All of David Matolcsi's Comments + Replies

I like the main idea of the post. It's important to note though that the setup assumed that we have a bunch of alignnent ideas that all have an independent 10% chance of working. Meanwhile, in reality I expect a lot of correlation: there is a decent chance that alignment is easy and a lot of our ideas will work, and a decent chance that it's hard and basically nothing works.

3mattmacdermott
Agreed, this only matters in the regime where some but not all of your ideas will work. But even in alignment-is-easy worlds, I doubt literally everything will work, so testing would still be helpful.

Does anyone know of a not peppermint flavored zinc acetate lozenge? I really dislike peppermint, so I'm not sure it would be worth it to drink 5 peppermint flavored glasses of water a day to decrease the duration of cold with one day, and I haven't found other zinc acetate lozenge options yet, the acetate version seems to be rare among zing supplement. (Why?)

1Drake Thomas
Note that the lozenges dissolve slowly, so (bad news) you'd have the taste around for a while but (good news) it's really not a very strong peppermint flavor while it's in your mouth, and in my experience it doesn't really have much of the menthol-triggered cooling effect. My guess is that you would still find it unpleasant, but I think there's a decent chance you won't really mind. I don't know of other zinc acetate brands, but I haven't looked carefully; as of 2019 the claim on this podcast was that only Life Extension brand are any good.
1Lucie Philippon
Earlier discussion on LW on zinc lozenges effectiveness mentioned that other flavorings which make it taste nice actually prevent the zinc effect. From this comment by philh (quite a chain of quotes haha): That's why the peppermint zinc acetate lozenge from Life Extension is the recommended one. So your only other option might be somehow finding unflavored zinc lozenges, which might taste even worse? Not sure where that might be available

Fair, I also haven't made any specific commitments, I phrased it wrongly. I agree there can be extreme scenarios with trillions of digital minds tortured where you'd maybe want to declare war on the. rest of society. But I would still like people to write down that "of course, I wouldn't want to destroy Earth before we can save all the people who want to live in their biological bodies, just to get a few years of acceleration in the cosmic conquest". I feel a sentence like this should really have been included in the original post about dismantling the Sun... (read more)

As I explain in more detail in my other comment, I expect market based approaches to not dismantle the Sun anytime soon. I'm interested if you know of any governance structure that you support that you think will probably lead to dismantling the Sun within the next few centuries.

I feel reassured that you don't want to Eat the Earth while there are still biological humans who want to live on it. 

I still maintain that under governance systems I would like, I would expect the outcome to be very conservative with  the solar system in the next thousand years. Like one default governance structure I quite like is to parcel out the Universe equally among the people alive during the Singularity, have a binding constitution on what they can do on their fiefdoms (no torture, etc), and allow them to trade and give away their stuff ... (read more)

I maintain that biological humans will need to do population control at some point. If they decide that enacting the population control in the solar system at a later population leve is worth it for them to dismantle the Sun, then they can go for it. My guess is that they won't, and will have population control earlier. 

I think that the coder looking up and saying that the Sun burning is distasteful but the Great Transhumanist Future will come in 20 years, along with a later mention of "the Sun is a battery", together implies that the Sun is getting dismantled in the near future. I guess you can debate in how strong the implication is, maybe they just want to dismantle the Sun in the long term, and currently only using the Sun as a battery in some benign way, but I think that's not the most natural interpretation.

2habryka
I think the 20 years somewhat unambiguously refers to timelines until AGI is built.  Separately, “the sun is a battery” I think also doesn’t really imply anything about the sun getting dismantled, if anything it seems to me imply explicitly that the sun is still intact (and probably surrounded by a Dyson swarm or sphere). 

Yeah, maybe I just got too angry. As we discussed in other comments, I believe that astronomical acceleration perspective the real deal is maximizing the initial industrialization of Earth and its surroundings, which does require killing off (and mind uploading) the Amish and everyone else. Sure, if people are only arguing that we should only dismantle the Sun and Earth after millennia, that's more acceptable, but I really don't see what's the point then, we can build out our industrial base on Alpha Centauri by then. 

The part that is frustrating to m... (read more)

6Ben Pace
It is good to have deontological commitments about what you would do with a lot of power. But this situation is very different from "a lot of power", it's also "if you were to become wiser and more knowledgeable than anyone in history so far". One can imagine the Christians of old asking for a commitment that "If you get this new scientific and industrial civilization that you want in 2,000 years from now, will you commit to following the teachings of Jesus?" and along the way I sadly find out that even though it seemed like a good and moral commitment at the time, it totally screwed my ability to behave morally in the future because Christianity is necessarily predicated on tons of falsehoods and many of its teachings are immoral. But there is some version of this commitment I think might be good to make... something like "Insofar as the players involved are all biological humans, I will respect the legal structures that exist and the existence of countries, and will not relate to them in ways that would be considered worthy of starting a war in its defense". But I'm not certain about this, for instance what if most countries in the world build 10^10 digital minds and are essentially torturing them? I may well wish to overthrow a country that is primarily torture with a small number of biological humans sitting on thrones on top of these people, and I am not willing to commit not to do that presently. I understand that there are bad ethical things one can do with post-singularity power, but I do not currently see a clear way to commit to certain ethical behaviors that will survive contact with massive increases in knowledge and wisdom. I am interested if anyone has made other commitments about post-singularity life (or "on the cusp of singularity life") that they expect to survive contact with reality? Added: At the very least I can say that I am not going to make commitments to do specific things that violate my current ethics. I have certainly made no positive
2Ben Pace
(Meta: Apologies for running the clock, but it is 1:45am where I am and I'm too sleepy to keep going on this thread, so I'm bowing out for tonight. I want to respond further, but I'm on vacation right now so I do wish to disclaim any expectations of a speedy follow-up.)

I expect non-positional material goods to be basically saturated for Earth people in a good post-Singularity world, so I don't think you can promise them to become twice as rich. And also, people dislike drastic change and new things they don't understand. 20% of the US population refused the potentially life-saving covid vaccine out of distrust of new things they don't understand. Do you think they would happily move to a new planet with artificial sky maintained by supposedly benevolent robots? Maybe you could buy off some percentage of the population if... (read more)

4habryka
Twenty years seems indeed probably too short, though it’s hard to say how post-singularity technology will affect things like public deliberation timelines.  My best guess is 200 years will very likely be enough. I agree with you that there exist some small minority of people who will have a specific attachment to the sun, but most people just want to live good and fulfilling lives, and don’t have strong preferences about whether the sun in the sky is exactly 1 AU away and feels exactly like the sun of 3 generations past. Also, people will already experience extremely drastic change in the 20 years after the singularity, and my sense is marginal cost of change is decreasing, and this isn’t the kind of change that would most affect people’s lived experience.  To be clear, for me it’s a crux whether not dismantling the sun is basically committing everyone who doesn’t want to be uploaded to relative cosmic poverty. It would really suck if all remaining biological humans would be unable to take advantage of the vast majority of the energy in the solar system.  I am not at present compelled that the marginal galaxies are worth destroying the sun and earth for (though I am also not confident it isn’t, I feel confused about it, and also don’t know where most people would end up after having been made available post-singularity intelligence enhancing drugs and deliberation technologies, which to be clear not everyone would use, but most people probably would). 

Are you arguing that if technologically possible, the Sun should be dismantled in the first few decades after the Singularity, as it is implied in the Great Transhumanist Future song, the main thing I'm complaining about here? In that case, I don't know of any remotely just and reasonable (democratic, market-based or other) governance structure that would allow that to happen given how the majority of people feel.

If you are talking about population dynamics, ownership and voting shifting over millennia to the point that they decide to dismantle the Sun, then sure, that's possible, though that's not what I expect to happen, see my other comment on market trades and my reply to Habryka on population dynamics.

2habryka
(It is not implied in the song, to be clear, you seem to have a reading of the lyrics I do not understand.  The song talks about there being a singularity in ~20 years, and separately that the sun is wasteful, but I don’t see any reference to the sun being dismantled in 20 years. For reference, lyrics are here: https://luminousalicorn.tumblr.com/post/175855775830/a-filk-of-big-rock-candy-mountain-one-evening-as) 

You mean that people on Earth and the solar system colonies will have enough biological children, and space travel to other stars for biological people will be hard enough that they will want the resources from dismantling the Sun? I suppose that's possible, though I expect they will put some kind of population control for biological people in place before that happens. I agree that also feels aversive, but at some point it needs to be done anyway, otherwise exponential population growth just brings us back to the Malthusian limit a few ten thousand years ... (read more)

3habryka
Someone will live on old earth in your scenario. Unless those people are selected for extreme levels of attachment to specific celestial bodies, as opposed to the function and benefit of those celestial bodies, I don’t see why those people would decide to not replace the sun with a better sun, and also get orders of magnitude richer by doing so. It seems to me that the majority of those inhabitants of old earth would simply be people who don’t want to be uploaded (which is a much more common preference I expect than maintaining the literal sun in the sky) and so have much more limited ability to travel to other solar systems. I don’t see why I would want to condemn most people who don’t want be uploaded to relative cosmic poverty just because a very small minority of people want to keep burning away most of the usable energy in the solar system for historical reasons. 

I agree that not all decisions about the cosmos should be made on a majoritarian democratic way, but I don't see how replacing the Sun with artificial light can be done by market forces under normal property rights. I think you are currently would not be allowed to build a giant glass dome around someone's pot of land, and this feels at least that strong. 

I'm broadly sympathetic to having property rights and markets in the post-Singularity future, and probably the people will scope-sensitive and longtermist preferences will be able to buy out the futu... (read more)

3habryka
People don’t generally have strong preferences about celestial objects. I really don’t understand why you think most people care about the sun qua the sun, as opposed to the things the sun provides.  Most people when faced with the choice to be more than twice as rich in new-earth, which they get to visualize and explore using the best of digital VR and sensory technology, with a fake sun indistinguishable for all intends and purposes from the real sun, will of course choose that over the attachment to maintaining that specific ball of plasma in the sky. 

I agree that I don't viscerally feel the loss of the 200 galaxies, and maybe that's a deficiency. But I still find this position crazy. I feel this is a decent parallel dialogue:
Other person: "Here is a something I thought of that would increase health outcomes in the world by 0.00000004%."
Me: "But surely you realize that this measure is horrendously unpopular, and the only way to implement it is through a dictatorial world government."
Other person: "Well yes, I agree it's a hard dilemma, but on absolute terms, 0.00000004% of the world population is 3 peop... (read more)

5Raemon
It sounds like there's actually like 3-5 different object level places where we're talking about slightly different things. I also updated on the practical aspect from Ryan's comment. So, idk here's a bunch of distinct points. 1.  Ryan Greenblatt's comment updated me that the energy requirements here are minimal enough that "eating the sun" isn't really going to come up as a consideration for astronomical waste. (Eating the Earth or most of the solar system seems like it still might be. But, I agree we shouldn't Eat the Earth) 2.  I'd interpreted most past comments for nearterm (i.e. measured in decades) crazy shit to be about building Dyson spheres, not Star Lifting. (i.e. I expected the '20 years from now in some big ol' computer' in the solstice song to be about dyson spheres and voluntary uploads). I think many people will still freak out about Dyson Sphering the sun (not sure if you would). I would personally argue "it's just pretty damn important to Dyson Sphere the sun even if it makes people uncomfortable (while designing it such that Earth still gets enough light)." 3.  I agree in 1000 years it won't much matter whether you Starlift, for astronomical waste reasons. But I do expect in 1000 years, even assuming a maximally consent-oriented / conservative-with-regards-to-bio-human-values, and all around "good" outcome, most people will have shifted to running on computronium and experienced much more than 1000 years of subjective time and their intuitions about what's good will just be real different. There may be small groups of people who continue living in bio-world but most of them will still probably be pretty alien by our lights.  I think I do personally hope they preserve the Earth as sanctuary and/or historical relic. But I think there's a lot of compromises like "starlift a lot of material out of the sun, but move the Earth closer to the sun to compensate" (I haven't looked into the physics here, the details are obviously cruxy).  When I imagi
6Ben Pace
Most decisions are not made democratically, and pointing out that a majoritarian vote is against a decision is no argument that they will not happen nor should not happen. This is true of the vast majority of resource allocation decisions such as how to divvy up physical materials.

Yes, I wanted to argue something like this. 

I think this is a false dilemma. If all human cultures on Earth come to the conclusion in 1000 years that they would like the Sun to be dismantled (which I very much doubt), then sure, we can do that. But at that point, we could already have built awesome industrial bases by dismantling Alpha Centauri, or just building them up by dismantling 0.1% of the Sun that doesn't affect anything on Earth. I doubt that totally dismantling the Sun after centuries would significantly accelerate the time we reach the cosmic event horizon. 

The thing that actually ha... (read more)

6Raemon
I have my own actual best guesses for what happens in reasonably good futures, which I can get into. (I'll flag for now I think "preserve Earth itself for as long as possible" is a reasonable Schelling point that is compatible with many "go otherwise quite fast" plans) Why do you doubt this? (to be clear, depends on exact details. But, my original query was about a 2 year delay. Proxima Centauri is 4 lightyears away. What is your story for how only taking 0.1% of the sun's energy while we spin up doesn't slow us down by at least 2 years? I have more to say but maybe should wait on your answer to that. Mostly, I think your last comment still had it's own missing mood of horror, and/or seemed to be assuming away any tradeoffs. (I am with you on "many rationalists seem gung ho about this in a way I find scary")

I might write a top level post or shortform about this at some point. I find it baffling how casually people talk about dismantling the Sun around here. I recognize that this post makes no normative claim that we should do it, but it doesn't say that it would be bad either, and expects that we will do it even if humanity remains in power. I think we probably won't do it if humanity remains in power, we shouldn't do it, and if humanity disassembles the Sun, it will probably happen for some very bad reason, like a fanatical dictatorship getting in power.&nbs... (read more)

2Said Achmiz
Is this true?! (Do you have a link or something?)

You are putting words in people's mouths to accuse lots of people of wanting to round up the Amish and hauling them to extermination camps, and I am disappointed that you would resort to such accusations.

So, I'm with you on "hey guys, uh, this is pretty horrifying, right? Uh, what's with the missing mood about that?".

The issue is that not-eating-the-sun is also horrifying. i.e. see also All Possible Views About Humanity's Future Are Wild. To not eat the sun is to throw away orders of magnitude more resources than anyone has ever thrown away before. Is it percentage-wise "a small fraction of the cosmos?". Sure. But, (quickly checks Claude, which wrote up a fermi code snippet before answering, I can share the work if you want to doublecheck yourself), a two ... (read more)

Without making any normative arguments: if you're in a position (industrially and technologically) to disassemble the sun at all, or build something like a Dyson swarm, then it's probably not too difficult to build an artificial system to light the Earth in such a way as to mimic the sun, and make it look and feel nearly identical to biological humans living on the surface, using less than a billionth of the sun's normal total light output. The details of tides might be tricky, but probably not out of reach.

9Seth Herd
You're such a traditionalist! More seriously, accusing rationalists of hauling the Amish and their mothers to camps doesn't seem quite fair. Like you said, most rationalists seem pretty nice and aren't proposing involuntary rapid changes. And this post certainly didn't. You'd need to address the actual arguments in play to write a serious post about this. "Don't propose weird stuff" isn't a very good argument. You could argue that went very poorly with communism, or come up with some other argument. Actually I think rationalists have come up with some. It looks to me like the more respected rationalists are pretty cautious about doing weird drastic stuff just because the logic seems correct at the time. See the unilateralist curse and Yudkiwky's and other's pleas that nobody do anything drastic about AGI even though they think it's very likely going to kill us all. This stuff is fun to think about, but it's planning the victory party before planning how to win the war. How to put the future into kind and rational hands seems like an equally interesting and much more urgent project right now. I'd be fine with a pretty traditional utopian future or a very weird one, but not fine with joyless machines eating the sun, or worse yet all of the suns they can reach.

What is an infra-Bayesian Super Mario supposed to mean? I studied infra-Bayes under Vanessa for half a year, and I have no idea what this could possibly mean. I asked Vanessa when this post came out and she also said she can't guess what you might mean under this. Can you explain what this is? It makes me very skeptical that the only part of the plan I know something about seems to be nonsense.

Also, can you give more information ir link to a resource on what Davidad's team is currently doing? It looks like they are the best funded AI safety group that currently exist (except if you count Anthropic), but I never hear about them.

2Quinn
(i'm guessing) super mario might refer to a simulation of the Safeguarded AI / Gatekeeper stack in a videogame. It looks like they're skipping videogames and going straight to cyberphysical systems (1, 2).

I don't agree with everything in this post, but I think it's a true an aunderappreciated point that "if your friend dies in a random accident, that's actually only a tiny loss accorfing to MWI."

I usually use this point to ask people to retire the old argument that "Religious people don't actually belive in their religion, otherwise they would be no more sad at the death of a loved one than if their loved one sailed to Australia." I think this "should be" true of MWI believers too, and we still feel very sad when a loved one dies in accident.

I don't think t... (read more)

1Jonah Wilberg
Yes, very much agree with those points. Virtue ethics is another angle to come at the same point that there's a process whereby you internalise system 2 beliefs into system 1. Virtues need to be practised and learned, not just appreciated theoretically. That's why stoicism has been thought of (e.g. by Pierre Hadot) as promoting 'spiritual exercises' rather than systematic philosophy - I draw some further connections to stoicism in the next post in the sequence.

I fixed some misunderstandable parts, I meant the $500k being the LW hosting + Software subscriptions and the Dedicated software + accounting stuff together. And I didn't mean to imply that the labor cost of the 4 people is $500k, that was a separate term in the costs. 

Is Lighthaven still cheaper if we take into account the initial funding spent on it in 2022 and 2023? I was under the impression that buying Lighthaven is one of the things that made a lot of sense when the community believed it would have access to FTX funding, and once we bought it, i... (read more)

9habryka
Ah yeah, I did misunderstand you there. Makes sense now.  It's tricky because a lot of that is capital investment, and it's extremely unclear what the resell price of Lighthaven would end up being if we ended up trying to sell, since we renovated it in a pretty unconventional way.  Total renovations cost around ~$7M-$8M. About $3.5M of that was funded as part of the mortgage from Jaan Tallinn, and another $1.2M of that was used to buy a property right next to Lighthaven which we are hoping to take out an additional mortgage on (see footnote #3), and which we currently own in full. The remaining ~$3M largely came from SFF and Open Phil funding. We also lost a total of around ~$1.5M in net operating costs so far. Since the property is super hard to value, let's estimate the value of the property after our renovations at our current mortgage value ($20M).[1] During the same time, the Lightcone Offices would have cost around $2M, so if you view the value we provided in the meantime as roughly equivalent, we are out around $2.5M, but also, property prices tend to increase over time at least some amount, so by default we've probably recouped some fraction of that in appreciated property values, and will continue to recoup more as we break even. My honest guess is that Lighthaven would make sense even without FTX, from an ex-post perspective, but that if we hadn't have had FTX there wouldn't have been remotely enough risk appetite for it to get funded ex-ante. I think in many worlds Lighthaven turned out much worse than it did (and for example, renovation costs already ended up in the like 85th percentile of my estimates due to much more extensive water and mold damage than I was expecting in the mainline). 1. ^ I think this is a potentially controversial choice, though I think it makes sense. I think most buyers would not be willing to pay remotely as much for the venue as that, since they would basically aim to return the property back to its standard hote

I donated $1000. Originally I was worried that this is a bottomless money-pit, but looking at the cost breakdown, it's actually very reasonable. If Oliver is right that Lighthaven funds itself apart from the labor cost, then the real costs are $500k for the hosting, software and accounting cost of LessWrong (this is probably an unavoidable cost and seems obviously worthy of being philanthropically funded), plus paying 4 people (equivalent to 65% of 6 people) to work on LW moderation and upkeep (it's an unavoidable cost to have some people working on LW, 4 ... (read more)

Thank you so much!

Some quick comments: 

then the real costs are $500k for the hosting and hosting cost of LessWrong 

Raw server costs for LW are more like ~$120k (and to be clear, you could drive this lower with some engineering, though you would have to pay for that engineering cost). See the relevant line in the budget I posted.

Total labor cost for the ~4 people working on LW is closer to ~$800k, instead of the $500k you mention.

(I'm not super convinced it was a good decision to abandon the old Lightcone offices for Lighthaven, but I guess it mad

... (read more)

I'm considering donating. Can you give us a little more information on breakdown of the costs? What are typical large expenses that the 1.6 million upkeep of Lighthaven consists of? Is this a usual cost for a similar sized event space, or is something about the location or the specialness of the place that makes it more expensive? 

How much money does running LW cost? The post says it's >1M, which somewhat surprised me, but I have no idea what's the usual cost of running such a site is. Is the cost mostly server hosting or salaries for content moderation or salaries for software development or something I haven't thought of? 

Very reasonable question! Here is a breakdown of our projected budget:

TypeCost
Core Staff Salaries, Payroll, etc. (6 people)$1.4M
Lighthaven (Upkeep) 
Operations & Sales$240k 
Repairs & Maintenance Staff$200k 
Porterage & Cleaning Staff$320k 
Property Tax$300k 
Utilities & Internet$180k 
Additional Rental Property$180k 
Supplies (Food + Maintenance)$180k 
Lighthaven Upkeep Total$1.6M
Lighthaven Mortgage$1M
LW Hosting + Software Subscriptions$120k
Dedicated Software + Accounting Staff$330k
Total Costs$4.45M
Expected
... (read more)

Importantly, the oracle in the story is not making an elementary mistake, I think it's true that it's "probably" in a simulation. (Most of the measure of beings like it are in simulations.) It is also not maximizing reward, it is just honestly reporting what it expects its future observations to be about the President (which is within the simulation). 

I agree with many of the previous commenters, and I acknowledged in the original post, that we don't know how to build such an AI that just honestly reports its probabilities of observables (even if thy depend of crazy simulation things), so all of this is hypothetical, but having such a truthful Oracle was the initial assumption of the thought experiment.

Even assuming that the simulators have wildly different values, why would doing something insane a good thing to do?

0Logan Zoellner
yes

I always assume when thinking about future dangerous models that they have access to some sort of black-box memory. Do we think there is a non-negligible chance that an AI that doesn't have hidden memory, only English-language CoT, will be able to evade our monitoring and execute a rouge deployment? (Not a rhetorical question, there might be a way I haven't thought of.)

So I think that assuming the AI being stateless when thinking about future risk is not a good idea, as I think the vast majority of the risk comes from AIs for which this assumption is not t... (read more)

5Buck
Yes, I do think this. I think the situation looks worse if the AI has hidden memory, but I don't think we're either fine if the model doesn't have it or doomed if it does.

Hm, probably we disagree on something. I'm very confused how to mesh epistemic uncertainty with these "distribution over different Universes" types of probability. When I say "Boltzmann brains are probably very low measure", I mean "I think Boltzmann brains are very low measure, but this is a confusing topic and there might be considerations I haven't thought of and I might be totally mistaken". I think this epistemic uncertainty is distinct from the type of "objective probabilities" I talk about in my post, and I don't really know how to use language with... (read more)

2Noosphere89
This part IMO is a crux, in that I don't truly believe an objective measure/magical reality fluid can exist in the multiverse, if we allow the concept to be sufficiently general, ruining both probability and expected value/utility theory in the process. Heck, in the most general cases, I don't believe any coherent measure exists at all, which basically ruins probability and expected utility theory at the same time.
3Richard_Ngo
The part I was gesturing at wasn't the "probably" but the "low measure" part. Yes, that's a good summary of my position—except that I think that, like with ethics, there will be a bunch of highly-suggestive logical/mathematical facts which make it much more intuitive to choose some priors over others. So the choice of prior will be somewhat arbitrary but not totally arbitrary. I don't think this is a fully satisfactory position yet, it hasn't really dissolved the confusion about why subjective anticipation feels so real, but it feels directionally correct.

I think it only came up once for a friend. I translated it and it makes sense, it just leaves replaces the appropriate English verb with a Chinese one in the middle of a sentence. (I note that this often happens with me to when I talk with my friends in Hungarian, I'm sometimes more used to the English phrase for something, and say one word in English in the middle of the sentence.)

2Michael Roe
As someone who, in a previous job, got to go to a lot of meetings where the European commission is seeking input about standardising or regulating something - humans also often do the thing where they just use the English word in the middle of a sentence in another language, when they can’t think what the word is. Often with associated facial expression / body language to indicate to the person they’re speaking to “sorry, couldn’t think of the right word”. Also used by people speaking English, whose first language isn’t English, dropping into their own lamguage for a word or two. If you’ve been the editor of e.g. an ISO standard, fixing these up in the proposed text is such fun.  So, it doesn’t surprise me at all that LLMs do this. I have, weirdly, seen llms put a single Chinese word in the middle of English text … and consulting a dictionary reveals that it was, in fact, the right word, just in Chinese.

I like your poem on Twitter.

I think that Boltzmann brains in particular are probably very low measure though, at least if you use Solomonoff induction. If you think that weighting observer moments within a Universe by their description complexity is crazy (which I kind of feel), then you need to come up with a different measure on observer moments, but I expect that if we find a satisfying measure, Boltzmann brains will be low measure in that too.

I agree that there's no real answer to "where you are", you are a superposition of beings across the multiverse... (read more)

3Richard_Ngo
Hmmm, uncertain if we disagree. You keep saying that these concepts are cursed and yet phrasing your claims in terms of them anyway (e.g. "probably very low measure"), which suggests that there's some aspect of my response you don't fully believe. In particular, in order for your definition of "what beings are sufficiently similar to you" to not be cursed, you have to be making claims not just about the beings themselves (since many Boltzmann brains are identical to your brain) but rather about the universes that they're in. But this is kinda what I mean by coalitional dynamics: a bunch of different copies of you become more central parts of the "coalition" of your identity based on e.g. the types of impact that they're able to have on the world around them. I think describing this as a metric of similarity is going to be pretty confusing/misleading. You still need a prior over worlds to calculate impacts, which is the cursed part.

I think that pleading total agnosticism towards the simulators' goals is not enough. I write "one common interest of all possible simulators is for us to cede power to an AI whose job is to figure out the distribution of values of possible simulators as best as it can, then serve those values." So I think you need a better reason to guard against being influenced then "I can't know what they want, everything and its opposite is equally likely", because the action proposed above is pretty clearly more favored by the simulators than not doing it. 

Btw, I... (read more)

The reason for agnosticism is that it is no more likely for them to be on one side or the other. As a result, you don't know without evidence who is influencing you. I don't really think this class of Pascal's Wager attack is very logical for this reason - an attack is supposed to influence someone's behavior but I think that without special pleading this can't do that. Non-existent beings have no leverage whatsoever and any rational agent would understand this - even humans do. Even religious beliefs aren't completely evidenceless, the type of evidence ex... (read more)

The simulators can just use a random number generator to generate the events you use in your decision-making. They lose no information by this, your decision based on leaves falling on your face would be uncorrelated anyway with all other decisions anyway from their perspective, so they might as well replace it with a random number generator. (In reality, there might be some hidden correlation between the leaf falling on your left face, and another leaf falling on someone else's face, as both events are causally downstream of the weather, but given that th... (read more)

I experimented a bunch with DeepSeek today, it seems to be exactly on the same level in highs school competition math as o1-preview in my experiments. So I don't think it's benchmark-gaming, at least in math. On the other hand, it's noticeably worse than even the original GPT-4 at understanding a short story I also always test models on.

I think it's also very noteworthy that DeepSeek gives everyone 50 free messages a day (!) with their CoT model, while OpenAI only gives 30 o1-preview messages a week to subscribers. I assume they figured out how to run it m... (read more)

2Joel Burget
The Chinese characters sound potentially worrying. Do they make sense in context? I tried a few questions but didn't see any myself.

Yeah, I really hope they do actually open-weights it because the science of faithful CoT would benefit greatly.

Yes, I agree that we won't get such Oracles by training. As I said, all of this is mostly just a fun thought experiment, and I don't think these arguments have much relevance in the near-term.

I agree that the part where the Oracle can infer from first principles that the aliens' values are more proobably more common among potential simulators is also speculative. But I expect that superintelligent AIs with access to a lot of compute (so they might run simulations on their own), will in fact be able to infer non-zero information about the distribution of the simulators' values, and that's enough for the argument to go through.

2Noosphere89
I think this is in fact the crux, in that I don't think they can do this in the general case, no matter how much compute is used, and even in the more specific cases, I still expect it to be extremely hard verging on impossible to actually get the distribution, primarily because you get equal evidence for almost every value, for the same reasons as why getting more compute is an instrumental convergent goal, so you cannot infer the values of basically anyone solely on the fact that you live in a simulation. In the general case, the distribution/probability isn't even well defined at all.

I think that the standard simulation argument is still pretty strong: If the world was like what it looks to be, then probably we could, and plausibly we would, create lots of simulations. Therefore, we are probably in a simulation.

I agree that all the rest, for example the Oracle assuming that most of the simulations it appears in are created for anthropic capture/influencing reasons, are pretty speculative and I have low confidence in them.

4Noosphere89
The boring answer to Solomonoff's malignness is that the simulation hypothesis is true, but we can infer nothing about our universe through it, since the simulation hypothesis predicts everything, and thus is too general a theory.

I'm from Hungary that is probably politically the closest to Russia among Central European countries, but I don't really know of any significant figure who turned out to be a Russian asset, or any event that seemed like a Russian intelligence operation. (Apart from one of our far-right politicians in the EU Parliament being a Russian spy, which was a really funny event, but its not like the guy was significantly shaping the national conversation or anything, I don't think many have heard of him before his cover was blown.) What are prominent examples in Czechia or other Central European countries, of Russian assets or operations?

7Viliam
It is difficult to prove things, but I strongly suspect that in Slovakia, Ján Čarnogurský is a Russian asset. In my opinion, the only remaining question is when exactly was he recruited, how long game was played on us. I have suspected him for a long time, but most people probably would have called me crazy for that, however recently he became openly pro-Russian, to a great surprise for many of his former supporters. So the question is whether I was right and this was a long con, or whether he had a change of mind recently and my previous suspicions were merely a coincidence (homogeneity of the outgroup, etc.). If this indeed was a long con (maybe, maybe not), then he had a perfect cover story. During communism, he was a lawyer and provided legal support for the anti-Communist opposition. Two years before the fall of communism, he was fired and unemployed. Three months before the fall of communism, he was put in prison. Also, he was strongly religious (perceived as a religious fanatic by some). Remember that Slovakia is a predominantly Catholic country. After the fall of communism he quickly rose to power. He basically represented the opposition to communism, and the comeback of religious freedom. In 1990s the political scene of Slovakia was basically two camps: those nostalgic for communism, led by Vladimír Mečiar, and those who opposed communism and wanted to join the West, led by Ján Čarnogurský. So we are talking here about the strongest, or the second strongest politician. I remember some weird opinions of his from that era. For example, he talked a lot about how Slovakia should be "a bridge between Russia and the West", and that we should build a broad-gauge railway across Slovakia (i.e. from the Ukrainian border, to the capital city which is on the western end). If anyone else would have said that, people would probably suspect them of something, but Čarnogurský's anti-communist credentials were just too perfect, so he stayed above suspicion. (From my per

GPT4 does not engage in the sorts of naive misinterpretations which were discussed in the early days of AI safety. If you ask it for a plan to manufacture paperclips, it doesn't think the best plan would involve converting all the matter in the solar system into paperclips.

 

I'm somewhat surprised by this paragraph. I thought the MIRI position was that they did not in fact predict AIs behaving like this, and the behavior of GPT4 was not an update at all for them. See this comment by Eliezer. I mostly bought that MIRI in fact never worried about AIs goi... (read more)

4abramdemski
I more-or-less agree with Eliezer's comment (to the extent that I have the data necessary to evaluate his words, which is greater than most, but still, I didn't know him in 1996). I have a small beef with his bolded "MIRI is always in every instance" claim, because a universal like that is quite a strong claim, and I would be very unsurprised to find a single counterexample somewhere (particularly if we include every MIRI employee and everything they've ever said while employed at MIRI). What I am trying to say is something looser and more gestalt. I do think what I am saying contains some disagreement with some spirit-of-MIRI, and possibly some specific others at MIRI, such that I could say I've updated on the modern progress of AI in a different way than they have. For example, in my update, the modern progress of LLMs points towards the Paul side of some Eliezer-Paul debates. (I would have to think harder about how to spell out exactly which Eliezer-Paul debates.) One thing I can say is that I myself often argued using "naive misinterpretation"-like cases such as the paperclip example. However, I was also very aware of the Eliezer-meme "the AI will understand what the humans mean, it just won't care". I would have predicted difficulty in building a system which correctly interprets and correctly cares about human requests to the extent that GPT4 does. This does not mean that AI safety is easy, or that it is solved; only that it is easier than I anticipated at this particular level of capability. Getting more specific to what I wrote in the post: My claim is that modern LLMs are "doing roughly what they seem like they are doing" and "internalize human intuitive concepts". This does include some kind of claim that these systems are more-or-less ethical (they appear to be trying to be helpful and friendly, therefore they "roughly are").  The reason I don't think this contradicts with Eliezer's bolded claim ("Getting a shape into the AI's preferences is differ

"Misinterpretation" is somewhat ambiguous. It either means not correctly interpreting the intent of an instruction (and therefore also not acting on that intent) or correctly understanding the intent of the instruction while still acting on a different interpretation. The latter is presumably what the outcome pump was assumed to do. LLMs can apparently both understand and act on instructions pretty well. The latter was not at all clear in the past.

I agree that if alignment is in fact philosophically and conceptually difficult, the AI can sandbag on that to some extent. Though I have some hope that the builder-breaker approach helps here. We train AIs to produce ideas that are at least as superficially plausible sounding as the things produced by the best alignment researchers. I think this is a number-go-up task, where we can train the AI to do well. Then we train an AI to point out convincing counter-arguments to the superficially plausible sounding ideas. This seems similarly trainable. I think it... (read more)

Here is the promised comment on what kind of "commitment" I want to make given all the responses. 

I agree with Buck that no one should make very direct commitment about this sort of thing, as there might be blackmail related scary things lurking in the shadows when one does acausal trade. I think we will probably figure out how to handle that, but we shouldn't make any strong promises of specific actions until we figure that out. 

However, the promise I'm intending to keep is that if humanity wins and I'm alive to see it, I will remember how scary... (read more)

Thanks to Nate for conceding this point. 

I still think that other than just buying freedom to doomed aliens, we should run some non-evolved simulations of our own with inhabitants that are preferably p-zombies or animated by outside actors. If we can do this in the way that the AI doesn't notice it's in a simulation (I think this should be doable), this will provide evidence to the AI that civilizations do this simulation game (and not just the alien-buying) in general, and this buys us some safety in worlds where the AI eventually notices there are n... (read more)

We are still talking past each other, I think we should either bet or finish the discussion here and call it a day.

I really don't get what you are trying to say here, most of it feels like a non-sequitor to me. I feel hopeless that either of us manages to convince the other this way. All of this is not a super important topic, but I'm frustrated enogh to offer a bet of $100, that we select one or three judges we both trust (I have some proposed names, we can discuss in private messages), show them either this comment thread or a four paragraphs summary of our view, and they can decide who is right. (I still think I'm clearly right in this particular discussion.)

Otherwise, I think it's better to finish this conversation here.

2So8res
I'm happy to stake $100 that, conditional on us agreeing on three judges and banging out the terms, a majority will agree with me about the contents of the spoilered comment.

I think this is mistaken. In one case, you need to point out the branch, planet Earth within our Universe, and the time and place of the AI on Earth. In the other case, you need to point out the branch, the planet on which a server is running the simulation, and the time and place of the AI on the simulated Earth. Seems equally long to me. 

If necessary, we can run let pgysical biological life emerge on the faraway planet and develop AI while we are observing them from space. This should make it clear that Solomonoff doesn't favor the AI being on Earth instead of this random other planet. But I'm pretty certain that the sim being run on a computer doesn't make any difference.

2So8res
If the simulators have only one simulation to run, sure. The trouble is that the simulators have 2N simulations they could run, and so the "other case" requires N additional bits (where N is the crossent between the simulators' distribution over UFAIs and physics' distribution over UFAIs). Consider the gas example again. If you have gas that was compressed into the corner a long time ago and has long since expanded to fill the chamber, it's easy to put a plausible distribution on the chamber, but that distribution is going to have way, way more entropy than the distribution given by physical law (which has only as much entropy as the initial configuration). (Do we agree this far?) It doesn't help very much to say "fine, instead of sampling from a distribution on the gas particles now, I'll sample on a distribution from the gas particles 10 minutes ago, where they were slightly more compressed, and run a whole ten minutes' worth of simulation". Your entropy is still through the roof. You've got to simulate basically from the beginning, if you want an entropy anywhere near the entropy of physical law. Assuming the analogy holds, you'd have to basically start your simulation from the big bang, if you want an entropy anywhere near as low as starting from the big bang. ---------------------------------------- Using AIs from other evolved aliens is an idea, let's think it through. The idea, as I understand it, is that in branches where we win we somehow mask our presence as we expand, and then we go to planets with evolved life and watch until they cough up a UFAI, and the if the UFAI kills the aliens we shut it down and are like "no resources for you", and if the UFAI gives its aliens a cute epilog we're like "thank you, here's a consolation star". To simplify this plan a little bit, you don't even need to hide yourself, nor win the race! Surviving humans can just go to every UFAI that they meet and be like "hey, did you save us a copy of your progenitors? If so,

"AI with a good prior should be able to tell whether it's the kind of AI that would actually exist in base reality, or the kind of AI that would only exist in a simulation" seems pretty clearly false, we assumed that our superintelligent descendants create sims where the AIs can't tell if it's a sim, that seems easy enough. I don't see why it would be hard to create AIs that can't tell based on introspection whether it's more likely that their thought process arises in reality or in sims. In the worst case, our sims can be literal reruns of biological evolution on physical planets (though we really need to figure out how to do that ethically).  Nate seems to agree with me on this point?

3habryka
(I think I agree with you. I wasn't thinking super hard about the full context of the conversation. I was just intrigued by Nate's challenge. I don't really think engaging with my comment is going to be a good use of your time)

I think this is wrong. The AI has a similarly hard time to the simulators figuring out what's a plausible configuration to arise from the big bang. Like the simulators have an entropy N distribution of possible AIs, the AI itself also has an entropy N distribution for that. So it's probability that it's in a real Everett branch is not p, but p times 2^-N, as it has only a 2^-N prior probability that the kind of word it observes is the kind of thing that can come up in a real Everett branch. So it's balanced out with the simulation hypothesis, and as long a... (read more)

6dxu
If I imagine the AI as a Solomonoff inductor, this argument looks straightforwardly wrong to me: of the programs that reproduce (or assign high probability to, in the setting where programs produce probabilistic predictions of observations) the AI's observations, some of these will do so by modeling a branching quantum multiverse and sampling appropriately from one of the branches, and some of them will do so by modeling a branching quantum multiverse, sampling from a branch that contains an intergalactic spacefaring civilization, locating a specific simulation within that branch, and sampling appropriately from within that simulation. Programs of the second kind will naturally have higher description complexity than programs of the first kind; both kinds feature a prefix that computes and samples from the quantum multiverse, but only the second kind carries out the additional step of locating and sampling from a nested simulation. (You might object on the grounds that there are more programs of the second kind than of the first kind, and the probability that the AI is in a simulation at all requires summing over all such programs, but this has to be balanced against the fact most if not all of these programs will be sampling from branches much later in time than programs of the first type, and will hence be sampling from a quantum multiverse with exponentially more branches; and not all of these branches will contain spacefaring civilizations, or spacefaring civilizations interested in running ancestor simulations, or spacefaring civilizations interested in running ancestor simulations who happen to be running a simulation that exactly reproduces the AI's observations. So this counter-counterargument doesn't work, either.)

I still don't get what you are trying to say. Suppose there is no multiverse. There are just two AIs, one in a simulation run by aliens in another galaxy, one is in base reality. They are both smart, but they are not copies of each other, one is a paperclip maximizer, the othe is a corkscrew maximizer, and there are various other differences in their code and life history. The world in the sim is also very different from the real world in various ways, but you still can't determine if you are in the sim while you are in it. Both AIs are told by God that th... (read more)

4So8res
My answer is in spoilers, in case anyone else wants to answer and tell me (on their honor) that their answer is independent from mine, which will hopefully erode my belief that most folk outside MIRI have a really difficult time fielding wacky decision theory Qs correctly.

I think I mostly understand the other parts of your arguments, but I still fail to understand this one. When I'm running the simulations, as originally described in the post, I think that should be in a fundamental sense equivalent to acausal trade. But how do you translate your objection to the original framework where we run the sims? The only thing we need there is that the AI can't distinguish sims from base reality, so it thinks it's more likely to be in a sim, as there are more sims. 

Sure, if the AI can model the distribution of real Universes m... (read more)

The only thing we need there is that the AI can't distinguish sims from base reality, so it thinks it's more likely to be in a sim, as there are more sims.

I don't think this part does any work, as I touched on elsewhere. An AI that cares about the outer world doesn't care how many instances are in sims versus reality (and considers this fact to be under its control much moreso than yours, to boot). An AI that cares about instantiation-weighted experience considers your offer to be a technical-threat and ignores you. (Your reasons to make the offer would... (read more)

Yeah, I agree, and I don't know that much about OpenPhil's policy work, and their fieldbuilding seems decent to me, though maybe not from you perspective. I just wanted to flag that many people (including myself until recently) overestimate how big a funder OP is in technical AI safety, and I think it's important to flag that they actually have pretty limited scope in this area.

5habryka
Yep, agree that this is a commonly overlooked aspect (and one that I think sadly has also contributed to the dominant force in AI Safety researchers becoming the labs, which I think has been quite sad).

Isn't it just the case that OpenPhil just generally doesn't fund that many technical AI safety things these days? If you look at OP's team on their website, they have only two technical AI safety grantmakers. Also, you list all the things OP doesn't fund, but what are the things in technical AI safety that they do fund? Looking at their grants, it's mostly MATS and METR and Apollo and FAR and some scattered academics I mostly haven't heard of. It's not that many things. I have the impression that the story is less like "OP is a major funder in technical AI... (read more)

A lot of OP's funding to technical AI safety goes to people outside the main x-risk community (e.g. applications to Ajeya's RFPs).

Open Phil is definitely by far the biggest funder in the field.  I agree that their technical grantmaking has been a limited over the past few years (though still on the order of $50M/yr, I think), but they also fund a huge amount of field-building and talent-funnel work, as well as a lot of policy stuff (I wasn't constraining myself to technical AI Safety, the people listed have been as influential, if not more, on public discourse and policy). 

AI Safety is still relatively small, but more like $400M/yr small. The primary other employers/funders in the space these days are big capability labs. As you can imagine, their funding does not have great incentives either.

I argue that right now, sarting from the present state, the true quantum probability of achieving the Glorious Future is way higher than 2^-75, or if not, then we should probably work on something other than AI safety. Me and Ryan argue for this in the last few comments. It's not a terribly important point, you can just say the true quantum probability is 1 in a billion, when it's still worth it for you to work on the problem, but it becomes rough to trade for keeping humanity physically alive that can cause one year of delay to the AI.

 But I would li... (read more)

5So8res
Starting from now? I agree that that's true in some worlds that I consider plausible, at least, and I agree that worlds whose survival-probabilities are sensitive to my choices are the ones that render my choices meaningful (regardless of how determinisic they are). Conditional on Earth being utterly doomed, are we (today) fewer than 75 qbitflips from being in a good state? I'm not sure, it probably varies across the doomed worlds where I have decent amounts of subjective probability. It depends how much time we have on the clock, depends where the points of no-return are. I haven't thought about this a ton. My best guess is it would take more than 75 qbitflips to save us now, but maybe I'm not thinking creatively enough about how to spend them, and I haven't thought about it in detail and expect I'd be sensitive to argument about it /shrug. (If you start from 50 years ago? Very likely! 75 bits is a lot of population rerolls. If you start after people hear the thunder of the self-replicating factories barrelling towards them, and wait until the very last moments that they would consider becoming a distinct person who is about to die from AI, and who wishes to draw upon your reassurance that they will be saved? Very likely not! Those people look very, very dead.) One possible point of miscommunication is that when I said something like "obviously it's worse than 2^-75 at the extreme where it's actually them who is supposed to survive" was intended to apply to the sort of person who has seen the skies darken and has heard the thunder, rather than the version of them that exists here in 2024. This was not intended to be some bold or suprising claim. It was an attempt to establish an obvious basepoint at one very extreme end of a spectrum, that we could start interpolating from (asking questions like "how far back from there are the points of no return?" and "how much more entropy would they have than god, if people from that branchpoint spent stars trying to figure
3Ben Pace
I have not followed this thread in all of its detail, but it sounds like it might be getting caught up on the difference between the underlying ratio of different quantum worlds (which can be expressed as a probability over one's future) and one's probabilistic uncertainty over the underlying ratio of different quantum worlds (which can also be expressed as a probability over the future but does not seem to me to have the same implications for behavior). Insofar as it seems to readers like a bad idea to optimize for different outcomes in a deterministic universe, I recommend reading the Free Will (Solution) sequence by Eliezer Yudkowsky, which I found fairly convincing on the matter of why it's still right to optimize in a fully deterministic universe, as well as in a universe running on quantum mechanics (interpreted to have many worlds).
Load More