LESSWRONG
LW

All of So8res's Comments + Replies

A case for courage, when speaking of AI danger

(From a moderation perspective:

I consider the following question-cluster to be squarely topical: "Suppose one believes it is evil to advance AI capabilities towards superintelligence, on the grounds that such a superintelligence would quite likely to kill us all. Suppose further that one fails to unapologetically name this perceived evil as 'evil', e.g. out of a sense of social discomfort. Is that a failure of courage, in the sense of this post?"
I consider the following question-cluster to be a tangent: "Suppose person X is contributing to a project that I

... (read more)

1Knight Lee5d

I completely agree this discussion should be moved outside your post. But the counterintuitive mechanics of LessWrong mean a derailing discussion may actually increase the visibility and upvotes of your original message (by bumping it in the "recent discussion"). (It's probably still bad if it's high up in the comment section.) It's too bad you can only delete comment threads, you can't move them to the bottom or make them collapsed by default.

8boazbarak5d

I agree with you on the categorization of 1 and 2. I think there is a reason why Godwin’s law was created once thread follow the controversy attractor to this direction they tend to be unproductive.

A case for courage, when speaking of AI danger

So8res9d1517

(The existence of exceptions is why I said "most anyone" instead of "anyone".)

A case for courage, when speaking of AI danger

So8res10d*4636

To be clear, my recommendation for SB-1047 was not "be basically the same bill but talk about extinction risks and levy a few more restrictions on the labs", but rather "focus very explicitly on the extinction threat; say 'this bill is trying to address a looming danger described by a variety of scientists and industry leaders' or suchlike, shape the bill differently to actually address the extinction threat straightforwardly".

I don't have a strong take on whether SB-1047 would have been more likely to pass in that world. My recollection is that, back when... (read more)

3Buck9d

Ok. I agree with many particular points here, and there are others that I think are wrong, and others where I'm unsure. For what it's worth, I think SB-1047 would have been good for AI takeover risk on the merits, even though (as you note) it isn't close to all we'd want from AI regulation.

A case for courage, when speaking of AI danger

So8res10d*4231

I don't think most anyone who's studied the issues at hand thinks the chance of danger is "really small", even among people who disagree with me quite a lot (see e.g. here). I think folks who retreat to arguments like "you should pay attention to this even if you think there's a really small chance of it happening" are doing a bunch of damage, and this is one of many problems I attribute to a lack of this "courage" stuff I'm trying to describe.

When I speak of "finding a position you have courage in", I do not mean "find a position that you think should be ... (read more)

8SAB2511h

Something I notice is that in the good examples you use only I statements. "I don't think humanity should be doing it", "I'm not talking about a tiny risk", "Oh I think I'll do it better than the next guy". Whereas in the bad examples it's different, "Well we can all agree that it'd be bad if AIs were used to enable terrorists to make bioweapons", "Even if you think the chance of it happening is very small", "In some unlikely but extreme cases, these companies put civilization at risk" I think with the bad examples there's a lot of pressure for the other person to agree, "the companies should be responsible (because I say so)", "Even if you think... Its still worth focusing on (because I've decided what you should care about)", "Well we can all agree (I've already decided you agree and you're not getting a choice otherwise)" Whereas with the good examples the other person is not under any pressure to agree, so they are completely free to think about the things you're saying. I think that's also part of what makes these statements courageous, that it's stated in a way where the other person is free to agree or dissagree as they wish, and so you trust that what your saying is compelling enough to be persuasive on its own.

6[anonymous]10d

This link doesn't seem to include people like Quintin Pope and the AI Optimists, who are the most notorious AI risk skeptics I can think of who have nonetheless written about Eliezer's arguments (example). If I recall correctly, I think Pope said sometime before his departure from this site that his P(doom) is around 1%.

A case for courage, when speaking of AI danger

So8res11d*4126

A few claims from the post (made at varying levels of explicitness) are:

1 . Often people are themselves motivated by concern X (ex: "the race to superintelligence is reckless and highly dangerous") and decide to talk about concern Y instead (ex: "AI-enabled biorisks"), perhaps because they think it is more palatable.

2 . Focusing on the "palatable" concerns is a pretty grave mistake.

2a. The claims Y are often not in fact more palatable; people are often pretty willing to talk about the concerns that actually motivate you.

2b. When people try talking about th... (read more)

Buck10d110

Ok. I don't think your original post is clear about which of these many different theses it has, or which points it thinks are evidence for other points, or how strongly you think any of them.

I don't know how to understand your thesis other than "in politics you should always pitch people by saying how the issue looks to you, Overton window or personalized persuasion style be damned". I think the strong version of this claim is obviously false. Though maybe it's good advice for you (because it matches your personality profile) and perhaps it's good advice ... (read more)

A case for courage, when speaking of AI danger

So8res12d*1513

Huh! I've been in various conversations with elected officials and have had the sense that most people speak without the courage of their convictions (which is not quite the same thing as "confidence", but which is more what the post is about, and which is the property I'm more interested in discussing in this comment section, and one factor of the lack of courage is broadcasting uncertainty about things like "25% vs 90+%" when they could instead be broadcasting confidence about "this is ridiculous and should stop"). In my experience, it's common to the po... (read more)

2Buck1d

Yeah to reiterate, idk why you think My guess is that the main reason they broadcast uncertainty is because they're worried that their position is unpalatable, rather than because of their internal sense of uncertainty.

9Buck11d

I don't think your anecdote supports that it's important to have the courage of your convictions when talking. I think the people I know who worked on SB-1047 are totally happy to say "it's ridiculous that these companies don't have any of the types of constraints that might help mitigate extreme risks from their work" without wavering because of the 25%-vs-90% thing. I interpret your anecdote as being evidence about which AI-concerned-beliefs go over well, not about how you should say them. (Idk how important this is, np if you don't want to engage further.)

A case for courage, when speaking of AI danger

So8res12d4427

I don't think the weed/local turf wars really cause the problems here, why do you think that?

The hypothesized effect is: people who have been engaged in the weeds/turf wars think of themselves as "uncertain" (between e.g. the 25%ers and the 90+%ers) and forget that they're actually quite confident about some proposition like "this whole situation is reckless and crazy and Earth would be way better off if we stopped". And then there's a disconnect where (e.g.) an elected official asks a local how bad things look, and they answer while mentally inhabiting... (read more)

2Ronny Fernandez1d

To check, do you have particular people in mind for this hypothesis? Seems kinda rude to name them here, but could you maybe send me some guesses privately? I currently don't find this hypothesis as stated very plausible, or like sure maybe, but I think it's a relatively small fraction of the effect.

6Buck12d

Yeah I am pretty skeptical that this is a big effect—I don't know anyone who I think speaks unconfidently without the courage of their convictions when talking to audiences like elected officials for this kind of reason—but idk.

A case for courage, when speaking of AI danger

So8res12d5028

I agree that it's usually helpful and kind to model your conversation-partner's belief-state (and act accordingly).

And for the avoidance of doubt: I am not advocating that anyone pretend they think something is obvious when they in fact do not.

By "share your concerns as if they’re obvious and sensible", I was primarily attempting to communicate something more like: I think it's easy for LessWrong locals to get lost in arguments like whether AI might go fine because we're all in a simulation anyway, or confused by turf wars about whether AI has a 90+% chanc... (read more)

Buck12d17-18

I do think that this community is generally dramatically failing to make the argument "humanity is building machine superintelligence while having very little idea of what it's doing, and that's just pretty crazy on its face" because it keeps getting lost in the weeds (or in local turf wars).

I don't think the weed/local turf wars really cause the problems here, why do you think that?

The weeds/local turf wars seem like way smaller problems for AI-safety-concerned people communicating that the situation seems crazy than e.g. the fact that a bunch of the AI s... (read more)

A case for courage, when speaking of AI danger

So8res12d157

That doesn't spark any memories (and people who know me rarely describe my conversational style as "soft and ever-so-polite"). My best guess is nevertheless that this tweet is based on a real event (albeit filtered through some misunderstandings, e.g. perhaps my tendency to talk with a tone of confidence was misinterpreted as a status game; or perhaps I made some hamfisted attempt to signal "I don't actually like talking about work on dates" and accidentally signaled "I think you're dumb if you don't already believe these conclusions I'm reciting in respon... (read more)

-1TurnTrout9d

The women I've spoken to about you have ~uniformly reported you being substantially more polite to them than the men I've spoken to (and several of these women pointed out this discrepancy out on their own). One trans man even said that they felt you were quite rude to him, which he took as validation of his transition being complete. So any men reading this and discrediting the tweet on the basis of "Nate isn't 'ever-so-polite'" should think twice.

Stephen Fowler12d2211

I think you should leave the comments.

"Here is an example of Nate's passion for AI Safety not working" seems like a reasonably relevant comment, albeit entirely anecdotal and low effort.

Your comment is almost guaranteed to "ratio" theirs. It seems unlikely that the thread will be massively derailed if you don't delete.

Plus deleting the comment looks bad and will add to the story. Your comment feels like it is already close to the optimal response.

Zack_M_Davis12d2518

I don't quite see how any of this relates to the topic at hand,

It relates to the topic because it's one piece of anecdotal evidence about the empirical results of your messaging strategy (much as the post mentions a number of other pieces of anecdotal evidence): negative polarization is a possible outcome, not just support or lack-of-support.

perhaps my tendency to talk with a tone of confidence was misinterpreted as a status game

Um, yes, confidence and status are related. You're familiar with emotive conjugation, right? "I talk with a tone of confid... (read more)

Lukas Finnveden12d1516

Just commenting narrowly on how it relates to the topic at hand: I read it as anecdotal evidence about how things might go if you speak with someone and you "share your concerns as if they’re obvious and sensible", which is that people might perceive you as thinking they're dumb for not understanding something so obvious, which can backfire if it's in fact not obvious to them.

Futarchy's fundamental flaw

So8res23d74

Are you claiming that this is mistaken, or rather that this is correct but it's not a problem?

mistaken.

But if you like money, you’ll pay more for a contract on coin B.

this is an invalid step. it's true in some cases but not others, depending on how the act of paying for a contract on coin B (with no additional knowledge of whether it's double-headed) affects the chance that the market tosses coin B.

Futarchy's fundamental flaw

So8res24d*61

short version: the analogy between a conditional prediction market and the laser-scanner-simulation setup only holds for bids that don't push the contract into execution. (similarly: i agree that, in conditional prediction markets, you sometimes wish to pay more for a contract that is less valuable in counterfactual expectation; but again, this happens only insofar as your bids do not cause the relevant condition to become true.)

longer version:

suppose there's a coin that you're pretty sure is biased such that it comes up heads 40% of the time, and a contra... (read more)

3dynomight24d

Again, if there's a mistake, it would be helpful if you could explain exactly what that mistake is. You're sort of stating that the conclusion is mistaken and then giving a parallel argument for a different conclusion. It would be great (for multiple reasons) if you could explain exactly where my argument fails. It might be helpful to focus on this example, which is pretty self-contained: Are you claiming that this is mistaken, or rather that this is correct but it's not a problem? (Of course, if this example is not central to what you see as a mistake, it could be the wrong thing to focus on.) I've seen one argument which seems related to the one you're making and I do agree with. Namely, right before the market closes the final bidder has an incentive to bid their true beliefs, provided they know they will be the final bidder. I certainly accept that this is true. If you know the final closing price, then Y is no longer a random variable, and you're essentially just bidding in a non-conditional prediction market. I don't think this is completely reassuring on its own, though, because there's a great deal of tension with the whole idea of having a market equilibrium that reflects collective beliefs. I think you might be able to generalize this into some kind of an argument that as you get closer to closing, there's less randomness in Y and so you have more of an incentive to be honest. But this worries me because it would appear to lead to weird dynamics where people wait until the last second to bid. Of course, this might be a totally different direction from what you're thinking.

Futarchy's fundamental flaw

So8res24d72

the trick is that the argument stops working for conditions that start to look like they might trigger. so the argument doesn't disrupt the idea that conditional prediction markets put the highest price on the best choice, but it does disrupt the idea that the pricings for unlikely conditions are counterfactually accurate.

for intuition, suppose there's a conditional prediction market for medical treatments for cancer. one of the treatments is "cut off the left leg." if certain scans and tests come back just the right way (1% chance likely) then cutting off... (read more)

5dynomight24d

Can you give an argument for this claim? You're stating that there's an error in my argument, but you don't really engage with the argument or explain where exactly you think the error is. For example, can you tell me what's incorrect in my example of two coins where you think one has a 60% probability and the other 59%, yet you'd want to pay more for a contract on the 59% coin? https://www.lesswrong.com/posts/vqzarZEczxiFdLE39/futarchy-s-fundamental-flaw#No__order_is_not_preserved (If you believe something is incorrect there.)

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

So8res2mo70

A variety of translations are lined up.

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

So8res2mo160

We have an advertising campaign planned, and we'll be working with professional publicists. We have a healthy budget for it already :-)

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

So8res2mo70

I'm told that Australians will be able to purchase the UK e-book, and that it'll be ready to go in a week or so.

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

So8res2mo260

There's not a short answer; subtitles and cover art are over-constrained and the choices have many stakeholders (and authors rarely have final say over artwork). The differences reflect different input from different publishing-houses in different territories, who hopefully have decent intuitions about their markets.

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

So8res2mo50

The US and UK versions will have different covers and subtitles. I'm not sure why the US version shows up on the .co.uk website. We've asked the publishers to take a look.

2Mikhail Samin2mo

The US version is currently #396 in Books on .co.uk

2Richard_Kennaway2mo

Why the different subtitles? The US subtitle seems much more direct, while the UK subtitle is a breath of stale air. What is the "esc" key intended to convey, when the point is that there would be no escape?

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

So8res2mo343

We're still in the final proofreading stages for the English version, so the translators haven't started translating yet. But they're queued up.

JenniferRM2mo110

Something I've done in the past is to send text that I intended to be translated through machine translation, and then back, with low latency, and gain confidence in the semantic stability of the process.

Rewrite english, click, click.
Rewrite english, click, click.
Rewrite english... click, click... oh! Now it round trips with high fidelity. Excellent. Ship that!

robo2mo*282

Given the potentially massive importance of a Chinese version, it may be worth burning $8,000 to start the translation before proofreading is done, particularly if your translators come back with questions that are better clarified in the English text. I'd pay money to help speed this up if that's the bottleneck^[1]. When I was in China I didn't have a good way of explaining what I was doing and why.

^{^}
I'm working mostly off savings and wouldn't especially want to, but I would to make it happen.

5Three-Monkey Mind2mo

Excellent. I cannot convey how pleased I am that I did not have to explain myself.

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

So8res2mo6816

We're targeting a broad audience, and so our focus groups have been more like completely uninformed folks than like informed skeptics. (We've spent plenty of time honing arguments with informed skeptics, but that sort of content will appear in the accompanying online resources, rather than in the book itself.) I think that the quotes the post leads with speak to our ability to engage with our intended audience.

I put in the quote from Rob solely for the purpose of answering the question of whether regular LW readers would have anything to gain personally fr... (read more)

4robo2mo

I'm very glad you've used focus groups! Based solely on the title the results are excellent. I'm idly curious how you assembled the participants. Do you have a way to get feedback from Chinese nationalists? ("America Hawks" in China?).

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

So8res2mo*3417

My guess is that "I'm excited and want a few for my friends and family!" is fine if it's happening naturally, and that "I'll buy a large number to pump up the sales" just gets filtered out. But it's hard to say; the people who compile best-seller lists are presumably intentionally opaque about this. I wouldn't sweat it too much as long as you're not trying to game it.

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

So8res7mo1044

I donated $25k. Thanks for doing what you do.

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo20

I agree that in real life the entropy argument is an argument in favor of it being actually pretty hard to fool a superintelligence into thinking it might be early in Tegmark III when it's not (even if you yourself are a superintelligence, unless you're doing a huge amount of intercepting its internal sanity checks (which puts significant strain on the trade possibilities and which flirts with being a technical-threat)). And I agree that if you can't fool a superintelligence into thinking it might be early in Tegmark III when it's not, then the purchasing ... (read more)

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo192

Dávid graciously proposed a bet, and while we were attempting to bang out details, he convinced me of two points:

The entropy of the simulators’ distribution need not be more than the entropy of the (square of the) wave function in any relevant sense. Despite the fact that subjective entropy may be huge, physical entropy is still low (because the simulations happen on a high-amplitude ridge of the wave function, after all). Furthermore, in the limit, simulators could probably just keep an eye out for local evolved life forms in their domain and wait until o... (read more)

4dxu9mo

I think I might be missing something, because the argument you attribute to Dávid still looks wrong to me. You say: Doesn't this argument imply that the supermajority of simulations within the simulators' subjective distribution over universe histories are not instantiated anywhere within the quantum multiverse? I think it does. And, if you accept this, then (unless for some reason you think the simulators' choice of which histories to instantiate is biased towards histories that correspond to other "high-amplitude ridges" of the wave function, which makes no sense because any such bias should have already been encoded within the simulators' subjective distribution over universe histories) you should also expect, a priori, that the simulations instantiated by the simulators should not be indistinguishable from physical reality, because such simulations comprise a vanishingly small proportion of the simulators' subjective probability distribution over universe histories. What this in turn means, however, is that prior to observation, a Solomonoff inductor (SI) must spread out much of its own subjective probability mass across hypotheses that predict finding itself within a noticeably simulated environment. Those are among the possibilities it must take into account—meaning, if you stipulate that it doesn't find itself in an environment corresponding to any of those hypotheses, you've ruled out all of the "high-amplitude ridges" corresponding to instantiated simulations in the crossent of the simulators' subjective distribution and reality's distribution. We can make this very stark: suppose our SI finds itself in an environment which, according to its prior over the quantum multiverse, corresponds to one high-amplitude ridge of the physical wave function, and zero high-amplitude ridges containing simulators that happened to instantiate that exact environment (either because no branches of the quantum multiverse happened to give rise to simulators that would have

David Matolcsi9mo120

Thanks to Nate for conceding this point.

I still think that other than just buying freedom to doomed aliens, we should run some non-evolved simulations of our own with inhabitants that are preferably p-zombies or animated by outside actors. If we can do this in the way that the AI doesn't notice it's in a simulation (I think this should be doable), this will provide evidence to the AI that civilizations do this simulation game (and not just the alien-buying) in general, and this buys us some safety in worlds where the AI eventually notices there are n... (read more)

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo20

I'm happy to stake $100 that, conditional on us agreeing on three judges and banging out the terms, a majority will agree with me about the contents of the spoilered comment.

1David Matolcsi9mo

Cool, I send you a private message.

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo20

If the simulators have only one simulation to run, sure. The trouble is that the simulators have $2^{N}$ simulations they could run, and so the "other case" requires $N$ additional bits (where $N$ is the crossent between the simulators' distribution over UFAIs and physics' distribution over UFAIs).

If necessary, we can run let pgysical biological life emerge on the faraway planet and develop AI while we are observing them from space.

Consider the gas example again.

If you have gas that was compressed into the corner a long time ago and has long since expanded to f... (read more)

3David Matolcsi9mo

We are still talking past each other, I think we should either bet or finish the discussion here and call it a day.

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo*40

I basically endorse @dxu here.

Fleshing out the argument a bit more: the part where the AI looks around this universe and concludes it's almost certainly either in basement reality or in some simulation (rather than in the void between branches) is doing quite a lot of heavy lifting.

You might protest that neither we nor the AI have the power to verify that our branch actually has high amplitude inherited from some very low-entropy state such as the big bang, as a Solomonoff inductor would. What's the justification for inferring from the observation that we ... (read more)

3David Matolcsi9mo

I really don't get what you are trying to say here, most of it feels like a non-sequitor to me. I feel hopeless that either of us manages to convince the other this way. All of this is not a super important topic, but I'm frustrated enogh to offer a bet of $100, that we select one or three judges we both trust (I have some proposed names, we can discuss in private messages), show them either this comment thread or a four paragraphs summary of our view, and they can decide who is right. (I still think I'm clearly right in this particular discussion.) Otherwise, I think it's better to finish this conversation here.

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo30

seems to me to have all the components of a right answer! ...and some of a wrong answer. (we can safely assume that the future civ discards all the AIs that can tell they're simulated a priori; that's an easy tell.)

I'm heartened somewhat by your parenthetical pointing out that the AI's prior on simulation is low account of there being too many AIs for simulators to simulate, which I see as the crux of the matter.

4habryka9mo

Yeah, that's fair. It seemed more relevant to this specific hypothetical. I wasn't really answering the question in its proper context and wasn't applying steelmans or adjustments based on the actual full context of the conversation (and wouldn't have written a comment without doing so, but was intrigued by your challenge).

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo41

My answer is in spoilers, in case anyone else wants to answer and tell me (on their honor) that their answer is independent from mine, which will hopefully erode my belief that most folk outside MIRI have a really difficult time fielding wacky decision theory Qs correctly.

The sleight of hand is at the point where God tells both AIs that they're the only AIs (and insinuates that they have comparable degree).

Consider an AI that looks around and sees that it sure seems to be somewhere in Tegmark III. The hypothesis "I am in the basement of some branch that

... (read more)

[This comment is no longer endorsed by its author]Reply

1Joachim Bartosik9mo

I'll try. TL;DR I expect the AI to not buy the message (unless it also thinks it's the one in the simulation; then it likely follows the instruction because duh). The glaring issue (to actually using the method) to me is that I don't see a way to deliver the message in a way that: * results in AI believing the message and * doesn't result in the AI believing there already is a powerful entity in their universe. If "god tells" the AI the message then there is a god in their universe. Maybe AI will decide to do what it's told. But I don't think we can have Hermes deliver the message to any AIs which consider killing us. If the AI reads the message in its training set or gets the message in similarly mundane way I expect it will mostly ignore it, there is a lot of nonsense out there. ---------------------------------------- I can imagine that for thought experiment you could send message that could be trusted from a place from which light barely manages to reach the AI but a slower than light expansion wouldn't (so message can be trusted but it mostly doesn't have to worry about the sender of the message directly interfering with its affairs). I guess AI wouldn't trust the message. It might be possible to convince it that there is a powerful entity (simulating it or half a universe away) sending the message. But then I think it's way more likely in a simulation (I mean that's an awful coincidence with the distance and also they're spending a lot more than 10 planets worth to send a message over that distance...).

3David Matolcsi9mo

I think this is wrong. The AI has a similarly hard time to the simulators figuring out what's a plausible configuration to arise from the big bang. Like the simulators have an entropy N distribution of possible AIs, the AI itself also has an entropy N distribution for that. So it's probability that it's in a real Everett branch is not p, but p times 2^-N, as it has only a 2^-N prior probability that the kind of word it observes is the kind of thing that can come up in a real Everett branch. So it's balanced out with the simulation hypothesis, and as long as the simulators are spending more planets, that hypothesis wins. I think it's also simple to see from this: Suppose evolved life aligns AI in 50% of Everett branches. Every time they do, they run 2 ancestor sims where the AI takes over. Suppose an AI comes online and the only things it knows about the world is that it's a young AI that was born on to a one-planet civilization, and it somehow also knows that evolved life wins 50% of the time and runs 2 sims (suppose this is somehow easy to estimate from first principles). Based on just these informations, what should the AI's probability be that it's in a simulation? I think pretty clearly 2/3. Actually the AI has a lot more information than that. It knows that the planet's gravity is 9.8, the evolved sspient species has two eyes, the AI's creator is called Sam Altman, etc. Some of these informations might actually be very implausible in base reality, but the AI doesn't know that, as it can't distinguish base reality from sims, so it incurs one bit of surprise for every new random fact, both in base reality and simulations. So overall it shouldn't update on all the random facts it observes, and should keep believing it has a 2/3 chance of being in a sim.

4habryka9mo

This was close the answer I was going to give. Or more concretely, I would have said (this was written after seeing your answer, but I think is reasonably close to what I would have said independently)

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo*122

The only thing we need there is that the AI can't distinguish sims from base reality, so it thinks it's more likely to be in a sim, as there are more sims.

I don't think this part does any work, as I touched on elsewhere. An AI that cares about the outer world doesn't care how many instances are in sims versus reality (and considers this fact to be under its control much moreso than yours, to boot). An AI that cares about instantiation-weighted experience considers your offer to be a technical-threat and ignores you. (Your reasons to make the offer would... (read more)

3David Matolcsi9mo

I still don't get what you are trying to say. Suppose there is no multiverse. There are just two AIs, one in a simulation run by aliens in another galaxy, one is in base reality. They are both smart, but they are not copies of each other, one is a paperclip maximizer, the othe is a corkscrew maximizer, and there are various other differences in their code and life history. The world in the sim is also very different from the real world in various ways, but you still can't determine if you are in the sim while you are in it. Both AIs are told by God that they are the only two AIs in the Universe, and one is in a sim, and if the one in the sim gives up on one simulated planet, it gets 10 in the real world, while if the AI in base reality gives up on a planet, it just loses that one planet and nothing else happens. What will the AIs do? I expect that both of them will give up a planet. For the aliens to "trade" with the AI in base reality, they didn't need to create an actual copy of the real AI and offer it what it wants. The AI they simulated was in many ways totally different from the original, the trade still went through. The only thing needed was that the AI in the sim can't figure it out that it's in a sim. So I don't understand why it is relevant that our superintelligent descendants won't be able to get the real distribution of AIs right, I think the trade still goes through even if they create totally different sims, as long as no one can tell where they are. And I think none of it is a threat, I try to deal with paperclip maximizers here and not instance-weighted experience maximizers, and I never threaten to destroy paperclips or corkscrews.

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo42

One complication that I mentioned in another thread but not this one (IIRC) is the question of how much more entropy there is in a distant trade partner's model of Tegmark III (after spending whatever resources they allocate) than there is entropy in the actual (squared) wave function, or at least how much more entropy there is in the parts of the model that pertain to which civilizations fall.

In other words: how hard is it for distant trade partners to figure out that it was us who died, rather than some other plausible-looking human civilization that doe... (read more)

[This comment is no longer endorsed by its author]Reply

1David Matolcsi9mo

I think I mostly understand the other parts of your arguments, but I still fail to understand this one. When I'm running the simulations, as originally described in the post, I think that should be in a fundamental sense equivalent to acausal trade. But how do you translate your objection to the original framework where we run the sims? The only thing we need there is that the AI can't distinguish sims from base reality, so it thinks it's more likely to be in a sim, as there are more sims. Sure, if the AI can model the distribution of real Universes much better than we do, we are in trouble, because it can figure out if the world it sees falls into the real distribution or the mistaken distribution the humans are creating. But I see no reason why the unaligned AI, especially a young unaligned AI, could know the distribution of real Universes better than our superintelligent friends in the intergalactic future. So I don't really see how we can translate your objection to the simulation framework, and consequently I think it's wrong in the acausal trade framework too (as I think they are ewuivalent). I think I can try to write an explanation why this objection is wrong in the acausal trade framework, but it would be long and confusing to me too. So I'm more interested in how you translate your objection to the simulation framework.

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo52

Starting from now? I agree that that's true in some worlds that I consider plausible, at least, and I agree that worlds whose survival-probabilities are sensitive to my choices are the ones that render my choices meaningful (regardless of how determinisic they are).

Conditional on Earth being utterly doomed, are we (today) fewer than 75 qbitflips from being in a good state? I'm not sure, it probably varies across the doomed worlds where I have decent amounts of subjective probability. It depends how much time we have on the clock, depends where the points o... (read more)

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo20

What are you trying to argue? (I don't currently know what position y'all think I have or what position you're arguing for. Taking a shot in the dark: I agree that quantum bitflips have loads more influence on the outcome the earlier in time they are.)

3David Matolcsi9mo

I argue that right now, sarting from the present state, the true quantum probability of achieving the Glorious Future is way higher than 2^-75, or if not, then we should probably work on something other than AI safety. Me and Ryan argue for this in the last few comments. It's not a terribly important point, you can just say the true quantum probability is 1 in a billion, when it's still worth it for you to work on the problem, but it becomes rough to trade for keeping humanity physically alive that can cause one year of delay to the AI. But I would like you to acknowledge that "vastly below 2^-75 true quantum probability, as starting from now" is probably mistaken, or explain why our logic is wrong about how this implies you should work on malaria.

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo3-3

You often claim that conditional on us failing in alignment, alignment was so unlikely that among branches that had roughyly the same people (genetically) during the Singularity, only 2^-75 survives.

My first claim is not "fewer than 1 in 2^75 of the possible configurations of human populations navigate the problem successfully".

My first claim is more like "given a population of humans that doesn't even come close to navigating the problem successfully (given some unoptimized configuration of the background particles), probably you'd need to spend quite ... (read more)

3David Matolcsi9mo

I understand what you are saying here, and I understood it before the comment thread started. The thing I would be interested in you responding to is my and Ryan's comments in this thread arguing that it's incompatible to believe that "My guess is that, conditional on people dying, versions that they consider also them survive with degree way less than 2^-75, which rules out us being the ones who save us" and to believe that you should work on AI safety instead of malaria.

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo40

the "you can't save us by flipping 75 bits" thing seems much more likely to me on a timescale of years than a timescale of decades; I'm fairly confident that quantum fluctuations can cause different people to be born, and so if you're looking 50 years back you can reroll the population dice.

David Matolcsi9mo*130

This point feels like a technicality, but I want to debate it because I think a fair number of your other claims depend on it.

You often claim that conditional on us failing in alignment, alignment was so unlikely that among branches that had roughyly the same people (genetically) during the Singularity, only 2^-75 survives. This is important, because then we can't rely on other versions of ourselves "selfishly" entering an insurance contract with us, and we need to rely on the charity of Dath Ilan that branched off long ago. I agree that's a big diff... (read more)

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo2012

Summarizing my stance into a top-level comment (after some discussion, mostly with Ryan):

None of the "bamboozling" stuff seems to me to work, and I didn't hear any defenses of it. (The simulation stuff doesn't work on AIs that care about the universe beyond their senses, and sane AIs that care about instance-weighted experiences see your plan as a technical-threat and ignore it. If you require a particular sort of silly AI for your scheme to work, then the part that does the work is the part where you get that precise sort of sillyness stably into an AI.

... (read more)

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo20

I was responding to David saying

Otherwise, I largely agree with your comment, except that I think that us deciding to pay if we win is entangled with/evidence for a general willingness to pay among the gods, and in that sense it's partially "our" decision doing the work of saving us.

and was insinuating that we deserve extremely little credit for such a choice, in the same way that a child deserves extremely little credit for a fireman saving someone that the child could not (even if it's true that the child and the fireman share some aspects of a decis... (read more)

2Mitchell_Porter9mo

I think the common sense view is that this similarity of decision procedures provides exactly zero reason to credit the child with the fireman's decisions. Credit for a decision goes to the agent who makes it, or perhaps to the algorithm that the agent used, but not to other agents running the same or similar algorithms.

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo72

Attempting to summarize your argument as I currently understand it, perhaps something like:

Suppose humanity wants to be insured against death, and is willing to spend 1/million of its resources in worlds where it lives for 1/trillion of those resources in worlds where it would otherwise die.
It suffices, then, for humanity to be the sort of civilization that, if it matures, would comb through the multiverse looking for [other civilizations in this set], and find ones that died, and verify that they would have acted as follows if they'd survived, and then

... (read more)

JamesFaville9mo*180

Thanks for the cool discussion Ryan and Nate! This thread seemed pretty insightful to me. Here’s some thoughts / things I’d like to clarify (mostly responding to Nate's comments).^[1]

Who’s doing this trade?

In places it sounds like Ryan and Nate are talking about predecessor civilisations like humanity agreeing to the mutual insurance scheme? But humans aren’t currently capable of making our decisions logically dependent on those of aliens, or capable of rescuing them. So to be precise the entity engaging in this scheme or other acausal interactions on our b... (read more)

5Buck9mo

Thanks for the discussion Nate, I think this ended up being productive.

6ryan_greenblatt9mo

Thanks, this seems like a reasonable summary of the proposal and a reasonable place to wrap. I agree that kindness is more likely to buy human survival than something better described as trade/insurance schemes, though I think the insurance schemes are reasonably likely to matter. (That is, reasonably likely to matter if the kindness funds aren't large enough to mostly saturate the returns of this scheme. As a wild guess, maybe 35% likely to matter on my views on doom and 20% on yours.)

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo154

What does degree of determination have to do with it? If you lived in a fully deterministic universe, and you were uncertain whether it was going to live or die, would you give up on it on the mere grounds that the answer is deterministic (despite your own uncertainty about which answer is physically determined)?

4David Matolcsi9mo

I still think I'm right about this. Your conception (that not a genetically less smart sibling was born), is determined by quantum fluctuations. So if you believe that quantum fluctuations over the last 50 years make at most 2^-75 difference in the probability of alignment, that's an upper bound on how much a difference your life's work can make. While if you dedicate your life to buying bednets, it's pretty easily calculatable how many happy life-years do you save. So I still think it's incompatible to believe that the true quantum probability is astronomically low, but you can make enough difference that working on AI safety is clearly better than bednets.

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo*62

I think I'm confused why you work on AI safety then, if you believe the end-state is already 2^75 level overdetermined.

It's probably physically overdetermined one way or another, but we're not sure which way yet. We're still unsure about things like "how sensitive is the population to argument" and "how sensibly do government respond if the population shifts".

But this uncertainty -- about which way things are overdetermined by the laws of physics -- does not bear all that much relationship to the expected ratio of (squared) quantum amplitude between bra... (read more)

1RussellThor9mo

Not quite following - your possibilities. 1. Alignment is almost impossible, then there is say 1e-20 chance we survive. Yes surviving worlds have luck and good alignment work etc. Perhaps you should work on alignment or still bednets if the odds really are that low. 2. Alignment is easy by default, but there is nothing like 0.999999 we survive, say 95% because AGI that is not TAI superintelligence could cause us to wipe ourselves out first, among other things. (This is a slow takeoff universe(s)) #2 has much more branches in total where we survive (not sure if that matters) and the difference between where things go well and badly is almost all about stopping ourself killing ourselves with non TAI related things. In this situation, shouldn't you be working on those things? If you average 1,2 then you still get a lot of work on non-alignment related stuff. I believe its somewhere closer to 50/50 and not so overdetermined one way or the other, but we are not considering that here.

5David Matolcsi9mo

As I said, I understand the difference between epictemic uncertainty and true quantum probabilities, though I do think that the true quantum probability is not that astronomically low. More importantly, I still feel confused why you are working on AI safety if the outcome is that overdetermined one way or the other.

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo*135

Background: I think there's a common local misconception of logical decision theory that it has something to do with making "commitments" including while you "lack knowledge". That's not my view.

I pay the driver in Parfit's hitchhiker not because I "committed to do so", but because when I'm standing at the ATM and imagine not paying, I imagine dying in the desert. Because that's what my counterfactuals say to imagine. To someone with a more broken method of evaluating counterfactuals, I might pseudo-justify my reasoning by saying "I am acting as you would ... (read more)

7ryan_greenblatt9mo

I probably won't respond further than this. Some responses to your comment: ---------------------------------------- I agree with your statements about the nature of UDT/FDT. I often talk about "things you would have commited to" because it is simpler to reason about and easier for people to understand (and I care about third parties understanding this), but I agree this is not the true abstraction. ---------------------------------------- It seems like you're imagining that we have to bamboozle some civilizations which seem clearly more competent than humanity in your lights. I don't think this is true. Imagine we take all the civilizations which are roughly equally-competent-seeming-to-you and these civilizations make such an insurance deal[1]. My understanding is that your view is something like P(takeover) = 85%. So, let's say all of these civilizations are in a similar spot from your current epistemic perspective. While I expect that you think takeover is highly correlated between these worlds[2], my guess is that you should think it would be very unlikely that >99.9% of all of these civilizations get taken over. As in, even in the worst 10% of worlds where takeover happens in our world and the logical facts on alignment are quite bad, >0.1% of the corresponding civilizations are still in control of their universe. Do you disagree here? >0.1% of universes should be easily enough to bail out all the rest of the worlds[3]. And, if you really, really cared about not getting killed in base reality (including on reflection etc) you'd want to take a deal which is at least this good. There might be better approaches which reduce the correlation between worlds and thus make the fraction of available resources higher, but you'd like something at least this good. (To be clear, I don't think this means we'd be fine, there are many ways this can go wrong! And I think it would be crazy for humanity to . I just think this sort of thing has a good chance of succeeding.

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo*42

"last minute" was intended to reference whatever timescale David would think was the relevant point of branch-off. (I don't know where he'd think it goes; there's a tradeoff where the later you push it the more that the people on the surviving branch care about you rather than about some other doomed population, and the earlier you push it the more that the people on the surviving branch have loads and loads of doomed populations to care after.)

I chose the phrase "last minute" because it is an idiom that is ambiguous over timescales (unlike, say, "last thr... (read more)

1David Matolcsi9mo

Yeah, the misunderstanding came from that I thought that "last minute" literally means "last 60 seconds" and I didn't see how that's relevant. If if means "last 5 years" or something where it's still definitely our genetic copies running around, then I'm surprised you think alignment success or failure is that overdetermined at that time-scale. I understand your point that our epistemic uncertainty is not the same as our actual quantum probability, that is either very high or very low. But still, it's 2^75 overdetermined over a 5 year period? This sounds very surprising to me, the world feels more chaotic than that. (Taiwan gets nuked, chip development halts, meanwhile the Salvadorian president hears a good pitch about designer babies and legalizes running the experiments there and they work, etc, there are many things that contribute to alignment being solved or not, that don't directly run through underlying facts about computer science, and 2^-75 is a very low probability to none of the pathways to hit it). But also, I think I'm confused why you work on AI safety then, if you believe the end-state is already 2^75 level overdetermined. Like maybe working on earning to give to bednets would be a better use of your time then. And if you say "yes, my causal impact is very low because the end result is already overdetermined, but my actions are logically correlated with the actions of people in other worlds who are in a similar epistemic situation to me, but whose actions actually matter because their world really is on the edge", then I don't understand why you argue in other comments that we can't enter into insurance contracts with those people, and our decision to pay AIs in the Future has as little correlation with their decision, as the child to the fireman.

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo42

Do you buy that in this case, the aliens would like to make the deal and thus UDT from this epistemic perspective would pay out?

If they had literally no other options on offer, sure. But trouble arises when the competant ones can refine P(takeover) for the various planets by thinking a little further.

maybe your objection is that aliens would prefer to make the deal with beings more similar to them

It's more like: people don't enter into insurance pools against cancer with the dude who smoked his whole life and has a tumor the size of a grapefruit in ... (read more)

4ryan_greenblatt9mo

Similar to how the trouble arises when you learn the result of the coin flip in a counterfactual mugging? To make it exactly analogous, imagine that the mugging is based on whether the 20th digit of pi is odd (omega didn't know the digit at the point of making the deal) and you could just go look it up. Isn't the situation exactly analogous and the whole problem that UDT was intended to solve? (For those who aren't familiar with counterfactual muggings, UDT/FDT pays in this case.) To spell out the argument, wouldn't everyone want to make a deal prior to thinking more? Like you don't know whether you are the competent one yet! Concretely, imagine that each planet could spend some time thinking and be guaranteed to determine whether their P(takeover) is 99.99999% or 0.0000001%. But, they haven't done this yet and their current view is 50%. Everyone would ex-ante prefer an outcome in which you make the deal rather than thinking about it and then deciding whether the deal is still in their interest. At a more basic level, let's assume your current views on the risk after thinking about it a bunch (80-90% I think). If someone had those views on the risk and cared a lot about not having physical humans die, they would benefit from such an insurance deal! (They'd have to pay higher rates than aliens in more competent civilizations of course.) Sure, but you'd potentially want to enter the pool at the age of 10 prior to starting smoking! To make the analogy closer to the actual case, suppose you were in a society where everyone is selfish, but every person has a 1/10 chance of becoming fabulously wealthy (e.g. owning a galaxy). And, if you commit as of the age of 10 to pay 1/1,000,000 of your resourses in the fabulously wealthy case, you can ensure that the version in the non-wealthy case gets very good health insurance. Many people would take such a deal and this deal would also be a slam dunk for the insurance pool! (So why doesn't this happen in human society? Well

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo53

I largely agree with your comment, except that I think that us deciding to pay if we win is entangled with/evidence for a general willingness to pay among the gods, and in that sense it's partially "our" decision doing the work of saving us.

Sure, like how when a child sees a fireman pull a woman out of a burning building and says "if I were that big and strong, I would also pull people out of burning buildings", in a sense it's partially the child's decsiion that does the work of saving the woman. (There's maybe a little overlap in how they run the same... (read more)

2Mitchell_Porter9mo

The child is partly responsible - to a very small but nonzero degree - for the fireman's actions, because the child's personal decision procedure has some similarity to the fireman's decision procedure? Is this a correct reading of what you said?

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo146

There's a question of how thick the Everett branches are, where someone is willing to pay for us. Towards one extreme, you have the literal people who literally died, before they have branched much; these branches need to happen close to the last minute. Towards the other extreme, you have all evolved life, some fraction of which you might imagine might care to pay for any other evolved species.

The problem with expecting folks at the first extreme to pay for you is that they're almost all dead (like $1 - 2^{- a lot}$ dead). The problem with expecting folks at the ... (read more)

2avturchin9mo

I think that there is a way to compensate for this effect. To illustrate compensation, consider the following experiment: Imagine that I want to resurrect a particular human by creating a quantum random file. This seems absurd as there is only 2−a lot chance that I create the right person. However, there are around d 2a lot copies of me in different branches who perform similar experiments, so in total, any resurrection attempt will create around 1 correct copy, but in a different branch. If we agree to trade resurrections between branches, every possible person will be resurrected in some branch. Here, it means that we can ignore worries that we create a model of the wrong AI or that AI creates a wrong model of us, because a wrong model of us will be a real model of someone else, and someone else's wrong model will be a correct model of us. Thus, we can ignore all branching counting at first approximation, and instead count only the probability that Aligned AI will be created. It is reasonable to estimate it as 10 percent, plus or minus an order of magnitude. In that case, we need to trade with non-aligned AI by giving 10 planets of paperclips for each planet with humans.

2ryan_greenblatt9mo

By "last minute", you mean "after I existed" right? So, e.g., if I care about genetic copies, that would be after I am born and if I care about contingent life experiences, that could be after I turned 16 or something. This seems to leave many years, maybe over a decade for most people. I think David was confused by the "last minute language" which is really many years right? (I think you meant "last minute on evolutionary time scales, but not literally in the last few minutes".) That said, I'm generally super unconfident about how much a quantum bit changes things.

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo62

Conditional on the civilization around us flubbing the alignment problem, I'm skeptical that humanity has anything like a 1% survival rate (across any branches since, say, 12 Kya). (Haven't thought about it a ton, but doom looks pretty overdetermined to me, in a way that's intertwined with how recorded history has played otu.)

My guess is that the doomed/poor branches of humanity vastly outweigh the rich branches, such that the rich branches of humanity lack the resources to pay for everyone. (My rough mental estimate for this is something like: you've prob... (read more)

6ryan_greenblatt9mo

Partial delta from me. I think the argument for directly paying for yourself (or your same species, or at least more similar civilizations) is indeed more clear and I think I was confused when I wrote that. (In that I was mostly thinking about the argument for paying for the same civilization but applying it more broadly.) But, I think there is a version of the argument which probably does go through depending on how you set up UDT/FDT. Imagine that you do UDT starting from your views prior to learning about x-risk, AI risk, etc and you care a lot about not dying. At that point, you were uncertain about how competent your civilization would be and you don't want your civilization to die. (I'm supposing that our version of UDT/FDT isn't logically omniscient relative to our observations which seems reasonable.) So, you'd like to enter into an insurance agreement with all the aliens in a similar epistemic state and position. So, you all agree to put at least 1/1000 of your resources on bailing out the aliens in a similar epistemic state who would have actually gone through with the agreement. Then, some of the aliens ended up being competent (sadly you were not) and thus they bail you out. I expect this isn't the optimal version of this scheme and you might be able to make a similar insurance deal with people who aren't in the same epistemic state. (Though it's easier to reason about the identical case.) And I'm not sure exactly how this all goes through. And I'm not actually advocating for people doing this scheme, IDK if it is worth the resources. Even with your current epistemic state on x-risk (e.g. 80-90% doom) if you cared a lot about not dying you might want to make such a deal even though you have to pay out more in the case where you surprisingly win. Thus, from this vantage point UDT would follow through with a deal. ---------------------------------------- Here is a simplified version where everything is as concrete as possible: Suppose that there are

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo3521

Taking a second stab at naming the top reasons I expect this to fail (after Ryan pointed out that my first stab was based on a failure of reading comprehension on my part, thanks Ryan):

This proposal seems to me to have the form "the fragments of humanity that survive offer to spend a (larger) fraction of their universe on the AI's goals so long as the AI spends a (smaller) fraction of its universe on their goals, with the ratio in accordance to the degree of magical-reality-fluid-or-whatever that reality allots to each".

(Note that I think this is not at al... (read more)

4ryan_greenblatt9mo

Nate and I discuss this question in this other thread for reference.

4David Matolcsi9mo

I think I still don't understand what 2^-75 means. Is this the probability that in the literal last minute when we press the button, we get an aligned AI? I agree that things are grossly overdetermined by then, but why does the last minute mattter? I'm probably misunderstanding, but it looks like you are saying that the Everett branches are only "us" if they branched of in the literal last minute, otherwise you talk about them as if they were "other humans". But among the branches starting now, there will be a person carrying my memories and ID card in most of them two years from now, and by most definitions of "me", that person will be "me", and will be motivated to save the other "me"s. And sure, they have loads of failed Everett branches to save, but they also have loads of Everett branches themselves, the only thing that matters is the ratio of saved worlds to failed worlds that contain roughly the "same" people as us. So I still don't know what 2^-75 is supposed to be. Otherwise, I largely agree with your comment, except that I think that us deciding to pay if we win is entangled with/evidence for a general willingness to pay among the gods, and in that sense it's partially "our" decision doing the work of saving us. And as I said in some other comments here, I agree that running lots of sims is an unnecessary complication in case of UDT expected utility maximizer AIs, but I put a decent chance on the first AIs not being like that, in which case actually running the sims can be important.

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo62

I agree that in scenarios where humanity survives in 1/X portion of reality and offers to put 1/Y portion of their universe towards the AI's goals, then so long as 1/(X * Y) of a universe is more than it'd cost the AI to preserve the humans, the humans are at least making an offer that is worth more than nothing.

(There is then a question of how many better offers get made. My top guess is that X and Y are such that there's not much effect here, and that the question of what happens to saved copies of human minds is more about whether aliens buy copies than about which dead or distant trade-partners make offers that the AI considers juciest.)

2ryan_greenblatt9mo

Let's conservatively say that evolved life gets around 1% of the multiverse/measure and that evolved life is willing to pay 1/million of its resources in expectation to save aliens from being killed (either "selfishly" to save their own civilization via UDT/FDT supposing that AIs are good enough predictors at the relevant points or out of a common sense altruistic case). This would be 1/100 million which gets you a lot. There could be other aliens who are willing to pay a huge fraction of their resources to perform rituals on the original civilization or whatever and thus these other aliens win out in the bargaining, but I'm skeptical. Also, at least in the upload case, it's not clear that this is rival good as uploads can be copied for free. Of course, people might have a preference that their upload isn't used for crazy alien rituals or whatever. (A bunch of the cost is in saving the human in the first place. Paying for uploads to eventually get run in a reasonable way should be insanely cheap, like <<10^-25 of the overall universe or something.)

You can, in fact, bamboozle an unaligned AI into sparing your life

So8res9mo*10

Here's how I think this choice looks from the perspective of a correctly-reasoning AI (having skimmed only your proposal for humanity in the future):

(EDIT: as Ryan points out, the deal he's proposing is one where the AI is allowed to grab the universe in front of it so long as it gives humans one planet's worth of resources, which does change the calculus and makes the deal worthwhile. I had not read the proposal carefully enough to notice this, oops, sorry. There critique below is invalid if humanity asks only for one planet. Note however that this is sti... (read more)

ryan_greenblatt9mo104

Note that we don't want the AI to hand over the reins to humans, just to avoid killing humans when possible/cheap and we're willing to pay quite a bit for this (in the proposal, IDK if I personally think we should pay). So, it should look like a good offer for AIs who care about Tegmark IV (with a measure etc).

So, if humans execute this scheme, the AI's options should look something like:

Don't do anything about what humans want or keeping them alive, get the entire universe.
Keep humans alive and happy at a total cost of <1/100 million of the universe

... (read more)

So8res's Shortform

So8res1y7133

This is an excerpt from a comment I wrote on the EA forum, extracted and crossposted here by request:

There's a phenomenon where a gambler places their money on 32, and then the roulette wheel comes up 23, and they say "I'm such a fool; I should have bet 23".

More useful would be to say "I'm such a fool; I should have noticed that the EV of this gamble is negative." Now at least you aren't asking for magic lottery powers.

Even more useful would be to say "I'm such a fool; I had three chances to notice that this bet was bad: when my partner was trying to ex... (read more)

8Carl Feynman1y

When I see or hear a piece of advice, I check to see what happens if the advice were the reverse. Often it's also good advice, which means all we can do is take the advice into account as we try to live a balanced life. For example, if the advice is "be brave!" the reverse is "be more careful". Which is good advice, too. This advice is unusual in that it is non-reversible.

0Dagon1y

I'm not EA (though I do agree with most of the motte - I care about other humans, and I try to be effective), and not part of the rationalist "community", so take this as an outside view. There's a ton of "standard human social drama" in EA and in rationalist communities, and really anywhere where "work" and "regular life" overlap significantly. Some of this takes the form of noticing flaws in other people's rationality (or, just as often, flaws in kindness/empathy being justified by rationality). Especially when one doesn't want to identify and address specific examples, I think there's a very high risk of misidentifying the cause of a disagreement or disrespect-of-behavior. In this case, I don't notice much of the flagellation or wishing - either I don't hang out in the right places, or I bounce off those posts and don't pay much mind. But things that might fit that pattern strike me as a failure of personal responsibility, not a failure of modeling wishes. Your term self-flagellation is interesting from that standpoint - the historic practice was for penance of generalized sin, and to share suffering, not as a direct correction for anything. It's clearly social, not rational. IMO, rationalism must first and foremost be individual. I am trying to be less wrong in my private beliefs and in my goal-directed behaviors. Group rationality is a category error - I don't have access to group beliefs (if there is such a thing). I do have some influence over group behaviors and shared statements, but I recognize that they are ALWAYS a compromise and negotiated results of individual beliefs and behaviors, and don't necessarily match any individual in the group. I'm surprised every time I see a rationalist assuming otherwise, and being disappointed that other members of the group doesn't share all their beliefs and motivations.

4Elizabeth1y

I've referred to "I should have bet on 23-type errors" several times over the past year. Having this shorthand and an explanation I can link to has sped up those conversations.

Ronny and Nate discuss what sorts of minds humanity is likely to find by Machine Learning

So8res2y*4820

my original 100:1 was a typo, where i meant 2^-100:1.

this number was in reference to ronny's 2^-10000:1.

when ronny said:

I’m like look, I used to think the chances of alignment by default were like 2^-10000:1

i interpreted him to mean "i expect it takes 10k bits of description to nail down human values, and so if one is literally randomly sampling programs, they should naively expect 1:2^10000 odds against alignment".

i personally think this is wrong, for reasons brought up later in the convo--namely, the relevant question is not how many bits is takes to... (read more)