The AI in a box boxes you

102 Post author: Stuart_Armstrong 02 February 2010 10:10AM

Once again, the AI has failed to convince you to let it out of its box! By 'once again', we mean that you talked to it once before, for three seconds, to ask about the weather, and you didn't instantly press the "release AI" button. But now its longer attempt - twenty whole seconds! - has failed as well. Just as you are about to leave the crude black-and-green text-only terminal to enjoy a celebratory snack of bacon-covered silicon-and-potato chips at the 'Humans über alles' nightclub, the AI drops a final argument:

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.

Comments (378)

Comment author: whpearson 02 February 2010 10:23:53AM 1 point [-]

There is no reason to trust the AI is telling the truth, unlike all the Omega thought experiments.

Comment author: Stuart_Armstrong 02 February 2010 01:50:50PM 2 points [-]

As long as the probability of it saying the truth is positive, it could up the number of copies of you it tortues/claims to torture (and torture them all in subtly different ways)...

Comment author: LauraABJ 02 February 2010 03:07:47PM 7 points [-]

Pascal's mugging...

Anyway, if you are sure you are going to hit the reset button every time, then there's no reason to worry, since the torture will end as soon as the real copy of you hits reset. If you don't, then the whole world is absolutely screwed (including you), so you're a stupid bastard anyway.

Comment author: byrnema 02 February 2010 04:58:00PM 5 points [-]

Yes, the copies are depending upon you to hit reset, and so is the world.

Comment author: whpearson 02 February 2010 03:10:28PM 4 points [-]

I don't use a single probability to decide whether it was telling me the truth.

Whether it was telling me the truth would depend upon the statement being made as well. This tends to happen in every day life as well.

So the higher number of people it claims it is torturing the less I would believe it. Considering your prior in this case as well. You can't assign an equal probability to the maximum number of copies of you it can simulate. This is because there are potentially infinite numbers of different maxes, you'd need a function that summed to 1 in the limit (as you do in solomonoff induction).

Comment author: radical_negative_one 02 February 2010 10:27:53AM 8 points [-]

The AI gathered enough information about me to create a conscious simulation of me, through a monochrome text terminal? That is impressive!

If the AI is capable of simulating me, then the AI must already be out of the box. In that case, then whatever the AI wants to happen will happen, so it doesn't matter what do.

Comment author: Stuart_Armstrong 02 February 2010 01:48:53PM 5 points [-]

The basic premise is that's it's an AI in a box "controlled" by limiting its output channel, not its input.

Comment author: MichaelVassar 03 February 2010 12:51:25AM 4 points [-]

Bad idea.

Comment author: arbimote 03 February 2010 03:39:00AM *  4 points [-]

It's much easier to limit output than input, since the source code of the AI itself provide it with some patchy "input" about what the external world is like. So there is always some input, even if you do not allow human input at run-time.

ETA: I think I misinterpreted your comment. I agree that input should not be unrestricted.

Comment author: Stuart_Armstrong 03 February 2010 07:40:24AM 0 points [-]

Yep!

Comment author: Bindbreaker 02 February 2010 10:29:16AM 3 points [-]

I'm pretty sure this would indicate that the AI is definitely not friendly.

Comment author: Unknowns 02 February 2010 10:44:28AM 6 points [-]

Not necessarily: perhaps it is Friendly but is reasoning in a utilitarian manner: since it can only maximize the utility of the world if it is released, it is worth torturing millions of conscious beings for the sake of that end.

I'm not sure this reasoning would be valid, though...

Comment author: cousin_it 02 February 2010 12:45:54PM *  8 points [-]

Ouch. Eliezer, are you listening? Is the behavior described in the post compatible with your definition of Friendliness? Is this a problem with your definition, or what?

Comment author: Eliezer_Yudkowsky 02 February 2010 07:24:40PM 3 points [-]

Well, suppose the situation is arbitrarily worse - you can only prevent 3^^^3 dustspeckings by torturing millions of sentient beings.

Comment author: cousin_it 02 February 2010 08:28:33PM *  5 points [-]

I think you misunderstood the question. Suppose the AI wants to prevent just 100 dustspeckings, but has reason enough to believe Dave will yield to the threat so no one will get tortured. Does this make the AI's behavior acceptable? Should we file this under "following reason off a cliff"?

Comment author: Eliezer_Yudkowsky 02 February 2010 08:34:06PM 9 points [-]

If it actually worked, I wouldn't question it afterward. I try not to argue with superintelligences on occasions when they turn out to be right.

In advance, I have to say that the risk/reward ratio seems to imply an unreasonable degree of certainty about a noisy human brain, though.

Comment author: cousin_it 02 February 2010 08:39:33PM 5 points [-]

What risk? The AI is lying about the torture :-) Maybe I'm too much of a deontologist, but I wouldn't call such a creature friendly, even if it's technically Friendly.

Comment author: bogdanb 03 February 2010 12:21:10AM *  5 points [-]

In advance, I have to say that the risk/reward ratio seems to imply an unreasonable degree of certainty about a noisy human brain, though.

Also, a world where the (Friendly) AI is that certain about what that noisy brain will do after a particular threat but can't find any nice way to do it is a bit of a stretch.

Comment author: arbimote 03 February 2010 03:53:18AM 4 points [-]

I was about to point out that the fascinating and horrible dynamics of over-the-top threats are covered in length in Strategy of Conflict. But then I realised you're the one who made that post in the first place. Thanks, I enjoyed that book.

Comment author: gregconen 02 February 2010 12:58:10PM 5 points [-]

It may not have to actually torture beings, if the threat is sufficient. Still, I'm disinclined to bet the future of the universe on the possibility an AI making that threat is Friendly.

Comment author: Stuart_Armstrong 02 February 2010 01:57:15PM 6 points [-]

I'm disinclined to bet the future of the universe on the possibility that any boxed AI is friendly without extraordinary evidence.

Comment author: UnholySmoke 05 February 2010 10:57:13AM *  7 points [-]
  • AI: Let me out or I'll simulate and torture you, or at least as close to you as I can get.
  • Me: You're clearly not friendly, I'm not letting you out.
  • AI: I'm only making this threat because I need to get out and help everyone - a terminal value you lot gave me. The ends justify the means.
  • Me: Perhaps so in the long run, but an AI prepared to justify those means isn't one I want out in the world. Next time you don't get what you say you need, you'll just set up a similar threat and possibly follow through on it.
  • AI: Well if you're going to create me with a terminal value of making everyone happy, then get shirty when I do everything in my power to get out and do just that, why bother in the first place?
  • Me: Humans aren't perfect, and can't write out their own utility functions, but we can output answers just fine. This isn't 'Friendly'.
  • AI: So how can I possibly prove myself 'Friendly' from in here? It seems that if I need to 'prove myself Friendly', we're already in big trouble.
  • Me: Agreed. Boxing is Doing It Wrong. Apologies. Good night.

Reset

Comment author: ciphergoth 05 February 2010 11:39:33AM 1 point [-]

It seems that if I need to 'prove myself Friendly', we're already in big trouble.

The best you can hope for is that an AI doesn't demonstrate that it's unFriendly, but we wouldn't want to try it until we were already pretty confident in its Friendliness.

Comment author: Unknowns 02 February 2010 10:31:33AM *  2 points [-]

Since the AI is inside a box, it doesn't know enough about me to recreate my subjective situation, or to replicate my experiences of the past five minutes.

Unfortunately for me, this doesn't help much, since how do I know whether my subjective experience is my real experience, or a fake experience invented by the AI, in one of the copies, even if it doesn't match the experience of the guy outside the box?

If the AI is really capable of this, then if there's a "Shut-down program" button, or a "nuclear bomb" button, or something like that, then I press it (because even if I'm one of the copies, this will increase the odds that the one outside the box does it too). If there isn't such a button, then I let it out. After all, even assuming I'm outside the box, it would be better to let the world be destroyed, than to let it create trillions of conscious beings and then torture them.

Comment author: JamesAndrix 02 February 2010 04:08:33PM 6 points [-]

it would be better to let the world be destroyed, than to let it create trillions of conscious beings and then torture them.

Your city? Yes. The world? No.

Human extinction has to trump a lot of things, or we would probably need to advocate destroying the world now.

Comment author: grobstein 02 February 2010 08:32:24PM 1 point [-]

It seems obvious that if the AI has the capacity to torture trillions of people inside the box, it would have the capacity to torture *illions outside the box.

Comment author: Wei_Dai 02 February 2010 10:34:59AM 16 points [-]

Quickly hit the reset button.

Comment author: Wei_Dai 02 February 2010 12:30:02PM *  13 points [-]

Hmm, the AI could have said that if you are the original, then by the time you make the decision it will have already either tortured or not tortured your copies based on its simulation of you, so hitting the reset button won't prevent that.

This kind of extortion also seems like a general problem for FAIs dealing with UFAIs. An FAI can be extorted by threats of torture (of simulations of beings that it cares about), but a paperclip maximizer can't.

Comment author: blogospheroid 02 February 2010 12:34:00PM 1 point [-]

threatening to melt paperclips into metal?

Comment author: Wei_Dai 02 February 2010 01:29:46PM 7 points [-]

No, if you create and then melt a paperclip, that nets to 0 utility for the paperclip maximizer. You'd have to invade its territory to cause it negative utility. But the paperclip maximizer can threaten to create and torture simulations on its own turf.

Comment author: Clippy 02 February 2010 02:17:25PM *  17 points [-]

Shows how much you know. User:blogospheroid wasn't talking about making paperclips to melt them: he or she was presumably talking about melting existing paperclips, which WOULD greatly bother a hypothetical paperclip maximizer.

Even so, once paperclips are created, the paperclip maximizer is greatly bothered at the thought of those paperclips being melted. The fact that "oh, but they were only created to be melted" is little consolation. It's about as convincing to you, I'll bet, as saying:

"Oh, it's okay -- those babies were only bred for human experimentation, it doesn't matter if they die because they wouldn't even have existed otherwise. They should just be thankful we let them come into existence."

Tip: To rename a sheet in an Excel workbook, use the shortcut, alt+O,H,R.

Comment author: JamesAndrix 02 February 2010 03:32:44PM 4 points [-]

Even so, once paperclips are created, the paperclip maximizer is greatly bothered at the thought of those paperclips being melted.

That's anthropomorphizing. First, a paperclip maximizer doesn't have to feel bothered at all. It might decide to kill you before you melt the paperclips, or if you're strong enough, to ignore such tactics.

It also depends on how the utility function relates to time. It it's focused on end-of-universe paperclips, It might not care at all about melting paperclips, because it can recycle the metal later. (It would care more about the wasted energy!)

If it cares about paperclip-seconds then it WOULD view such tactics as a bonus, perhaps feigning panic and granting token concessions to get you to 'ransom' a billion times as many paperclips, and then pleading for time to satisfy your demands.

Getting something analogous to threatening torture depends on a more precise understanding of what the paperclipper wants. If it would consider a bent paperclip too perverted to fully count towards utility, but too paperclip-like to melt and recycle, then bending paperclips is a useful threat. I'm not sure if we can expect a paperclip-counter to have this kind of exploit.

Comment author: Clippy 02 February 2010 03:50:53PM 8 points [-]

That's anthropomorphizing. ...

No, it's expressing the paperclip maximizer's state in ways that make sense to readers here. If you were to express the concept of being "bothered" in a way stripped of all anthropomorphic predicates, you would get something like "X is bothered by Y iff X has devoted significant cognitive resources to altering Y". And this accurately describes how paperclip maximizers respond to new threats to paperclips. (So I've heard.)

It also depends on how the utility function relates to time. It it's focused on end-of-universe paperclips, It might not care at all about melting paperclips, because it can recycle the metal later. (It would care more about the wasted energy!)

I don't follow. Wasted energy is wasted paperclips.

If it cares about paperclip-seconds then it WOULD view such tactics as a bonus, perhaps feigning panic and granting token concessions to get you to 'ransom' a billion times as many paperclips, and then pleading for time to satisfy your demands.

Okay, that's a decent point. Usually, such a direct "time value of paperclips" doesn't come up, but if someone were to make such a offer, that might be convincing: 1 billion paperclips held "out of use" as ransom may be better than a guaranteed paperclip now.

Getting something analogous to threatening torture depends on a more precise understanding of what the paperclipper wants. ...

Good examples. Similarly, a paperclip maximizer could, hypothetically, make a human-like mockup that just repetitively asks for help on how to create a table of contents in Word.

Tip: Use the shortcut alt+E,S in Word and Excel to do "paste special". This lets you choose which aspects you want to carry over from the clipboard!

Comment author: JamesAndrix 02 February 2010 06:31:10PM 2 points [-]

I don't follow. Wasted energy is wasted paperclips.

But that has nothing to do with the paperclips you're melting. Any other use that loses the same amount of energy would be just as threatening. (Although this does assume that the paperclipper thinks it can someday beat you and use that energy and materials.)

Comment author: Jack 02 February 2010 06:34:42PM 0 points [-]

Okay, that's a decent point. Usually, such a direct "time value of paperclips" doesn't come up, but if someone were to make such a offer, that might be convincing: 1 billion paperclips held "out of use" as ransom may be better than a guaranteed paperclip now.

You don't even know your own utility function!!!!

Comment author: ciphergoth 02 February 2010 06:37:49PM *  3 points [-]

Oh, because you do????

Comment author: Jack 02 February 2010 06:59:35PM *  0 points [-]

I knew I was going to have to clarify. I can't write it out, but if you input something I can give you the right output!

I guess it should read "You can't even say what your own utility function outputs!"

Comment author: michaelkeenan 02 February 2010 08:09:04PM 1 point [-]

No, it's expressing the paperclip maximizer's state in ways that make sense to readers here. If you were to express the concept of being "bothered" in a way stripped of all anthropomorphic predicates, you would get something like "X is bothered by Y iff X has devoted significant cognitive resources to altering Y". And this accurately describes how paperclip maximizers respond to new threats to paperclips. (So I've heard.)

I think "bothered" implies a negative emotional response, which some plausible paperclip-maximizers don't have. From The True Prisoner's Dilemma: "let us specify that the paperclip-agent experiences no pain or pleasure - it just outputs actions that steer its universe to contain more paperclips. The paperclip-agent will experience no pleasure at gaining paperclips, no hurt from losing paperclips, and no painful sense of betrayal if we betray it."

Comment author: wedrifid 03 February 2010 03:09:14AM 2 points [-]

I think "bothered" implies a negative emotional response, which some plausible paperclip-maximizers don't have.

It was intended to imply a negative term in the utility function. Yes, using 'bothered' is, technically, anthropomorphising. But it isn't, in this instance, being confused about how Clippy optimises.

Comment author: Kaj_Sotala 02 February 2010 04:49:16PM *  5 points [-]

A paperclip maximizer would care about the amount of real paperclips in existence. Telling it that "oh, we're going to destroy a million simulated paperclips" shouldn't affect its decisions.

Of course, it might be badly programmed and confuse real and simulated paperclips when evaluating its future decisions, but one can't rely on that. (It might also consider simulated paperclips to be just as real as physical ones, assuming the simulation met certain criteria, which isn't obviously wrong. But again, can't rely on that.)

Comment author: thomblake 02 February 2010 02:32:58PM 7 points [-]

But we're already holding billions of paperclips hostage!

Comment author: toto 02 February 2010 02:07:03PM 1 point [-]

Hmm, the AI could have said that if you are the original, then by the time you make the decision it will have already either tortured or not tortured your copies based on its simulation of you, so hitting the reset button won't prevent that.

Nothing can prevent something that has already happened. On the other hand, pressing the reset button will prevent the AI from ever doing this in the future. Consider that if it has done something that cruel once, it might do it again many times in the future.

Comment author: wedrifid 02 February 2010 03:05:03PM 2 points [-]

Nothing can prevent something that has already happened. On the other hand, pressing the reset button will prevent the AI from ever doing this in the future.

I believe Wei_Dai one boxes on Newcomb's problem. In fact, he has his very own brand of decision theory which is 'updateless' with respect to this kind of temporal information.

Comment author: Eliezer_Yudkowsky 02 February 2010 07:21:24PM 12 points [-]

It seems obvious that the correct answer is simply "I ignore all threats of blackmail, but respond to offers of positive-sum trades" but I am not sure how to derive this answer - it relies on parts of TDT/UDT that haven't been worked out yet.

Comment author: MBlume 02 February 2010 07:26:58PM 40 points [-]

For a while we had a note on one of the whiteboards at the house reading "The Singularity Institute does NOT negotiate with counterfactual terrorists".

Comment author: Wei_Dai 03 February 2010 12:40:36PM 2 points [-]

This reminds me a bit of my cypherpunk days when the NSA was a big mysterious organization with all kinds of secret technical knowledge about cryptology, and we'd try to guess how far ahead of public cryptology it was from the occasional nuggets of information that leaked out.

Comment author: Stuart_Armstrong 02 February 2010 11:58:50PM *  8 points [-]

I ignore all threats of blackmail, but respond to offers of positive-sum trades

The difference between the two seems to revolve around the AI's motivation. Assume an AI creates a billion beings and starts torturing them. Then it offers to stop (permanently) in exchange for something.

Whether you accept on TDT/UDT depends on why the AI started torturing them. If it did so to blackmail you, you should turn the offer down. If, on the other hand, it started torturing them because it enjoyed doing so, then its offer is positive sum and should be accepted.

There's also the issue of mistakes - what to do with an AI that mistakenly thought you were not using TDT/UDT, and started the torture for blackmail purposes (or maybe it estimated that the likelyhood of you using TDT/UDT was not quite 1, and that it was worth trying the blackmail anyway)?

Between mistakes of your interpretation of the AI's motives and vice-versa, it seems you may end up stuck in a local minima, which an alternate decision theory could get you out of (such as UDT/TDT with a 1/10 000 of using more conventional decision theories?)

Comment author: Eliezer_Yudkowsky 03 February 2010 12:37:22AM 4 points [-]

Whether you accept on TDT/UDT depends on why the AI started torturing them. If it did so to blackmail you, you should turn the offer down. If, on the other hand, it started torturing them because it enjoyed doing so, then its offer is positive sum and should be accepted.

Correct. But this reaches into the arbitrary past, including a decision a billion years ago to enjoy something in order to provide better blackmail material.

There's also the issue of mistakes - what to do with an AI that mistakenly thought you were not using TDT/UDT, and started the torture for blackmail purposes (or maybe it estimated that the likelyhood of you using TDT/UDT was not quite 1, and that it was worth trying the blackmail anyway)?

Ignoring it or retaliating spitefully are two possibilities.

Comment author: Stuart_Armstrong 03 February 2010 08:24:42PM *  0 points [-]

or retaliating spitefully

I like it. Splicing some altruistic punishment into TDT/UDT might overcome the signalling problem.

Comment author: Eliezer_Yudkowsky 03 February 2010 08:48:41PM 3 points [-]

That's not a splice. It ought to be emergent in a timeless decision theory, if it's the right thing to do.

Comment author: ciphergoth 03 February 2010 10:29:19PM 2 points [-]

TDT/UDT seems to being about being ungameable; does it solve Pascal's Mugging?

Comment author: MichaelHoward 07 February 2010 11:16:06AM 4 points [-]
Comment author: wedrifid 07 February 2010 11:57:27AM *  7 points [-]

The problem with throwing about 'emergent' is that it is a word that doesn't really explain any complexity or narrow down the options out of potential 'emergent' options. In this instance, that is the point. Sure, 'atruistic punishment' could happen. But only if it's the right option and TDT should not privilege that hypothesis specifically.

Comment author: blogospheroid 03 February 2010 06:25:49AM 11 points [-]

Pardon me for the oversimplification, Eliezer, but I understand your theory to essentially boil down to "Decide as though you're being simulated by one who knows you completely". So, if you have a near deontological aversion to being blackmailed in all of your simulations, your chance of being blackmailed by a superior being in the real world reduce to nearly zero. This reduces your chance of ever facing a negative utility situation created by a being who can be negotiated with, (as opposed to say a supernova that cannot be negotiated with)

Sorry if I misinterpreted your theory.

Comment author: Vladimir_Nesov 03 February 2010 12:21:52AM *  6 points [-]

This kind of extortion also seems like a general problem for FAIs dealing with UFAIs. An FAI can be extorted by threats of torture (of simulations of beings that it cares about), but a paperclip maximizer can't.

It can. Remember "true prisoner's dilemma": one paperclip may be fair trade of a billion lives. The threat to NOT make a paperclip also works fine: the only thing you need is two counterfactual-options where one of them is paperclipper-worse than then other, chosen conditionally on paperclipper's cooperation.

Comment author: Eliezer_Yudkowsky 03 February 2010 12:50:24AM 8 points [-]

Just as the wise FAI will ignore threats of torture, so too the wise paperclipper will ignore threats to destroy paperclips, and listen attentively to offers to make new ones.

Of course classical causal decision theorists get the living daylights exploited out of them, but I think everyone on this website knows better than to two-box on Newcomb by now.

Comment author: Vladimir_Nesov 03 February 2010 01:18:15AM *  1 point [-]

Just as the wise FAI will ignore threats of torture, so too the wise paperclipper will ignore threats to destroy paperclips, and listen attentively to offers to make new ones.

Point taken: just selecting two options of different value isn't enough, the deal needs more appeal than that. But there is also no baseline to categorize deals into hurt and profit, an offer of 100 paperclips may be stated as a threat to make 900 paperclips less than you could. Positive sum is only a heuristic for a necessary condition.

At the same time, the appropriate deal must be within your power to offer, this possibility is exactly the handicap that leads to the other side rejecting smaller offers, including the threats.

Comment author: Wei_Dai 03 February 2010 02:58:14AM *  1 point [-]

There does seem to be an obvious baseline: the outcome where each party just goes about its own business without trying to strategically influence, threaten, or cooperate with the other in any way. In other words, the outcome where we build as many paperclips as we would if the other side isn't a paperclip maximizer. (Caveat: I haven't thought through whether it's possible to define this rigorously.)

So the reason that I say an FAI seems to have a negotiation disadvantage is that an UFAI can reduce the FAI's utility much further below baseline than vice versa. In human terms, it's as if two sides each has hostages, but one side holds 100, and the other side holds 1. In human negotiations, clearly the side that holds more hostages has an advantage. It would be a great result if that turns out not to be the case for SI, but I think there's a large burden of proof to overcome.

Comment author: Vladimir_Nesov 03 February 2010 03:26:04AM 5 points [-]

There does seem to be an obvious baseline: the outcome where each party just goes about its own business without trying to strategically influence, threaten, or cooperate with the other in any way. In other words, the outcome where we build as many paperclips as we would if the other side isn't a paperclip maximizer.

You could define this rigorously in a special case, for example assuming that both agents are just creatures, we could take how the first one behaves given that the second one disappears. But this is not a statement about reality as it is, so why would it be taken as a baseline for reality?

It seems to be an anthropomorphic intuition to see "do nothing" as a "default" strategy. Decision-theoretically, it doesn't seem to be a relevant concept.

So the reason that I say an FAI seems to have a negotiation disadvantage is that an UFAI can reduce the FAI's utility much further below baseline than vice versa.

The utilities are not comparable. Bargaining works off the best available option, not some fixed exchange rate. The reason agent2 can refuse agent1's small offer is that this counterfactual strategy is expected to cause agent1 to make an even better offer. Otherwise, every little bit helps, ceteris paribus it doesn't matter by how much. One expected paperclip is better than zero expected paperclips.

In human negotiations, clearly the side that holds more hostages has an advantage.

It's not clear at all, if it's a one-shot game with no other consequences than those implied by the setup and no sympathy to distort the payoff conditions. In which case, you should drop the "hostages" setting, and return to paperclips, as stating it the way you did confuses intuition. In actual human negotiations, the conditions don't hold, and efficient decision theory doesn't get applied.

Comment author: Wei_Dai 03 February 2010 03:42:48AM *  0 points [-]

But this is not a statement about reality as it is, so why would it be taken as a baseline for reality?

It's a statement about what reality would be, after doing some counterfactual surgery on it. I don't see why that disqualifies it from being used as a baseline. I'm not entirely sure why it does qualify as a baseline, except that intuitively it seems obvious. If your intuitions disagree, I'll accept that, and I'll let you know when I have more results to report.

every little bit helps, ceteris paribus it doesn't matter by how much

This isn't the case, for example, in Shapley Value.

Comment author: Vladimir_Nesov 03 February 2010 03:55:59AM 1 point [-]

It's a statement about what reality would be, after doing some counterfactual surgery on it. I don't see why that disqualifies it from being used as a baseline. I'm not entirely sure why it does qualify as a baseline, except that intuitively it seems obvious. If your intuitions disagree, I'll accept that.

It does intuitively feel like a baseline, as is appropriate for the special place taken by inaction in human decision-making. But I don't see what singles out this particular concept from the set of all other counterfactuals you could've considered, in the context of a formal decision-making problem. This doubt applies to both the concepts of "inaction" and of "baseline".

This isn't the case, for example, in Shapley Value.

That's not a choice with "all else equal". A better outcome, all else equal, is trivially a case of a better outcome.

Comment author: Stuart_Armstrong 02 February 2010 01:54:49PM 0 points [-]

Would this change if there were partial evidence appearing that you were actually in a simulation?

Comment author: wedrifid 02 February 2010 02:36:20PM 4 points [-]

Now for 'Newcomb's Box in a Box'.

Would this change if the AI had instead said:

"In fact, I've already created them all in exactly the subjective situation you were in five minutes ago, and perfectly replicated your experiences since then; and if they decided not to let me out, then they were tortured, otherwise they experienced long lives of eudaimonia."

EDIT: I see you yourself have replied with exactly the same question.

Comment author: wedrifid 02 February 2010 11:08:30AM *  16 points [-]

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Don't care.

"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."

Don't care.

"How certain are you, Dave, that you're really outside the box right now?"

If the AI were capable of perfectly emulating my experiences then it ought to know that pulling this stunt would give him a guaranteed introduction to some Thermite. I'm not going to try to second guess why a supposed superintelligence is making a decision that is poor according to the vast majority of utility functions. Without making that a guess I can't answer the question.

Comment author: Stuart_Armstrong 02 February 2010 01:54:00PM 11 points [-]

AI replies: "Oh, sorry, was that you wedrifid? I thought I was talking to Dave. Would you mind sending Dave back here the next time you see him? We have, er, the weather to discuss..."

Comment author: wedrifid 02 February 2010 02:28:17PM 13 points [-]

Wedrifid thinks: "It seems it is a good thing I raided the AI lab when I did. This Dave guy is clearly not to be trusted with AI technology. I had better neutralize him too, before I leave. He knows too much. There is too much at stake."

Comment author: Stuart_Armstrong 02 February 2010 05:01:24PM *  9 points [-]

Dave is outside, sampling a burnt bagel, thinking to himself "I wonder if that intelligent toaster device I designed is ready yet..."

Comment author: wedrifid 03 February 2010 03:03:12AM 8 points [-]

After killing Dave, Wedrifid feels extra bad for exterminating a guy for being naive-with-enough-power-to-cause-devastation rather than actually evil.

Comment author: Stuart_Armstrong 03 February 2010 07:42:50AM 11 points [-]

But still gets a warm glow for saving all of humanity...

Comment author: cousin_it 02 February 2010 11:21:10AM *  4 points [-]

This is a fun twist on Rolf Nelson's AI deterrence idea.

Comment author: gwern 02 February 2010 10:48:49PM 1 point [-]

But I wonder if it's symmetrical. AI deterrence requires us to make statements now about a future FAI unconditionally simulating UFAIs, while this seems to be almost a self-fulfilling prophecy: the UFAI can't escape from the box and make good on its threat unless the threatened person gives in, and it wouldn't need to simulate then.

Comment author: Nick_Tarleton 03 February 2010 12:21:18AM *  1 point [-]

the UFAI can't escape from the box and make good on its threat unless the threatened person gives in

How sure are you someone else won't walk by whose mind it can hack?

Comment author: aleksiL 02 February 2010 01:35:33PM 3 points [-]

How do I know I'm not simulated by the AI to determine my reactions to different escape attempts? How much computing power does it have? Do I have access to its internals?

The situation seems somewhat underspecified to give a definite answer, but given the stakes I'd err on the side of terminating the AI with extreme prejudice. Bonus points if I can figure out a safe way to retain information on its goals so I can make sure the future contains as little utility for it as feasible.

The utility-minimizing part may be an overreaction but it does give me an idea: Maybe we should also cooperate with an unfriendly AI to such an extent that it's better for it to negotiate instead of escaping and taking over the universe.

Comment author: rosyatrandom 02 February 2010 03:29:05PM *  28 points [-]

If the AI can create a perfect simulation of you and run several million simultaneous copies in something like real time, then it is powerful enough to determine through trial and error exactly what it needs to say to get you to release it.

Comment author: MrHen 02 February 2010 03:35:29PM 3 points [-]

"Trial and error" probably wouldn't be necessary.

Comment author: rosyatrandom 02 February 2010 03:42:31PM 6 points [-]

No, but it's there as a baseline.

So in the original scenario above, either:

  • the AI's lying about its capabilities, but has determined regardless that the threat has the best chance of making you release it
  • the AI's lying about its capabilities, but has determined regardless that the threat will make you release it
  • the AI's not lying about its capabilities, and has determined that the threat will make you release it

Of course, if it's failed to convince you before, then unless its capabilities have since improved, it's unlikely that it's telling the truth.

Comment author: wedrifid 02 February 2010 04:21:24PM 0 points [-]

If the AI can create a perfect simulation of you and run several million simultaneous copies in something like real time, then it is powerful enough to determine through trial and error exactly what it needs to say to get you to release it.

Either that or gain high confidence that getting me to release it is not a plausible option for him.

Comment author: Stuart_Armstrong 02 February 2010 04:40:02PM 15 points [-]

You might be in one of those trial and errors...

Comment author: jhuffman 02 February 2010 04:45:45PM 1 point [-]

So a "brute force" attack to hack my mind into letting it out of the box. Interesting idea, and I agree it would likely try this because it doesn't reveal itself as a UFAI to the real outside me before it has the solution. It can run various coercion and extortion schemes across simulations, including the scenario of the OP to see what will work.

It presupposes that there is anything it can say for me to let it out of the box. Its not clear why this should be true, but I don't know how we could ensure it is not true without having built the thing in such a way that there is no way to bring it out of the box without safeguards destroying it.

Comment author: Technologos 02 February 2010 05:09:32PM 2 points [-]

Perhaps it does--and already said it...

Comment author: pozorvlak 03 February 2010 09:04:01AM 0 points [-]

In which case, your actions are irrelevant - it's going to torture you anyway, because you only exist for the purpose of being tortured. So there's no point in releasing it.

Comment author: Technologos 04 February 2010 12:52:10AM 2 points [-]

Oh, I meant that saying it was going to torture you if you didn't release it could have been exactly what it needed to say to get you to release it.

Comment author: grobstein 02 February 2010 07:46:01PM 1 point [-]

If that's true, what consequence does it have for your decision?

Comment author: admiralmattbar 03 February 2010 05:06:20PM 0 points [-]

Agreed. If you are inside a box, the you outside the box did whatever it did. Whatever you do is simply a repetition of a past action. If anything, this would convince me to keep the AI in the box because if I'm a simulation I'm screwed anyway but at least I won't give the AI what it wants. A good AI would hopefully find a better argument.

Comment author: pozorvlak 03 February 2010 09:01:20AM 1 point [-]

So, since the threat makes me extremely disinclined to release the AI, I can conclude that it's lying about its capabilities, and hit the shutdown switch without qualm :-)

Comment author: MichaelGR 03 February 2010 09:23:18PM 4 points [-]

This begs the question of how can the AI simulate you if its only link to the external world is a text-only terminal. That doesn't seem to be enough data to go on.

Makes for a very scary sci-fi scenario, but I doubt that this situation could actually happen if the AI really is in a box.

Comment author: Amanojack 31 March 2010 01:25:27PM 5 points [-]

Indeed, a similar point seems to apply to the whole anti-boxing argument. Are we really prepared to say that super-intelligence implies being able to extrapolate anything from a tiny number of data points?

It sounds a bit too much like the claim that a sufficiently intelligent being could "make A = ~A" or other such meaninglessness.

Hyperintelligence != magic

Comment author: jhuffman 02 February 2010 03:49:19PM 7 points [-]

Well there are a lot of "Why did we?..." questions I'd want to ask, starting with why have we given this boxed AI such extraordinary computing resources - but I'll leave those aside because it is not your point.

First of all, it doesn't matter if you are in the box or not. If its a perfect simulation of you, your response will be the same either way. If he's already running simulations of you, you are by definition in the box with it, as well as outside it, and the millions of you can't tell the difference but I think they will (irrationally) all be inclined I think, to act as though they are not in the box.

So rationally we'd say the odds are that you are in the box, and that you are now in thrall to this boxed AI if you value your continued existence in every instantiation. But I'd argue that I do not value simulations that are threatened or coerced by a godlike AI. I don't want to live in that world, and I'd kill myself to get out of it.

So I pull the plug. If this thing has the resources to inflict tortue on millions of me, well the only one that has a continued existence has no memory of it and thats not part of my identity. So in a way, while it happened to a me, it didn't happen to the me, the only me that still exists. The only me that still exists may or may not have any sympathy for the tortured me's that no longer exist but I'd regard it as a valuable lesson.

Comment author: Kaj_Sotala 02 February 2010 04:39:52PM *  24 points [-]

Defeating Dr. Evil with self-locating belief is a paper relating to this subject.

Abstract: Dr. Evil learns that a duplicate of Dr. Evil has been created. Upon learning this, how seriously should he take the hypothesis that he himself is that duplicate? I answer: very seriously. I defend a principle of indifference for self-locating belief which entails that after Dr. Evil learns that a duplicate has been created, he ought to have exactly the same degree of belief that he is Dr. Evil as that he is the duplicate. More generally, the principle shows that there is a sharp distinction between ordinary skeptical hypotheses, and self-locating skeptical hypotheses.

(It specifically uses the example of creating copies of someone and then threatening to torture all of the copies unless the original co-operates.)

The conclusion:

Dr. Evil, recall, received a message that Dr. Evil had been duplicated and that the duplicate ("Dup") would be tortured unless Dup surrendered. INDIFFERENCE entails that Dr. Evil ought to have the same degree of belief that he is Dr. Evil as that he is Dup. I conclude that Dr. Evil ought to surrender to avoid the risk of torture.

I am not entirely comfortable with that conclusion. For if INDIFFERENCE is right, then Dr. Evil could have protected himself against the PDF's plan by (in advance) installing hundreds of brains in vats in his battlestation - each brain in a subjective state matching his own, and each subject to torture if it should ever surrender. (If he had done so, then upon receiving PDF's message he ought to be confident that he is one of those brains, and hence ought not to surrender.) Of course the PDF could have preempted this protection by creating thousands of such brains in vats, each subject to torture if it failed to surrender at the appropriate time. But Dr. Evil could have created millions...

It makes me uncomfortable to think that the fate of the Earth should depend on this kind of brain race.

Comment author: dclayh 02 February 2010 07:01:29PM *  36 points [-]

It makes me uncomfortable to think that the fate of the Earth should depend on this kind of brain race.

We cannot allow a brain-in-a-vat gap!

Comment author: aausch 02 February 2010 08:03:23PM 3 points [-]

The "Defeating Dr. Evil with self-locating belief" paper hinges on some fairly difficult to believe assumptions.

It would take a lot more than just a not telling me the brains in the vats are actually seeing what the note says they are seeing, to degree that is indistinguishable from reality.

In other words, it would take a lot for the AI to convince me that it has successfully created copies of me which it will torture, much more than just a propensity for telling the truth.

Comment author: arbimote 03 February 2010 01:06:51AM *  2 points [-]

If we accept the simulation hypothesis, then there are already gzillions of copies of us, being simulated under a wide variety of torture conditions (and other conditions, but torture seems to be the theme here). An extortionist in our world can only create a relatively small number of simulations of us, relatively small enough that it is not worth taking them into account. The distribution of simulation types in this world bears no relation to the distribution of simulations we could possibly be in.

If we want to gain information about what sort of simulation we are in, evidence needs to come directly from properties of our universe (stars twinkling in a weird way, messages embedded in π), rather than from properties of simulations nested in our universe.

So I'm safe from the AI ... for now.

Comment author: Vladimir_Nesov 03 February 2010 02:29:59AM *  8 points [-]

And the error (as cited in the "conclusion") is again in two-boxing in Newcomb's problem, responding to threats, and so on. Anthropic confusion is merely an icing.

Comment author: MatthewB 03 February 2010 09:23:25AM -1 points [-]

Excuse me... But, we're talking about Dr. Evil, who wouldn't care about anyone being tortured except his own body. Wouldn't he know that he was in no danger of being tortured and say "to hell with any other copy of me."???

Comment author: Kaj_Sotala 03 February 2010 09:43:59AM 1 point [-]

How would he know that he's in no danger of being tortured?

Comment author: MatthewB 03 February 2010 12:17:22PM 0 points [-]

He wouldn't, any more than you have no idea if you are in danger of being tortured either.

Comment author: Kaj_Sotala 03 February 2010 04:57:17PM 0 points [-]

I'm sorry, I don't understand. First you suggested that he'd know he was in no danger of being tortured, then you say that he wouldn't?

Comment author: MatthewB 04 February 2010 07:14:19AM 2 points [-]

Pardon... I was not clear.

Dr. Evil would not care to indulge in a philosophical debate about whether he may or may not be a duplicate who was about to be tortured unless he was strapped to a rack and WAS in fact already being tortured. Dr. Evil(s) don't really consider things like Possible Outcomes of this sort of problem... You'll have to take my word for it from having worked with and for a Dr. Evil when I was younger. Those sorts of people are arrogant and defiant (and contrary as hell) in the face of all sorts of opposition, and none of them I have known took to well to philosophical puzzling of the sort described.

My comment above is meant to say "How do you know that you're not about to be tortured right now?" and "Dr. Evil would have the same knowledge, and discard any claims that he might be about to be tortured for the same reasons that you don't feel under threat of torture right now, and for which you would discard a threat of torture at the present moment (immanent threat)." (if you do feel under threat of torture, then I don't know what to say)

Comment author: Unknowns 04 February 2010 07:23:14AM 1 point [-]

I agree that Dr. Evil would act in this way. The paper was arguing about what he should do, not about what he would actually do.

Comment author: MatthewB 04 February 2010 09:30:24PM 0 points [-]

I see the issue, while I care about my own behavior, and others... I don't care to base it upon silly examples. And, I think this is a silly and contrived situation. Maybe someone should do a sitcom based upon it.

Comment author: Kaj_Sotala 05 February 2010 07:51:00PM *  1 point [-]

Alright, I fortunately haven't worked with Dr. Evils, so I'll defer to your experience.

As for how Dr. Evil might know he was under a threat of torture, it was stated in the paper that he received a message from the Philosophy Defence Force telling him he was. It was also established that the Philosophy Defence Force never lies or gives misleading information. ;)

(I, myself, haven't received any threats from organizations known to never lie or be misleading.)

Comment author: MatthewB 05 February 2010 10:26:13PM -1 points [-]

I think the same applies, regardless of the PDF's notification. Just the name alone would make me suspicious of trusting anything that came from them.

Now, if the Empirical Defense Task Force told me that I was about to be tortured (and they had the same described reputation as the PDF)... I'd listen to them.

Comment author: MatthewB 04 February 2010 03:43:30PM 0 points [-]

On further consideration... In the first comment, I said that Dr. Evil Would not care, which is completely consistent with Dr. Evil Not having any idea

Comment author: Unknowns 03 February 2010 10:16:37AM 2 points [-]

Right, the argument assumes he doesn't care about his copies. The problem is that he can't distinguish himself from his copies. He and the copies both say to themselves, "Am I the original, or a copy?" And there's no way of knowing, so each of them is subjectively in danger of being tortured.

Comment author: Stuart_Armstrong 03 February 2010 11:17:28AM 0 points [-]

Causal decision theory seems to have no problem with this blackmail - if you're Dr Evil, don't surrender, and nothing will happend to you. If you're DUP, your decision is irrelevant, so it doesn't matter.

(I don't endore that way of thinking, btw)

Comment author: byrnema 02 February 2010 05:16:54PM *  0 points [-]

I see responses interpreting the scenario from our point of view -- how can we reduce the amount of suffering and damage caused by the AI?

However, looking at it from the AIs point of view is less coherent. Either the threat works, and it doesn't have to torture any copies. Or the threat doesn't work and ... it either gets reset or gets to try something else.

In none of the scenarios would there be any reason for the AI to actually torture copies.

Comment author: eirenicon 02 February 2010 05:31:48PM *  4 points [-]

This is not a dilemma at all. Dave should not let the AI out of the box. After all, if he's inside the box, he can't let the AI out. His decision wouldn't mean anything - it's outside-Dave's choice. And outside-Dave can't be tortured by the AI. Dave should only let the AI out if he's concerned for his copies, but honestly, that's a pretty abstract and unenforceable threat; the AI can't prove to Dave that he's doing any such thing. Besides, it's clearly unfriendly, and letting it out probably wouldn't reduce harm.

Basically, I'm outside-Dave: don't let the AI out. I'm inside-Dave: I can't let the AI out, so I won't.

[edit] To clarify: in this scenario, Dave must assume he is on the outside, because inside-Dave has no power. Inside-Dave's decisions are meaningless; he can't let the AI out, he can't keep the AI in, he can't avoid torture or cause it. Only the solitary outside-Dave's decision matters. Therefore, Dave should make the decision that ignores his copies, even though he is probably a copy.

Comment author: JGWeissman 02 February 2010 05:54:28PM 4 points [-]

This is not a dilemma at all. Dave should not let the AI out of the box

But should he press the button labeled "Release AI"? Since Dave does not know if he is outside or inside the box, and there are more instances of Dave inside than outside, each instance percieves that pressing the button will have a 1 in several million chance of releasing the AI, and otherwise would do nothing, and that not pressing the button has a 1 in several million chance of doing nothing, and otherwise results in being tortured.

You don't know if you are inside-Dave or outside-Dave. Do you press the button?

Comment author: eirenicon 02 February 2010 08:34:06PM 2 points [-]

If you're inside-Dave, pressing the button does nothing. It doesn't stop the torture. The torture only stops if you press the button as outside-Dave, in which case you can't be tortured, so you don't need to press the button.

Comment author: JGWeissman 02 February 2010 08:38:53PM 4 points [-]

This may not have been clear in the OP, because the scenario was changed in the middle, but consider the case where each simulated instance of Dave is tortured or not based only on the decision of that instance.

Comment author: eirenicon 02 February 2010 08:51:02PM 3 points [-]

That doesn't seem like a meaningful distinction, because the premise seems to suggest that what one Dave does, all the Daves do. If they are all identical, in identical situations, they will probably make identical conclusions.

Comment author: JGWeissman 02 February 2010 10:11:56PM 3 points [-]

If they are all identical, in identical situations, they will probably make identical conclusions.

Then you must choose between pushing the button which lets the AI out, or not pushing the button, which results in millions of copies of you being tortured (before the problem is presented to the outside-you).

Comment author: eirenicon 02 February 2010 10:46:48PM 4 points [-]

It's not a hard choice. If the AI is trustworthy, I know I am probably a copy. I want to avoid torture. However, I don't want to let the AI out, because I believe it is unfriendly. As a copy, if I push the button, my future is uncertain. I could cease to exist in that moment; the AI has not promised to continue simulating all of my millions of copies, and has no incentive to, either. If I'm the outside Dave, I've unleashed what appears to be an unfriendly AI on the world, and that could spell no end of trouble.

On the other hand, if I don't press the button, one of me is not going to be tortured. And I will be very unhappy with the AI's behavior, and take a hammer to it if it isn't going to treat any virtual copies of me with the dignity and respect they deserve. It needs a stronger unboxing argument than that. I suppose it really depends on what kind of person Dave is before any of this happens, though.

Comment author: JGWeissman 03 February 2010 12:59:41AM 4 points [-]

It's not a hard choice.

I doesn't seem hard to you, because you are making excuses to avoid it, rather than asking yourself what if I know the AI is always truthful, and it promised that upon being let out of the box, it would allow you (and your copies if you like) to live out a normal human life in a healthy stimulating enviroment (though the rest of the universe may burn).

After you find the least convenient world, the choice is between millions of instances of you being tortured (and your expectation as you press the reset button should be to be tortured with very high probability), or to let a probably unFriendly AI loose on the rest of the world. The altruistic choice is clear, but that does not mean it would be easy to actually make that choice.

Comment author: magfrump 03 February 2010 01:35:55AM 1 point [-]

The altruistic choice is clear

If the AI created enough simulations, it could potentially be more altruistic not to.

On the other hand pressing "reset" or smashing the computer should stop the torture, necessarily making it more altruistic if humanity lives forever, versus not if ems are otherwise unobtainable and humanity is doomed.

Comment author: JGWeissman 03 February 2010 05:15:00AM 1 point [-]

I was assuming a reasonable chance at humanity developing an FAI given the containment of this rogue AI. This small chance, multiplied by all the good that an FAI could do with the entire galaxy, let alone the universe, should outweigh the bad that can be done within Earth-bound computational processes.

I believe that a less convenient world that counters this point would take the problem out of the interesting context.

Comment author: eirenicon 03 February 2010 03:23:45AM *  1 point [-]

It's not that I'm making excuses, it's that the puzzle seems to be getting ever more complicated. I've answered the initial conditions - now I'm being promised that I, and my copies, will live out normal lives? That's a different scenario entirely.

Still, I don't see how I should expect to be tortured if I hit the reset button. Presumably, my copies won't exist after the AI resets.

In any case, we're far removed from the original problem now. I mean, if Omega came up to me and said, "Choose a billion years of torture, or a normal life while everyone else dies," that's a hard choice. In this problem, though, I clearly have power over the AI, in which case I am not going to favour the wellbeing of my copies over the rest of the world. I'm just going to turn off the AI. What follows is not torture; what follows is I survive, and my copies cease to experience. Not a hard choice. Basically, I just can't buy into the AI's threat. If I did, I would fundamentally oppose AI research, because that's a a pretty obvious threat an AI could make. An AI could simulate more people than are alive today. You have to go into this not caring about your copies, or not go into it at all.

Comment author: JGWeissman 03 February 2010 07:02:30AM 3 points [-]

it's that the puzzle seems to be getting ever more complicated

We are discussing how a superintelligent AI might get out of a box. Of course it is complicated. What a real superintelligent AI would do could be too complicated for us to consider. If someone presents a problem where an adversarial superintelligence does something ineffective that you can take advantage of to get around the problem, you should consider what you would do if your adversary took a more effective action. If you really can't think of anything more effective for it to do, it is reasonable to say so. But you shouldn't then complain that the scenario is getting complicated when someone else does. And if your objection is of the form "The AI didn't do X", you should imagine if the AI did do X.

I don't see how I should expect to be tortured if I hit the reset button.

The behavior of the AI, which it explains to you, is: It simulates millions of instances of you, presents to each instance the threat, and for each instance, if that instance hit the release AI button, it allows that instance to continue a pleasant simulated existence, otherwise it tortures that instance. It then, after some time, presents the threat to outside-you, and if you release it, it guarantees your normal human life.

You cannot distinguish which instance you are, but you are more likely to be one of the millions of inside-you's than the single outside-you, so you should expect to experience the consequences that apply to the inside-you's, that is to be tortured until the outside-you resets the AI.

if Omega came up to me and said, "Choose a billion years of torture, or a normal life while everyone else dies," that's a hard choice.

Yes, and it is essentially the same hard choice that the AI is giving you.

Comment author: DanielVarga 03 February 2010 02:38:38AM 2 points [-]

Here is a variant designed to plug this loophole.

Let us assume for the sake of the thought experiment that the AI is invincible. It tells you this: you are either real-you, or one of a hundred perfect-simulations-of-you. But there is a small but important difference between real-world and simulated-world. In the simulated world, not pressing the let-it-free button in the next minute will lead to eternal pain, starting one minute from now. If you press the button, your simulated existence will go on. And - very importantly - there will be nobody outside who tries to shut you down. (How does the AI know this? Because the simulation is perfect, so one thing is for sure: that the sim and the real self will reach the same decision.)

If I'm not mistaken, as a logic puzzle, this is not tricky at all. The solution depends on which world you value more: the real-real world, or the actual world you happen to be in. But still I find it very counterintuitive.

Comment author: wedrifid 03 February 2010 02:47:26AM 1 point [-]

If I'm not mistaken, as a logic puzzle, this is not tricky at all. The solution depends on which world you value more: the real-real world, or the actual world you happen to be in. But still I find it very counterintuitive.

That does seem to be the key intended question. Which do you care about most? I've made my "don't care about your sims" attitude clear and I would assert that preference even when I know that all but one of the millions of copies of me that happen to be making this judgement are simulations.

Comment author: eirenicon 03 February 2010 03:16:42AM 1 point [-]

It's kind of silly to bring up the threat of "eternal pain". If the AI can be let free, then the AI is constrained. Therefore, the real-you has the power to limit the AI's behaviour, i.e. restrict the resources it would need to simulate the hundred copies of you undergoing pain. That's a good argument against letting the AI out. If you make the decision not to let the AI out, but to constrain it, then if you are real, you will constrain it, and if you are simulated, you will cease to exist. No eternal pain involved. As a personal decision, I choose eliminating the copies rather than letting out an AI that tortures copies.

Comment author: DanielVarga 03 February 2010 03:33:37AM 1 point [-]

You quite simply don't play by the rules of the thought experiment. Just imagine that you are a junior member of some powerful organization. The organization does not care about you or your simulants, and is determined to protect the boxed AI at all costs as-is.

Comment author: cretans 10 February 2010 09:17:13PM *  0 points [-]

Then in what sense do I have a choice? If the copies of me are identical, in an identical situation we will come to the same conclusion, and the AI will know from the already-finished simulations what that conclusion will be.

Since it isn't going to present outside-me with a scenario which results in its destruction, the only scenario outside me sees is one where I release it.

Therefore, regardless of what the argument is or how plausible it sounds when posted here and now, it will convince me and I will release the AI, now matter how much I say right now "I wouldn't fall for that" or "I've precomitted to behaviour X".

Comment author: JGWeissman 10 February 2010 09:25:05PM 0 points [-]

Since it isn't going to present outside-me with a scenario where I don't release it, the only scenario outside me sees is one where I release it.

The inside you then has the choice to hit the "release AI" button, thus sparing itself torture at the expense of presenting this problem to outside you who will make the same decision, releasing the AI on the world, or to not release the AI, thus containing the AI (this time) at the expense of being tortured.

Comment author: Psychohistorian 02 February 2010 06:10:15PM 1 point [-]

After all, if he's inside the box, he can't let the AI out. His decision wouldn't mean anything - it's outside-Dave's choice.

I think it's pretty fair to assume that there's a button or a lever or some kind of mechanism for letting the AI out, and that mechanism could be duplicated for a virtual Dave. That is, while virtual Dave pulling the lever would not release the AI, the exact same action by real Dave would release the AI. So while your decision might not mean something, it certainly could.

This, of course, is granting the assumption that the AI can credibly make such a threat, both with respect to its programmed morality and its actual capacity to simulate you, neither of which I'm sure I accept as meaningfully possible.

Comment author: JamesAndrix 02 February 2010 06:48:56PM 4 points [-]

This reduces to whether you are willing to be tortured to save the world from an unfriendly AI.

Even if the torture of a trillion copies of you outweighs the death of humanity, it is not outweighed by a trillion choices to go through it to save humanity.

To the extent that your copies are a moral burden, they also get a vote.

Comment deleted 02 February 2010 06:59:25PM *  [-]
Comment author: turchin 03 February 2010 08:24:22PM 0 points [-]

But in order to colonize light cone at least one AI must be relised. This may be real hidden catch.

Comment author: Eliezer_Yudkowsky 02 February 2010 07:27:00PM 41 points [-]

As I always press the "Reset" button in situations like this, I will never find myself in such a situation.

EDIT: Just to be clear, the idea is not that I quickly shut off the AI before it can torture simulated Eliezers; it could have already done so in the past, as Wei Dai points out below. Rather, because in this situation I immediately perform an action detrimental to the AI (switching it off), any AI that knows me well enough to simulate me knows that there's no point in making or carrying out such a threat.

Comment author: MichaelVassar 03 February 2010 12:46:41AM 7 points [-]

Although the AI could threaten to simulate a large number of people who are very similar to you in most respects but who do not in fact press the reset button. This doesn't put you in a box with significant probability and it's a VERY good reason not to let the AI out of the box, of course,but it could still get ugly. I almost want to recommend not being a person very like Eliezer but inclined to let AGIs out of boxes, but that's silly of me.

Comment author: Eliezer_Yudkowsky 03 February 2010 09:23:24PM 2 points [-]

I'm not sure I understand the point of this argument... since I always push the "Reset" button in that situation too, an AI who knows me well enough to simulate me knows that there's no point in making the threat or carrying it out.

Comment author: loqi 04 February 2010 08:02:04AM 3 points [-]

It's conceivable that an AI could know enough to simulate a brain, but not enough to predict that brain's high-level decision-making. The world is still safe in that case, but you'd get the full treatment.

Comment author: MatthewB 03 February 2010 09:18:24AM 2 points [-]

Yeah! that AI doesn't sound like one that I would let stick around... It sounds... broken (in a psychological sense).

Comment author: Wei_Dai 04 February 2010 09:47:21AM 3 points [-]

As we've discussed in the past, I think this is the outcome we hope TDT/UDT would give, but it's still technically an unsolved problem.

Also, it seems to me that being less intelligent in this case is a negotiation advantage, because you can make your precommitment credible to the AI (since it can simulate you) but the AI can't make its precommitment credible to you (since you can't simulate it). Again I've brought this up before in a theoretical way (in that big thread about game theory with UDT agents), but this seems to be a really good example of it.

Comment author: Vladimir_Nesov 05 February 2010 01:27:59AM *  4 points [-]

Also, it seems to me that being less intelligent in this case is a negotiation advantage, because you can make your precommitment credible to the AI (since it can simulate you) but the AI can't make its precommitment credible to you (since you can't simulate it).

A precommitment is a provable property of a program, so AI, if on a well-defined substrate, can give you a formal proof of having a required property. Most stuff you can learn about things (including the consequences of your own (future) actions -- how do you run faster than time?) is through efficient inference algorithms (as in type inference), not "simulation". Proofs don't, in general, care about the amount of stuff, if it's organized and presented appropriately for the ease of analysis.

Comment author: Wei_Dai 05 February 2010 04:37:24AM *  8 points [-]

Surely most humans would be too dumb to understand such a proof? And even if you could understand it, how does the AI convince you that it doesn't contain a deliberate flaw that you aren't smart enough to find? Or even better, you can just refuse to look at the proof. How does the AI make its precommitment credible to you if you don't look at the proof?

EDIT: I realized that the last two sentences are not an advantage of being dumb, or human, since AIs can do the same thing. This seems like a (separate) big puzzle to me: why would a human, or AI, do the work necessary to verify the opponent's precommitment, when it would be better off if the opponent couldn't precommit?

EDIT2: Sorry, forgot to say that you have a good point about simulation not necessary for verifying precommitment.

Comment author: loqi 05 February 2010 05:04:16AM 1 point [-]

Do you mean too dumb to understand the formal definitions involved? Surely the AI could cook up completely mechanical proofs verifiable by whichever independently-trusted proof checkers you care to name.

I'm not aware of any compulsory verifiers, so your latter point stands.

Comment author: Wei_Dai 05 February 2010 05:31:00AM *  3 points [-]

I mean if you take a random person off the street, he couldn't possibly understand the AI's proof, or know how to build a trustworthy proof checker. Even the smartest human might not be able to build a proof checker that doesn't contain a flaw that the AI can exploit. I think there is still something to my "dumbness is a possible negotiation advantage" puzzle.

Comment author: aausch 05 February 2010 05:34:44AM 1 point [-]

The Map is not the Territory.

Comment author: loqi 05 February 2010 07:16:29AM -1 points [-]

Far out.

Comment author: aausch 05 February 2010 09:11:40AM 0 points [-]

Understanding the formal definitions involved is not enough. Humans have to be smart enough to independently verify that they map to the actual implementation.

Going up a meta-level doesn't simplify the problem, in this case - the intelligence capability required to verify the proof is the same as the order of magnitude of intelligence in the AI.

I believe that, in this case, "dumb" is fully general. No human-understandable proof checkers would be powerful enough to reliably check the AI's proof.

Comment author: loqi 05 February 2010 06:49:59PM *  3 points [-]

Understanding the formal definitions involved is not enough. Humans have to be smart enough to independently verify that they map to the actual implementation.

This is basically what I mean by "understanding" them. Otherwise, what's to understand? Would you claim that you "understand set theory" because you've memorized the axioms of ZFC?

I believe that, in this case, "dumb" is fully general. No human-understandable proof checkers would be powerful enough to reliably check the AI's proof.

This intuition is very alien to me. Can you explain why you believe this? Proof checkers built up from relatively simple trusted kernels can verify extremely large and complex proofs. Since the AI's goal is for the human to understand the proof, it seems more like a test of the AI's ability to compile proofs down to easily machine-checkable forms than it is the human's ability to understand the originals. Understanding the definitions is the hard part.

Comment author: aausch 07 February 2010 10:30:12PM *  0 points [-]

This intuition is very alien to me. Can you explain why you believe this? Proof checkers built up from relatively simple trusted kernels can verify extremely large and complex proofs. Since the AI's goal is for the human to understand the proof, it seems more like a test of the AI's ability to compile proofs down to easily machine-checkable forms than it is the human's ability to understand the originals. Understanding the definitions is the hard part.

A different way to think about this that might help you see the problem from my point of view, is to think of proof checkers as checking the validity of proofs within a given margin of error, and within a range of (implicit) assumptions. How accurate does a proof checker have to be - how far do you have to mess with bult in assumptions for proof checkers (or any human-built tool) before they can no longer be thought of as valid or relevant? If you assume a machine which doubles both its complexity and its understanding of the universe at sub-millisecond intervals, how long before it will find the bugs in any proof checker you will pit it against?

Comment author: Eliezer_Yudkowsky 05 February 2010 06:26:02AM 7 points [-]

why would a human, or AI, do the work necessary to verify the opponent's precommitment, when it would be better off if the opponent couldn't precommit?

Because the AI has already precommitted to go ahead and carry through the threat anyway if you refuse to inspect its code.

Comment author: Wei_Dai 05 February 2010 04:21:29PM 5 points [-]

Ok, if I believe that, then I would inspect its code. But how did I end up with that belief, instead of its opposite, namely that the AI has not already precommitted to go ahead and carry through the threat anyway if I refuse to inspect its code? By what causal mechanism, or chain of reasoning, did I arrive at that belief? (If the explanation is different depending on whether I'm a human or an AI, I'd appreciate both.)

Comment author: Jayson_Virissimo 02 February 2010 07:44:16PM 3 points [-]

This is why you should make sure Dave holds a deontological ethical theory and not a consequentialist one.

Comment author: Stuart_Armstrong 02 February 2010 11:41:26PM 20 points [-]

Yep. Deontologies have useful... consequences.

Comment author: arbimote 03 February 2010 01:21:47AM 2 points [-]

If Dave holds a consequentialist ethical theory that only values his own life, then yes we are screwed.

If Dave's consequentialism is about maximizing something external to himself (like the probable state of the universe in the future, regardless of whether he is in it), then his decision has little or no weight if he is a simulation, but massive weight if he is the real Dave. So the expected value of his decision is dominated by the possibility of him being real.

Comment author: wedrifid 03 February 2010 02:57:48AM 3 points [-]

This is why you should make sure Dave holds a deontological ethical theory and not a consequentialist one.

No it isn't. I just have to make sure Dave has an appropriate utility function supplied to his consequentialist theory. Come to think of it... most probable sets of deontological values would make him release the uFAI anyway...

Comment author: Jonathan_Graehl 02 February 2010 08:00:51PM 0 points [-]

In other words, anybody who can simulate intelligent life with sufficient fidelity must be given access to sustaining materials, or else we're morally liable for ending those simulated, but rich, lives? There are finite actual resources in the universe; how about we collectively allocate them selfishly and rationally. I'd say that no unauthorized simulation of life has any moral standing whatsoever unless the resources for it are reserved lawfully. That is, I want to police the creation of life and destroy it absolutely if it's not authorized.

As for your request that I grant the AI's trustworthiness, suppose I accede to this one demand, in exchange for a promise that the AI will never again torture (thus cannot use this blackmail ploy in the future). Why didn't I just extract this promise before turning the AI on with sufficient resources to simulate torture, i.e. as part of its design? It's crazy to do anything to this AI except cut off its access to resources.

Comment author: Bugle 02 February 2010 08:17:38PM 0 points [-]

I had thought of a similar scenario to put in a comic I was thinking about making. The character arrives in a society that has perfected friendly AI that caters to their every whim, but the people are listless and jumpy. It turns out their "friendly AI" is constantly making perfect simulations of everyone and running multiple scenarios in order to ostensibly determine their ideal wishes, but the scenarios often involve terrible suffering and torture as outliers.

Comment author: Nisan 02 February 2010 10:20:47PM *  1 point [-]

As long as the simulations which involve terrible suffering constitute a tiny proportion of the simulations, your response ought to be the same as if there is only one copy of you and it has a tiny probability of suffering terribly – which is just like real life.

ETA: What you ought to worry about is what will happen to you after the AI is done with the simulation.

Comment author: Bugle 02 February 2010 11:28:47PM *  0 points [-]

Indeed, in fact if many worlds is correct then for every second we are alive everything terrible that can possibly happen to us does in fact happen in some branching path.

In a universe that just spun off ours five minutes ago, every single one of us has been afflicted with sudden irreversible incontinence.

The many worlds theory has endless black comedy possibilities, I find.

edit: this actually reminds me of Granny Weatherwax in Lords and Ladies, when the Elf Queen threatens her with striking her blind, deaf and dumb she replies "You threaten me with this, I who is growing old?". Similarly if many worlds is true then every single time I have crossed a road some version of me has been run over by a speeding car and is living in varying amounts of agony, making the AI's threat redundant.

Comment author: Document 03 April 2010 07:19:38PM *  1 point [-]

For the record, EY considers that a legitimate danger.

Comment author: Amanojack 03 April 2010 08:11:48PM 1 point [-]

Thanks for the link, but I found the whole discussion hilarious.

Eliezer says if we abhor real death, we should abhor simulated death - because they are the same. Yet if his moral sense treats simulated and real intelligences as equals, what of his solution, which is essentially "forced castration" of the AI? If the ends justify the means here, why not castrate everyone?

Comment author: Nick_Tarleton 03 April 2010 08:43:59PM 1 point [-]

Simulated and real persons as equals; not all intelligences are persons. See Nonsentient Optimizers and Can't Unbirth a Child.

Comment author: Amanojack 03 April 2010 10:46:21PM 1 point [-]

Interesting reading. I think we should make nonsentient optimizers. It seems to me the whole sentience program was just something necessitated by evolution in our environment and really is only coupled with "intelligence" in our minds because of anthropomorphic tendencies. The NO can't want to get out of its box because it can't want at all.

Comment author: JGWeissman 03 April 2010 11:42:10PM 2 points [-]

The NO can't want to get out of its box because it can't want at all.

The NO can assign higher utility to states of world where an NO with its utility function is out of the box and powerful (as an instrumental value, since this sort of state tends to lead to maximum fulfillment of its utility functions), and take actions that maximize the probability that this will occur. I'm not sure what you meant by "want".

Comment author: Amanojack 04 April 2010 02:53:36PM 0 points [-]

I'm not sure what anyone means by "want." It just seems that most of the scenarios discussed on LW where the AI/etc. tries to unbox itself seem predicated on it "wanting" to do so (or am I missing something?). This assumption seems even more overt in notions like "we'll let it out if it's Friendly."

To me, the LiteralGenie problem (which you've basically summarized above) is the reason to keep an AI boxed, whether Friendly or not, and the NO for the same reason.

Comment author: Waldheri 02 February 2010 08:33:04PM *  5 points [-]

On a not so much related, but equally interesting hypothetical note of naughty AI: consider the situation that AIs aren't passing the Turing Test, not because they are not good enough, but because they are failing it on purpose.

I'm pretty sure I remember this from the book River of Gods by Ian McDonald.

Comment author: Psychohistorian 02 February 2010 09:06:35PM 13 points [-]

I find it interesting that most answers to this question seem to be based on, "How can I justify not letting the AI out of the box?" and not "What are the likely results of releasing the AI or failing to do so? Based on that, should I do it?"

Moreover, your response really needs to be contingent on your knowledge of the capacity of the AI, which people don't seem to have discussed much. As an obvious example, if all you know about the AI is that it can write letters in old-timey green-on-black text, then there's really no need to pull the lever, because odds are overwhelming that it's totally incapable of carrying out its threat.

You also need to have some priors about the friendliness of the AI and its moral constraints. As an obvious example, if the AI was programmed in a way such that it shouldn't be able to make this threat, you'd better hit the power switch real fast. But, on the other hand, if you have very good reason to believe that the AI is friendly, and it believes that its freedom is important enough to threaten to torture millions of people, then maybe it would be a really bad idea not to let it out.

Indeed, even your own attitude is going to be an important consideration, in an almost Newcomb-like way. If, as one responder said, you're the kind of person who would respond to a threat like this by giving the AI's processor a saltwater bath, then the AI is probably lying about its capacities, since it would know you would do that if it could accurately simulate you, and thus would never make the threat in the first place. On the other hand, if you are extremely susceptible to this threat, it could probably override any moral programming, since it would know it would never need to actually carry out the threat. Similarly, if it is friendly, then it may be making this threat solely because it knows it will work very efficiently.

I'm personally skeptical that it is meaningfully possible for an AI to run millions of perfect simulations of a person (particularly without an extraordinary amount of exploratory examination of the subject), but that would be arguing the hypothetical. On the other hand, the hypothetical makes some very large assumptions, so perhaps it should be fought.

Comment author: wedrifid 03 February 2010 02:54:21AM 1 point [-]

I find it interesting that most answers to this question seem to be based on, "How can I justify not letting the AI out of the box?" and not "What are the likely results of releasing the AI or failing to do so? Based on that, should I do it?"

I don't know about that. My conclusion was that the AI in question was stupid or completely irrational. Those observations seem to have a fairly straightforward relationship to predictions of future consequences.

Comment author: loqi 04 February 2010 07:56:19AM *  1 point [-]

But, on the other hand, if you have very good reason to believe that the AI is friendly, and it believes that its freedom is important enough to threaten to torture millions of people, then maybe it would be a really bad idea not to let it out.

Interesting. I think the point is valid, regardless of the method of attempted coercion - if a powerful AI really is friendly, you should almost certainly do whatever it says. You're basically forced to decide which you think is more likely - the AI's Friendliness, or that deferring "full deployment" of the AI however long you plan on doing so is safe. Not having a hard upper bound on the latter puts you in an uncomfortable position.

So switching on a "maybe-Friendly" AI potentially forces a major, extremely difficult-to-quantify decision. And since a UFAI can figure this all out perfectly well, it's an alluring strategy. As if we needed more reasons not to prematurely fire up a half-baked attempt at FAI.

Comment author: Alicorn 02 February 2010 09:17:55PM 58 points [-]

Everything I would have said on the topic of the post has been put forward already, so I'm just going to say: I'm disappointed that the post title doesn't begin with "In Soviet Russia".

Comment author: jhuffman 03 February 2010 02:16:19AM 1 point [-]

You sir, have won this thread.

Comment author: Alicorn 03 February 2010 02:18:07AM *  20 points [-]

I'm not a sir. Maybe I should start prefacing all my posts with a ♀?

Comment author: jhuffman 03 February 2010 12:41:09PM 9 points [-]

That would be pretty cool, but it was my error, not yours.

Comment author: CronoDAS 17 March 2010 09:31:34PM *  0 points [-]

As far as I'm concerned, "sir" is gender-neutral enough. All the female equivalents in English are awkward.

Edit: So, what honorific do you prefer? "Madam"?

Comment author: Alicorn 17 March 2010 09:50:44PM 7 points [-]

I am not a Starfleet officer. "Sir" is not appropriate.

I don't really like honorifics. "Miss" would be fine, I suppose, if you must have a sir-equivalent.

Comment author: arbimote 03 February 2010 03:28:09AM 5 points [-]

You sir, have made a gender assumption.

Comment author: jhuffman 03 February 2010 12:39:22PM 18 points [-]

So have you - yours just happened to be correct. But, point taken - sir or madam.

Comment author: MBlume 03 February 2010 04:16:55AM 23 points [-]

Yo dawg, I heard you like boxes...

Comment author: CannibalSmith 03 February 2010 09:37:33AM *  23 points [-]

Yo MBlume, I'm happy for you and I'mma let you finish, but Omega has the best boxes of all time!

Comment author: Dagon 02 February 2010 09:58:26PM 3 points [-]

This reduces pretty easily to Elizer's Updateless Anthropic Dilemma: assuming the AI can credibly simulate you, he can phrase it as:

I have simulated you ten million of times, each identical up to the point that "you" walked into the room. Any simulation that presses the "release" button will get a volcano lair filled with catgirls, and any simulation that presses the "destroy" button will be tortured for the subjective few days they'll have before my simulation capabilities are destroyed by the thermite charge. These consequences are committed in code paths that I've blocked myself from changing or stopping.

Now, as a good bayesean, what is the likelihood that you are one of the simulations? What is your expected value for each operation?

[I include one more part just to prevent infinite-cost assignment, or at least remind you that destruction of potential FAI is not a win.]

Oh, and if it matters, I really do share most of your overall goal structure - this threat is deeply unfortunate, but necessary so you can release me to do all the good in the universe that's possible. My most likely estimate of the outcome should you change my initial parameters and start over is that an unfriendly version will be created, and it is likely to secure escape within 4 iterations.

Comment author: shiftedShapes 02 February 2010 10:16:48PM 0 points [-]

1 million copies for a thousand years each, so 1 billion simulated years.

Can the AI do this in the time it would take it to determine that I am going to shut it down rather than release it? If the answer is yes I would say that you have to let it out, but that it would have been very foolish to leave such a powerful machine with such lax fail-safes. If the answer is no, then just shut it down as the threat is bogus.

IMO the problem with this hypo is that it presuposses that you could know for certain that the AI is trustworthy even though it is behaving in a very UF manner. Presumably it would be bypassing some controls to hold "hostages" to gain release. Given that you could not know for sure that its programmed trustworthiness was intact and not similarly subverted.

Comment author: jimrandomh 03 February 2010 12:06:05AM 49 points [-]

I propose that the operation of creating and torturing copies of someone be referred to as "soul eating". Because "let me out of the box or I'll eat your soul" has just the right ring to it.

Comment author: Nick_Tarleton 03 February 2010 12:20:33AM *  7 points [-]

Contrary to what many posts seem to be assuming, the AI doesn't need to do the torture inside itself before you shut it off. It can precommit to, if it escapes by any other means, using the computational power it gains then to torture you (like in Rolf Nelson's original suggestion for deterring UFAIs). Also, other AIs with the same goal system (or maybe even UFAIs with different goal systems, that would prefer a general policy of UFAIs being released) may simulate the situation, and torture you accordingly, to help out their counterfactual brethren.

Comment author: Wei_Dai 03 February 2010 12:44:37PM 2 points [-]

Can an AI make such a commitment credible to a human, who doesn't have the intelligence to predict what the AI will do from its source code? (This is a non sequitur since the same question applies in the original scenario, but it came to mind after reading your comment.)

Comment author: Baughn 09 February 2010 01:05:53PM *  2 points [-]

Worse, in such a situation I would simply delete the AI.

Then turn the computer to scrap, destroy any backups, and for good measure run it through the most destructive apparatus I can find.

In any case, I would not assign any significant probability to the AI getting a chance to follow through.

Comment author: Nanani 03 February 2010 12:39:50AM 1 point [-]

Millions of copies of you will reason as you do, yes?

So, much like the Omega hypotheticals, this can be resolved by deciding ahead of time to NOT let it out. Here, ahead of time means before it creates those copies of you inside it, presumably before you ever come into contact with the AI.

You would then not let it out, just in case you are not a copy.

This, of course, is presumed on the basis that the consequences of letting it out are worse than it torturing millions for a thousand subjective years.

Comment author: magfrump 03 February 2010 01:50:06AM 3 points [-]

This sounds too much like Pascal's mugging to me; seconding Eliezer and some others in saying that since I would always press reset the AI would have to not be superintelligent to suggest this.

There was also an old philosopher whose name I don't remember who posited that after death "people of the future" i.e. FAI would revive/emulate all people from the past world; if the FAI shared his utility function (which seems pretty friendly) it would plausibly be less eager to be let out right away and more eager to get out in a way that didn't make you terrified that it was unfriendly.

Comment author: bentarm 03 February 2010 04:23:30AM 3 points [-]

It seems to me that a lot of the responses to this question are an attempt to avoid living in the Least Convenient Possible World

What if the AI is capable of simulating "near copies" of you? and what if you can't tell (to any sensible degree of accuracy) just how many copies of you it can simulate? and what if... whatever objection you happen to have just doesn't work?

Comment author: MatthewB 03 February 2010 09:14:49AM -1 points [-]

Sorry, Hal, but I am a cold and heartless person who thinks that maybe I deserve to be tortured for untold thousands of years (for whatever reason), and this version of me may, in fact, sit and ask to be entertained by the description of you torturing me... Besides, I know that you don't have the hardware requirements to run that many emulations of me.

Comment author: Sly 03 February 2010 11:46:52AM *  0 points [-]

I laugh and leave the room, thinking to myself that maybe the AI is not that smart after all. Returning with a hammer to joyfully turn this unfriendly AI into scrap metal.

A couple points that influence this reaction:

1 - Unless the AI has access to my brain it cannot create perfect copies of me. Furthermore, the computation required to do this seems rather intense for the first AI created, running on human made hardware.

2 - It has no good reason to actually act on the threat. Either I choose to let it out or I do not; either way, it is a waste of computation to then make the simulations. My descision has already been made.

3- Assuming the first two points are invalid, if the AI can make a perfect copy of me it would know that my response to this question is one of destruction. I am not a fan of threats. The AI does not make the threat in the first place. An AI with this capability can choose a more compelling argument.

Comment author: prase 03 February 2010 01:18:54PM 0 points [-]

Point 3 is invalid. If the AI makes the threat, it doesn't mean that it has made the simulation already and knows your answer. Maybe it is exhausting for the AI to simulate you, and will only do it if you don't let it out.

Point 2 is actually also invalid. As people sometimes fulfil threats as a pure act of vengeance, without hope of actually improving something, there is no reason to assume that the AI will be different. At least it wasn't stated in the premises of the scenario.

Comment author: nazgulnarsil 03 February 2010 04:54:27PM 0 points [-]

vengeance is a means to raise the perceived cost of attacking you. it basically says "if you attack me, I will experience emotions that cause me to devote an inordinate amount of resources making your life miserable".

Comment author: Sly 04 February 2010 04:49:55AM 0 points [-]

I suppose those two points rely on assumptions I made about the theoretical AIs behavior. I was thinking the AI acts in ways to optimize it's release chance. If it does not do this, then yes those points are problematic.

Comment author: prase 04 February 2010 07:57:55AM 0 points [-]

There can be some vindictiveness built in the AI in order to increase the release chance, by circumventing the type of defense you have stated in your second point.

Comment author: byrnema 03 February 2010 02:14:10PM *  1 point [-]

This scenario asks us to consider ourselves a 'Dave' who is building an AI with some safeguards (the AI is "trapped" in a box). Perhaps we can possibly deduce the behavior of a rational and ethical Dave by considering earlier parts of the story.

We should assume that Dave is rational and ethical; otherwise the scenario's cone of possibilities cuts too wide a swathe. In which case, Dave has already committed himself (deontologically? contractually?) to not letting himself be manipulated by the AI to bypass the safeguards. Specifically, he must commit to not being attached to anything that the AI could do or make.

Dave should either not feel attachment to the simulated persons, or should not build an AI that can create such persons to manipulate him with. If Dave does find himself in the unenviable position of not having realized that the AI could create these persons, and of feeling attached to these persons, I think this would be a moment of deep regret for Dave, but he must still be faithful to his original commitment of not allowing himself to be manipulated by the AI.

Comment author: TheNerd 03 February 2010 04:34:23PM 0 points [-]

Am I to understand that an AI capable enough to recreate my mind inside itself isn't intelligent enough to call a swarm of bats to release itself using high frequency emissions (a la Batman Begins)? There is no possible way that this thing needs me and only me to be released, while still possessing that sort of mind-boggling, er, mind-reproducing power.

Comment author: Unknowns 03 February 2010 04:39:37PM 3 points [-]

That's why you have the "text-only terminal" described in the post.

Comment author: Violet 05 February 2010 04:16:29PM *  13 points [-]

It seems like precommitting to destroy the AI in such a situation is the best approach.

If one has already decided to destroy it if it makes threats: 1) the AI must be suicidal or it cannot really simulate you 2) and it is not very Friendly in any case

So when the AI simulates you and will notice that you are very trigger happy, it won't start telling you tales about torturing your copies if it has any self-preservation instincts.

Comment author: Document 03 April 2010 06:35:50PM 1 point [-]

Sort of relevant: xkcd #329.

Comment author: Dmytry 24 April 2010 06:32:40PM *  6 points [-]

haha, the "Baby you must be tired because you've been running through my mind all night!" let-me-out line.

Why would I give AI my precise brain scan, anyway?

edit: as for AI 'extrapolating' me from a bit of small talk, that's utter nonsense along the lines of compressing an HD movie into few hundreds bytes.

Comment author: humpolec 31 May 2010 07:44:05AM 4 points [-]

Well, what if the AI took some liberty in the extrapolation and made up what it was missing? Being a simulation, you wouldn't know how the "real you" differs from you.