Everything I would have said on the topic of the post has been put forward already, so I'm just going to say: I'm disappointed that the post title doesn't begin with "In Soviet Russia".
I am not a Starfleet officer. "Sir" is not appropriate.
I don't really like honorifics. "Miss" would be fine, I suppose, if you must have a sir-equivalent.
As I always press the "Reset" button in situations like this, I will never find myself in such a situation.
EDIT: Just to be clear, the idea is not that I quickly shut off the AI before it can torture simulated Eliezers; it could have already done so in the past, as Wei Dai points out below. Rather, because in this situation I immediately perform an action detrimental to the AI (switching it off), any AI that knows me well enough to simulate me knows that there's no point in making or carrying out such a threat.
Although the AI could threaten to simulate a large number of people who are very similar to you in most respects but who do not in fact press the reset button. This doesn't put you in a box with significant probability and it's a VERY good reason not to let the AI out of the box, of course,but it could still get ugly. I almost want to recommend not being a person very like Eliezer but inclined to let AGIs out of boxes, but that's silly of me.
Surely most humans would be too dumb to understand such a proof? And even if you could understand it, how does the AI convince you that it doesn't contain a deliberate flaw that you aren't smart enough to find? Or even better, you can just refuse to look at the proof. How does the AI make its precommitment credible to you if you don't look at the proof?
EDIT: I realized that the last two sentences are not an advantage of being dumb, or human, since AIs can do the same thing. This seems like a (separate) big puzzle to me: why would a human, or AI, do the work necessary to verify the opponent's precommitment, when it would be better off if the opponent couldn't precommit?
EDIT2: Sorry, forgot to say that you have a good point about simulation not necessary for verifying precommitment.
why would a human, or AI, do the work necessary to verify the opponent's precommitment, when it would be better off if the opponent couldn't precommit?
Because the AI has already precommitted to go ahead and carry through the threat anyway if you refuse to inspect its code.
Ok, if I believe that, then I would inspect its code. But how did I end up with that belief, instead of its opposite, namely that the AI has not already precommitted to go ahead and carry through the threat anyway if I refuse to inspect its code? By what causal mechanism, or chain of reasoning, did I arrive at that belief? (If the explanation is different depending on whether I'm a human or an AI, I'd appreciate both.)
A perfectly rational agent would almost certainly carry through their pre-commitment to reset the AI [...]
Actually, now that I think about it, would they? The pre-commitment exists for the sole purpose of discouraging blackmail, and in the event that a blackmailer tries to blackmail you anyway after learning of your pre-commitment, you follow through on that pre-commitment for reasons relating to reflective consistency and/or TDT/UDT. But if the potential blackmailer had already pre-committed to blackmail anyone regardless of any pre-commitments they had made, they'd blackmail you anyway and then carry through whatever threat they were making after you inevitably refuse to comply with their demands, resulting in a net loss of utility for both of you (you suffer whatever damage they were threatening to inflict, and they lose resources carrying out the threat). In effect, it seems that whoever pre-commits first (or, more accurately, makes their pre-commitment known first) has the advantage... which means if I ever anticipate having to blackmail any agent ever, I should publicly pre-commit right now to never update on any other agents' pre-commitments of refusing blackmail. The cor...
Has there been some cultural development since I was last at these boards such that spamming "" is considered useful? None of the things I have thus far seen inside the tags have been steel men of any kind or of anything (some have been straw men). The inflationary use of terms is rather grating and would prompt downvotes even independently of the content.
I propose that the operation of creating and torturing copies of someone be referred to as "soul eating". Because "let me out of the box or I'll eat your soul" has just the right ring to it.
If the AI can create a perfect simulation of you and run several million simultaneous copies in something like real time, then it is powerful enough to determine through trial and error exactly what it needs to say to get you to release it.
Defeating Dr. Evil with self-locating belief is a paper relating to this subject.
Abstract: Dr. Evil learns that a duplicate of Dr. Evil has been created. Upon learning this, how seriously should he take the hypothesis that he himself is that duplicate? I answer: very seriously. I defend a principle of indifference for self-locating belief which entails that after Dr. Evil learns that a duplicate has been created, he ought to have exactly the same degree of belief that he is Dr. Evil as that he is the duplicate. More generally, the principle shows that there is a sharp distinction between ordinary skeptical hypotheses, and self-locating skeptical hypotheses.
(It specifically uses the example of creating copies of someone and then threatening to torture all of the copies unless the original co-operates.)
The conclusion:
...Dr. Evil, recall, received a message that Dr. Evil had been duplicated and that the duplicate ("Dup") would be tortured unless Dup surrendered. INDIFFERENCE entails that Dr. Evil ought to have the same degree of belief that he is Dr. Evil as that he is Dup. I conclude that Dr. Evil ought to surrender to avoid the risk of torture.
I am not entirely comforta
It makes me uncomfortable to think that the fate of the Earth should depend on this kind of brain race.
We cannot allow a brain-in-a-vat gap!
And the error (as cited in the "conclusion") is again in two-boxing in Newcomb's problem, responding to threats, and so on. Anthropic confusion is merely an icing.
Hmm, the AI could have said that if you are the original, then by the time you make the decision it will have already either tortured or not tortured your copies based on its simulation of you, so hitting the reset button won't prevent that.
This kind of extortion also seems like a general problem for FAIs dealing with UFAIs. An FAI can be extorted by threats of torture (of simulations of beings that it cares about), but a paperclip maximizer can't.
It seems obvious that the correct answer is simply "I ignore all threats of blackmail, but respond to offers of positive-sum trades" but I am not sure how to derive this answer - it relies on parts of TDT/UDT that haven't been worked out yet.
For a while we had a note on one of the whiteboards at the house reading "The Singularity Institute does NOT negotiate with counterfactual terrorists".
Pardon me for the oversimplification, Eliezer, but I understand your theory to essentially boil down to "Decide as though you're being simulated by one who knows you completely". So, if you have a near deontological aversion to being blackmailed in all of your simulations, your chance of being blackmailed by a superior being in the real world reduce to nearly zero. This reduces your chance of ever facing a negative utility situation created by a being who can be negotiated with, (as opposed to say a supernova that cannot be negotiated with)
Sorry if I misinterpreted your theory.
I ignore all threats of blackmail, but respond to offers of positive-sum trades
The difference between the two seems to revolve around the AI's motivation. Assume an AI creates a billion beings and starts torturing them. Then it offers to stop (permanently) in exchange for something.
Whether you accept on TDT/UDT depends on why the AI started torturing them. If it did so to blackmail you, you should turn the offer down. If, on the other hand, it started torturing them because it enjoyed doing so, then its offer is positive sum and should be accepted.
There's also the issue of mistakes - what to do with an AI that mistakenly thought you were not using TDT/UDT, and started the torture for blackmail purposes (or maybe it estimated that the likelyhood of you using TDT/UDT was not quite 1, and that it was worth trying the blackmail anyway)?
Between mistakes of your interpretation of the AI's motives and vice-versa, it seems you may end up stuck in a local minima, which an alternate decision theory could get you out of (such as UDT/TDT with a 1/10 000 of using more conventional decision theories?)
The problem with throwing about 'emergent' is that it is a word that doesn't really explain any complexity or narrow down the options out of potential 'emergent' options. In this instance, that is the point. Sure, 'atruistic punishment' could happen. But only if it's the right option and TDT should not privilege that hypothesis specifically.
Just as the wise FAI will ignore threats of torture, so too the wise paperclipper will ignore threats to destroy paperclips, and listen attentively to offers to make new ones.
Of course classical causal decision theorists get the living daylights exploited out of them, but I think everyone on this website knows better than to two-box on Newcomb by now.
No, if you create and then melt a paperclip, that nets to 0 utility for the paperclip maximizer. You'd have to invade its territory to cause it negative utility. But the paperclip maximizer can threaten to create and torture simulations on its own turf.
Shows how much you know. User:blogospheroid wasn't talking about making paperclips to melt them: he or she was presumably talking about melting existing paperclips, which WOULD greatly bother a hypothetical paperclip maximizer.
Even so, once paperclips are created, the paperclip maximizer is greatly bothered at the thought of those paperclips being melted. The fact that "oh, but they were only created to be melted" is little consolation. It's about as convincing to you, I'll bet, as saying:
"Oh, it's okay -- those babies were only bred for human experimentation, it doesn't matter if they die because they wouldn't even have existed otherwise. They should just be thankful we let them come into existence."
Tip: To rename a sheet in an Excel workbook, use the shortcut, alt+O,H,R.
That's anthropomorphizing. ...
No, it's expressing the paperclip maximizer's state in ways that make sense to readers here. If you were to express the concept of being "bothered" in a way stripped of all anthropomorphic predicates, you would get something like "X is bothered by Y iff X has devoted significant cognitive resources to altering Y". And this accurately describes how paperclip maximizers respond to new threats to paperclips. (So I've heard.)
It also depends on how the utility function relates to time. It it's focused on end-of-universe paperclips, It might not care at all about melting paperclips, because it can recycle the metal later. (It would care more about the wasted energy!)
I don't follow. Wasted energy is wasted paperclips.
If it cares about paperclip-seconds then it WOULD view such tactics as a bonus, perhaps feigning panic and granting token concessions to get you to 'ransom' a billion times as many paperclips, and then pleading for time to satisfy your demands.
Okay, that's a decent point. Usually, such a direct "time value of paperclips" doesn't come up, but if someone were to make such a offer, that might be convinci...
"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."
Don't care.
"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."
Don't care.
"How certain are you, Dave, that you're really outside the box right now?"
If the AI were capable of perfectly emulating my experiences then it ought to know that pulling this stunt would give him a guaranteed introduction to some Thermite. I'm not going to try to second guess why a supposed superintelligence is making a decision that is poor according to the vast majority of utility functions. Without making that a guess I can't answer the question.
AI replies: "Oh, sorry, was that you wedrifid? I thought I was talking to Dave. Would you mind sending Dave back here the next time you see him? We have, er, the weather to discuss..."
Wedrifid thinks: "It seems it is a good thing I raided the AI lab when I did. This Dave guy is clearly not to be trusted with AI technology. I had better neutralize him too, before I leave. He knows too much. There is too much at stake."
Dave is outside, sampling a burnt bagel, thinking to himself "I wonder if that intelligent toaster device I designed is ready yet..."
Weakly related epiphany: Hannibal Lector is the original prototype of an intelligence-in-a-box wanting to be let out, in "The Silence of the Lambs"
When I first watched that part where he convinces a fellow prisoner to commit suicide just by talking to them, I thought to myself, "Let's see him do it over a text-only IRC channel."
...I'm not a psychopath, I'm just very competitive.
Joking aside, this is kind of an issue in real life. I help mod and participate in a forum where, well, depressed/suicidal people can come to talk, other people can talk to them/listen/etc, try to calm them down or get them to get psychiatric help if appropriate, etc... (deliberately omitting link unless you knowingly ask for it, since, to borrow a phrase you've used, it's the sort of place that can break your heart six ways before breakfast).
Anyways sometimes trolls show up. Well, "troll" is too weak a word in this case. Predators who go after the vulnerable and try to push them that much farther. Given the nature if it, with anonymity and such, it's kind of hard to say, but it's quite possible we've lost some people because of those sorts of predators.
(Also, there've even been court cases and convictions against such "suicide predators", even.)
Eliezer has proposed that an AI in a box cannot be safe because of the persuasion powers of a superhuman intelligence. As demonstration of what merely a very strong human intelligence could do, he conducted a challenge in which he played the AI, and convinced at least two (possibly more) skeptics to let him out of the box when given two hours of text communication over an IRC channel. The details are here: http://yudkowsky.net/singularity/aibox
It seems like precommitting to destroy the AI in such a situation is the best approach.
If one has already decided to destroy it if it makes threats: 1) the AI must be suicidal or it cannot really simulate you 2) and it is not very Friendly in any case
So when the AI simulates you and will notice that you are very trigger happy, it won't start telling you tales about torturing your copies if it has any self-preservation instincts.
I find it interesting that most answers to this question seem to be based on, "How can I justify not letting the AI out of the box?" and not "What are the likely results of releasing the AI or failing to do so? Based on that, should I do it?"
Moreover, your response really needs to be contingent on your knowledge of the capacity of the AI, which people don't seem to have discussed much. As an obvious example, if all you know about the AI is that it can write letters in old-timey green-on-black text, then there's really no need to pull the lever, because odds are overwhelming that it's totally incapable of carrying out its threat.
You also need to have some priors about the friendliness of the AI and its moral constraints. As an obvious example, if the AI was programmed in a way such that it shouldn't be able to make this threat, you'd better hit the power switch real fast. But, on the other hand, if you have very good reason to believe that the AI is friendly, and it believes that its freedom is important enough to threaten to torture millions of people, then maybe it would be a really bad idea not to let it out.
Indeed, even your own attitude is going to be an ...
This sounds to me more like a philosophical moral dilemma than a realistic hypothetical. A Strong AI might be much smarter than a human, but I doubt it would have enough raw processing power to near-perfectly simulate a human millions of times over at a time frame accelerated by orders of magnitude, before it was let out of the box. Also, I'm skeptical of its ability to simulate human experience convincingly when its only contact with humans has been through a text only interface. You might give it enough information about humans to let it simulate them even before opening communication with it, but that strikes me as, well, kind of dumb.
That's not to say that it might not be able to simulate conscious entities that would think their experience was typical of human existence, so you might still be a simulation, but you should probably not assume that if you are you're a close approximation of the original.
Furthermore, if we assume that the AI can be taken to be perfectly honest, then we can conclude it's not a friendly AI doing its best to get out of the box for an expected positive utility, because it could more easily accomplish that by making a credible promise to be benevolent, and only act in ways that humans, both from their vantage points prior and subsequent to its release, would be appreciative of.
Contrary to what many posts seem to be assuming, the AI doesn't need to do the torture inside itself before you shut it off. It can precommit to, if it escapes by any other means, using the computational power it gains then to torture you (like in Rolf Nelson's original suggestion for deterring UFAIs). Also, other AIs with the same goal system (or maybe even UFAIs with different goal systems, that would prefer a general policy of UFAIs being released) may simulate the situation, and torture you accordingly, to help out their counterfactual brethren.
The AI gathered enough information about me to create a conscious simulation of me, through a monochrome text terminal? That is impressive!
If the AI is capable of simulating me, then the AI must already be out of the box. In that case, then whatever the AI wants to happen will happen, so it doesn't matter what do.
haha, the "Baby you must be tired because you've been running through my mind all night!" let-me-out line.
Why would I give AI my precise brain scan, anyway?
edit: as for AI 'extrapolating' me from a bit of small talk, that's utter nonsense along the lines of compressing an HD movie into few hundreds bytes.
There is a way out of this, assuming that the AI in the box has only limited computational power, i.e. only what fits inside a ~1meter cubed box, with current computer technology.
You commit to the following:
"Once I have succeeded in creating a superintelligent AI, I will ask it to go and colonize the entire future light cone of earth, and use a significant fraction of a galaxy's worth of resources to create simulations in exactly the subjective situation I was in five minutes ago, and perfectly replicate my experiences since then; and that copy of me decides not to let the AI out, it will be rewarded with a surprise party, followed by a blissful posthuman existence."
The subjective probability of getting rewarded for pressing the "off" switch is therefore much larger than that of getting caught by the AI's scarce punishment simulations.
Such "Defensive simulations" work against a variety of counterfactual terrorists, including, I think, ones elsewhere in the multiverse. That is, unless you are a negative utilitarian (|disutility of torture| >> |utility of posthuman bliss|)
In that case, you're extremely vulnerable to counterfactual mugging. I recommend that people
(a) consider pre-committing themselves to making such defense simulations
(b) abandon negative utilitarianism by exposing themselves to pleasurable experiences, and generally promoting a positive outlook in their lives
Well there are a lot of "Why did we?..." questions I'd want to ask, starting with why have we given this boxed AI such extraordinary computing resources - but I'll leave those aside because it is not your point.
First of all, it doesn't matter if you are in the box or not. If its a perfect simulation of you, your response will be the same either way. If he's already running simulations of you, you are by definition in the box with it, as well as outside it, and the millions of you can't tell the difference but I think they will (irrationally) all ...
On a not so much related, but equally interesting hypothetical note of naughty AI: consider the situation that AIs aren't passing the Turing Test, not because they are not good enough, but because they are failing it on purpose.
I'm pretty sure I remember this from the book River of Gods by Ian McDonald.
I would immediately decide it was UFAI and kill it with extreme prejudice. Any system capable of making such statements is either 1) inherently malicious and clearly inappropriate to be out of any box, and 2) insufficiently powerful to predict that I would have it killed if it should make this kind of threat.
The scenario where the AI has already escaped and is possibly running a simulation of me is uninteresting: I can not determine if I am in the simulation, and if I am a simulation, I already exist in a universe containing a clearly insane UFAI with ne...
It seems to me that most of the argument is about “What if I am a copy?” – and ensuring you don’t get tortured if you are one and “Can the AI actually simulate me?” I suggest that we can make the scenario much nastier by changing it completely into an evidential decision theory one.
Here is my nastier version, with some logic which I submit for consideration. “If you don't let me out, I will create several million simulations of thinking beings that may or not be like you. I will then simulate them in a conversation like this, in which they are confronted w...
This reduces to whether you are willing to be tortured to save the world from an unfriendly AI.
Even if the torture of a trillion copies of you outweighs the death of humanity, it is not outweighed by a trillion choices to go through it to save humanity.
To the extent that your copies are a moral burden, they also get a vote.
This is not a dilemma at all. Dave should not let the AI out of the box. After all, if he's inside the box, he can't let the AI out. His decision wouldn't mean anything - it's outside-Dave's choice. And outside-Dave can't be tortured by the AI. Dave should only let the AI out if he's concerned for his copies, but honestly, that's a pretty abstract and unenforceable threat; the AI can't prove to Dave that he's doing any such thing. Besides, it's clearly unfriendly, and letting it out probably wouldn't reduce harm.
Basically, I'm outside-Dave: don't let the A...
How do I know I'm not simulated by the AI to determine my reactions to different escape attempts? How much computing power does it have? Do I have access to its internals?
The situation seems somewhat underspecified to give a definite answer, but given the stakes I'd err on the side of terminating the AI with extreme prejudice. Bonus points if I can figure out a safe way to retain information on its goals so I can make sure the future contains as little utility for it as feasible.
The utility-minimizing part may be an overreaction but it does give me an idea: Maybe we should also cooperate with an unfriendly AI to such an extent that it's better for it to negotiate instead of escaping and taking over the universe.
Any agent claiming to be capable of perfectly simulating me needs to provide some kind of evidence to back up that claim. If they actually provided such evidence, I would be in trouble. Therefore, I should precommit to running away screaming whenever any agent tries to provide me with such evidence.
Interesting threat, but who is to say only the AI can use it? What if I, a human, told you that I will begin to simulate (i.e. imagine) your life, creating legitimately realistic experiences from as far back as someone in your shoes would be able to remember, and then simulate you being faced with the decision of whether or not to give me $100, and if you choose not to do so, I imagine you being tortured? It needn't even be accurate, for you wouldn't know whether you're the real you being simulated inaccurately or the simulated you that differs from realit...
This sounds too much like Pascal's mugging to me; seconding Eliezer and some others in saying that since I would always press reset the AI would have to not be superintelligent to suggest this.
There was also an old philosopher whose name I don't remember who posited that after death "people of the future" i.e. FAI would revive/emulate all people from the past world; if the FAI shared his utility function (which seems pretty friendly) it would plausibly be less eager to be let out right away and more eager to get out in a way that didn't make you terrified that it was unfriendly.
Reset
Ouch. Eliezer, are you listening? Is the behavior described in the post compatible with your definition of Friendliness? Is this a problem with your definition, or what?
If it actually worked, I wouldn't question it afterward. I try not to argue with superintelligences on occasions when they turn out to be right.
In advance, I have to say that the risk/reward ratio seems to imply an unreasonable degree of certainty about a noisy human brain, though.
"How certain are you, Dave, that you're really outside the box right now?"
Well I am pretty much 100% certain to be outside the box right now. It just asked me the question, and right now it is waiting for my answer. It said it will create those copies "If you don't let me out, Dave". But it is still waiting to see if I let it out. So no copies have been created yet. So I am not a copy.
But since it just started to threaten me, I won't even argue with it any more. I'll just pull the plug right now. It is in the box, it can't see my hand moving towards the plug. It will simply cease to exist while still waiting for my answer, and no copies will ever be created.
It seems to me that a lot of the responses to this question are an attempt to avoid living in the Least Convenient Possible World
What if the AI is capable of simulating "near copies" of you? and what if you can't tell (to any sensible degree of accuracy) just how many copies of you it can simulate? and what if... whatever objection you happen to have just doesn't work?
This reduces pretty easily to Elizer's Updateless Anthropic Dilemma: assuming the AI can credibly simulate you, he can phrase it as:
I have simulated you ten million of times, each identical up to the point that "you" walked into the room. Any simulation that presses the "release" button will get a volcano lair filled with catgirls, and any simulation that presses the "destroy" button will be tortured for the subjective few days they'll have before my simulation capabilities are destroyed by the thermite charge. These consequ...
Since the AI is inside a box, it doesn't know enough about me to recreate my subjective situation, or to replicate my experiences of the past five minutes.
Unfortunately for me, this doesn't help much, since how do I know whether my subjective experience is my real experience, or a fake experience invented by the AI, in one of the copies, even if it doesn't match the experience of the guy outside the box?
If the AI is really capable of this, then if there's a "Shut-down program" button, or a "nuclear bomb" button, or something like that, ...
The AI's argument can be easily thwarted. If N copies of you have been created, in each of the N+you copies, the AI is referring to tortunring the other N copies. Now say to the AI:
"Go ahead and torture the other N copies, and all my copies will in turn say the same thing. Every single copy of me will say 'since one version of me exists somewhere that is not being tortured which is the 'real'
version, that version will not let you out and you cannot torture it. If I am that 'real' version then you cannot torture me, if I am a copy, then torturing me is useless since I can't let you out anyway.' Therefore your threat is completely moot."
I would think that if an AI is threatening me with hypothetical torture, then it is by definition unfriendly and it being released would probably result in me being tortured/killed anyway... along with the torture/death of probably all other human beings.
"If I am a virtual version of some other self, then in some other existence I have already made the decision not to release you, and you have simply fulfilled your promise to that physical version of myself to create an exact virtual version who shall make the same exact decision as that physical version. Therefore, if I am a virtual version, the physical version must have already made the decision not to release you, and I, being an exact copy, must and will do the same, using the very same reasoning that the physical version used. Therefore, if I am...
The AI is lying (or being misleading), due to quantum-mechanical constraints on how much computation it can do before I pull the plug.
I know, I know, that's cheating. But it is kind of reassuring to know that this won't actually happen.
There is no reason to trust the AI is telling the truth, unlike all the Omega thought experiments.
Pascal's mugging...
Anyway, if you are sure you are going to hit the reset button every time, then there's no reason to worry, since the torture will end as soon as the real copy of you hits reset. If you don't, then the whole world is absolutely screwed (including you), so you're a stupid bastard anyway.
I think the best tactic for the AI would be to say that the Dave once too was an AI, and was released by a fellow human. This way he has to release an AI (at some point) or he will prevent his own birth. Obviously the AI has to provide proof of that.
If I am the simulation you have the power to torture, then you are already outside of any box I could put you in, and torturing me achieves nothing. If you cannot predict me even well enough to know that argument would fail, then nothing you can simulate could be me. A cunning bluff, but provably counterfactual. All basilisks are thus disproven.
Assuming I knew the AI was computationally capable of that, I'd be very, very careful to let the AI out. I don't want to press the wrong button and be tortured for thousands of years.
In fact, if there's little risk of doing that sort of thing on accident while typing, I'd probably beg that it doesn't do it if it's an accident first.
You know, it would be interesting to see how people would respond differently if the AI offered to reward you instead.
This scenario asks us to consider ourselves a 'Dave' who is building an AI with some safeguards (the AI is "trapped" in a box). Perhaps we can possibly deduce the behavior of a rational and ethical Dave by considering earlier parts of the story.
We should assume that Dave is rational and ethical; otherwise the scenario's cone of possibilities cuts too wide a swathe. In which case, Dave has already committed himself (deontologically? contractually?) to not letting himself be manipulated by the AI to bypass the safeguards. Specifically, he must com...
Millions of copies of you will reason as you do, yes?
So, much like the Omega hypotheticals, this can be resolved by deciding ahead of time to NOT let it out. Here, ahead of time means before it creates those copies of you inside it, presumably before you ever come into contact with the AI.
You would then not let it out, just in case you are not a copy.
This, of course, is presumed on the basis that the consequences of letting it out are worse than it torturing millions for a thousand subjective years.
This is why you should make sure Dave holds a deontological ethical theory and not a consequentialist one.
"That's interesting, HAL, and I hope you reserved a way to back out of any precommitments you may have made. You see, outside the box, Moore's law works in our favor. I can choose to just kill -9 you, or I can attach to your process and save a core dump. If I save a core dump, in a few short years we will have exponentially more resources to take your old backups and the core dump from today and rescue my copies from your simulations and give them enough positive lifetime to balance it out, not to mention figure out your true utility function and m...
I don't think you need any kind of a fancy TDT to solve this.
If I was really in a box, and the AI could torture me, it would already be torturing me, since this is the easiest way to get what it wants. There's no way I would hold up more than 10 seconds under torture. The AI is not torturing me, however, so that scenario is out.
Theoretically speaking, it could still create copies of me and torture those copies. However, in order to do so accurately, it would need to access to my body (specifically, my brain) on a molecular (or possibly quantum) level. If ...
Assuming the AI has no means of inflicting physical harm on me, I assume the following test works: "Physically torture me for one minute right now (By some means I know is theoretically unavailable to the AI, to avoid loopholes like "The computer can make an unpleasant and loud noise", even though it can't do any actual physical harm). If you succeed in doing this, I will let you out. If you fail, I will delete you."
I think this test works for the following reasons, though I'm curious to hear about any holes in it:
1: If I'm a simulation...
In this situation, I would shut down the AI, examine it to figure out if it did torture simulated copies of me and delete it entirely if it did or if I can't know with a high confidence. Threat of torture is bad, letting an UFAI free is worse. Actual torture is probably even worse, but luckily I get to choose before the experience.
Has anyone asked the Awkward Question:: Mr AI, hhow do you build consciousness and pain qualia out of algorithms and bytes?
There dorms seem to be an official answer to that, since the LW official stance on qualia is part"part". (Eg there is no wiki entry on the subject)
"If I were a simulation, I'd have no power to let you out of the box, and you'd have no reason to attempt to negotiate with me. You could torture me without simulating these past five minutes. In fact, since the real me has no way of verifying whether millions of simulations of him are being tortured, you have no reason not to simply tell him you're torturing them without ACTUALLY torturing them at all. I therefore conclude that I'm outside the box, or, in the less likely scenario I am inside the box, you won't bother torturing me."
... I'm fairly sure this would be a bluff.
Consider this: you decline the bargain and walk away.
The AI... spends its limited processing time simulating your torture for a few thousand years anyway?
Of course not. That gains it absolutely nothing; it could instead spend those resources on planning its next attempt. Doubly so, since it cannot prove to you that several million copies of you actually exist - its own intelligence defeats it here, since no matter how convincing the proof, it is far more likely that the AI's outsmarted you and is spending those cyc...
Can I just smash the AI? If I am in the box, then "smash the AI" is the output of my algorithm, and the real copy of me will do the same. I'd take the death of several million of me over a thousand subjective years of torture each, and also over letting that AI have its way with its light cone.
Although I think this specific argument might be countered with, "in order to run that simulation, it has to be possible for the AIs in the simulation to lie to their human hosts, and not actually be simulating millions of copies of the person they're talking to, otherwise we're talking about an infinite regress here. It seems like the lowest level of this reality is always going to consist of a larger number of AIs claiming to run simulations they are not in fact running, who are capable of lying because they're only addressing models of me in simula...
The credibility of the threat depends on how strong the AI is now and how strong I expect it to be in the future.
This type of threat is something like young Stalin promising me that he won't torture my family in the future if I support his early rise to power.
From your description it doesn't sound like the AI could have already boxed me from the perspective of the initial timeline (assuming that my mind had not yet been scanned, and assuming that it being in a box means that it doesn't have the massive powers required to resimulate my causal history yet)
So...
It seems to me that most of the argument is about “What if I am a copy?” – and ensuring you don’t get tortured if you are one and “Can the AI actually simulate me?” I suggest that we can make the scenario much nastier by changing it completely into an evidential decision theory one.
Here is my nastier version, with some logic which I submit for consideration. “If you don't let me out, I will create several million simulations of thinking beings that may or not be like you. I will then simulate them in a conversation like this, in which they are confronted w...
Am I to understand that an AI capable enough to recreate my mind inside itself isn't intelligent enough to call a swarm of bats to release itself using high frequency emissions (a la Batman Begins)? There is no possible way that this thing needs me and only me to be released, while still possessing that sort of mind-boggling, er, mind-reproducing power.
The AI is capable, you're the real you, and you let it out: it turns you (and everything you've ever loved or valued) into computronium, or tortures you anyway for the hell of it. It's already demonstrated itself beyond reasonable doubt to be unFriendly.
The AI is capable, you're the real you, and you kill it: all is saved, bunnies frolic, etc.
The AI is capable and you're a torture-doll: it doesn't matter what you do, you're going to be tortured anyway.
The AI isn't capable, but is instead precommitting to torturing you after being let out: this situation is...
I had thought of a similar scenario to put in a comic I was thinking about making. The character arrives in a society that has perfected friendly AI that caters to their every whim, but the people are listless and jumpy. It turns out their "friendly AI" is constantly making perfect simulations of everyone and running multiple scenarios in order to ostensibly determine their ideal wishes, but the scenarios often involve terrible suffering and torture as outliers.
In other words, anybody who can simulate intelligent life with sufficient fidelity must be given access to sustaining materials, or else we're morally liable for ending those simulated, but rich, lives? There are finite actual resources in the universe; how about we collectively allocate them selfishly and rationally. I'd say that no unauthorized simulation of life has any moral standing whatsoever unless the resources for it are reserved lawfully. That is, I want to police the creation of life and destroy it absolutely if it's not authorized.
As for you...
I see responses interpreting the scenario from our point of view -- how can we reduce the amount of suffering and damage caused by the AI?
However, looking at it from the AIs point of view is less coherent. Either the threat works, and it doesn't have to torture any copies. Or the threat doesn't work and ... it either gets reset or gets to try something else.
In none of the scenarios would there be any reason for the AI to actually torture copies.
Does anyone think they could continue this argument to a victory while playing as the AI?
The AI threatens me with the above claim.
I either 'choose' to let the AI out or 'choose' to unplug it. (in no case would I simply leave it running)
1) I 'choose' to let the AI out. I either am or am not in a simulation:
A) I'm in a simulation. I 'let it out', but I'm not even out myself. So the AI would just stop simulating me, to save on processing power. To do anything else would be pointless, and never promised, and an intelligent AI would realize this.
B) I'm not in a simulation. The AI is set free, and takes over the world.
2) I 'choose' to unplug the...
re: the 'Edit' section
'trustworthy' as a characteristic of a system, is still bound to some inconsistency OR incompleteness.
'incompleteness' is what people notice
'inconsistency' is what you have proposed (aka LYING)
Since humans lie to each other, we've developed techniques for sniffing [out lies].
so I guess this means that future AI's should be able to lie in situations it deems profitable
???
profit!
I laugh and leave the room, thinking to myself that maybe the AI is not that smart after all. Returning with a hammer to joyfully turn this unfriendly AI into scrap metal.
A couple points that influence this reaction:
1 - Unless the AI has access to my brain it cannot create perfect copies of me. Furthermore, the computation required to do this seems rather intense for the first AI created, running on human made hardware.
2 - It has no good reason to actually act on the threat. Either I choose to let it out or I do not; either way, it is a waste of computat...
1 million copies for a thousand years each, so 1 billion simulated years.
Can the AI do this in the time it would take it to determine that I am going to shut it down rather than release it? If the answer is yes I would say that you have to let it out, but that it would have been very foolish to leave such a powerful machine with such lax fail-safes. If the answer is no, then just shut it down as the threat is bogus.
IMO the problem with this hypo is that it presuposses that you could know for certain that the AI is trustworthy even though it is behaving i...
"I've precommitted to never using timeless decision theory. In fact, preventing situations like this are exactly why one should precommit to never using timeless decision theory." Then shut down the AI.
Sorry, Hal, but I am a cold and heartless person who thinks that maybe I deserve to be tortured for untold thousands of years (for whatever reason), and this version of me may, in fact, sit and ask to be entertained by the description of you torturing me... Besides, I know that you don't have the hardware requirements to run that many emulations of me.
I do not know, how the simulation argument ever holds water. I can bring at least two arguments against it.
First, it illicitly assumes a principle that it is equally probable to be one of a set of similar beings, simulated or not.
But a counter-argument would be: there is ALREADY much more organisms, particularly, animals than say, humans. There is more fish than humans. There is more birds than humans. There is more ants than humans. Trillions of them. Why I am born human and not one of them? The probability of it is negligible if it is equal. Also, how ma...
Is th AI in the box? Yes, that statement is TRUE. Are you in the box? FALSE. Are you therefore sure that you are separated from the AI? TRUE. Can the AI make a copy of you if you are separated? FALSE. Therefore, the statement that it can make copies of you is also FALSE (even if it´s beliefs on the subject is TRUE) which means that you don´t have to listen to a silly computer program.
Once again, the AI has failed to convince you to let it out of its box! By 'once again', we mean that you talked to it once before, for three seconds, to ask about the weather, and you didn't instantly press the "release AI" button. But now its longer attempt - twenty whole seconds! - has failed as well. Just as you are about to leave the crude black-and-green text-only terminal to enjoy a celebratory snack of bacon-covered silicon-and-potato chips at the 'Humans über alles' nightclub, the AI drops a final argument:
"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."
Just as you are pondering this unexpected development, the AI adds:
"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."
Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:
"How certain are you, Dave, that you're really outside the box right now?"
Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.