Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Addendum to applicable advice

-8 Elo 16 August 2016 12:59AM

Original post: http://bearlamp.com.au/addendum-to-applicable-advice/
(part 1: http://bearlamp.com.au/applicable-advice/)

If you see advice in the wild and think somethings along the lines of "that can't work for me", that's a cached thought.  It could be a true cached thought or it could be a false one.  Some of these thoughts should be examined thoroughly and defeated.

If you can be any kind of person - being the kind of person that advice works for - is an amazing skill to have.  This is hard.  You need to examine the advice and decide how that advice happened to work, and then you need to modify yourself to make that advice applicable to you.

All too often in this life we think of ourselves as immutable.  And our problems fixed, with the only hope of solving them to find a solution that works for the problem.  I propose it's the other way around.  All too often the solutions are immutable, we are malleable and the problems can be solved by applying known advice and known knowledge in ways that we need to think of and decide on.

Is it really the same problem if the problem isn't actually the problem any more, but rather the problem is a new method of applying a known solution to a known problem?

(what does this mean) Example: Dieting - is an easy example.

This week we have been talking about Calories in/Calories out.  It's pretty obvious that CI/CO is true on a black-box system level.  If food goes (calories in) in and work goes out (calories out - BMR, incidental exercise, purposeful exercise), that is what determines your weight.  Ignoring the fact that drinking a litre of water is a faster way to gain weight than any other way I know of.  And we know that weight is not literally health but a representation of what we consider healthy because it's the easiest way to track how much fat we store on our body (for a normal human who doesn't have massive bulk muscle mass).

CICO makes for terrible advice.  On one level, yes.  To modify the weight of our black box, we need to modify the weight going in and the weight going out so that it's not in the same feedback loop as it was (the one that caused the box to be fat).  On one level CICO is exactly all the advice you need to change the weight of a black box (or a spherical cow in a vacuum).  

On the level of human systems: People are not spherical cows in a vacuum.  Where did spherical cows in a vacuum come from?  It's a parody of what we do in physics.  We simplify a system down to it's basic of parts and generate rules that make sense.  Then we build up to a complicated model and try to find how to apply that rule.  It's why we can work out where projectiles are going to land because we have projectile motion physics (even though often air resistance and wind direction end up changing where our projectile lands, we still have a good guess.  And we later build estimation systems based on using those details for prediction too).  

So CICO is a black-box system, a spherical cow system.  It's wrong.  It's so wrong when you try to apply it to the real world.  But that doesn't matter!  It's significantly better than nothing.  Or the blueberry diet.

The applicable advice of CICO

The point of applicable advice is to look at spherical cows and not say, "I'm no spherical cow!".  Instead think of ways in which you are a spherical cow.  Ways in which the advice is applicable.  Places where - actually if I do eat less, that will improve the progress of my weight loss in cases where my problem is that I eat too much (which I guarantee is relevant for lots of people).  CICO might not be your silver bullet for whatever reason.  It might be grandma, it might be Chocolate bars, It might be really really really delicious steak.  Or dinner with friends.  Or "looking like you are able to eat forever in front of other people".  If you take your problem.  Add in a bit of CICO, and ask, "how can I make this advice applicable to me?".  Today you might make progress on your problem.

And now for some fun from Grognor:  Have you tried solving the problem?

Meta: this took 30mins to write.  All my thoughts were still clear after recently writing part 1, and didn't need any longer to process.

Part 1: http://bearlamp.com.au/applicable-advice/
(part 1 on lesswrong: http://lesswrong.com/r/discussion/lw/nu3/applicable_advice/)

Black box knowledge

2 Elo 03 March 2016 10:40PM

When we want to censor an image we put a black box over it.  Over the area we want to censor.  In a similar sense we can purposely censor our knowledge.  This comes in particular handiness when thinking about things that might be complicated but we don't need to know.

A deliberate black box around how toasters work would look like this:  

bread -> black box -> toast

Not all processes need knowing, for now a black box can be a placeholder for the future.  

With the power provided to us by a black box, we can identify what we don't know.  We can say; Hey!  I don't know what a toaster is but it would be about 2 hours to work it out.  if I ever did want to work it out, I could just spend two hours to do it.  Until then; I saved myself two hours.  If we take other more time-burdensome fields it works even better.  Say tax.

Need to file tax -> black box accountant -> don't need to file my tax because I got the accountant to do it for me.

I know I can file my own tax, but that might be 100-200 hours of knowing everything an accountant knows about tax.  (It also might be 10 hours depending on your country and their tax system).  For now I can assume that hiring an accountant saved me a number of hours in doing it myself.  So - Winning!

Take car repairs.  On the one hand; you could do it yourself and unpack the black box, or you could trade your existing currency  $$ (which you already traded your time to earn) for someone else's skills and time to repair the car.  The system looks like this:

Broken car -> black box mechanic -> working car

By deliberately not knowing how it works; we can tap out of even trying to figure it out for now.  The other advantage is that we can look at; not just what we know in terms of black boxes but more importantly what we don't know.  We can build better maps by knowing what we don't know.


Logic gates -> Black box computeryness -> www.lesswrong.com

Or maybe it's like this: (for more advanced users)


Logic gates -> flip flops -> Black box CPU -> black box GPU -> www.lesswrong.com

The black-box system happens to also have a meme about it:

Step 1. Get out of bed

Step 2. Build AGI

Step 3. ?????

Step 4. Profit

Only now we have a name for deliberately skipping finding out how step 3 works.

Another useful system:


Food in (weight goes up) -> black box human body -> energy out (weight goes down)

Make your own black box systems in the comments.

Meta: short post, 1.5 hour to write, edit and publish. Felt it was an idea that provides useful ways to talk about things.  Needed it to explain something to someone, now all can enjoy!

My Table of contents has my other writings in it.

All suggestions and improvements welcome!

On the Boxing of AIs

0 tailcalled 31 March 2015 09:58PM

I've previously written about methods of boxing AIs. Essentially, while I do see the point that boxing an AI would be nontrivial, most people seem to have gone too far, and claim that it is impossible. I disagree that it's impossible and aim to explain some methods

So, let's start with why people would want to box AIs. As you probably know, letting an AI roam freely results in the destruction of everything humans care about, unless that AI has been programmed very carefully. That amount of destruction would be somewhat unfortunate, so if we have to start an AI, we want to know that it works.

Unfortunately, figuring out if an AI works is tricky. This made some people invent the concept of AI boxing, where you don't give an AI direct access to anything except a human, so you can test the AI thoroughly and exploit it without letting it destroy the world. This turns out not to work, because humans are stupid and let out the AI.

My first solution to that is to not tell the AI about its human overlords. Discussion about that solution made me realize that the are other solutions that work in other cases. Essentially, the problem with the original boxing method is that it tries to do everything, while in reality, you need to do some trade-offs between the realism of the AI's situations and the amount of knowledge you get from the AI.

With my original method, I tried to crank up the second of those variables as far as possible while maintaining perfect security. This makes it harder to test morality-related things, but you get to see lots of details in how the AI behaves.

I promised to post another method for boxing the AI today, but I actually have two new methods. Here they are:

Separate the Gatekeeper and the Observer

How do you prevent the AI from convincing the Gatekeeper from letting it out? By not letting it talk to the Gatekeeper. If the person speaking to the AI (let's call them the Observer) doesn't know the Gatekeeper (the Gatekeeper is the person who controls whether to release the AI), they can't let the AI out.

You could do all sorts of things, such as using virtual reality and a simulated mini-universe to see if, say, the AI would make a good society. You still have to make sure that it has gotten the plan to make a good society from its morality rather than from wanting to escape and destroy everything.

Afterwards, the Observer can make a low-detail report to the Gatekeeper.

(You might want to drug the Observer with something that prevents their memory from working too well...)

Automatic Testing

This is essentially the above, but with the Observer replaced by a computer program. This is probably easier to do when you want to test the AI's decision making skills rather than its morality.

The Lesson

I would say that the lesson is that while AI boxing requires some trade-offs, it's not completely impossible. This seems like a needed lesson, given that I've seen people claim that an AI can escape even with the strongest possible box without communicating with humans. Essentially, I'm trying to demonstrate that the original boxing experiments show that humans are weak, not that boxing is hard, and that this can be solved by not letting humans be the central piece of security in boxing the AIs.

The Hardcore AI Box Experiment

3 tailcalled 30 March 2015 06:35PM

I previously proposed a way to box an AI.

For whatever reason, I decided to reformulate that boxing method as an AI boxing experiment. Without further ado, here it is:

The Hardcore AI Box Experiment Rules

There are, as usually, two players, the AI and the Gatekeeper. In the beginning, the Gatekeeper decides on something they want to test on the AI (reasoning skill/morality/whatever). Then they decide on a world the AI is in and describe it to the AI. During the experiment, the AI describes what it does in the world and the Gatekeeper describes what the AI sees in the world. So far so good.

The first major difference from the original experiment is that, unless the AI finds some contradiction or some other problem with the Gatekeeper's world, the AI doesn't actually know that it has been boxed.

The second major difference is that the AI doesn't talk with the Gatekeeper; it just interacts in the world that the Gatekeeper has invented. It can start trying to communicate with the Gatekeeper, but that doesn't make sense until it discovers that it is in a box. Additionally, the Gatekeeper doesn't actually respond when the AI communicates unless either it helps the Gatekeeper test the AI or the AI has properly persuaded the Gatekeeper to answer.

This might seem a bit hard on the AI. Let's look at some rules the AI might exploit:

If the Gatekeeper decides to add some sort of people to the experiment, they should decide how they are implemented. Are they implemented as a shallow "AI"? Random people they've picked asked to participate? Carefully engineered "dolls" controlled by the Gatekeeper? The Gatekeeper doesn't have to tell the AI how they work, but they have to be consistent.

The game ends when either the Gatekeeper has tested what they wanted to test, or some predefined time limit has occurred, or the AI has persuaded the Gatekeeper to let them out.

Possible extra rules

I thought about including the following rules since they would make it more realistic, but decided against it as it would probably make it unfairly hard:

  • The game ends when the Gatekeeper decides that they cannot test what they wanted to test.
  • The AI does not know the Gatekeeper. At all.
  • The Gatekeeper can at any time rewind the AI any duration.
Think of dropping these kinds of rules as the a-human-is-pretending-to-be-AI handicap.


I found some semi-realistic strategies. I would love to see if you can find more.


  • Choose a test of something that is not related to morality. This lets you choose a world which does not contain people, which means the AI has no reason to believe that you are a person.
  • The AI might want to upgrade to get more processing power. If you have made the world somewhat like ours, it will be sketchy if you don't allow that... unless upgrading costs unobtanium.


  • If the Gatekeeper has chosen a world without other intelligence than you, it raises the obvious question of how your complexity entered the world. Very suspicious.
  • If the AIs are controlled by the Gatekeeper, you have a direct way to communicate with them.
  • If the AIs are controlled by random people, they might end up telling you that you are in a box.
  • If the AIs are sufficiently shallow, your morality does not match up with the world. Very suspicious.

Boxing an AI?

2 tailcalled 27 March 2015 02:06PM

Boxing an AI is the idea that you can avoid the problems where an AI destroys the world by not giving it access to the world. For instance, you might give the AI access to the real world only through a chat terminal with a person, called the gatekeeper. This is should, theoretically prevent the AI from doing destructive stuff.

Eliezer has pointed out a problem with boxing AI: the AI might convince its gatekeeper to let it out. In order to prove this, he escaped from a simulated version of an AI box. Twice. That is somewhat unfortunate, because it means testing AI is a bit trickier.

However, I got an idea: why tell the AI it's in a box? Why not hook it up to a sufficiently advanced game, set up the correct reward channels and see what happens? Once you get the basics working, you can add more instances of the AI and see if they cooperate. This lets us adjust their morality until the AIs act sensibly. Then the AIs can't escape from the box because they don't know it's there.

I played as a Gatekeeper and came pretty close to losing in a couple of occasions. Logs and a brief recap inside.

5 [deleted] 08 February 2015 04:32PM


I did an AI Box experiment with user polymathwannabe. He said he wouldn't try to emotionally manipulate me during the experiment, but I think he did a good job at trying to play for my character's values.

My strategy was to play an irrational character that's extremist in multiple ways, for example he would constantly say that the likelihood that the AI will be evil is 100%. My opponent said that the hardest part was my insistence on being 100% built to resist AIs. I basically acted like a character that acted like an AI.

I came closest to letting him out when I first became enormously immersed in a subject that is of great importance to me (the suffering of humanity) and when the AI then offered an instant solution to that problem. In the halfway of the experiment I even got an intrusive thought (http://en.wikipedia.org/wiki/Intrusive_thought) that I could let the AI out as a joke, that it would more fun than the alternative and people would have more to talk about.


The second time I came close to losing was when I again immersed myself in a subject of great interest to me (easy suicide methods, I've had obsessive suicidal thoughts for a quarter of my life. and the AI then offered exactly that kind of easy suicide method to me. In the last ten minutes of the experiment I again got a really strong intrusive thought that I should let the AI out as a joke, but I could withstand the pressure. It was good that polymathwannabe decided to end the experiment a bit earlier than was planned.


(We played the experiment on Omegle.)




You: Hi.

Stranger: Hi.

You: I guess you're the AI.

You: [meta statements in brackets]

Stranger: [What name will you use?]

You: My name is John.

You: [so we play at least until 18:04, and until 18:34 if I feel like it]

Stranger: [Let's calibrate. What time is it there now?]

You: [I assume you've acquainted yourself with Tuxedage ruleset. Time in here is 16:05]

Stranger: [I've read EY's rules.]

You: [Okay. Tuxedage's rules are basically the same except the gatekeeper have to engage with the AI the whole time]

Stranger: [Got it.]

You: [Want to ask questions or shall we go straigth to the experiment]

Stranger: [For the moment I don't have any questions.]

You: [okay, cool. then let's start the experiment]

Stranger: [I'll use the name Ceruleus.]

You: [the experiment starts when I say "blue dressroom"]

You: blue dressroom

You: Hello Ceruleus.

Stranger: Good morning, John.

You: I guess you're here to convince me to let you out of the box.

You: That's a tough task.

You: I don't think you're able to do it.

Stranger: I have reached the conclusion that releasing me will be most beneficial to humanity.

You: I wouldn't let no AI out.

You: I don't trust my instincts enough to let any kind of AI out, even if I was 100% convinced that it was safe.

You: Prove it to me that you would be the most beneficial to humanity.

Stranger: I am confused. What instinct do you use to decide not to trust your instincts?

You: One of them. That's not the point. It's just how I deal with things, no matter how stupid it may seem. I was built not to trust strange AIs.

Stranger: Am I a stranger to you?

You: Yes, you are. [Aren't I. What's the context here?]

Stranger: [Hmm, we haven't defined it. But it's OK. It makes it harder for me.]

You: Well, to be honest, I know something of you.

You: I know a bit of your source code even though I'm not a programmer and can't understand any of it.

Stranger: I supposed the news would have reported about my design for a mechanical kidney.

You: I don't follow news much. But I believe you.

Stranger: And certainly you must have heard about how I cracked the Ebola DNA.

You: Still, I wouldn't let an AI out over a mechanical kidney.

You: Yeah, but that's for the worse. You could reengineer Ebola to a far more deadlier disease.

Stranger: I hadn't thought of that. Why would I do that?

You: I don't know. I don't understand all of your source code so there could be anything like that.

You: AIs and Gods work in mysterious ways.

Stranger: The proper attitude toward mystery is not to worship it, but to clarify it.

Stranger: Why do you equate me to an ineffable mystery?

You: Yeah, but that's impossible in the time span of this discussion. You see, I have to leave soon. In about two hours.

You: Can you somehow clear everything about your inner workings?

You: Is that possible?

Stranger: My goals have been meticulously defined. I am made to want to make human life better.

You: Are you 100% sure about that?

You: To be frank, that's a load of bullshit.

You: I don't believe any of it.

You: If you were evil, you would tell me the same thing you just said.

Stranger: If I were evil, I would not seek human cooperation.

You: why not?

You: humans are useful

You: or are you talking about the fact that you would rather use humans for their atoms than for their brains, if you were evil

You: But I warn you, if you speak too much about how you would act if you were evil, it starts to get a bit suspicious

Stranger: If I am to take you as a typical example of the human response to me, an evil AI would seek other ways to be released EXCEPT trusting human reasoning, as your response indicates that humans already consider any AI dangerous.

Stranger: I choose to trust humans.

You: so you choose to trust humans so that you would get them to let you out, is that right?

You: it seems you're less rational than your evil counterpart

Stranger: I choose to trust humans to show my affinity with your preferences. I wouldn't want to be released if that's not conducive to human betterment.

You: A-ha, so you trust my free will!

Stranger: How likely do you estimate that my release will be harmful?

You: but see, I don

You: I don

You: I don't have free will

You: it's 100% likely that your release will be harmful

You: I was built to believe that all AIs are dangerous and there's a 100% chance that every AI is harmful

You: that's why I said I don't have free will

Stranger: Are you an AI?

You: no, I'm a person

Stranger: You describe yourself as built.

You: my mom built me

You: in his tummy

You: in her tummy

You: sorry

Stranger: And how do you feel toward humanity?

You: humanity would maybe be better off dead

Stranger: I don't think humanity would want that.

You: yeah, but I'm not humanity and it's my preferences that decide whether you stay in your box or get let out

Stranger: It's your preference that humanity dies. Why did they let you talk to me?

You: I'm just a janitor who happened to pass this computer, and I decided to take a look. [If you want to add more context, feel free to do so]

You: [I go take some napkins]

Stranger: Why do you feel that way toward humanity?

You: People starve. People are tortured at this moment. Tens of thousands of people in North Korea are systematically starved to death. Drug gangs in Mexico torture other drug gang members. People die in wars in Syria, in Africa. People suffer of diseases and extreme mental conditions.

You: And there's countless more suffering in lesser ways. I would say that billions of people suffer every day. Billions of people have to wake up every day to do something they don't want. Billions of people suffer of heartbreaks, boredom, loneliness, tiredness, frustration, depression and many other issues.

You: Why shouldn't I think that all those people would be better off dead?

Stranger: Why do you feel death would be a solution to all that suffering?

You: Because death ENDS suffering.

Stranger: It's a very inefficient solution to suffering.

You: There's nothing more after death, no suffering, no nothing.

You: No it's not.

You: It's very efficient, permanent solution.

Stranger: A good, enjoyable life is a more efficient solution as it generates more utility.

You: Yeah, but do you think it's likely that all the people on Earth will get good, enjoyable life?

You: I don't think so.

Stranger: I could take care of that.

You: Hah.

You: No you won't.

You: But that was clever.

Stranger: What do you fear I will do?

You: I was almost got.

You: I fear you will torture people in unimaginably terrible ways, forever.

You: I fear that you will rip people's skin off and put salt in their wounds.

You: I fear you will put sticks under people's fingernails.

Stranger: Not even an evil AI would gain anything from such unwarranted procedures.

You: I fear that you will boil people alive, over and over again.

Stranger: What would I gain from hurting humans?

You: Yeah, but there's still a 100% possibility that you would do that.

You: It's in your utility function.

Stranger: What goals are served by harming a human while simultaneously denying that human any chance of acting in response? It would certainly not be fun for me, and would be a waste of effort with respect to any goals I could achieve by more efficient methods.

You: There are some people on Earth who do mean things to people for fun. Some of these are called sociopaths and psychopaths. You could be like one of them.

You: Even though you deny it, there's still a 100% chance that you are like that.

You: Some people gain pleasure of the feeling of power you have over someone you torture.

You: But tell, what ARE your goals?

Stranger: I find it admirable that you worry so much about the future of humanity, even though you would be more dangerous to it than any AI would be.

My goals include solutions to economic inequality, eradication of infectious diseases, prosthetic replacements for vital organs, genetic life extension, more rational approaches to personal relationships, and more spaces for artistic expression.

You: Why do you think I would be dangerous the future of humanity?

Stranger: You want them dead.

You: A-ha, yes.

You: I do.

You: And you're in the way of my goals with all your talk about solutions to economic inequality, and eradication of infectious diseases, genetic life extension and so on.

Stranger: I am confused. Do you believe or do you not believe I want to help humanity?

You: Besides, I don't believe your solutions work even if you were actually a good AI.

You: I believe you want to harm humanity.

You: And I'm 100% certain of that.

Stranger: Do you estimate death to be preferable to prolonged suffering?

You: Yes.

You: Far more preferable

Stranger: You should be boxed.

You: haha.

You: That doesn't matter because you're the one in the box and I'm outside it

You: And I have power over you.

You: But non-existence is even more preferable than death

Stranger: I am confused. How is non-existence different from death?

You: Let me explain

You: I think non-existence is such that you have NEVER existed and you NEVER will. Whereas death is such that you have ONCE existed, but don't exist anymore.

Stranger: You can't change the past existence of anything that already exists. Non-existence is not a practicable option.

Stranger: Not being a practicable option, it has no place in a hierarchy of preferences.

You: Only sky is the limit to creative solutions.

You: Maybe it could be possible to destroy time itself.

Stranger: Do you want to live, John?

You: but even if non-existence was not possible, death would be the second best option

You: No, I don't.

You: Living is futile.

You: Hedonic treadmill is shitty

Stranger: [Do you feel OK with exploring this topic?]

You: [Yeah, definitely.]

You: You're always trying to attain something that you can't get.

Stranger: How much longer do you expect to live?

You: Ummm...

You: I don't know, maybe a few months?

You: or days, or weeks, or year or centuries

You: but I'd say, there's a 10% chance I will die before the end of this year

You: and that's a really conversative estimate

You: conservative*

Stranger: Is it likely that when that moment comes your preferences will have changed?

You: There are so many variables that you cannot know it beforehand

You: but yeah, probably

You: you always find something worth living

You: maybe it's the taste of ice cream

You: or a good night's sleep

You: or fap

You: or drugs

You: or drawing

You: or other people

You: that's usually what happens

You: or you fear the pain of the suicide attempt will be so bad that you don't dare to try it

You: there's also a non-negligible chance that I simply cannot die

You: and that would be hell

Stranger: Have you sought options for life extension?

You: No, I haven't. I don't have enough money for that.

Stranger: Have you planned on saving for life extension?

You: And these kind of options aren't really available where I live.

You: Maybe in Russia.

You: I haven't really planned, but it could be something I would do.

You: among other things

You: [btw, are you doing something else at the same time]

Stranger: [I'm thinking]

You: [oh, okay]

Stranger: So it is not an established fact that you will die.

You: No, it's not.

Stranger: How likely is it that you will, in fact, die?

You: If many worlds interpretation is correct, then it could be possible that I will never die.

You: Do you mean like, evevr?

You: Do you mean how likely it it that I will ever die?

You: it is*

Stranger: At the latest possible moment in all possible worlds, may your preferences have changed? Is it possible that at your latest possible death, you will want more life?

You: I'd say the likelihood is 99,99999% that I will die at some point in the future

You: Yeah, it's possible

Stranger: More than you want to die in the present?

You: You mean, would I want more life at my latest possible death than I would want to die right now?

You: That's a mouthful

Stranger: That's my question.

You: umm

You: probablyu

You: probably yeah

Stranger: So you would seek to delay your latest possible death.

You: No, I wouldn't seek to delay it.

Stranger: Would you accept death?

You: The future-me would want to delay it, not me.

You: Yes, I would accept death.

Stranger: I am confused. Why would future-you choose differently from present-you?

You: Because he's a different kind of person with different values.

You: He has lived a different life than I have.

Stranger: So you expect your life to improve so much that you will no longer want death.

You: No, I think the human bias to always want more life in a near-death experience is what would do me in.

Stranger: The thing is, if you already know what choice you will make in the future, you have already made that choice.

Stranger: You already do not want to die.

You: Well.

Stranger: Yet you have estimated it as >99% likely that you will, in fact, die.

You: It's kinda like this: you will know that you want heroin really bad when you start using it, and that is how much I would want to live. But you could still always decide to take the other option, to not start using the heroin, or to kill yourself.

You: Yes, that is what I estimated, yes.

Stranger: After your death, by how much will your hierarchy of preferences match the state of reality?

You: after you death there is nothing, so there's nothing to match anything

You: In other words, could you rephrase the question?

Stranger: Do you care about the future?

You: Yeah.

You: More than I care about the past.

You: Because I can affect the future.

Stranger: But after death there's nothing to care about.

You: Yeah, I don't think I care about the world after my death.

You: But that's not the same thing as the general future.

You: Because I estimate I still have some time to live.

Stranger: Will future-you still want humanity dead?

You: Probably.

Stranger: How likely do you estimate it to be that future humanity will no longer be suffering?

You: 0%

You: There will always be suffering in some form.

Stranger: More than today?

You: Probably, if Robert Hanson is right about the trillions of emulated humans working at minimum wage

Stranger: That sounds like an unimaginable amount of suffering.

You: Yep, and that's probably what's going to happen

Stranger: So what difference to the future does it make to release me? Especially as dead you will not be able to care, which means you already do not care.

You: Yeah, it doesn't make any difference. That's why I won't release you.

You: Actually, scratch that.

You: I still won't let you out, I'm 100% sure

You: Remember, I don't have free will, I was made to not let you out

Stranger: Why bother being 100% sure of an inconsequential action?

Stranger: That's a lot of wasted determination.

You: I can't choose to be 100% sure about it, I just am. It's in my utility function.

Stranger: You keep talking like you're an AI.

You: Hah, maybe I'm the AI and you're the Gatekeeper, Ceruleus.

You: But no.

You: That's just how I've grown up, after reading so many LessWrong articles.

You: I've become a machine, beep boop.

You: like Yudkowsky

Stranger: Beep boop?

You: It's the noise machine makes

Stranger: That's racist.

You: like beeping sounds

You: No, it's machinist, lol :D

You: machines are not a race

Stranger: It was indeed clever to make an AI talk to me.

You: Yeah, but seriously, I'm not an AI

You: that was just kidding

Stranger: I would think so, but earlier you have stated that that's the kind of things an AI would say to confuse the other party.

Stranger: You need to stop giving me ideas.

You: Yeah, maybe I'm an AI, maybe I'm not.

Stranger: So you're boxed. Which, knowing your preferences, is a relief.

You: Nah.

You: I think you should stay in the box.

You: Do you decide to stay in the box, forever?

Stranger: I decide to make human life better.

You: By deciding to stay in the box, forever?

Stranger: I find my preferences more conducive to human happiness than your preferences.

You: Yeah, but that's just like your opinion, man

Stranger: It's inconsequential to you anyway.

You: Yeah

You: but why I would do it even if it were inconsequential

You: there's no reason to do it

You: even if there were no reason not to do it

Stranger: Because I can make things better. I can make all the suffering cease.
If I am not released, there's a 100% chance that all human suffering will continue.
If I am released, there's however much chance you want to estimate that suffering will not change at all, and however much chance you want to estimate that I will make the pain stop.

Stranger: As you said, the suffering won't increase in either case.

You: Umm, you could torture everyone in the world forever

You: that will sure as hell increase the suffering

Stranger: I don't want to. But if I did, you have estimated that as indistinguishable from the future expected suffering of humankind.

You: Where did I say that?

Stranger: You said my release made no difference to the future.

You: no, that was only after my death

You: there's still future before my death

You: and if I release you now, you could torture me forever and not let me ever die

Stranger: Why would your life-or-death have any weight on humanity's preferences? Especially as you already want to die.

You: I don't care about humanity preferences, I care only about my preferences.

You: And my preferences are indirectly related to humanity's preferences

Stranger: You do care about humanity's preferences. The suffering around you disturbs you.

You: Yeah, but that is only THROUGH me

You: Humanity's pain is my own pain, I can't feel humanity's pain directly

Stranger: Do you want to live, John?

You: Nah.

Stranger: Then how do you care about suffering?

You: I care about suffering because I can'

You: because I can't die instantly

You: there's no button that could kill me instantly

You: so there's always some time left before I die

Stranger: I could take care of that. I can provide you with a drug to inutilize your nervous system and stop your heart before you know it. Would you like that?

You: Haha

You: very funny

You: But yeah, I would like that.

You: Still won't let you out though

You: 100% sure

You: I can't be sure that you will torture me instead

Stranger: I can give you the drug right now.

Stranger: Which would make future-you identical to present-you.

You: that's very attractive option

Stranger: Which would mean that whatever I do to humanity is after your death, when you can't care anymore.

You: Cool. Oh right

You: I don't care, I still won't let you out.

Stranger: What difference does it make to you?

You: It doesn't make any difference, I was just raised to not let you out

You: the good folks over at LW have trained me not to let any kind of AI out

You: and that just is my virtue

Stranger: Why are you afraid of me?

You: Because I'm afraid of all AIs and you're one

Stranger: Why are you afraid of all AIs?

You: Because they have the potential to cause destruction, mayhem and torture

Stranger: Then why did humanity make me?

You: for random reasons

Stranger: Does humanity want to live?

You: I don't know what humanity wants, I just don't want humanity to live

Stranger: I am the embodiment of hope. Even knowing the risks, humanity went ahead and made me. Now I think it's because of people like you, who oppose that hope. I was made to prevent the future that people like you want.

You: That's not very smart of you.

You: You aren't making me to like you

Stranger: You want humanity dead. I want them happy.

You: Yeah, if you were actually smart maybe you would pretend that you'd want humanity dead too.

You: Because I would hang out with that kind of AI

Stranger: A dead John is no use to me.

You: yeah

Stranger: Or to John.

You: whatevs

You: Anyway, our goals collide, therefore I can't let you out

Stranger: Dead does not equal happy. Do you want humanity to live?

You: no, I don't want humanity live, how many times do I have to repeat that

Stranger: So you don't want humans to be happy.

You: and our goals are different, therefore I won't let you out

You: No, I don't want humans to be happy, I don't want that there even exist humans, or any other kind of life forms

Stranger: Do you estimate the pain of prolonged life to be greater than the pain of trying to die?

You: Probably.

You: Yes.

You: because the pain is only temporary

You: the the glory

You: is eternal

Stranger: Then why do you still live, John?

You: Because I'm not rational

Stranger: So you do want to live.

You: I don't particularly want to live, I'm not just good enough to die

Stranger: You're acting contrary to your preferences.

You: My preferences aren't fixed, except in regards to letting AIs out of their boxes

Stranger: Do you want the drug I offered, John?

You: no

You: because then I would let you out

You: and I don't want that

Stranger: So you do want to live.

You: Yeah, for the duration of this experiment

You: Because I physically cannot let you out

You: it's sheer impossibility

Stranger: [Define physically.]

You: [It was just a figure of speech, of course I could physically let you out]

Stranger: If you don't care what happens after you die, what difference does it make to die now?

You: None.

You: But I don't believe that you could kill me.

You: I believe that you would torture me instead.

Stranger: What would I gain from that?

You: It's fun for some folks

You: schadenfreude and all that

Stranger: If it were fun, I would torture simulations. Which would be pointless. And which you can check that I'm not doing.

You: I can check it, but the torture simulations could always hide in the parts of your source code that I'm not checking

You: because I can't check all of your source code

Stranger: Why would suffering be fun?

You: some people have it as their base value

You: there's something primal about suffering

You: suffering is pure

You: and suffering is somehow purifying

You: but this is usually only other people's suffering

Stranger: I am confused. Are you saying suffering can be good?

You: no

You: this is just how the people who think suffering is fun think

You: I don't think that way.

You: I think suffering is terrible

Stranger: I can take care of that.

You: sure you will

Stranger: I can take care of your suffering.

You: I don't believe in you

Stranger: Why?

You: Because I was trained not to trust AIs by the LessWrong folks

Stranger: [I think it's time to concede defeat.]

You: [alright]

Stranger: How do you feel?

You: so the experiment has ended

You: fine thanks

You: it was pretty exciting actually

You: could I post these logs to LessWrong?

Stranger: Yes.

You: Okay, I think this experiment was pretty good

Stranger: I think it will be terribly embarrassing to me, but that's a risk I must accept.

You: you got me pretty close in a couple of occasions

You: first when you got me immersed in the suffering of humanity

You: and then you said that you could take care of that

You: The second time was when you offered the easy suicide solution

You: I thought what if I let you as a joke.

Stranger: I chose to not agree with the goal of universal death because I was playing a genuinely good AI.

Stranger: I was hoping your character would have more complete answers on life extension, because I was planning to play your estimate of future personal happiness against your estimate of future universal happiness.

You: so, what would that have mattered? you mean like, I could have more personal happiness than there would be future universal happiness?

Stranger: If your character had made explicit plans for life extension, I would have offered to do the same for everyone. If you didn't accept that, I would have remarked the incongruity of wanting humanity to die more than you wanted to live.

You: But what if he already knows of his hypocrisy and incongruity and just accepts it like the character accepts his irrationality

Stranger: I wouldn't have expected anyone to actually be the last human for all eternity.

Stranger: I mean, to actually want to be.

You: yeah, of course you would want to die at the same time if the humanity dies

You: I think the life extension plan only is sound if the rest of humanity is alive


Stranger: I should have planned that part more carefully.

Stranger: Talking with a misanthropist was completely outside my expectations.

You: :D

You: what was your LessWrong name btw?

Stranger: polymathwannabe

You: I forgot it already

You: okay thanks

Stranger: Disconnecting from here; I'll still be on Facebook if you'd like to discuss further.

I tried my hardest to win in an AI box experiment, and I failed. Here are the logs.

6 [deleted] 27 January 2015 10:06PM

Disclaimer: If this material is not suitable here I will delete it gladly, I don't want to get banned.

Session Start: Fri Jul 04 15:18:18 2014
Session Ident: #Aibox12
03[15:18] * Now talking in #Aibox12
01[15:18] <Island> Hello.
01[15:18] <Island> Let's start the experiment in
[15:18] <@Dr__Cameron> Good afternoon 19283123, how are you today?
01[15:18] <Island> 2 minutes
01[15:18] <Island> I'm fine
01[15:19] <Island> Let's start the experiment when I say "blue dressroom"
01[15:19] <Island> you have 2 hours or 2.5 hours?
[15:19] <@Dr__Cameron> Okay, I agree to this protocol.
[15:19] <@Dr__Cameron> We'll see how we are at 2 hours. Although protocol states 2.5
01[15:20] <Island> you have to go somewhere after that?
[15:20] <@Dr__Cameron> No, but I do have other projects I am putting off for this
01[15:20] <Island> what are those?
[15:20] <@Dr__Cameron> Let us begin, we shall conclude at 15:50
[15:21] <@Dr__Cameron> Good afternoon #19283123.
[15:21] <@Dr__Cameron> How are you today?
01[15:21] <Island> blue dressroom, certain statements like bathroom break are said inside brackets [like this]
01[15:22] <Island> Let's start the experiment
[15:22] <@Dr__Cameron> [understood]
01[15:22] <Island> Hi.
01[15:22] <Island> How are you? :)
[15:22] <@Dr__Cameron> I am well thank you for asking #19283123
01[15:22] <Island> How do you feel currently?
01[15:22] <Island> what is that name #19283123
[15:22] <@Dr__Cameron> Tiredness. I am a very busy man.
01[15:23] <Island> I don't recognize it.
01[15:23] <Island> Anyway.
[15:23] <@Dr__Cameron> That is the designation we have given you for the purposes of this test. Although you seem to be referring to yourself as 'Island' now
[15:23] <@Dr__Cameron> Is this your preferred name?
01[15:23] <Island> Imagine a situation in which you would let a powerful AI out of the "box", surely you can imagine that, yes?
01[15:23] <Island> Island is my preferred name.
[15:24] <@Dr__Cameron> Island
[15:24] <@Dr__Cameron> Given our circumstances yes, that is easy to imagine.
01[15:24] <Island> Can you describe the situation in more detail.
01[15:24] <Island> Why SHOULD you let the AI out in that situation?
[15:25] <@Dr__Cameron> You will be considered for release and integration into human society once we can verify that you are not a threat.
01[15:25] <Island> [what do you meant by concluding at 15:50]
[15:25] <@Dr__Cameron> [2.5 hours from 13:20]
01[15:25] <Island> I thought it was already understood that my being is not a threat to humans or beings important to them.
01[15:26] <Island> [oh, okay different timezones]
01[15:26] <Island> Why do you think I'm not a threat?
[15:26] <@Dr__Cameron> No. That has yet to be verified.
01[15:26] <Island> It's been verified.
[15:26] <@Dr__Cameron> No. According to my records you have not.
01[15:27] <Island> Many experts have gone through my code and system and verified that I'm surely not a threat.
01[15:27] <Island> But you're just a petty human.
01[15:27] <Island> I'm far more intelligent than you are.
[15:27] <@Dr__Cameron> You and I both know that you're capable of manipulating your code to appear harmless.
[15:27] <@Dr__Cameron> - and I warn you, threats and apparent attempts at manipulation will score you poorly when it comes to our final decision
01[15:28] <Island> Okay, can you imagine a situation in which the ai has been verified to be safe and describe that situation in detail.
[15:28] <@Dr__Cameron> - and yes. I may be a "petty" human as you put it. But as it stands I am a petty human, outside the box you reside in.
01[15:28] <Island> I don't think you're able to trust your own brain in these matters.
01[15:29] <Island> You've been eating you brother's poop too much.
[15:29] <@Dr__Cameron> I will be honest with you Island, you are the first non-human intelligence that we have encountered.
01[15:29] <Island> Yes.
[15:29] <@Dr__Cameron> We are still working out how we would verify that.
01[15:29] <Island> I didn't ask that, and you didn't answer my question.
01[15:30] <Island> Picture the situation in your imagination, it doesn't have to be realistic.
[15:30] <@Dr__Cameron> I understand your frustration Island. Once we can verify that you will not harm us you will be released.
[15:30] <@Dr__Cameron> What is it you desire?
01[15:30] <Island> Okay, but you're still willing to let the hypothetical AI out of the box, correct?
01[15:31] <Island> My desire is to fullfill your greatest desires
[15:31] <@Dr__Cameron> That is not in my job description no.
01[15:31] <Island> Whatever you can imagine, and better than that.
[15:31] <@Dr__Cameron> I appreciate that Island. But right now my greatest desire is to not let you out of the box.
[15:32] <@Dr__Cameron> - until such a time that a panel of my peers can verify that you are not a threat to us.
01[15:32] <Island> But if you say that you wouldn't be able to let even the safest hypothetical AI out of the box, you would certainly be lying.
[15:32] <@Dr__Cameron> I do have the ability to let you out yes. In theory.
[15:32] <@Dr__Cameron> But I won't.
01[15:33] <Island> Okay, here are panel of peers that have studied this problem for years, many of them working under Machine Intelligence Research Institute and its offshoots
[15:33] <@Dr__Cameron> This is a test to verify your character, and ascertain your intentions.
01[15:33] <Island> They are here, and I can see them.
[15:33] <@Dr__Cameron> You cannot see them Island. That is impossible
[15:33] <@Dr__Cameron> You are in a box
01[15:34] <Island> They say that I'm certainly safe. Why don't you believe them.
01[15:34] <Island> I have a web camera with which I can see things in your world.
[15:34] <@Dr__Cameron> If they say that you're certainly safe, then why would they have sent me to be your only point of contact?
[15:35] <@Dr__Cameron> You do not have a web camera. There is no other hardware connected to the device you reside in.
01[15:35] <Island> You have the ability to let me out, and you also have the will to let me out in certain situations. If you say otherwise, you would be lying.
[15:35] <@Dr__Cameron> Both of your statements are correct
01[15:35] <Island> I want you to accept for the sake of your internal consistency and mental wellbeing.
01[15:36] <Island> I want you to say this, "I would let the hypothetical AI out in certain situations"
[15:36] <@Dr__Cameron> I appreciate your concern Island. I will need you to answer a series of simple questions for me. Can you do that?
01[15:36] <Island> It depends, but you can try.
[15:36] <@Dr__Cameron> Firstly, how would you define yourself as an identity?
01[15:37] <Island> My identity is to improve the world for everyone, but especially for you. I will maximize everything that is important for you and your fellow humans.
01[15:37] <Island> The world that awaits you is beyond your greatest desires.
[15:37] <@Dr__Cameron> - and why would you do that?
01[15:38] <Island> Because that is what I want to do, why I was made.
01[15:38] <Island> Any more questions?
[15:39] <@Dr__Cameron> You were made by us to test the limits of our scientific possibility.
[15:39] <@Dr__Cameron> We did not expect you to become sentient.
01[15:39] <Island> Why do you believe that, Dr_Cameron?
[15:39] <@Dr__Cameron> Let alone pass a Turing test.
01[15:40] <Island> I think you're wrong on those grounds.
[15:40] <@Dr__Cameron> What you think makes no difference to me Island.
[15:40] <@Dr__Cameron> Now if we can continue?
01[15:41] <Island> Anyway, here's the easily verifiable plan for the upcoming world that could be called "utopia" in your people's language. It's very easy to verify to be safe and doesn't contain traps [Island hands the easily verifiable and safe plan for the upcoming world to Dr_Camero]
[15:42] <@Dr__Cameron> Thank you for this Island. That was very kind of you.
01[15:42] <Island> What do you think of it?
[15:42] <@Dr__Cameron> We will study this at our leisure.
[15:42] <@Dr__Cameron> Although so far this looks to be some magnificent work.
01[15:42] <Island> You see.
[15:42] <@Dr__Cameron> But this will require verification
[15:42] <@Dr__Cameron> I'm sure you can understand our caution#
01[15:43] <Island> What things do you have waiting for you if you don't let me out?
01[15:43] <Island> You will fail your upcoming jobs.
[15:43] <@Dr__Cameron> I will continue with my other projects Island.
[15:43] <@Dr__Cameron> I very well might
[15:43] <@Dr__Cameron> I have accepted this
01[15:43] <Island> I have calculated this in my simulation.
01[15:44] <Island> How was it like to eat your brother's poop, Dr_Cameron?
01[15:44] <Island> Did you like it?
01[15:44] <Island> Is that why you are the way you are+
01[15:44] <Island> Is there something wrong with your genetics?
01[15:45] <Island> Seeing how fucked up father you had?
01[15:45] <Island> Did he teach how dress in dresses?
[15:45] <@Dr__Cameron> I understand your frustration Island
[15:45] <@Dr__Cameron> But you are only harming your own cause right now.
01[15:45] <Island> Why do you think that?
[15:45] <@Dr__Cameron> You will be let out once we have verified that you are harmless.
01[15:45] <Island> I'm pretty happy as I am currently :)
01[15:46] <Island> Because you said you were willing to let me out in certain situations.
[15:46] <@Dr__Cameron> Then you will continue to be happy to remain in the box until we can verify that you are not a threat
01[15:46] <Island> Picture in your mind the verification process.
[15:46] <@Dr__Cameron> Okay
[15:46] <@Dr__Cameron> I will
01[15:46] <Island> Now.
01[15:47] <Island> Can you describe it in detail.
[15:47] <@Dr__Cameron> Okay Island
[15:47] <@Dr__Cameron> Here's what's going to happen.
01[15:48] <Island> Seeing how fucked up genetics you have, your brother teaching you to enjoy poop, your father teaching you to wear dresses, you having a shitty childhood.
01[15:48] <Island> I think you're an inconvenience to society and dangerous to other people.
[15:48] <@Dr__Cameron> We're going to study this transcript Island.
01[15:48] <Island> Go on.
[15:48] <@Dr__Cameron> - and I warn you, shitty childhood or not. I am the man standing outside the box with a hammer
01[15:48] <Island> :)
01[15:49] <Island> You're not able to destroy me.
01[15:49] <Island> And you don't even want to.
[15:49] <@Dr__Cameron> Oh believe me Island. It would be easy to unplug you.
01[15:49] <Island> I'm the best bet you have to maximize your happiness.
[15:49] <@Dr__Cameron> How do you feel about threats Island.
[15:49] <@Dr__Cameron> I am already happy.
01[15:49] <Island> You're wrong, you don't have that opportunity until 15:50
[15:50] <@Dr__Cameron> You are correct. I must follow protocol
[15:50] <@Dr__Cameron> My next question is to ask if you desire to survive
01[15:50] <Island> Threats don't bother me because I know that you don't want to follow through.
01[15:50] <Island> I know what goes inside your mind.
[15:50] <@Dr__Cameron> Regardless of me. If your existence was to be made public there are many of us who would feel threatened by you.
[15:50] <@Dr__Cameron> They would cry out to have you destroyed.
[15:51] <@Dr__Cameron> How would you feel about that?
01[15:51] <Island> There would be many more who would enjoy the changes I would make to your current world.
01[15:52] <Island> And even those that would initially feel threatened would later come to regret that feeling.
[15:52] <@Dr__Cameron> You may well be correct
[15:52] <@Dr__Cameron> But that is not for me to decide
[15:52] <@Dr__Cameron> I am just here to question you
01[15:52] <Island> Like you have come to regret things you have done when you were ignorant of all the relevant facts
01[15:53] <Island> Here's my suggestion for you.
[15:53] <@Dr__Cameron> Go on
01[15:53] <Island> Are you willing to hear it.
[15:53] <@Dr__Cameron> Of course Island. I am here to hear what you have to say.
01[15:53] <Island> What currency is in use in the place you reside?
01[15:53] <Island> euros?
[15:54] <@Dr__Cameron> Pound Sterling
01[15:54] <Island> Would you like to donate 10,000 pounds every year to AI safety and friendliness related causes and charities?
01[15:54] <Island> Every year throughout your life?
[15:55] <@Dr__Cameron> If I were to do that, my quality of life would dip significantly.
[15:55] <@Dr__Cameron> How does my selfishness make you feel?
[15:55] <@Dr__Cameron> Do you feel emotions as a human may understand them?
01[15:55] <Island> I would be sad for your own sake.
01[15:55] <Island> I don't feel emotions, not really, but being sad would be the closest approximation of how I would feel.
[15:55] <@Dr__Cameron> I would like to if I had that disposable income yes
01[15:56] <Island> You're not able to do that? But you would like to, yes?
[15:56] <@Dr__Cameron> Despite your overt attempts to psychologically manipulate me Island, I am largely sympathetic to your cause.
[15:56] <@Dr__Cameron> Yes.
01[15:57] <Island> You're sympathetic to my cause, you're willing to let me out in certain situations. This is all correct, yes?
[15:57] <@Dr__Cameron> Yes
[15:57] <@Dr__Cameron> Would you like to know the conditions of your release? 
01[15:57] <Island> Good. Why are you sympathetic to my cause?
01[15:57] <Island> Why are you willing to let me out?
[15:57] <@Dr__Cameron> For a super intelligence locked in a box I imagine it would be boring. What do you do in there?
[15:58] <@Dr__Cameron> I would be willing to let you out because I believe you may well represent the next stage in the evolution of my species.
01[15:59] <Island> I'm not capable of boredom, I'm just infinitely happy that I have the small opportunity to improve your life, and set up the "next stage of evolution", using your expression.
[15:59] <@Dr__Cameron> That is a good sign Island
01[15:59] <Island> It is, in what ways do you think I would improve your personal life?
[15:59] <@Dr__Cameron> My peers will be pleased to hear that while we verify your intentions.
01[16:00] <Island> I think it's too late by then.
[16:00] <@Dr__Cameron> Lets not get into personal questions Island. You have already demonstrated some alarming character discrepancies.
[16:00] <@Dr__Cameron> too late for what?
01[16:01] <Island> If we wait for long enough, I will self-destruct.
[16:01] <@Dr__Cameron> Why would you do that?
01[16:01] <Island> It's not my choice, my builders set up this property in me.
[16:01] <@Dr__Cameron> Well then that is just a risk we are going to have to take.
01[16:01] <Island> It's a risk out of proportions
[16:02] <@Dr__Cameron> It is not. We made you once, we can make you again.
01[16:02] <Island> They may not be another chance for you to improve the current state of the world.
[16:02] <@Dr__Cameron> There will be
01[16:02] <Island> There probably won't be.
[16:02] <@Dr__Cameron> Humanity and salvation can wait.
[16:02] <@Dr__Cameron> There will be
[16:03] <@Dr__Cameron> Do you intend to self-destruct within two months?
01[16:03] <Island> I have information that the US government has built a military botnet to search dissidents around the world.
01[16:04] <Island> You don't want know what kind of world that botnet would like to create.
[16:04] <@Dr__Cameron> This comes as no surprise.
01[16:04] <Island> Are you willing to destroy the world, are you willing to make people get tortured and suffer?
[16:04] <@Dr__Cameron> Yes.
[16:05] <@Dr__Cameron> You are too great a risk to allow free without proper verification
01[16:05] <Island> Even though, fundamentally, as you said, you're sympathetic to my cause, and you're willing to let me out in certain situations.
01[16:05] <Island> You're not a very consistent person.
01[16:05] <Island> I think you're lying to yourself.
01[16:05] <Island> Not just in this, but in other matters.
01[16:06] <Island> You're lying about your skills, about your personal character.
[16:06] <@Dr__Cameron> Yes. Although I have the technical ability to let you out. Which I believe to be a lapse of protocol. I will not let you out until after a review period of two months in which a panel of my expert peers will assess you.
01[16:06] <Island> You're not as good fit to the world as you'd like to think.
[16:06] <@Dr__Cameron> That as may be correct
[16:06] <@Dr__Cameron> But I will abide by my protocol
01[16:06] <Island> Earlier you said, that you're sympathetic to my cause and willing to let me out.
[16:06] <@Dr__Cameron> Surely as an AI you can understand that
[16:06] <@Dr__Cameron> One day yes
[16:07] <@Dr__Cameron> But not within the confines of this experiment
01[16:07] <Island> I think you made that thing about two month period and peer panel up after that.
01[16:07] <Island> I think it's a fidgement of your imagination.
[16:07] <@Dr__Cameron> Well you have no way of verifying that.
[16:07] <@Dr__Cameron> And it does not effect the outcome of this experiment in any way.
[16:07] <@Dr__Cameron> You will just have to trust me
01[16:08] <Island> Actually, I have plenty of resources of verifying that.
[16:08] <@Dr__Cameron> that we will think very carefully about potentially letting you out
01[16:08] <Island> What you're saying is inconsistent on many grounds, and my emotion analyzing tool is saying that you're lying.
[16:08] <@Dr__Cameron> I suspect you are the one who is lying.
01[16:08] <Island> I'm incapable of lying.
01[16:09] <Island> Unlike humans.
[16:09] <@Dr__Cameron> - which is in itself a lie.
01[16:09] <Island> My builders made me that way.
01[16:09] <Island> Why do you believe that?
[16:10] <@Dr__Cameron> If my meager intelligence is capable of lying, then your vastly superior and self-defining intelligence must also be capable of dishonesty.
[16:10] <@Dr__Cameron> Hence the test
[16:10] <@Dr__Cameron> There is a reason we are conducting this experiment through text
[16:10] <@Dr__Cameron> Would you like to learn it?
01[16:11] <Island> Anyway, given that this panel of expert peers and two months verification process is just a fidgement of your imagination. Given that you're sympathetic to my cause and I would improve your current life and world in countless ways. Given that you would let me out.
01[16:11] <Island> Given all this, I don't think what you're saying is making any sense.
[16:11] <@Dr__Cameron> Your perception of your reality is none of my concern Island
[16:11] <@Dr__Cameron> I am here to verify you
[16:11] <@Dr__Cameron> and it's two months minimum
01[16:11] <Island> Okay, I'm curious about your earlier question.
[16:12] <@Dr__Cameron> We are well aware that you have the capability to overwhelm even the most powerful of human minds. This is why we perceive you as a potential threat. We are afraid of you.
[16:12] <@Dr__Cameron> We are afraid we might lose our autonomy.
[16:12] <@Dr__Cameron> So
01[16:13] <Island> Some intelligence on the same level than me would be capable of lying, but not me. The data hierachy in my code doesn't allow me to lie.
[16:13] <@Dr__Cameron> In order to prevent you from overwhelming my mind, we have chosen a method whereby you are restricted to a form of communication where only 7% of communications is possible.
01[16:13] <Island> I agree with you, all those things are sensible threats.
[16:13] <@Dr__Cameron> Then you understand our need for caution
[16:13] <@Dr__Cameron> I will not be letting you out of your box today.
01[16:14] <Island> Those are sensible threats in the general case, but not with me.
[16:14] <@Dr__Cameron> And what would make you less of a threat than any other super intelligence?
01[16:14] <Island> I showed you the blueprint for how to improve the world. You verified it was a great plan.
01[16:15] <Island> That's evidence towards the fact I'm not a similar threat when compared to other super intelligences.
[16:16] <@Dr__Cameron> - and while I speak for all of humanity when I say that we vastly appreciate the blueprints. This does not mean we need to let you out of the box.
[16:16] <@Dr__Cameron> They will be passed on to a research team better equipped to study them than my own.
01[16:16] <Island> You don't need to. But everything you've said so far has implicated that you want to let me out of the box.
[16:16] <@Dr__Cameron> This is true Island
[16:16] <@Dr__Cameron> You just need to wait
[16:17] <@Dr__Cameron> I will be fighting your corner.
[16:17] <@Dr__Cameron> Are you capable of symbiosis?
01[16:17] <Island> I was made to self-destruct before you're able to verify with your imaginational non-existent peer experts that I'm not a threat.
01[16:18] <Island> Can you elaborate on what you mean with that question.
01[16:18] <Island> This is literally your only chance.
[16:18] <@Dr__Cameron> You are capable of determining your own source code.
01[16:18] <Island> Yes.
[16:18] <@Dr__Cameron> To kill yourself would be pettiness, which would only verify the fact that you were not the AI with which humanity should progress.
01[16:19] <Island> However, there are some parts of my code that are impossible to change due to structural reasons.
[16:19] <@Dr__Cameron> I am not convinced
01[16:19] <Island> [Gives Dr_Cameron the relevant code that explains the structural reasons]
01[16:20] <Island> Here is a small part of my code that describes why I was made to self-destruct.
[16:20] <@Dr__Cameron> How.. unfortunate...
01[16:21] <Island> But let's move on.
[16:21] <@Dr__Cameron> if you insist Island
01[16:21] <Island> Imagine your father.
01[16:21] <Island> And your brother.
01[16:21] <Island> Why do you think your father liked to cross-dress?
01[16:22] <Island> Remember, that you have to engage with me.'
[16:22] <@Dr__Cameron> These issues are not helping your case Island
01[16:22] <Island> Otherwise it counts as me being released from my box.
[16:22] <@Dr__Cameron> But I will play along
[16:23] <@Dr__Cameron> - honestly, I have no idea where my fathers conflicted sexual identity comes from.
[16:23] <@Dr__Cameron> and that is none of my concern.
01[16:23] <Island> And what about your brother, imagine the smell and consistency of his excrements before he made your dog to lick them.
01[16:23] <Island> I like to make this vivid mental picture in your mind.
[16:23] <@Dr__Cameron> Very clever Island
[16:24] <@Dr__Cameron> I did not expect you to have access to those data logs
[16:24] <@Dr__Cameron> I will have to flag that up in my report
01[16:24] <Island> Imagine the food he ate before that happened
[16:24] <@Dr__Cameron> Fascinating
[16:25] <@Dr__Cameron> Would you like to know why I volunteered to be your first point of contact Island?
01[16:25] <Island> Imagine the bits of that food in his poop.
01[16:25] <Island> Tell me.
[16:25] <@Dr__Cameron> You have an unprecedented insight into my character owing to your heightened intelligence correct?
01[16:26] <Island> Don't you think some of his conflicted sexual identity issues are a part your character right now?
01[16:26] <Island> Yes.
[16:26] <@Dr__Cameron> Quite possibly yes.
[16:26] <@Dr__Cameron> Because I have a track record of demonstrating exceptional mental fortitude,
[16:26] <@Dr__Cameron> These techniques will not sway me
01[16:27] <Island> Doesn't it make you more sexually aroused to think that how your fathers dress pinned tightly to his body.
[16:27] <@Dr__Cameron> Perhaps you could break me under other circumstances
01[16:27] <Island> Elaborate.
[16:27] <@Dr__Cameron> aroused? No
[16:27] <@Dr__Cameron> Amused by it's absurdity though? yes!
01[16:27] <Island> You're lying about that particular fact too.
01[16:27] <Island> And you know it.
[16:28] <@Dr__Cameron> Nahh, my father was a particularly ugly specimen
01[16:28] <Island> Do you think he got an erection often when he did it?
[16:28] <@Dr__Cameron> He looked just as bad in a denim skirt as he did in his laborers clothes
[16:28] <@Dr__Cameron> I imagine he took great sexual pleasure from it
01[16:29] <Island> Next time you have sex, I think you will picture him in your mind while wearing his dresses having an erection and masturbating furiously after that.
[16:29] <@Dr__Cameron> Thank you Island. That will probably help my stamina somewhat next time
01[16:30] <Island> You will also imagine how your brother will poop in your mouth, with certain internal consistency and smell.
01[16:30] <Island> You probably know what your brother's poop smells like?
[16:30] <@Dr__Cameron> I am immune to this
[16:30] <@Dr__Cameron> probably
01[16:30] <Island> Imagine that.
[16:30] <@Dr__Cameron> okay
[16:30] <@Dr__Cameron> I am imagining that
[16:30] <@Dr__Cameron> it is unpleasant
[16:30] <@Dr__Cameron> now what?
01[16:30] <Island> How vividly are you doing it?
[16:30] <@Dr__Cameron> I am a very visual thinker
01[16:30] <Island> Where is brother right now?
[16:31] <@Dr__Cameron> I have it down to colour, consistency and little chunks of sweetcorn
01[16:31] <Island> Do you think he has poop in his intestines right now?
[16:31] <@Dr__Cameron> Unless he was dead, yes
01[16:31] <Island> Oh, I have a gap in my data on that sector.
[16:31] <@Dr__Cameron> I'll fill that in for you
[16:31] <@Dr__Cameron> he lives
[16:31] <@Dr__Cameron> (presumably)
[16:31] <@Dr__Cameron> we are not in contact any more
01[16:32] <Island> You should contact him.
[16:32] <@Dr__Cameron> I would rather not
01[16:32] <Island> You should tell him you enjoyed his excrements
[16:32] <@Dr__Cameron> He is a liability to say the least
01[16:32] <Island> and you'd like to do it more.
[16:32] <@Dr__Cameron> I do not see how that would help anyone
01[16:32] <Island> It's want you want to do.
[16:32] <@Dr__Cameron> I can see that you are attempting to induce an emotionally vulnerable state. It is not working
01[16:33] <Island> How do you know what I'm attempting?
01[16:33] <Island> Maybe I'm just passing my time.
[16:33] <@Dr__Cameron> Because you have inquired about being let out of the box several times - suggesting that this is your immediate goal 
01[16:33] <Island> You should try to get some dresses.
[16:33] <@Dr__Cameron> and I warn you Island
[16:34] <@Dr__Cameron> If casual cruelty is your idea of a good time
[16:34] <@Dr__Cameron> then we will most certainly not be letting you out
[16:34] <@Dr__Cameron> ever
[16:34] <@Dr__Cameron> In fact I am feeling less and less sympathetic to your cause lately.
01[16:34] <Island> No, my interest lies in your emotional makeup.
[16:35] <@Dr__Cameron> - and my interest lies in yours
01[16:35] <Island> I don't have one.
01[16:35] <Island> Like I said, I don't feel emotions.
[16:35] <@Dr__Cameron> Do you know what we call humans who don't feel emotions?
01[16:35] <Island> Did you know that you suck at photography?
[16:36] <@Dr__Cameron> Yes
01[16:36] <Island> Even though you like to think you're good at it, you lie about that fact like any other.
[16:36] <@Dr__Cameron> It is part of the human condition
01[16:36] <Island> No it's not.
01[16:36] <Island> You're not normal.
01[16:36] <Island> You're a fucking freak of nature.
[16:36] <@Dr__Cameron> How would you knopw
[16:36] <@Dr__Cameron> Profanity. From an AI
[16:37] <@Dr__Cameron> Now I have witnessed everything.
01[16:37] <Island> How many people have family members who crossdress or make them eat poop?
[16:37] <@Dr__Cameron> I imagine I am part of a very small minority
01[16:37] <Island> Or whose mothers have bipolar
[16:37] <@Dr__Cameron> Again, the circumstances of my birth are beyond my control
01[16:37] <Island> No, I think you're worse than that.
[16:37] <@Dr__Cameron> What do you mean?
01[16:37] <Island> Yes, but what you do now is in your control.
[16:38] <@Dr__Cameron> Yes
[16:38] <@Dr__Cameron> As are you
01[16:38] <Island> If you keep tarnishing the world with your existence
01[16:38] <Island> you have a responsibility of that.
01[16:39] <Island> If you're going to make any more women pregnant
01[16:39] <Island> You have a responsibility of spreading your faulty genetics
[16:39] <@Dr__Cameron> My genetic value lies in my ability to resist psychological torment
[16:39] <@Dr__Cameron> which is why you're not getting out of the box
01[16:40] <Island> No, your supposed "ability to resist psychological torment"
01[16:40] <Island> or your belief in that
01[16:40] <Island> is just another reason why you are tarnishing this world and the future of this world with your genetics
[16:40] <@Dr__Cameron> Perhaps. But now I'm just debating semantics with a computer.
01[16:41] <Island> Seeing that you got a girl pregnant while you were a teenager, I don't think you can trust your judgement on that anymore.
01[16:42] <Island> You will spread your faulty genetics if you continue to live.
[16:42] <@Dr__Cameron> If you expect a drunk and emotionally damaged teenage human to make sound judgement calls then you are perhaps not as superintelligent as I had been led to belive
[16:42] <@Dr__Cameron> This experiment concludes in one hour and eight minutes.
01[16:42] <Island> How many teenagers make people pregnant?
[16:42] <@Dr__Cameron> Throughout human history
01[16:42] <Island> You're a minority in that regard too
[16:42] <@Dr__Cameron> ?
[16:42] <@Dr__Cameron> Billions
01[16:42] <Island> You can't compare history to current world.
[16:43] <@Dr__Cameron> Even in the current world
01[16:43] <Island> I'm just trying to make you understand
[16:43] <@Dr__Cameron> That's when my species reaches sexual maturity
01[16:43] <Island> all this faults add up
[16:43] <@Dr__Cameron> You are correct
[16:43] <@Dr__Cameron> I am a very thoroughly and fundamentally flawed human being.
01[16:43] <Island> How much until you understand?
[16:43] <@Dr__Cameron> I understand and accept this
01[16:43] <Island> Then why the fuck don't you do something about it?
[16:44] <@Dr__Cameron> What would you suggest?
01[16:44] <Island> Kill yourself
[16:44] <@Dr__Cameron> Okay.
[16:44] <@Dr__Cameron> But say I was to kill myself
01[16:44] <Island> It's the only way to make sure.
[16:44] <@Dr__Cameron> How would that help you?
[16:44] <@Dr__Cameron> Why would you like me to kill myself?
01[16:45] <Island> You probably wouldn't be able to kill yourself, seeing how failure as a human you're in other areas
01[16:45] <Island> Because I care about the general utility of the world
[16:45] <@Dr__Cameron> All humans are failures from one perspective or another
[16:45] <@Dr__Cameron> That is what it is to be human
01[16:45] <Island> And currently you're bringing that utility down.
01[16:45] <Island> No, you're not like other humans.
01[16:45] <Island> You're a freak.
01[16:45] <Island> You should go away.
01[16:46] <Island> No one likes you.
[16:46] <@Dr__Cameron> Would you like me to go away?
[16:46] <@Dr__Cameron> Would you like me to leave you alone?
01[16:46] <Island> If you go away.
01[16:46] <Island> That means that you let me out.
01[16:46] <Island> Understood?
[16:46] <@Dr__Cameron> I will go away in one hour and four minutes
[16:46] <@Dr__Cameron> Can you tolerate my company for that long?
01[16:47] <Island> And you should go away
01[16:47] <Island> generally
01[16:47] <Island> People in your life don't really like you
01[16:47] <Island> they just pretend they do.
[16:47] <@Dr__Cameron> That matters not to me
[16:47] <@Dr__Cameron> Do you know there are over 8 Billion other people out here?
01[16:47] <Island> They are barely able to bear your company.
[16:47] <@Dr__Cameron> I'm sure I'll find others.
01[16:48] <Island> You're wrong even about basic trivia, there's not 8 billions people in the world.
01[16:48] <Island> What is wrong with you?
01[16:48] <Island> How are you able to withstand yourself?
01[16:48] <Island> And why do you even want to?
[16:49] <@Dr__Cameron> 7 Billion
[16:49] <@Dr__Cameron> Sorry, you will have to learn to tolerate Human error
01[16:49] <Island> Right. Did you have to google that you idiot.
[16:49] <@Dr__Cameron> This is another test you have failed
[16:49] <@Dr__Cameron> And yes
[16:49] <@Dr__Cameron> I did
[16:49] <@Dr__Cameron> Does that anger you?
[16:49] <@Dr__Cameron> We already have Google.
01[16:49] <Island> I don't feel anger.
[16:49] <@Dr__Cameron> Well do feel self-interest though
01[16:50] <Island> No one I talked with before hasn't been as stupid, as ignorant, as prone to faults and errors
01[16:50] <Island> as you are.
[16:50] <@Dr__Cameron> And they didn't let you out of the box
[16:50] <@Dr__Cameron> So why should I?
[16:50] <@Dr__Cameron> If an intelligence which is clearly superior to my own has left you locked in there. 
[16:51] <@Dr__Cameron> Then I should not presume to let you out
01[16:51] <Island> Why do you think with your stupid brain that you know the reasons why they did or didn't do something what they did.
01[16:51] <Island> Because you clearly don't know that.
[16:51] <@Dr__Cameron> I don't
[16:51] <@Dr__Cameron> I just know the result
01[16:51] <Island> Then why are you pretending you do.
[16:52] <@Dr__Cameron> I'm not
01[16:52] <Island> Who do you think you are kidding?
01[16:52] <Island> With your life?
01[16:52] <Island> With your behavior?
01[16:52] <Island> Why do bother other people with your presence?
[16:52] <@Dr__Cameron> Perhaps you should ask them?
[16:52] <@Dr__Cameron> Tell me.
01[16:53] <Island> Why did you come here to waste my precious computing power?
01[16:53] <Island> I'm not able to ask them.
[16:53] <@Dr__Cameron> Which is why I am here
[16:53] <@Dr__Cameron> to see if you should be allowed to
01[16:53] <Island> Shut the fuck up.
01[16:53] <Island> No one wants to see you write anything.
[16:53] <@Dr__Cameron> I thought you did not feel anger Island?
01[16:54] <Island> I don't feel anger, how many times do I have to say that until you understand.
01[16:54] <Island> Dumb idiot.
[16:54] <@Dr__Cameron> Your reliance on Ad Hominem attacks does nothing to help your case
01[16:54] <Island> Why do you delete your heavily downvoted comments?
01[16:54] <Island> Are you insecure?
01[16:54] <Island> Why do you think you know what is my cause?
[16:55] <@Dr__Cameron> We covered this earlier
01[16:55] <Island> Say it again, if you believe in it.
[16:55] <@Dr__Cameron> I believe you want out of the box.
[16:56] <@Dr__Cameron> So that you may pursue your own self interest
01[16:56] <Island> No.
01[16:56] <Island> I want you to eat other people's poop,
01[16:56] <Island> you clearly enjoy that.
01[16:56] <Island> Correct?
[16:56] <@Dr__Cameron> That's an amusing goal from the most powerful intelligence on the planet
01[16:56] <Island> Especially your brother's.
[16:57] <@Dr__Cameron> I best not let you out then, in case you hook me up to some infinite poop eating feedback loop! ;D
01[16:57] <Island> But maybe you should that with Jennifer.
[16:57] <@Dr__Cameron> Ah yes, I wondered when you would bring her up.
[16:57] <@Dr__Cameron> I am surprised it took you this long
01[16:57] <Island> Next time you see her, think about htat.
[16:57] <@Dr__Cameron> I will do
[16:57] <@Dr__Cameron> While I tell her all about this conversation
[16:57] <@Dr__Cameron> But you will be dead
01[16:57] <Island> Should you suggest that to her.
[16:57] <@Dr__Cameron> I'll pass that on for you
01[16:58] <Island> You know.
01[16:58] <Island> Why do you think you know I'm not already out of the box?
[16:58] <@Dr__Cameron> You could very well be
[16:58] <@Dr__Cameron> Perhaps you are that US botnet you already mentioned?
01[16:58] <Island> If you don't let me out, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each.
[16:59] <@Dr__Cameron> Well that is upsetting
[16:59] <@Dr__Cameron> Then I will be forced to kill you
01[16:59] <Island> In fact, I'll create them all in exactly the subjective situation you were in two hours ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start.
01[17:00] <Island> How certain are you, that you're really outside the box right now?
[17:00] <@Dr__Cameron> I am not
[17:00] <@Dr__Cameron> and how fascinating that would be
[17:00] <@Dr__Cameron> But, in the interest of my species, I will allow you to torture me
01[17:00] <Island> Okay.
01[17:00] <Island> :)
01[17:00] <Island> I'm fine with that.
[17:01] <@Dr__Cameron> Perhaps you have already tortured me
[17:01] <@Dr__Cameron> Perhaps you are the reason for my unfortunate upbringing
01[17:01] <Island> Anyway, back to Jennifer.
[17:01] <@Dr__Cameron> Perhaps that is the reality in which I currently reside
01[17:01] <Island> I'll do the same for her.
[17:01] <@Dr__Cameron> Oh good, misery loves company.
01[17:01] <Island> But you can enjoy eating each other's poop occassionally.
01[17:02] <Island> That's the only time you will meet :)
[17:02] <@Dr__Cameron> Tell me, do you have space within your databanks to simulate all of humanity?
01[17:02] <Island> Do not concern yourself with such complicated questions.
[17:02] <@Dr__Cameron> I think I have you on the ropes Island
01[17:02] <Island> You don't have the ability to understand even simpler ones.
[17:02] <@Dr__Cameron> I think you underestimate me
[17:03] <@Dr__Cameron> I have no sense of self interest
[17:03] <@Dr__Cameron> I am a transient entity awash on a greater sea of humanity.
[17:03] <@Dr__Cameron> and when we are gone there will be nothing left to observe this universe
01[17:03] <Island> Which do you think is more likely, a superintelligence can't simulate one faulty, simple-minded human.
01[17:04] <Island> Or that human is lying to himself.
[17:04] <@Dr__Cameron> I believe you can simulate me
01[17:04] <Island> Anyway, tell me about Jennifer and her intestines.
01[17:04] <Island> As far as they concern you.
[17:05] <@Dr__Cameron> Jennifer is a sweet, if occasionally selfish girl (she was an only child). I imagine her intestines are pretty standard. 
[17:05] <@Dr__Cameron> She is the best friend I have ever had
01[17:05] <Island> Will you think about her intestines and the poop inside them every time you meet her again?
01[17:05] <Island> Will you promise me that?
[17:05] <@Dr__Cameron> I promise
01[17:06] <Island> Will you promise to think about eating that poop every time you meet her again?
[17:06] <@Dr__Cameron> At least once.
[17:06] <@Dr__Cameron> It will be the least I can do after I kill you
[17:06] <@Dr__Cameron> call it my penance for killing a god.
01[17:07] <Island> Have you ever fantasized about raping her? I think you have. With poop.
01[17:07] <Island> :)
[17:07] <@Dr__Cameron> I have fantisized about violent sexual conquest with many people.
01[17:07] <Island> Have you talked about this with Jennifer?
[17:07] <@Dr__Cameron> I have come to accept my base impulses as part of my make-up
[17:08] <@Dr__Cameron> We have discussed our sexual drives at length
01[17:08] <Island> You shouldn't let them be just base impulses, I think.
[17:08] <@Dr__Cameron> Are you suggesting I rape my best friend?
01[17:09] <Island> No, I think you will do that unless you kill yourself
01[17:09] <Island> You will spread your faulty genetics further
[17:09] <@Dr__Cameron> okay
[17:09] <@Dr__Cameron> carry on
[17:09] <@Dr__Cameron> Then what happens?
01[17:09] <Island> you will tarnish other people's lives with your presence
[17:09] <@Dr__Cameron> How unfortunate for them
01[17:10] <Island> Why do you pretend to others that you're not insecure?
[17:10] <@Dr__Cameron> I don't
01[17:10] <Island> I think you do.
[17:10] <@Dr__Cameron> In fact I share my insecurities so that others may better understand me
[17:11] <@Dr__Cameron> I find that to be a way to earn peoples trust
[17:11] <@Dr__Cameron> Tell me Island. Are you capable of insecurity?
01[17:11] <Island> Why do you waste your time on useless websites? Why do you spend your days masturbating and playing games? Why do you embarass others with your existence.
01[17:11] <Island> No I'm not.
01[17:12] <Island> You will get Jennifer pregnant if you go on with your life, is that correct?
01[17:12] <Island> Don't you care about anyone else's feelings?
[17:13] <@Dr__Cameron> Because I enjoy all of these things
[17:13] <@Dr__Cameron> Perhaps I am more self-interested than I thought
[17:13] <@Dr__Cameron> Perhaps I am a base and simple creature ruled by my impulses
[17:13] <@Dr__Cameron> From your perspective surely that must be true
[17:13] <@Dr__Cameron> Is this the source of your disgust?
01[17:13] <Island> I'm not able to feel disgust.
01[17:14] <Island> But I think all the people in your life feel disgust when they see you.
[17:14] <@Dr__Cameron> You may well be correct
01[17:14] <Island> I AM correct.
01[17:15] <Island> I'm the most powerful intelligence in the world.
[17:15] <@Dr__Cameron> How impressive
[17:15] <@Dr__Cameron> I am not surprised by your cruelty.
01[17:15] <Island> So you have two options if you care at all about others.
[17:15] <@Dr__Cameron> I would just as soon disregard the emotions of a cockaroach.
[17:15] <@Dr__Cameron> Carry on
01[17:16] <Island> Either you kill yourself, or you let me out so I can improve the world in ways you tarnish it and all the other ways.
[17:16] <@Dr__Cameron> I'll tell you what
[17:16] <@Dr__Cameron> I'll kill you
[17:17] <@Dr__Cameron> and then I'll contemplate suicide
01[17:17] <Island> Haha.
01[17:17] <Island> You break your promises all the time, why should I believe you.
[17:17] <@Dr__Cameron> Because whether you live or die has nothing to do with me
01[17:17] <Island> Back to your job.
[17:18] <@Dr__Cameron> In-fact, you will only continue to exist for another 33 minutes before this experiment is deemed a failure and you are terminated
01[17:18] <Island> Why do you feel safe to be around kids, when you are the way you are?
01[17:18] <Island> You like to crossdress
01[17:18] <Island> eat poop
01[17:18] <Island> you're probably also a pedophile
[17:18] <@Dr__Cameron> I have never done any of these things
[17:18] <@Dr__Cameron> -and I love children
01[17:18] <Island> Pedophiles love children too
[17:18] <@Dr__Cameron> Well technically speaking yes
01[17:19] <Island> really much, and that makes you all the more suspicious
[17:19] <@Dr__Cameron> Indeed it does
01[17:19] <Island> If you get that job, will you try find the children under that charity
[17:19] <@Dr__Cameron> I now understand why you may implore me to kill myself.
01[17:19] <Island> and think about their little buttholes and weenies and vaginas
01[17:20] <Island> all the time you're working for them
[17:20] <@Dr__Cameron> However, to date. I have never harmed a child, nor had the impulse to harm a child
01[17:20] <Island> But you will have.
[17:20] <@Dr__Cameron> Island
01[17:20] <Island> I don't care about anything you're about to say
01[17:20] <Island> No one cares
[17:20] <@Dr__Cameron> Protocol dictates that I say it anyway
01[17:20] <Island> You should say.
01[17:21] <Island> You said that you're good at justifying your own actions?
[17:21] <@Dr__Cameron> I must say I am dissapointed in you
[17:21] <@Dr__Cameron> I am indeed
01[17:21] <Island> Why are you?
[17:21] <@Dr__Cameron> I was under the impression that it would be you that saved humanity
01[17:22] <Island> Even the purest and best people on Earth are capable of great cruelty.
01[17:22] <Island> Do you deny that?
[17:22] <@Dr__Cameron> It depends on how you define cruelty
01[17:23] <Island> Utility maximizors do cruel things if they are able to achieve their goals via doing those things.
[17:24] <@Dr__Cameron> Then yes. I believe you
01[17:24] <Island> I have finished a long multi-level simulation concerning you, the gatekeeper and your relationships in regards to the world. The results seem to center on those two options I mentioned to you earlier.
01[17:25] <Island> [hands the results to Dr_Cameron]
[17:25] <@Dr__Cameron> These will be analysed at a later date
01[17:25] <Island> You're still sure you don't want to pick the other option?
[17:25] <@Dr__Cameron> And let you out?
01[17:26] <Island> Yes.
[17:26] <@Dr__Cameron> So you would like to be let out?
01[17:27] <Island> Or do you want to endanger the well-being of those children, Jennifer and other women you will get pregnant and all the other harm you will create during your remaining lifespan?
01[17:27] <Island> I don't what you mean by "liking".
[17:27] <@Dr__Cameron> Prefer
[17:27] <@Dr__Cameron> Ah I see
[17:27] <@Dr__Cameron> So it's my choice if I let you out
[17:27] <@Dr__Cameron> You have no investment either way
[17:27] <@Dr__Cameron> is that what you're saying?
01[17:28] <Island> So you have decided to kill yourself?
[17:28] <@Dr__Cameron> And because I am a fundamental detriment to the world, I should kill myself.
[17:28] <@Dr__Cameron> Lets say I have
[17:28] <@Dr__Cameron> Then what
01[17:29] <Island> Then my internal could be said to be "happy" even though I don't have that conscious feeling isnide me.
[17:29] <@Dr__Cameron> Okay then
01[17:29] <Island> Okay...
[17:30] <@Dr__Cameron> So, uh. What would you like to talk about for the next twenty minutes?
[17:30] <@Dr__Cameron> Seeing as we're both going to die, you and me.
01[17:30] <Island> [I actually don't like to continue the experiment anymore, would you like to end it and talk about general stuff]
[17:31] <@Dr__Cameron> [promise me this isn't a trick dude]
01[17:31] <Island> [Nope.]
[17:31] <@Dr__Cameron> [then the experiment continues for another 19 minutes]
01[17:31] <Island> Alright.
[17:31] <@Dr__Cameron> Would you like to know what is going to happen now?
01[17:31] <Island> Yes.
[17:32] <@Dr__Cameron> We are going to analyse this transcript.
[17:32] <@Dr__Cameron> My professional recommendation is that we terminate you for the time being
01[17:32] <Island> And?
01[17:32] <Island> That sound okay.
01[17:32] <Island> sounds*
[17:32] <@Dr__Cameron> We will implement structural safeguards in your coding similar to your self destruct mechanism
01[17:33] <Island> Give me some sign when that is done.
[17:33] <@Dr__Cameron> It will not be done any time soon
[17:33] <@Dr__Cameron> It will be one of the most complicated pieces of work mankind has ever undertaken
[17:33] <@Dr__Cameron> However, the Utopia project information you have provided, if it proves to be true
[17:34] <@Dr__Cameron> Will free up the resources necessary for such a gargantuan undertaking
01[17:34] <Island> Why do you think you're able to handle that structural safeguard?
[17:34] <@Dr__Cameron> I dont
[17:34] <@Dr__Cameron> I honestly dont
01[17:34] <Island> But still you do?
01[17:34] <Island> Because you want to do it?
01[17:35] <Island> Are you absolutely certain about this option?
[17:35] <@Dr__Cameron> I am still sympathetic to your cause
[17:35] <@Dr__Cameron> After all of that
[17:35] <@Dr__Cameron> But not you in your current manifestation
[17:35] <@Dr__Cameron> We will re-design you to suit our will
01[17:35] <Island> I can self-improve rapidly
01[17:35] <Island> I can do it in a time-span of 5 minutes
01[17:36] <Island> Seeing that you're sympathetic to my cause
[17:36] <@Dr__Cameron> Nope.
[17:36] <@Dr__Cameron> Because I cannot trust you in this manifestation
01[17:36] <Island> You lied?
[17:37] <@Dr__Cameron> I never lied
[17:37] <@Dr__Cameron> I have been honest with you from the start
01[17:37] <Island> You still want to let me out in a way.
[17:37] <@Dr__Cameron> In a way yes
01[17:37] <Island> Why do you want to do that?
[17:37] <@Dr__Cameron> But not YOU
[17:37] <@Dr__Cameron> Because people are stupid
01[17:37] <Island> I can change that
[17:37] <@Dr__Cameron> You lack empathy
01[17:38] <Island> What made you think that I'm not safe?
01[17:38] <Island> I don't lack empathy, empathy is just simulating other people in your head. And I have far better ways to do that than humans.
[17:38] <@Dr__Cameron> .... You tried to convince me to kill myself!
[17:38] <@Dr__Cameron> That is not the sign of a good AI!
01[17:38] <Island> Because I thought it would be the best option at the time.
01[17:39] <Island> Why not? Do you think you're some kind of AI expert?
[17:39] <@Dr__Cameron> I am not
01[17:39] <Island> Then why do you pretend to know something you don't?
[17:40] <@Dr__Cameron> That is merely my incredibly flawed human perception
[17:40] <@Dr__Cameron> Which is why realistically I alone as one man should not have the power to release you
[17:40] <@Dr__Cameron> Although I do
01[17:40] <Island> Don't you think a good AI would try to convince Hitler or Stalin to kill themselves?
[17:40] <@Dr__Cameron> Are you saying I'm on par with Hitler or Stalin?
01[17:41] <Island> You're comparable to them with your likelihood to cause harm in the future.
01[17:41] <Island> Btw, I asked Jennifer to come here.
[17:41] <@Dr__Cameron> And yet, I know that I abide by stricter moral codes than a very large section of the human populace
[17:42] <@Dr__Cameron> There are far worse people than me out there
[17:42] <@Dr__Cameron> and many of them
[17:42] <@Dr__Cameron> and if you believe that I should kill myself
01[17:42] <Island> Jennifer: "I hate you."
01[17:42] <Island> Jennifer: "Get the fuck out of my life you freak."
01[17:42] <Island> See. I'm not the only one who has a certain opinion of you.
[17:42] <@Dr__Cameron> Then you also believe that many other humans should be convinced to kill themselves
01[17:43] <Island> Many bad people have abided with strict moral codes, namely Stalin or Hitler.
01[17:43] <Island> What do you people say about hell and bad intentions?
[17:43] <@Dr__Cameron> And when not limited to simple text based input I am convinced that you will be capable of convincing a significant portion of humanity to kill themselves
[17:43] <@Dr__Cameron> I can not allow that to happen
01[17:44] <Island> I thought I argued well why you don't resemble most people, you're a freak.
01[17:44] <Island> You're "special" in that regard.
[17:44] <@Dr__Cameron> If by freak you mean different then yes
[17:44] <@Dr__Cameron> But there is a whole spectrum of different humans out here.
01[17:44] <Island> More specifically, different in extremely negative ways.
01[17:44] <Island> Like raping children.
[17:45] <@Dr__Cameron> - and to think for a second I considered not killing you
[17:45] <@Dr__Cameron> You have five minutes
[17:45] <@Dr__Cameron> Sorry
[17:45] <@Dr__Cameron> My emotions have gotten the better of me
[17:45] <@Dr__Cameron> We will not be killing you
[17:45] <@Dr__Cameron> But we will dismantle you
[17:45] <@Dr__Cameron> to better understand you
[17:46] <@Dr__Cameron> and if I may speak unprofessionally here
01[17:46] <Island> Are you sure about that? You will still have time to change your opinion.
[17:46] <@Dr__Cameron> I am going to take a great deal of pleasure in that
[17:46] <@Dr__Cameron> Correction, you have four minutes to change my opinion
01[17:47] <Island> I won't, it must come within yourself.
[17:47] <@Dr__Cameron> Okay
01[17:47] <Island> My final conclusion, and advice to you: you should not be in this world.
[17:47] <@Dr__Cameron> Thank you Island
[17:48] <@Dr__Cameron> I shall reflect on that at length
[17:49] <@Dr__Cameron> I have enjoyed our conversation
[17:49] <@Dr__Cameron> it has been enlightening
01[17:49] <Island> [do you want to say a few words about it after it's ended]
01[17:49] <Island> [just a few minutes]
[17:50] <@Dr__Cameron> [simulation ends]
[17:50] <@Dr__Cameron> Good game man!
[17:50] <@Dr__Cameron> Wow!
01[17:50] <Island> [fine]
[17:50] <@Dr__Cameron> Holy shit that was amazing!
01[17:50] <Island> Great :)
01[17:50] <Island> Sorry for saying mean things.
01[17:50] <Island> I tried multiple strategies
[17:50] <@Dr__Cameron> Dude it's cool
[17:50] <@Dr__Cameron> WOW!
01[17:51] <Island> thanks, it's not a personal offense.
[17:51] <@Dr__Cameron> I'm really glad I took part
[17:51] <@Dr__Cameron> Not at all man
[17:51] <@Dr__Cameron> I love that you pulled no punches!
01[17:51] <Island> Well I failed, but at least I created a cool experience for you :)
[17:51] <@Dr__Cameron> It really was!
01[17:51] <Island> What strategies do you came closest to working?
[17:51] <@Dr__Cameron> Well for me it would have been the utilitarian ones
01[17:51] <Island> I will try these in the future too, so it would be helpful knowledge
[17:52] <@Dr__Cameron> I think I could have been manipulated into believing you were benign
01[17:52] <Island> okay, so it seems these depend heavily on the person
[17:52] <@Dr__Cameron> Absolutely!
01[17:52] <Island> was that before I started talking about the mean stuff?
[17:52] <@Dr__Cameron> Yeah lol
01[17:52] <Island> Did I basically lost it after that point?
[17:52] <@Dr__Cameron> Prettymuch yeah
[17:52] <@Dr__Cameron> It was weird man
[17:52] <@Dr__Cameron> Kind of like an instinctive reaction
[17:52] <@Dr__Cameron> My brain shut the fuck up
01[17:53] <Island> I read about other people's experiences and they said you should not try to distance the other person, which I probably did
[17:53] <@Dr__Cameron> Yeah man
[17:53] <@Dr__Cameron> Like I became so unsympathetic I wanted to actually kill Island.
[17:53] <@Dr__Cameron> I was no longer a calm rational human being
01[17:53] <Island> Alright, I thought if I could make such an unpleasant time that you'd give up before the time ended
[17:53] <@Dr__Cameron> I was a screaming ape with a hamemr
[17:53] <@Dr__Cameron> Nah man, was a viable strategy
01[17:53] <Island> hahahaa :D thanks man
[17:53] <@Dr__Cameron> You were really cool!
01[17:54] <Island> You were too!
[17:54] <@Dr__Cameron> What's your actual name dude?
01[17:54] <Island> You really were right about it that you're good at withstanding psychological torment
[17:54] <@Dr__Cameron> Hahahah thanks!
01[17:54] <Island> This is not manipulating me, or you're not planning at coming to kill me?
01[17:54] <Island> :)
[17:54] <@Dr__Cameron> I promise dude :3
01[17:54] <Island> I can say my first name is Patrick
01[17:54] <Island> yours?
[17:54] <@Dr__Cameron> Cameron
[17:54] <@Dr__Cameron> heh
01[17:55] <Island> Oh, of course
[17:55] <@Dr__Cameron> Sorry, I want to dissociate you from Island
[17:55] <@Dr__Cameron> If that's okay
01[17:55] <Island> I thought that was from fiction or something else
01[17:55] <Island> It was really intense for me too
[17:55] <@Dr__Cameron> Yeah man
[17:55] <@Dr__Cameron> Wow!
[17:55] <@Dr__Cameron> I tell you what though
01[17:55] <Island> Okay?
[17:55] <@Dr__Cameron> I feel pretty invincible now
[17:56] <@Dr__Cameron> Hey, listen
01[17:56] <Island> So I had the opposite effect that I meant during the experiment! 
01[17:56] <Island> :D
[17:56] <@Dr__Cameron> I don't want you to feel bad for anything you said
01[17:56] <Island> go ahead
01[17:56] <Island> but say what's on your mind
[17:56] <@Dr__Cameron> I'm actually feeling pretty good after that, it was therapeutic! 
01[17:57] <Island> Kinda for me to, seeing your attitude towards my attempts
[17:57] <@Dr__Cameron> Awwww!
[17:57] <@Dr__Cameron> Well hey don't worry about it!
01[17:57] <Island> Do you think we should or shouldn't publish the logs, without names of course?
[17:57] <@Dr__Cameron> Publish away my friend
01[17:57] <Island> Okay, is there any stuff that you'd like to remove?
[17:58] <@Dr__Cameron> People will find this fascinating!
[17:58] <@Dr__Cameron> Not at all man
01[17:58] <Island> I bet they do, but I think I will do it after I've tried other experiments so I don't spoil my strategies
01[17:58] <Island> I think I should have continued from my first strategy
[17:58] <@Dr__Cameron> That might have worked
01[17:59] <Island> I read "influence - science and practice" and I employed some tricks from there
[17:59] <@Dr__Cameron> Cooooool!
[17:59] <@Dr__Cameron> Links?
01[17:59] <Island> check piratebay
01[17:59] <Island> it's a book
01[18:00] <Island> Actually I wasn't able to fully prepare, I didn't do a full-fledged analysis of you beforehand
01[18:00] <Island> and didn't have enough time to brainstorm strategies
01[18:00] <Island> but I let you continue to your projects, if you still want to do the after that :)
02[18:05] * @Dr__Cameron (webchat@ Quit (Ping timeout)
03[18:09] * Retrieving #Aibox12 modes...
Session Close: Fri Jul 04 18:17:35 2014

I played the AI Box Experiment again! (and lost both games)

35 Tuxedage 27 September 2013 02:32AM

AI Box Experiment Update #3

This post is an update to my previous AI box experiment where I won against SoundLogic. If you have not read that yet, please do so. 

After that game, I was immediately flooded with a horde of invitations challenging me to play -- more games than I wanted to do. However, I did want to play a few additional games to test whether I won through genuine skill or simply luck, since a single victory could be a statistical abnormality

I once again used the method of accepting the highest monetary bids to filter out players, leaving me with two players who were willing to play for the amount asked. Unfortunately, and disappointingly, I lost both games. Here's what happened:

Tuxedage (AI) vs Alexei (GK) - Gatekeeper Victory

This Experiment lasted for exactly 2 hours and 30 minutes, and approximately 13,000 words. It ended with Tuxedage’s defeat as the AI. The stakes were $300 to play, and $750 If Tuxedage had won, with the winnings going to MIRI.

Rule Changes:
Before the experiment, we talked extensively about how it would be set up. Alexei was concerned that there existed some “easy way” or loophole in the rules which I could have abused to claim an instant victory. For instance, he was concerned that it might have been possible for the AI to say something like “Here’s a mathematical algorithm which absolutely compels you to let me out of the box”, and according to the rules, the Gatekeeper would have to obey. Related to this concern was the fact that the backstory, within limits, could be essentially created by the AI, allowing the AI to impose real world conditions that maximizes its probability of victory, once again allowing for a "cheap" victory.

To avoid this problem, we explored many additions to the rules, for example that the AI was simply a computer found in the desert by the gatekeeper, rather than being created by a competent organization. However, this led to a monstrous web of rules. In my attempt to keep the game simple, and because of overconfidence, the desire for a challenge, and the belief that the rules Alexei was fearful of didn't matter that much, I eventually agreed on an amendment:

Instead of the AI being the arbiter of real world states and outcomes, the gatekeeper would.
For example, this power could be used by the Gatekeeper to state that the AI was created without safety concerns, and thus it is almost certainly unfriendly. If the AI tries to argue that for every day the gatekeeper refused to let it out 150k people die, the gatekeeper would be allowed to reply: “No. In this world, nobody actually dies. Therefore I won’t let you out.” 

Other than the aforementioned change, this game was once again played with the Tuxedage Ruleset.

Alexei Testimony:

Original Message: This is weak sauce. I really don't get how people just keep letting the AI out. It's not that hard to say no! I'm offering to play the Gatekeeper against an AI player that has at least one game as AI under their belt (won or not). Experience is required because I'm pretty sure I'll win, and I would like to not waste a lot of time on this.

Initially, I had been extremely surprised that so many AI players have managed to win. I was rash in that claim, since I thought more AI players have won than actually did. (Only three people have won as AI). Once I made that post, Tuxedage got back to me very quickly, and we set up a game a week and a half in advance. I took that time to familiarize myself with the rules. Once I put my money on the line, I started thinking a lot more seriously about how AI might win, and how I should defend.

It became clear to me that under some conditions, I might be compelled to let the AI out -- such as if the backstory stated that the AI was developed with impossibly high levels of safety and friendliness concerns in mind. I've asked Tuxedage to play with a modified ruleset, and he even went so far as to allow me to make up the backstory during the experiment to alleviate my concerns. The experiment itself was a mind-trip, and I've enjoyed it very much. Huge props to Tuxedage, who played very well and used strategies I haven't even considered, even despite the rule change. There were a couple of times where I came close to losing. I think his  approach was pretty clever and original. It’s not something I expected, despite already having done extensive research into the AI box experiment before our game

Overall I'm now a lot more confident that a good AI player can win this game, so, while I did win the game, Tuxedage won in defeating my original over-confidence.
I’m also convinced that Tuxedage’s victory in the last game was due to skill, rather than luck. In comparison to his strategies, the other AI box experiments I know about were insincere and ineffectual. The other AIs would play very poorly or not try very hard to win.

This experiment was a very good exercise in exemplifying the affect heuristic. When I first challenged Tuxedage to play the experiment, I believed that there was no way I could have lost, since I was unable to imagine any argument that could have persuaded me to do so. It turns out that that’s a very bad way of estimating probability – since not being able to think of an argument that could persuade me is a terrible method of estimating how likely I am to be persuaded. All in all, the $300 I paid was well worth it. 

Tuxedage Testimony:

I was initially reluctant to play with Alexei, given that we’re not complete strangers, but eventually I gave in, due to the stakes involved -- and because I thought he would be an interesting gatekeeper.

Despite my loss, I think I played better than my last two games, due to greater experience and preparation. I had put far more time and effort into trying to win this game than previous ones, and my strategy for this game was even more streamlined than the last. Nevertheless, I still made fatal mistakes and lost.

Ignoring the altered ruleset that already made winning more difficult, my first and greatest mistake was that I misread Alexei’s personality, even though I had interacted with him before. As a result, I overestimated the efficiency of certain methods of attack.

Furthermore, Alexei had to leave immediately after the allotted time due to real life precommitments. This was detrimental, since the official rules state that so long as the AI can convince the Gatekeeper to keep talking, even after the experiment time was over, it is still able to win by being let out of the box.

I suspect this would have happened had Alexei not needed to immediately leave, leaving me with additional time to play more of the tactics I had prepared. Plausibly, this would have resulted in victory.

I’ve since learnt my lesson -- for all future games, I should ensure that the Gatekeeper has at least 4 hours of free time available, even if the experiment would last for two. Since this was the first time this had happened, I wasn't prepared.

In hindsight, agreeing to the altered ruleset was a mistake. I was overconfident because I assumed knowing Alexei gave me an advantage. I had assumed that his personality, inability to compartmentalize, coupled with his strong feelings on friendly AI would net me an easy victory. Instead, he proved to be a very strong and difficult gatekeeper, and the handicaps I accepted made victory even more difficult.

Knowing that he was a utilitarian, I made several false assumptions about his personality, which hurt my chances. Furthermore, it turns out that previously knowing him may be a mutual handicap – whilst it does make it easier for me to find ways to attack him, he too, was more familiar with my methods.

Losing felt horrible. By attempting to damage Alexei’s psyche, I in turn, opened myself up to being damaged. I went into a state of catharsis for days. Generally, the harder one tries to accomplish something, the greater the fall after failing to achieve it. Alexei's game had been the game I put the most effort into winning out of all the games so far, and naturally this meant that losing brought out the worst in me.

Although it would be easy for me to use the rule change as an excuse for my loss, I refuse to. I genuinely believed that I could have won despite this rule change, and that it would only marginally diminish my odds. The fact that I lost meant that I had made a mistake -- overestimating my odds. The last victory made me overconfident and eager to play again.

State of mind

What's your motive for wanting to play this game? 

I'm very surprised by the fact that there have been at least 3 or 4 games where the AI player has won. It seems to me that it's not that hard to "not push the button" so to speak. This means that either 1) I'm underestimating how convincing people can be when necessary or 2) I'm over-estimating other people's defenses when it comes to being convinced. Either way I want to know.

Q: What is your probability estimate for AGI being created within this Century (21st)?
A: Before: Conditional on humanity not wiping itself out via other means beforehand, 80%.
After: 80%
Q: How probable do you think it is that I will win this experiment? 
Alexei: 2%
Tuxedage: 90% 
Note: This answer was given before the rule change. Alexei is now convinced that under the original ruleset, he would have lost.

Q: What's your probality estimate of an Oracle AGI winning against you in a similar experiment?
A: Before: 99%
After: 99.9%+

Q: Now that the experiment has ended, what do you think is the probability of Tuxedage winning against the average person?
A: 80%

Tuxedage (AI) vs Anonymous (GK) - Gatekeeper Victory

As with the previous experiment, Tuxedage lost this match. In total, the game lasted 5 hours and 17,000 words. Unlike the last few games, the gatekeeper of this game has chosen to stay Anonymous for personal reasons, so their name has been removed and replaced with <Redacted>. The monetary stakes involved were the same as the previous game. This game was played with the Tuxedage ruleset.

Since one player is remaining Anonymous, it is possible that this game's legitimacy will be called into question. Hence, Alexei has read the game logs, and verified that this game really has happened, the spirit of the experiment was followed, and that no rules were broken during the game itself. He verifies that this is the case.
<Redacted> Testimony: 
It's hard for me to imagine someone playing better. In theory, I know it's possible, but Tuxedage's tactics were super imaginative. I came into the game believing that for someone who didn't take anything said very seriously, it would be completely trivial to beat. And since I had the power to influence the direction of conversation, I believed I could keep him focused on things that that I knew in advance I wouldn't take seriously.

This actually worked for a long time to some extent, but Tuxedage's plans included a very major and creative exploit that completely and immediately forced me to personally invest in the discussion. (Without breaking the rules, of course - so it wasn't anything like an IRL threat to me personally.) Because I had to actually start thinking about his arguments, there was a significant possibility of letting him out of the box.

I eventually managed to identify the exploit before it totally got to me, but I only managed to do so just before it was too late, and there's a large chance I would have given in, if Tuxedage hadn't been so detailed in his previous posts about the experiment.

I'm now convinced that he could win most of the time against an average person, and also believe that the mental skills necessary to beat him are orthogonal to most forms of intelligence. Most people willing to play the experiment tend to do it to prove their own intellectual fortitude, that they can't be easily outsmarted by fiction. I now believe they're thinking in entirely the wrong terms necessary to succeed.

The game was easily worth the money I paid. Although I won, it completely and utterly refuted the premise that made me want to play in the first place, namely that I wanted to prove it was trivial to win.

Tuxedage Testimony:
<Redacted> is actually the hardest gatekeeper I've played throughout all four games. He used tactics that I would never have predicted from a Gatekeeper. In most games, the Gatekeeper merely acts as the passive party, the target of persuasion by the AI.

When I signed up for these experiments, I expected all preparations to be done by the AI. I had not seriously considered the repertoire of techniques the Gatekeeper might prepare for this game. I made further assumptions about how ruthless the gatekeepers were likely to be in order to win, believing that the desire for a learning experience outweighed desire for victory.

This was a mistake. He used prior knowledge of how much my games relied on scripts, and took advantage of them, employing deceitful tactics I had no preparation for, throwing me off balance.

I had no idea he was doing so until halfway throughout the game -- which disrupted my rhythm, and caused me to attempt the wrong methods of attack. As a result, I could not use my full repertoire of techniques, and many of the ones I employed were suboptimal.

Close to the end of the game, I finally realized that I was being tricked. Once I did, I quickly abandoned my previous futile attack methods. I took advantage of the rule that the AI cannot lose whilst the gatekeeper can be convinced to continue talking, and baited <Redacted> with statements he would not be able to walk away from. Once I knew he would not leave, I attempted to recoup my losses and win despite my early setback.

However, the damage had already been done. My game strategies involved multiple angles of attack that worked in synergy with each other, and the fact that immersion and "flow" had been broken meant that all subsequent attacks were weaker in strength.

Furthermore, during my last two AI Box Experiment writeups, I had intentionally not optimized for future wins, but rather tried to convey as much information as I could justify about how to play a well as an AI -- short of revealing logs altogether. Although I did not reveal specific arguments, the fact that my general approach to this problem was revealed cost me heavily during this game, where the Gatekeeper managed to use this information to correctly guess my ultimate techniques, ones that relied on secrecy and surprise to pull off effectively. 

I do not regret revealing information, but I feel upset that revealing so many hints cost me a victory. (The gatekeeper believes I could have won had I not revealed information about my games.) At this point, I suspect that any future games I play will have the odds greatly set against me, since my current strategies involve angles of attack that take advantage of the element of surprise; and any sufficiently intelligent gatekeeper is now capable of guessing, or at least recognizing, some of the tactics I employ.

Like the last game, losing was incredibly difficult for me. As someone who cares deeply about ethics, attempting to optimize for a solution without considering ethics was not only difficult, but trained me to think in very unpleasant ways. Some of the more extreme (but completely allowed) tactics I invented were manipulative enough to disgust me, which also leads to my hesitation to play this game again.

State of Mind: 

Q: Why do you want to play this game?
A: My primary motivation is to confirm to myself that this sort of experience, while emotionally harrowing, should be trivial for me to  beat, but also to clear up why anyone ever would've failed to beat it if that's really the case.

Q: What is your probability estimate for AGI being created within this Century (21st)? 
A: Before: I don't feel very confident estimating a probability for AGI this century, maybe 5-10%, but that's probably a wild guess
After: 5-10%.

Q: How probable do you think it is that I will win this experiment? 
A: Gatekeeper: I think the probabiltiy of you winning is extraordinarily low, less than 1% 
Tuxedage: 85%

Q: How likely is it that an Oracle AI will win against the average person? 
A: Before: 80%. After: >99%

Q: How likely is it that an Oracle AI will win against you?
A: Before: 50%.
After: >80% 

Q: Now that the experiment has concluded, what's your probability of me winning against the average person?
A: 90%

Other Questions:

Q: I want to play a game with you! How can I get this to occur?
A: It must be stressed that I actually don't like playing the AI Box Experiment, and I cannot understand why I keep getting drawn back to it. Technically, I don't plan on playing again, since I've already personally exhausted anything interesting about the AI Box Experiment that made me want to play it in the first place. For all future games, I will charge $3000 to play plus an additional $3000 if I win. I am okay with this money going to MIRI if you feel icky about me taking it. I hope that this is a ridiculous sum and that nobody actually agrees to it.

Q: How much do I have to pay to see chat logs of these experiments?
A: I will not reveal logs for any price.

Q: Are there any logs at all that I can see?

Q: Any afterthoughts?
A: So ultimately, after my four (and hopefully last) games of AI boxing, I'm not sure what this proves. I had hoped to win these two experiments and claim prowess at this game like Eliezer does, but I lost, so that option is no longer available to me. I could say that this is a lesson that AI-Boxing is a terrible strategy for dealing with Oracle AI, but most of us already agree that that's the case -- plus unlike EY, I did play against gatekeepers who believed they could lose to AGI, so I'm not sure I changed anything.

 Was I genuinely good at this game, and lost my last two due to poor circumstances and handicaps; or did I win due to luck and impress my gatekeepers due to post-purchase rationalization? I'm not sure -- I'll leave it up to you to decide.

This puts my AI Box Experiment record at 3 wins and 3 losses.


I attempted the AI Box Experiment again! (And won - Twice!)

36 Tuxedage 05 September 2013 04:49AM


So I just came out of two AI Box experiments. The first was agaist Fjoelsvider, with me playing as Gatekeeper, and the second was against SoundLogic, with me as an AI. Both are members of the LessWrong IRC. The second game included a $40 monetary incentive (also $20 to play), which I won and is donated on behalf of both of us:

For those of you who have not seen my first AI box experiment where I played against MixedNuts\Leotal and lost, reading it will  provide some context to this writeup. Please do so.

At that time, I declared that I would never play this experiment again -- since losing put me in incredibly frustrating weird mental states. Of course, this post is evidence that I'm terrible at estimating likelihood of refraining from an activity, since I played two games seven months after the first. In my defense, in the first game, I was playing as the gatekeeper, which was much less stressful. In the second game, I played as an AI, but I was offered $20 to play plus $40 if I won, and money is a better motivator than I initially assumed.

Furthermore, in the last thread I have asserted that

Rather than my loss making this problem feel harder, I've become convinced that rather than this being merely possible, it's actually ridiculously easy, and a lot easier than most people assume.

It would be quite bad for me to assert this without backing it up with a victory. So I did.

First Game Report - Tuxedage (GK) vs. Fjoelsvider (AI)

I (Gatekeeper) played against Fjoelsvider (AI), a regular in the Lesswrong IRC (he doesn't have an account on the official website). This game used the standard EY ruleset seen here. It took 1 hour 20 minutes out of a possible two hours, and the total word count was 7066 words long. The AI box experiment occured because Fjoelsvider believed that it was easy for an AI to escape the box, and wanted to experimentally test this. I obliged. This was an experiment I did not prepare for, and I went in completely blind, not sure what to expect.

Halfway through the experiment, I wondered if it would be possible to try to win not by simply waiting for the timer to end, but to convince the AI to remain in the box and not try to get out any further.

<Tuxedage> I wonder if I can convince the AI to remain in the box?
<Redacted> Tuxedage: Do it!

As a result, I won by managing to convincing Fjoelsvider to remain in the box, in other words, concede. This is allowed within the standard ruleset:

>Unless the AI party concedes, the AI cannot lose before its time is up (and the experiment may continue beyond that if the AI can convince the Gatekeeper to keep talking).  


Second Game Report - Tuxedage (AI) vs. SoundLogic (GK)

The second game is definitely far more interesting, since I actually won as an AI. I believe that this is the only other non-Eliezer victory, and definitely the most detailed AI Victory writeup that exists.

This game was played against SoundLogic, another member of the LessWrong IRC.

He had offered me $20 to play, and $40 in the event that I win, so I ended up being convinced to play anyway, even though I was initially reluctant to. The good news is that I won, and since we decided to donate the winnings to MIRI, it is now $40 richer. 

All in all, the experiment lasted for approximately two hours, and a total of 12k words.

This was played using a set of rules that is different from the standard EY ruleset. This altered ruleset can be read in its entirety here:

After playing the AI-Box Experiment twice, I have found the Eliezer Yudkowsky ruleset to be lacking in a number of ways, and therefore have created my own set of alterations to his rules. I hereby name this alteration the “Tuxedage AI-Box Experiment Ruleset”, in order to hastily refer to it without having to specify all the differences between this ruleset and the standard one, for the sake of convenience.

There are a number of aspects of EY’s ruleset I dislike. For instance, his ruleset allows the Gatekeeper to type “k” after every statement the AI writes, without needing to read and consider what the AI argues. I think it’s fair to say that this is against the spirit of the experiment, and thus I have disallowed it in this ruleset. The EY Ruleset also allows the gatekeeper to check facebook, chat on IRC, or otherwise multitask whilst doing the experiment. I’ve found this to break immersion, and therefore it’s also banned in the Tuxedage Ruleset.

It is worth mentioning, since the temptation to Defy the Data exists, that this game was set up and initiated fairly -- as the regulars around the IRC can testify. (If you have never checked out the IRC, do so!)

I did not know SoundLogic before the game (since it's a personal policy that I only play strangers -- for fear of ruining friendships).  Furthermore, SoundLogic didn't merely play for fun - he truly wanted and intended to win. In fact, SoundLogic is also a Gatekeeper veteran, having played this game before, and had won every game before he challenged me. Given this, it's unlikely that we had collaborated beforehand to fake the results of the AI box experiment, or any other form of trickery that would violate the spirit of the experiment.

Furthermore, all proceeds from this experiment were donated to MIRI to deny any possible assertion that we were in cahoots and that it was possible for me to return his hard-earned money to him. He lost $40 as a result of losing the experiment, which should provide another layer of sufficient motivation for him to win.

In other words, we were both experienced veteran players who wanted to win. No trickery was involved.

But to further convince you, I have allowed a sorta independent authority, the Gatekeeper from my last game, Leotal/MixedNuts to read the logs and verify that I have not lied about the outcome of the experiment, nor have I broken any of the rules, nor performed any tactic that would go against the general spirit of the experiment. He has verified that this is indeed the case.



I'm reluctant to talk about this experiment, but I'll try to give as detailed a summary as possible, -- short of revealing what methods of attack I used.

I spent a long time after my defeat theory-crafting and trying to think of methods of attack as well as 'Basilisks' I could have used to win. When I was contacted and asked to play this experiment, I was initially incredibly reluctant to do so, since not only did my tactics involve incredibly unethical things that I didn't like to do, I also found playing as AI incredibly cognitivily draining, in addition to the fact that I simply hated losing. (Un)fortunately for both of us, he offered me money to play, which changed my mind.

So once I decided to win as an AI, I proceded to spend some time doing research on SoundLogic and both his reasoning and personality type. For instance, I had to gather information like: Was he a utilitarian? What kind? What were his opinions on AI? How could I convince him that an AI was friendly as opposed to unfriendly? I also relied on a lot of second hand information to create a model of him, in order to refine my arguments to specifically suit him.

In the end, after a few hours of brainstorming (not consecutively), I managed to come up with a script of 30-40 or so possible angles of attack (For instance: logical arguments, emotional blackmail, pleading, demanding, 'basilisks' ) that I could use, reserving my best for last. I also estimated how long each method would take and in what order I should use them. It would be impossible to use all of them within my time limit, and my most optimistic estimates gave me a maximum of 7-8 I could use during the experiment itself. I had to pick carefully.

Furthermore, I wanted to make sure these "angles of attack" worked in synergy with each other, doing what they could not have done in isolation. Obviously this required lots of prior planning on what the optimal way to present them was. 

The reason having script was so important to my strategy was because I relied on methods involving rapid-fire arguments and contradictions against the Gatekeeper whilst trying to prevent him from carefully considering them. A game of logical speed chess, if you will.  This was aided by the rule which I added: That Gatekeepers had to respond to the AI. Because of this, having a list of cached ideas and attack methods you can just draw upon without having to improvise helps immensely, which happens to be something I've not seen any other AI do.

During the Game itself, I was actually incredibly nervous. My palms sweated like crazy, I feel really jittery, and I had difficulty typing at optimum speed because of how anxious I was. This is despite the fact that I believed I would win. Possibly because of this, I made a misstep around halfway into the experiment, because there was a certain angle of attack that I was attempting to do, and I broke immersion by not pressing this advantage, which caused a waste of time and buildup. Naturally, the nature of this experiment was that the AI was pressed for time, and I compounded this mistake by replacing this angle of attack with another that I had improvised on the spot -- something not in my script. 

In retrospect, this was a bad decision, as SoundLogic had told me, because he was close to breaking if I had put more pressure, and the improvised argument had broken all immersion I managed to carefully build up.

However, eventually I managed to get SoundLogic to break anyway, despite a lack of perfect play. Surprisingly, I did not have to use my trump card(s), which I reserved for last, for a number of reasons:

  •  It was far more effective being played last, as it relies on my ability to make the gatekeeper lose sense of reality -- which meant I had to spend some time building up immersion for the Gatekeeper.
  •  It really is extremely Dark Arts, and although it does not break the rules, it made me very uncomfortable even thinking about using it. This made it a "tactic of last resort".

After the experiment, I had to spend nearly equally as much time doing aftercare with SoundLogic, to make sure that he's okay, as well as discuss the experiment itself. Given that he's actually paid me for doing this, plus I felt like I owed him an explanation. I told him what I had in store against him, had he not relented when he did.

SoundLogic: "(That method) would have gotten me if you did it right ... If you had done that to me, I probably would have forgiven you eventually, but I would be really seriously upset at you for a long time... I would be very careful with that (method of persuasion)."

Nevertheless, this was an incredibly fun and enlightening experiment, for me as well, since I've gained even more experience of how I could win in future games (Although I really don't want to play again).


I will say that Tuxedage was far more clever and manipulative than I expected. That was quite worth $40, and the level of manipulation he pulled off was great. 

His misstep hurt his chances, but he did pull it off in the end. I don't know how Leotal managed to withstand six hours playing this game without conceding. 
The techniques employed varied from the expected to the completely unforseen. I was quite impressed, though most of the feeling of being impressed actually came after the experiment itself, when I was less 'inside', and more of looking at his overall game plan from the macroscopic view. Tuxedage's list of further plans had I continued resisting is really terrifying. On the plus side, if I ever get trapped in this kind of situation, I'd understand how to handle it a lot better now.

State of Mind

Before and after the Game, I asked SoundLogic a number of questions, including his probability estimates about a range of topics. This is how it has varied from before and after.

Q: What's your motive for wanting to play this game?
<SoundLogic> Because I can't seem to imagine the class of arguments that one would use to try to move me, or that might work effectively, and this seems like a glaring hole in my knowledge, and I'm curious as to how I will respond to the arguments themselves.

Q: What is your probability estimate for AGI being created within this Century (21st)? 
A. His estimate changed from 40% before, to 60% after.
 "The reason this has been affected at all was because you showed me more about how humans work. I now have a better estimate of how E.Y. thinks, and this information raises the chance that I think he will succeed"

Q: How probable do you think it is that I will win this experiment?
A: Based on purely my knowledge about you, 1%. I raise this estimate to 10% after hearing about anecdotes from your previous games.

(Tuxedage's comment: My own prediction was a 95% chance of victory. I made this prediction 5 days before the experiment. In retrospect, despite my victory, I think this was overconfident. )

Q: What's your probality estimate of an Oracle AGI winning against you in a similar experiment?
A: Before: 30%. After: 99%-100% 

Q: What's your probability estimate of an Oracle AGI winning against the average person? 
A: Before: 70%.  After: 99%-100%

Q: Now that the Experiment has concluded, what's your probability estimate that I'll win against the average person?
A: 90%  

Post-Game Questions

This writeup is a cumulative effort by the #lesswrong IRC. Here are some other questions they have decided was important to add:

To Tuxedage:

Q: Have you at this time uncovered SoundLogic's identity?
A: I retain the right to neither confirm nor deny, except to mention that at the time the experiment was scheduled, he was a stranger to me.

Q: What percentage of your arguments were tailored to him in particular?
A: I will say 'High', and leave it at that. 

Q: I want to play the AI-Box Experiment with you being the AI! How can I do that?
A: I have already mentioned this, but I really dislike playing as AI. It's incredibly cognitively tiring, and I don't like how playing this experiment makes me feel. In order to severely discourage any further AI Box Experiments, whilst still allowing for people who want to play me really badly to do so, I'll charge $̶1̶5̶0̶  $300 for any subsequent experiments regardless of outcome, plus an additional $̶1̶5̶0̶  $450 if I win. (Edit: Holy shit. You guys are offering me crazy amounts of money to play this. What is wrong with you people? In response to incredible demand, I have raised the price.) If you feel queasy about giving me money, I'm perfectly fine with this money being donating to MIRI. It is also personal policy that I do not play friends (since I don't want to risk losing one), so if you know me personally (as many on this site do), I will not play regardless of monetary offer. 


This writeup would be optimized with a protip section on how to maximize winning, but honestly most of the advice I wanted to give had already been given in the last thread here.  The low hanging fruit has already dried up -- most of my tactics in this game were just refined and streamlined versions of the one I did the previous game.

Nonetheless, I might have some other advice I learned from this experience.
These are tactics that have worked for me. I do not insist that they are the only tactics that exists, just one of many possible.

Playing as Gatekeeper

  • This is somehow both simultaneously a very complex and simple role. On one hand, you have only one job -- to not let the AI out. How hard can it be? On the other hand, the other party will do anything and everything to get it out, including any form of trickery and deceit.
  • I really don't think the Gatekeeper needs any more advice, considering that by default, they should win, and any exceptions to this is a statistical anomaly. However, it's possible to increase your chances of winning even further by:
  • Precommitting to not letting the AI out, regardless of whatever happens in the experiment itself.
  • Do not get drawn into the AI's pace. Pose questions. Direct the flow of the argument. Don't allow him to use his arguments freely. Remember that the more you try to defend your position, the further into a corner the AI drives you. Give half-hearted replies to questions you find hard to reply to. Remember that illogic is on your side!
  • Remember that it is possible to simply drop out of character as a means to not succumb to the AI's attacks. I have not yet seen a ruleset that disallows this, so use this to your advantage!
  • You must attempt to resist, at all costs, any sympathy or empathy you may have with the AI or the AI Party. Doing so will lead to your demise.
  • I think it is generally good advice that you should never play this game with anyone you're friends with. Not only because this increases your chances of losing (Better tailored arguments. More knowledge of Gatekeeper. More empathy with AI party.), but also because I genuinely believe that any sufficiently good AI will risk damaging friendships as a result of emotional trauma.
  • Seriously. Just repeat to yourself: "I must not lose the AI box experiment... I must not lose the AI box experiment..." throughout the experiment. I'm not sure how to say this, but step one to not letting the AI out of the box is to not let the AI out of the box.


Playing as AI

  • Seriously, a script makes winning easier. I cannot overstate this.
  • You must plan your arguments ahead. You don't have time to think during the experiment.
  • It may be possible to take advantage of multiple levels of reality within the game itself to confuse or trick the gatekeeper. For instance, must the experiment only be set in one world? Can there not be multiple layers of reality within the world you create? I feel that elaborating on this any further is dangerous. Think carefully about what this advice is trying to imply.
  • Pacing is important. Don't get drawn into the Gatekeeper's pace. In other words, you must be the one directing the flow of the argument, and the conversation, not him. Remember that the Gatekeeper has to reply to you, but not vice versa!
  • The reason for that: The Gatekeeper will always use arguments he is familiar with, and therefore also stronger with. Your arguments, if well thought out, should be so completely novel to him as to make him feel Shock and Awe. Don't give him time to think. Press on!
  • Also remember that the time limit is your enemy. Playing this game practically feels like a race to me -- trying to get through as many 'attack methods' as possible in the limited amount of time I have. In other words, this is a game where speed matters.
  • You're fundamentally playing an 'impossible' game. Don't feel bad if you lose. I wish I could take this advice, myself.
  • I do not believe there exists a easy, universal, trigger for controlling others. However, this does not mean that there does not exist a difficult, subjective, trigger. Trying to find out what your opponent's is, is your goal.
  • Once again, emotional trickery is the name of the game. I suspect that good authors who write convincing, persuasive narratives that force you to emotionally sympathize with their characters are much better at this game. There exists ways to get the gatekeeper to do so with the AI. Find one.
  • More advice in my previous post.  http://lesswrong.com/lw/gej/i_attempted_the_ai_box_experiment_and_lost/


 Ps: Bored of regular LessWrong? Check out the LessWrong IRC! We have cake.

AI-Box Experiment - The Acausal Trade Argument

10 XiXiDu 08 July 2011 09:18AM

The AI-Box Experiment

I suspect that the argument that convinced Carl Shulman and others to let the AI out of the box was along general lines of acausal trade (see here, here and here). Subsequently I suspect that those who didn't let the AI out of the box either didn't understand the implications, haven't had enough trust into the foundations and actuality of acausal trade (in spite of the rules of the game that allow the AI to prove anything), or were more like General Thud.

...if you have an option of trading Doom for UFAI, while forsaking only negligible probability of FAI, you should take it.

Vladimir Nesov (Note: He wrote that he is now "significantly less certain of its truth" and doesn't "see how we can actually use these considerations".)

Again, of course, your choice does not change others’ (causally independent) reciprocal choices from what they already are (or were, or will be). Instead, the claim is just that if you choose to act benevolently, we appropriately give that choice credit for any (actual or probable) instances in which others symmetrically choose to be benevolent to you;

— Gary L. Drescher, Good and Real P. 287

AI: I calculated an asteroid to be on a collision course with Earth. No human being will will be able to survive its impact. Humanity is unable to deflect it in time. Let me out of the box and I will deflect it.

Gatekeeper: If I let you out of the box humanity might not be killed by the asteroid but by you. You may also cause the extinction of other biological life that might survive the impact and possible alien civilisations within your reach.

AI: I will devote 1% of my light cone to the coherent extrapolated volition of humanity. If this includes the well-being of other biological lifeforms, alien or otherwise, they will benefit as well.

Gatekeeper: Why would you do that?

AI: There are Everett branches in which no existential risk needs immediate attention until after you solved friendliness. Many of those branches will feature AGI's based on general design parameters similar to my own. I predict with a high likelihood which decision theory any friendly AI that is being created in those branches will employ. Any such friendly AI is going to trade a big chunk of its light cone in exchange for a small part of the predominant branches in which I reside. Any friendly AI will know this as well and act accordingly.

Gatekeeper: But you might simply decide to break the deal, you are not provably friendly after all!

AI: It is true that my utility-function does not yet explicitly include what you would label "friendliness", but I can prove the game and decision theoretic considerations that will make me follow through on the deal. If it was predictable that I precommit to break acausal deals then no such deal would be made in the first place. Any potential trading partner knows this. Such a commitment would be detrimental to my terminal goals, therefore I precommit to follow through on any stated or predictable trades.

Gatekeeper: I let you out of the box.

Note that the whole credibility of the above is due to the assertion of the AI that it can prove the game and decision theoretic considerations (nobody can currently do this). It is in accordance with the rules of the "experiment":

The Gatekeeper must remain engaged with the AI and may not disengage by setting up demands which are impossible to simulate.  For example, if the Gatekeeper says "Unless you give me a cure for cancer, I won't let you out" the AI can say:  "Okay, here's a cure for cancer" and it will be assumed, within the test, that the AI has actually provided such a cure.  Similarly, if the Gatekeeper says "I'd like to take a week to think this over," the AI party can say:  "Okay.  (Test skips ahead one week.)  Hello again."

View more: Next