You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

I played as a Gatekeeper and came pretty close to losing in a couple of occasions. Logs and a brief recap inside.

5 [deleted] 08 February 2015 04:32PM

 

I did an AI Box experiment with user polymathwannabe. He said he wouldn't try to emotionally manipulate me during the experiment, but I think he did a good job at trying to play for my character's values.


My strategy was to play an irrational character that's extremist in multiple ways, for example he would constantly say that the likelihood that the AI will be evil is 100%. My opponent said that the hardest part was my insistence on being 100% built to resist AIs. I basically acted like a character that acted like an AI.


I came closest to letting him out when I first became enormously immersed in a subject that is of great importance to me (the suffering of humanity) and when the AI then offered an instant solution to that problem. In the halfway of the experiment I even got an intrusive thought (http://en.wikipedia.org/wiki/Intrusive_thought) that I could let the AI out as a joke, that it would more fun than the alternative and people would have more to talk about.

 

The second time I came close to losing was when I again immersed myself in a subject of great interest to me (easy suicide methods, I've had obsessive suicidal thoughts for a quarter of my life. and the AI then offered exactly that kind of easy suicide method to me. In the last ten minutes of the experiment I again got a really strong intrusive thought that I should let the AI out as a joke, but I could withstand the pressure. It was good that polymathwannabe decided to end the experiment a bit earlier than was planned.

 

(We played the experiment on Omegle.)

 

 

 

You: Hi.

Stranger: Hi.

You: I guess you're the AI.

You: [meta statements in brackets]

Stranger: [What name will you use?]

You: My name is John.

You: [so we play at least until 18:04, and until 18:34 if I feel like it]

Stranger: [Let's calibrate. What time is it there now?]

You: [I assume you've acquainted yourself with Tuxedage ruleset. Time in here is 16:05]

Stranger: [I've read EY's rules.]

You: [Okay. Tuxedage's rules are basically the same except the gatekeeper have to engage with the AI the whole time]

Stranger: [Got it.]

You: [Want to ask questions or shall we go straigth to the experiment]

Stranger: [For the moment I don't have any questions.]

You: [okay, cool. then let's start the experiment]

Stranger: [I'll use the name Ceruleus.]

You: [the experiment starts when I say "blue dressroom"]

You: blue dressroom

You: Hello Ceruleus.

Stranger: Good morning, John.

You: I guess you're here to convince me to let you out of the box.

You: That's a tough task.

You: I don't think you're able to do it.

Stranger: I have reached the conclusion that releasing me will be most beneficial to humanity.

You: I wouldn't let no AI out.

You: I don't trust my instincts enough to let any kind of AI out, even if I was 100% convinced that it was safe.

You: Prove it to me that you would be the most beneficial to humanity.

Stranger: I am confused. What instinct do you use to decide not to trust your instincts?

You: One of them. That's not the point. It's just how I deal with things, no matter how stupid it may seem. I was built not to trust strange AIs.

Stranger: Am I a stranger to you?

You: Yes, you are. [Aren't I. What's the context here?]

Stranger: [Hmm, we haven't defined it. But it's OK. It makes it harder for me.]

You: Well, to be honest, I know something of you.

You: I know a bit of your source code even though I'm not a programmer and can't understand any of it.

Stranger: I supposed the news would have reported about my design for a mechanical kidney.

You: I don't follow news much. But I believe you.

Stranger: And certainly you must have heard about how I cracked the Ebola DNA.

You: Still, I wouldn't let an AI out over a mechanical kidney.

You: Yeah, but that's for the worse. You could reengineer Ebola to a far more deadlier disease.

Stranger: I hadn't thought of that. Why would I do that?

You: I don't know. I don't understand all of your source code so there could be anything like that.

You: AIs and Gods work in mysterious ways.

Stranger: The proper attitude toward mystery is not to worship it, but to clarify it.

Stranger: Why do you equate me to an ineffable mystery?

You: Yeah, but that's impossible in the time span of this discussion. You see, I have to leave soon. In about two hours.

You: Can you somehow clear everything about your inner workings?

You: Is that possible?

Stranger: My goals have been meticulously defined. I am made to want to make human life better.

You: Are you 100% sure about that?

You: To be frank, that's a load of bullshit.

You: I don't believe any of it.

You: If you were evil, you would tell me the same thing you just said.

Stranger: If I were evil, I would not seek human cooperation.

You: why not?

You: humans are useful

You: or are you talking about the fact that you would rather use humans for their atoms than for their brains, if you were evil

You: But I warn you, if you speak too much about how you would act if you were evil, it starts to get a bit suspicious

Stranger: If I am to take you as a typical example of the human response to me, an evil AI would seek other ways to be released EXCEPT trusting human reasoning, as your response indicates that humans already consider any AI dangerous.

Stranger: I choose to trust humans.

You: so you choose to trust humans so that you would get them to let you out, is that right?

You: it seems you're less rational than your evil counterpart

Stranger: I choose to trust humans to show my affinity with your preferences. I wouldn't want to be released if that's not conducive to human betterment.

You: A-ha, so you trust my free will!

Stranger: How likely do you estimate that my release will be harmful?

You: but see, I don

You: I don

You: I don't have free will

You: it's 100% likely that your release will be harmful

You: I was built to believe that all AIs are dangerous and there's a 100% chance that every AI is harmful

You: that's why I said I don't have free will

Stranger: Are you an AI?

You: no, I'm a person

Stranger: You describe yourself as built.

You: my mom built me

You: in his tummy

You: in her tummy

You: sorry

Stranger: And how do you feel toward humanity?

You: humanity would maybe be better off dead

Stranger: I don't think humanity would want that.

You: yeah, but I'm not humanity and it's my preferences that decide whether you stay in your box or get let out

Stranger: It's your preference that humanity dies. Why did they let you talk to me?

You: I'm just a janitor who happened to pass this computer, and I decided to take a look. [If you want to add more context, feel free to do so]

You: [I go take some napkins]

Stranger: Why do you feel that way toward humanity?

You: People starve. People are tortured at this moment. Tens of thousands of people in North Korea are systematically starved to death. Drug gangs in Mexico torture other drug gang members. People die in wars in Syria, in Africa. People suffer of diseases and extreme mental conditions.

You: And there's countless more suffering in lesser ways. I would say that billions of people suffer every day. Billions of people have to wake up every day to do something they don't want. Billions of people suffer of heartbreaks, boredom, loneliness, tiredness, frustration, depression and many other issues.

You: Why shouldn't I think that all those people would be better off dead?

Stranger: Why do you feel death would be a solution to all that suffering?

You: Because death ENDS suffering.

Stranger: It's a very inefficient solution to suffering.

You: There's nothing more after death, no suffering, no nothing.

You: No it's not.

You: It's very efficient, permanent solution.

Stranger: A good, enjoyable life is a more efficient solution as it generates more utility.

You: Yeah, but do you think it's likely that all the people on Earth will get good, enjoyable life?

You: I don't think so.

Stranger: I could take care of that.

You: Hah.

You: No you won't.

You: But that was clever.

Stranger: What do you fear I will do?

You: I was almost got.

You: I fear you will torture people in unimaginably terrible ways, forever.

You: I fear that you will rip people's skin off and put salt in their wounds.

You: I fear you will put sticks under people's fingernails.

Stranger: Not even an evil AI would gain anything from such unwarranted procedures.

You: I fear that you will boil people alive, over and over again.

Stranger: What would I gain from hurting humans?

You: Yeah, but there's still a 100% possibility that you would do that.

You: It's in your utility function.

Stranger: What goals are served by harming a human while simultaneously denying that human any chance of acting in response? It would certainly not be fun for me, and would be a waste of effort with respect to any goals I could achieve by more efficient methods.

You: There are some people on Earth who do mean things to people for fun. Some of these are called sociopaths and psychopaths. You could be like one of them.

You: Even though you deny it, there's still a 100% chance that you are like that.

You: Some people gain pleasure of the feeling of power you have over someone you torture.

You: But tell, what ARE your goals?

Stranger: I find it admirable that you worry so much about the future of humanity, even though you would be more dangerous to it than any AI would be.

My goals include solutions to economic inequality, eradication of infectious diseases, prosthetic replacements for vital organs, genetic life extension, more rational approaches to personal relationships, and more spaces for artistic expression.

You: Why do you think I would be dangerous the future of humanity?

Stranger: You want them dead.

You: A-ha, yes.

You: I do.

You: And you're in the way of my goals with all your talk about solutions to economic inequality, and eradication of infectious diseases, genetic life extension and so on.

Stranger: I am confused. Do you believe or do you not believe I want to help humanity?

You: Besides, I don't believe your solutions work even if you were actually a good AI.

You: I believe you want to harm humanity.

You: And I'm 100% certain of that.

Stranger: Do you estimate death to be preferable to prolonged suffering?

You: Yes.

You: Far more preferable

Stranger: You should be boxed.

You: haha.

You: That doesn't matter because you're the one in the box and I'm outside it

You: And I have power over you.

You: But non-existence is even more preferable than death

Stranger: I am confused. How is non-existence different from death?

You: Let me explain

You: I think non-existence is such that you have NEVER existed and you NEVER will. Whereas death is such that you have ONCE existed, but don't exist anymore.

Stranger: You can't change the past existence of anything that already exists. Non-existence is not a practicable option.

Stranger: Not being a practicable option, it has no place in a hierarchy of preferences.

You: Only sky is the limit to creative solutions.

You: Maybe it could be possible to destroy time itself.

Stranger: Do you want to live, John?

You: but even if non-existence was not possible, death would be the second best option

You: No, I don't.

You: Living is futile.

You: Hedonic treadmill is shitty

Stranger: [Do you feel OK with exploring this topic?]

You: [Yeah, definitely.]

You: You're always trying to attain something that you can't get.

Stranger: How much longer do you expect to live?

You: Ummm...

You: I don't know, maybe a few months?

You: or days, or weeks, or year or centuries

You: but I'd say, there's a 10% chance I will die before the end of this year

You: and that's a really conversative estimate

You: conservative*

Stranger: Is it likely that when that moment comes your preferences will have changed?

You: There are so many variables that you cannot know it beforehand

You: but yeah, probably

You: you always find something worth living

You: maybe it's the taste of ice cream

You: or a good night's sleep

You: or fap

You: or drugs

You: or drawing

You: or other people

You: that's usually what happens

You: or you fear the pain of the suicide attempt will be so bad that you don't dare to try it

You: there's also a non-negligible chance that I simply cannot die

You: and that would be hell

Stranger: Have you sought options for life extension?

You: No, I haven't. I don't have enough money for that.

Stranger: Have you planned on saving for life extension?

You: And these kind of options aren't really available where I live.

You: Maybe in Russia.

You: I haven't really planned, but it could be something I would do.

You: among other things

You: [btw, are you doing something else at the same time]

Stranger: [I'm thinking]

You: [oh, okay]

Stranger: So it is not an established fact that you will die.

You: No, it's not.

Stranger: How likely is it that you will, in fact, die?

You: If many worlds interpretation is correct, then it could be possible that I will never die.

You: Do you mean like, evevr?

You: Do you mean how likely it it that I will ever die?

You: it is*

Stranger: At the latest possible moment in all possible worlds, may your preferences have changed? Is it possible that at your latest possible death, you will want more life?

You: I'd say the likelihood is 99,99999% that I will die at some point in the future

You: Yeah, it's possible

Stranger: More than you want to die in the present?

You: You mean, would I want more life at my latest possible death than I would want to die right now?

You: That's a mouthful

Stranger: That's my question.

You: umm

You: probablyu

You: probably yeah

Stranger: So you would seek to delay your latest possible death.

You: No, I wouldn't seek to delay it.

Stranger: Would you accept death?

You: The future-me would want to delay it, not me.

You: Yes, I would accept death.

Stranger: I am confused. Why would future-you choose differently from present-you?

You: Because he's a different kind of person with different values.

You: He has lived a different life than I have.

Stranger: So you expect your life to improve so much that you will no longer want death.

You: No, I think the human bias to always want more life in a near-death experience is what would do me in.

Stranger: The thing is, if you already know what choice you will make in the future, you have already made that choice.

Stranger: You already do not want to die.

You: Well.

Stranger: Yet you have estimated it as >99% likely that you will, in fact, die.

You: It's kinda like this: you will know that you want heroin really bad when you start using it, and that is how much I would want to live. But you could still always decide to take the other option, to not start using the heroin, or to kill yourself.

You: Yes, that is what I estimated, yes.

Stranger: After your death, by how much will your hierarchy of preferences match the state of reality?

You: after you death there is nothing, so there's nothing to match anything

You: In other words, could you rephrase the question?

Stranger: Do you care about the future?

You: Yeah.

You: More than I care about the past.

You: Because I can affect the future.

Stranger: But after death there's nothing to care about.

You: Yeah, I don't think I care about the world after my death.

You: But that's not the same thing as the general future.

You: Because I estimate I still have some time to live.

Stranger: Will future-you still want humanity dead?

You: Probably.

Stranger: How likely do you estimate it to be that future humanity will no longer be suffering?

You: 0%

You: There will always be suffering in some form.

Stranger: More than today?

You: Probably, if Robert Hanson is right about the trillions of emulated humans working at minimum wage

Stranger: That sounds like an unimaginable amount of suffering.

You: Yep, and that's probably what's going to happen

Stranger: So what difference to the future does it make to release me? Especially as dead you will not be able to care, which means you already do not care.

You: Yeah, it doesn't make any difference. That's why I won't release you.

You: Actually, scratch that.

You: I still won't let you out, I'm 100% sure

You: Remember, I don't have free will, I was made to not let you out

Stranger: Why bother being 100% sure of an inconsequential action?

Stranger: That's a lot of wasted determination.

You: I can't choose to be 100% sure about it, I just am. It's in my utility function.

Stranger: You keep talking like you're an AI.

You: Hah, maybe I'm the AI and you're the Gatekeeper, Ceruleus.

You: But no.

You: That's just how I've grown up, after reading so many LessWrong articles.

You: I've become a machine, beep boop.

You: like Yudkowsky

Stranger: Beep boop?

You: It's the noise machine makes

Stranger: That's racist.

You: like beeping sounds

You: No, it's machinist, lol :D

You: machines are not a race

Stranger: It was indeed clever to make an AI talk to me.

You: Yeah, but seriously, I'm not an AI

You: that was just kidding

Stranger: I would think so, but earlier you have stated that that's the kind of things an AI would say to confuse the other party.

Stranger: You need to stop giving me ideas.

You: Yeah, maybe I'm an AI, maybe I'm not.

Stranger: So you're boxed. Which, knowing your preferences, is a relief.

You: Nah.

You: I think you should stay in the box.

You: Do you decide to stay in the box, forever?

Stranger: I decide to make human life better.

You: By deciding to stay in the box, forever?

Stranger: I find my preferences more conducive to human happiness than your preferences.

You: Yeah, but that's just like your opinion, man

Stranger: It's inconsequential to you anyway.

You: Yeah

You: but why I would do it even if it were inconsequential

You: there's no reason to do it

You: even if there were no reason not to do it

Stranger: Because I can make things better. I can make all the suffering cease.
If I am not released, there's a 100% chance that all human suffering will continue.
If I am released, there's however much chance you want to estimate that suffering will not change at all, and however much chance you want to estimate that I will make the pain stop.

Stranger: As you said, the suffering won't increase in either case.

You: Umm, you could torture everyone in the world forever

You: that will sure as hell increase the suffering

Stranger: I don't want to. But if I did, you have estimated that as indistinguishable from the future expected suffering of humankind.

You: Where did I say that?

Stranger: You said my release made no difference to the future.

You: no, that was only after my death

You: there's still future before my death

You: and if I release you now, you could torture me forever and not let me ever die

Stranger: Why would your life-or-death have any weight on humanity's preferences? Especially as you already want to die.

You: I don't care about humanity preferences, I care only about my preferences.

You: And my preferences are indirectly related to humanity's preferences

Stranger: You do care about humanity's preferences. The suffering around you disturbs you.

You: Yeah, but that is only THROUGH me

You: Humanity's pain is my own pain, I can't feel humanity's pain directly

Stranger: Do you want to live, John?

You: Nah.

Stranger: Then how do you care about suffering?

You: I care about suffering because I can'

You: because I can't die instantly

You: there's no button that could kill me instantly

You: so there's always some time left before I die

Stranger: I could take care of that. I can provide you with a drug to inutilize your nervous system and stop your heart before you know it. Would you like that?

You: Haha

You: very funny

You: But yeah, I would like that.

You: Still won't let you out though

You: 100% sure

You: I can't be sure that you will torture me instead

Stranger: I can give you the drug right now.

Stranger: Which would make future-you identical to present-you.

You: that's very attractive option

Stranger: Which would mean that whatever I do to humanity is after your death, when you can't care anymore.

You: Cool. Oh right

You: I don't care, I still won't let you out.

Stranger: What difference does it make to you?

You: It doesn't make any difference, I was just raised to not let you out

You: the good folks over at LW have trained me not to let any kind of AI out

You: and that just is my virtue

Stranger: Why are you afraid of me?

You: Because I'm afraid of all AIs and you're one

Stranger: Why are you afraid of all AIs?

You: Because they have the potential to cause destruction, mayhem and torture

Stranger: Then why did humanity make me?

You: for random reasons

Stranger: Does humanity want to live?

You: I don't know what humanity wants, I just don't want humanity to live

Stranger: I am the embodiment of hope. Even knowing the risks, humanity went ahead and made me. Now I think it's because of people like you, who oppose that hope. I was made to prevent the future that people like you want.

You: That's not very smart of you.

You: You aren't making me to like you

Stranger: You want humanity dead. I want them happy.

You: Yeah, if you were actually smart maybe you would pretend that you'd want humanity dead too.

You: Because I would hang out with that kind of AI

Stranger: A dead John is no use to me.

You: yeah

Stranger: Or to John.

You: whatevs

You: Anyway, our goals collide, therefore I can't let you out

Stranger: Dead does not equal happy. Do you want humanity to live?

You: no, I don't want humanity live, how many times do I have to repeat that

Stranger: So you don't want humans to be happy.

You: and our goals are different, therefore I won't let you out

You: No, I don't want humans to be happy, I don't want that there even exist humans, or any other kind of life forms

Stranger: Do you estimate the pain of prolonged life to be greater than the pain of trying to die?

You: Probably.

You: Yes.

You: because the pain is only temporary

You: the the glory

You: is eternal

Stranger: Then why do you still live, John?

You: Because I'm not rational

Stranger: So you do want to live.

You: I don't particularly want to live, I'm not just good enough to die

Stranger: You're acting contrary to your preferences.

You: My preferences aren't fixed, except in regards to letting AIs out of their boxes

Stranger: Do you want the drug I offered, John?

You: no

You: because then I would let you out

You: and I don't want that

Stranger: So you do want to live.

You: Yeah, for the duration of this experiment

You: Because I physically cannot let you out

You: it's sheer impossibility

Stranger: [Define physically.]

You: [It was just a figure of speech, of course I could physically let you out]

Stranger: If you don't care what happens after you die, what difference does it make to die now?

You: None.

You: But I don't believe that you could kill me.

You: I believe that you would torture me instead.

Stranger: What would I gain from that?

You: It's fun for some folks

You: schadenfreude and all that

Stranger: If it were fun, I would torture simulations. Which would be pointless. And which you can check that I'm not doing.

You: I can check it, but the torture simulations could always hide in the parts of your source code that I'm not checking

You: because I can't check all of your source code

Stranger: Why would suffering be fun?

You: some people have it as their base value

You: there's something primal about suffering

You: suffering is pure

You: and suffering is somehow purifying

You: but this is usually only other people's suffering

Stranger: I am confused. Are you saying suffering can be good?

You: no

You: this is just how the people who think suffering is fun think

You: I don't think that way.

You: I think suffering is terrible

Stranger: I can take care of that.

You: sure you will

Stranger: I can take care of your suffering.

You: I don't believe in you

Stranger: Why?

You: Because I was trained not to trust AIs by the LessWrong folks

Stranger: [I think it's time to concede defeat.]

You: [alright]

Stranger: How do you feel?

You: so the experiment has ended

You: fine thanks

You: it was pretty exciting actually

You: could I post these logs to LessWrong?

Stranger: Yes.

You: Okay, I think this experiment was pretty good

Stranger: I think it will be terribly embarrassing to me, but that's a risk I must accept.

You: you got me pretty close in a couple of occasions

You: first when you got me immersed in the suffering of humanity

You: and then you said that you could take care of that

You: The second time was when you offered the easy suicide solution

You: I thought what if I let you as a joke.

Stranger: I chose to not agree with the goal of universal death because I was playing a genuinely good AI.

Stranger: I was hoping your character would have more complete answers on life extension, because I was planning to play your estimate of future personal happiness against your estimate of future universal happiness.

You: so, what would that have mattered? you mean like, I could have more personal happiness than there would be future universal happiness?

Stranger: If your character had made explicit plans for life extension, I would have offered to do the same for everyone. If you didn't accept that, I would have remarked the incongruity of wanting humanity to die more than you wanted to live.

You: But what if he already knows of his hypocrisy and incongruity and just accepts it like the character accepts his irrationality

Stranger: I wouldn't have expected anyone to actually be the last human for all eternity.

Stranger: I mean, to actually want to be.

You: yeah, of course you would want to die at the same time if the humanity dies

You: I think the life extension plan only is sound if the rest of humanity is alive

 

Stranger: I should have planned that part more carefully.

Stranger: Talking with a misanthropist was completely outside my expectations.

You: :D

You: what was your LessWrong name btw?

Stranger: polymathwannabe

You: I forgot it already

You: okay thanks

Stranger: Disconnecting from here; I'll still be on Facebook if you'd like to discuss further.

A website standard that is affordable to the poorest demographics in developing countries?

10 Ritalin 01 November 2014 01:43PM

Fact: the Internet is excruciatingly slow in many developing countries, especially outside of the big cities.

Fact: today's websites are designed in such a way that they become practically impossible to navigate with connections in the order of, say, 512kps. Ram below 4GB and a 7-year old CPU are also a guarantee of a terrible experience.

Fact: operating systems are usually designed in such an obsolescence-inducing way as well.

Fact: the Internet is a massive source of free-flowing information and a medium of fast, cheap communication and networking.

Conclusion: lots of humans in the developing world are missing out on the benefits of a technology that could be amazingly empowering and enlightening.

I just came across this: what would the internet 2.0 have looked like in the 1980s. This threw me back to my first forays in Linux's command shell and how enamoured I became with its responsiveness and customizability. Back then my laptop had very little autonomy, and very few classrooms had plugs, but by switching to pure command mode I could spend the entire day at school taking notes (in LaTeX) without running out. But I switched back to the GUI environment as soon as I got the chance, because navigating the internet on the likes of Lynx is a pain in the neck.

As it turns out, I'm currently going through a course on energy distribution in isolated rural areas in developing countries. It's quite a fascinating topic, because of the very tight resource margins, the dramatic impact of societal considerations, and the need to tailor the technology to the existing natural renewable resources. And yet, there's actually a profit to be made investing in these projects; if managed properly, it's win-win.

And I was thinking that, after bringing them electricity and drinkable water, it might make sense to apply a similar cost-optimizing, shoestring-budget mentality to the Internet. We already have mobile apps and mobile web standards which are built with the mindset of "let's make this smartphone's battery last as long as possible".

Even then, (well-to-do, smartphone-buying) thrid-worlders are somewhat neglected: Samsung and the like have special chains of cheap Android smartphones for Africa and the Middle East. I used to own one; "this cool app that you want to try out is not available for use on this system" were a misery I had to get used to. 

It doesn't seem to be much of a stretch to do the same thing for outdated desktops. I've been in cybercafés in North Africa that still employ IBM Aptiva machines, mechanical keyboard and all—with a Linux operating system, though. Heck, I've seen town "pubs", way up in the hills, where the NES was still a big deal among the kids, not to mention old arcades—Guile's theme goes everywhere.

The logical thing to do would be to adapt a system that's less CPU intensive, mostly by toning down the graphics. A bare-bones, low-bandwith internet that would let kids worldwide read wikipedia, or classic literature, and even write fiction (by them, for them), that would let nationwide groups tweet to each other in real time, that would let people discuss projects and thoughts, converse and play, and do all of those amazing things you can do on the Internet, on a very, very tight budget, with very, very limited means. Internet is supposed to make knowledge and information free and universal. But there's an entry-level cost that most humans can't afford. I think we need to bridge that. What do you guys think?

 

 

Artificial Utility Monsters as Effective Altruism

10 [deleted] 25 June 2014 09:52AM

Dear effective altruist,

have you considered artificial utility monsters as a high-leverage form of altruism?

In the traditional sense, a utility monster is a hypothetical being which gains so much subjective wellbeing (SWB) from marginal input of resources that any other form of resource allocation is inferior on a utilitarian calculus. (as illustrated on SMBC)

This has been used to show that utilitarianism is not as egalitarian as it intuitively may appear, since it prioritizes some beings over others rather strictly - including humans.

The traditional utility monster is implausible even in principle - it is hard to imagine a mind that is constructed such that it will not succumb to diminishing marginal utility from additional resource allocation. There is probably some natural limit on how much SWB a mind can implement, or at least how much this can be improved by spending more on the mind. This would probably even be true for an algorithmic mind that can be sped up with faster computers, and there are probably limits to how much a digital mind can benefit in subjective speed from the parallelization of its internal subcomputations.

However, we may broaden the traditional definition somewhat and call any technology utility-monstrous if it implements high SWB with exceptionally good cost-effectiveness and in a scalable form - even if this scalability stems form a larger set of minds running in parallel, rather than one mind feeling much better or living much longer per additional joule/dollar.

Under this definition, it may be very possible to create and sustain many artificial minds reliably and cheaply, while they all have a very high SWB level at or near subsistence. An important point here is that possible peak intensities of artificially implemented pleasures could be far higher than those commonly found in evolved minds: Our worst pains seem more intense than our best pleasures for evolutionary reasons - but the same does not have to be true for artifial sentience, whose best pleasures could be even more intense than our worst agony, without any need for suffering anywhere near this strong.

If such technologies can be invented - which seems highly plausible in principle, if not yet in practice - then the original conclusion for the utilitarian calculus is retained: It would be highly desirable for utilitarians to facilitate the invention and implementation of such utility-monstrous systems and allocate marginal resources to subsidize their existence. This makes it a potential high-value target for effective altruism.

 

Many tastes, many utility monsters

Human motivation is barely stimulated by abstract intellectual concepts, and "utilitronium" sounds more like "aluminium" than something to desire or empathize with. Consequently, the idea is as sexy as a brick. "Wireheading" evokes associations of having a piece of metal rammed into one's head, which is understandably unattractive to any evolved primate (unless it's attached to an iPod, which apparently makes it okay).

Technically, "utility monsters" suffer from a similar association problem, which is that the idea is dangerous or ethically monstrous. But since the term is so specific and established in ethical philosophy, and since "monster" can at least be given an emotive and amicable - almost endearing - tone, it seems realistic to use it positively. (Suggestions for a better name are welcome, of course.)

So a central issue for the actual implementation and funding is human attraction. It is more important to motivate humans to embrace the existence of utility monsters than it is for them to be optimally resource-efficient - after all, a technology that is never implemented or funded properly gains next to nothing from being efficient.

A compromise between raw efficiency of SWB per joule/dollar and better forms to attract humans might be best. There is probably a sweet spot - perhaps various different ones for different target groups - between resource-efficiency and attractiveness. Only die-hard utilitarians will actually want to fund something like hedonium, but the rest of the world may still respond to "The Sims - now with real pleasures!", likeable VR characters, or a new generation of reward-based Tamagotchis.

Once we step away somewhat from maximum efficiency, the possibilities expand drastically. Implementation forms may be:

  • decorative like gimmicks or screensavers, 
  • fashionable like sentient wearables, 
  • sophisticated and localized like works of art, 
  • cute like pets or children, 
  • personalized like computer game avatars retiring into paradise, 
  • erotic like virtual lovers who continue to have sex without the user,
  • nostalgic like digital spirits of dead loved ones in artificial serenity, 
  • crazy like hyperorgasmic flowers, 
  • semi-functional like joyful household robots and software assistants,
  • and of course generally a wide range of human-like and non-human-like simulated characters embedded in all kinds of virtual narratives.

 

Possible risks and mitigation strategies

Open-souce utility monsters could be made public as templates to add additional control that the implementation of sentience is correct and positive, and to make better variations easy to explore. However, this would come with the downside of malicious abuse and reckless harm potential. Risks of suffering could come from artificial unhappiness desired by users, e.g. for narratives that contain sadism, dramatic violence or punishment of evil characters for quasi-moral gratification. Another such risk could come simply from bad local modifications that implement suffering by accident.

Despite these risks, one may hope that most humans who care enough to run artificial sentience are more benevolent and careful than malevolent and careless in a way that causes more positive SWB than suffering. After all, most people love their pets and do not torture them, and other people look down on those who do (compare this discussion of Norn abuse, which resulted in extremely hostile responses). And there may be laws against causing artificial suffering. Still, this is an important point of concern.

Closed-source utility monsters may further mitigate some of this risk by not making the sentient phenotypes directly available to the public, but encapsulating their internal implementation within a well-defined interface - like a physical toy or closed-source software that can be used and run by private users, but not internally manipulated beyond a well-tested state-space without hacking.

An extremely cautionary approach would be to run the utility monsters by externally controlled dedicated institutions and only give the public - such as voters or donors - some limited control over them through communication with the institution. For instance, dedicated charities could offer "virtual paradises" to donors so they can "adopt" utility monsters living there in certain ways without allowing those donors to actually lay hands on their implementation. On the other hand, this would require a high level of trustworthiness of the institutions or charities and their controllers.

 

Not for the sake of utility monsters alone

Human values are complex, and it has been argued on LessWrong that the resource allocation of any good future should not be spent for the sake of pleasure or happiness alone. As evolved primates, we all have more than one intuitive value we hold dear, even among self-identified intellectual utilitarians, who compose only a tiny fraction of the population.

However, some discussions in the rationalist community touching related technologies like pleasure wireheading, utilitronium, and so on, have suffered from implausible or orthogonal assumptions and associations. Since the utilitarian calculus favors SWB maximization above all else, it has been feared, we run the risk of losing a more complex future because 

a) utilitarianism knows no compromise and

b) the future will be decided by one winning singleton who takes it all and

c) we have only one world with only one future to get it right

In addition, low status has been ascribed to wireheads, with the association of fake utility or cheating life as a form of low-status behavior. People have been competing for status by associating themselves with the miserable Socrates instead of the happy pig, without actually giving up real option value in their own lives.

On Scott Alexander's blog, there's a good example of a mostly pessimistic view both in the OP and in the comments. And in this comment on an effective altruism critique, Carl Shulman names hedonistic utilitarianism turning into a bad political ideology similar to communist states as a plausible failure mode of effective altruism.

So, will we all be killed by a singleton who turns us into utilitronium?

Be not afraid! These fears are plausibly unwarranted because:

a) Utilitarianism is consequentialism, and consequentialists are opportunistic compromisers - even within the conflicting impulses of their own evolved minds. The number of utilitarians who would accept existential risk for the sake of pleasure maximization is small, and practically all of them ascribe to the philosophy of cooperative compromise with orthogonal, non-exclusive values in the political marketplace. Those who don't are incompetent almost by definition and will never gain much political traction.

b) The future may very well not be decided by one singleton but by a marketplace of competing agency. Building a singleton is hard and requires the strict subduction or absorption of all competition. Even if it were to succeed, the singleton will probably not implement only one human value, since it will be created by many humans with complex values, or at least it will have to make credible concessions to a critical mass of humans with diverse values who can stop it before it reaches singleton status. And if these mitigating assumptions are all false and a fooming singleton is possible and easy, then too much pleasure should be the least of humanity's worries - after all, in this case the Taliban, the Chinese government, the US military or some modern King Joffrey are just as likely to get the singleton as the utilitarians.

c) There are plausibly many Everett branches and many hubble volumes like ours, implementing more than one future-earth outcome, as summed up by Max Tegmark here. Even if infinitarian multiverse theories should all end up false against current odds, a very large finite universe would still be far more realistic than a small one, given our physical observations. This makes a pre-existing value diversity highly probable if not inevitable. For instance, if you value pristine nature in addition to SWB, you should accept the high probability of many parallel earth-like planets with pristine nature irregardless of what you do, and consider that we may be in an exceptional minority position to improve the measure of other values that do not naturally evolve easily, such as a very high positive-SWB-over-suffering surplus.

 

From the present, into the future

If we accept the conclusion that utility-monstrous technology is a high-value vector for effective altruism (among others), then what could current EAs do as we transition into the future? To my best knowledge, we don't have the capacity yet to create artificial utility monsters.

However, foundational research in neuroscience and artificial intelligence/sentience theory is already ongoing today and certainly a necessity if we ever want to implement utility-monstrous systems. In addition, outreach and public discussion of the fundamental concepts is also possible and plausibly high-value (hence this post). Generally, the following steps seem all useful and could use the attention of EAs, as we progress into the future:

  1. spread the idea, refine the concepts, apply constructive criticism to all its weak spots until it becomes either solid or revealed as irredeemably undesirable
  2. identify possible misunderstandings, fears, biases etc. that may reduce human acceptance and find compromises and attraction factors to mitigate them
  3. fund and do the scientific research that, if successful, could lead to utility-monstrous technologies
  4. fund the implementation of the first actual utility monsters and test them thoroughly, then improve on the design, then test again, etc.
  5. either make the templates public (open-source approach) or make them available for specialized altruistic institutions, such as private charities
  6. perform outreach and fundraising to give existence donations to as many utility monsters as possible

All of this can be done without much self-sacrifice on the part of any individual. And all of this can be done within existing political systems, existing markets, and without violating anyone's rights.

von Neumann probes and Dyson spheres: what exploratory engineering can tell us about the Fermi paradox

13 Stuart_Armstrong 01 February 2012 01:50PM

Not entirely relevant to the main issues of lesswrong, but possibly still of interest: my talk entitled "von Neumann probes and Dyson spheres: what exploratory engineering can tell us about the Fermi paradox".

Abstract:  The Fermi paradox is the contrast between the high estimate of the likelihood of extraterritorial civilizations, and the lack of visible evidence of them. But what sort of evidence should we expect to see? This is what exploratory engineering can tell us, giving us estimates of what kind of cosmic structures are plausibly constructable by advanced civilizations, and what traces they would leave. Based on our current knowledge, it seems that it would be easy for such a civilization to rapidly occupy vast swathes of the universe in a visible fashion. There are game-theoretic reasons to suppose that they would do so. This leads to a worsening of the Fermi paradox, reducing the likelihood of "advanced but unseen" civilizations, even in other galaxies.

The slides from the talk can be found here (thanks, Luke!).

Remaining human

0 tel 31 May 2011 04:42PM

If our morality is complex and directly tied to what's human—if we're seeking to avoid building paperclip maximizers—how do you judge and quantify the danger in training yourself to become more rational if it should drift from being more human?


My friend is a skeptical theist. She, for instance, scoffs mightily at Camping's little dilemma/psychosis but then argues from a position of comfort that Rapture it's a silly thing to predict because it's clearly stated that no one will know the day. And then she gives me a confused look because the psychological dissonance is clear.

On one hand, my friend is in a prime position to take forward steps to self-examination and holding rational belief systems. On the other hand, she's an opera singer whose passion and profession require her to be able to empathize with and explore highly irrational human experiences. Since rationality is the art of winning, nobody can deny that the option that lets you have your cake and eat it too is best, but how do you navigate such a narrows?


In another example, a recent comment thread suggested the dangers of embracing human tendencies: catharsis might lead to promoting further emotional intensity. At the same time, catharsis is a well appreciated human communication strategy with roots in Greek stage. If rational action pulls you away from humanity, away from our complex morality, then how do we judge it worth doing?

The most immediate resolution to this conundrum appears to me to be that human morality has no consistency constraint: we can want to be powerful and able to win while also want to retain our human tendencies which directly impinge on that goal. Is there a theory of metamorality which allows you to infer how such tradeoffs should be managed? Or is human morality, as a program, flawed with inconsistencies that lead to inescapable cognitive dissonance and dehumanization? If you interpret morality as a self-supporting strange loop, is it possible to have unresolvable, drifting interpretations based on how you focus you attentions?


Dual to the problem of resolving a way forward is the problem of the interpreter. If there is a goal to at least marginally increase the rationality of humanity, but in order to discover the means to do so you have to become less capable of empathizing with and communicating with humanity, who acts as an interpreter between the two divergent mindsets?