Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Thought experiment: coarse-grained VR utopia

15 Post author: cousin_it 14 June 2017 08:03AM

I think I've come up with a fun thought experiment about friendly AI. It's pretty obvious in retrospect, but I haven't seen it posted before. 

When thinking about what friendly AI should do, one big source of difficulty is that the inputs are supposed to be human intuitions, based on our coarse-grained and confused world models. While the AI's actions are supposed to be fine-grained actions based on the true nature of the universe, which can turn out very weird. That leads to a messy problem of translating preferences from one domain to another, which crops up everywhere in FAI thinking, Wei's comment and Eliezer's writeup are good places to start.

What I just realized is that you can handwave the problem away, by imagining a universe whose true nature agrees with human intuitions by fiat. Think of it as a coarse-grained virtual reality where everything is built from polygons and textures instead of atoms, and all interactions between objects are explicitly coded. It would contain player avatars, controlled by ordinary human brains sitting outside the simulation (so the simulation doesn't even need to support thought).

The FAI-relevant question is: How hard is it to describe a coarse-grained VR utopia that you would agree to live in?

If describing such a utopia is feasible at all, it involves thinking about only human-scale experiences, not physics or tech. So in theory we could hand it off to human philosophers or some other human-based procedure, thus dealing with "complexity of value" without much risk. Then we could launch a powerful AI aimed at rebuilding reality to match it (more concretely, making the world's conscious experiences match a specific coarse-grained VR utopia, without any extra hidden suffering). That's still a very hard task, because it requires solving decision theory and the problem of consciousness, but it seems more manageable than solving friendliness completely. The resulting world would be suboptimal in many ways, e.g. it wouldn't have much room for science or self-modification, but it might be enough to avert AI disaster (!)

I'm not proposing this as a plan for FAI, because we can probably come up with something better. But what do you think of it as a thought experiment? Is it a useful way to split up the problem, separating the complexity of human values from the complexity of non-human nature?

Comments (43)

Comment author: Oscar_Cunningham 14 June 2017 12:04:57PM 8 points [-]

This is a very interesting idea!

Let me try an example to see if I've got it right. Humans think that it is wrong to destroy living things, but okay to destroy non-living things. But in physics the line between living and non-living is blurry. For example a developing embryo goes from non-living to living in a gradual way; hence the abortion debate. The AI is acting on our behalf and so it also wants to preserve life. But this is difficult because "life" doesn't have a clear boundary. So it fixes this problem by ensuring that every object in the simulation is either alive or non-alive. When people in the simulation become pregnant it looks and feels to them as though they have a growing baby inside them, but in fact there is no child's brain outside the simulation. At the moment they give birth the AI very quickly fabricates a child-brain and assigns it control of the simulated baby. This means that if someone decides to terminate their pregnancy then they can be assured that they are not harming a living thing (this is hypothetical because presumably the simulation is utopic enough that abortions are never necessary). After the child is born then it definitely is living and the people in the simulation know that they have to act to protect it.

Is that the right idea?

Comment author: cousin_it 14 June 2017 12:13:36PM *  4 points [-]

Yes! I was hoping that the post would provoke ideas like that. It's a playground for thinking about what people want, without distractions like nanotech etc.

Comment author: Wei_Dai 14 June 2017 10:21:39PM *  5 points [-]

How does value scale with brain size? For example, if we had enough atoms to build 10 human-like brains, is it better to do that or to build a giant brain that can have much more complex thoughts and experiences? Are there experiences that we can't imagine today that are many times more valuable than what we can imagine, without even using more atoms? Also consider questions like population ethics and egoism vs altruism, e.g., should we use the rest of the universe to create more lives (if so how many), or to extend/improve the lives of existing people?

For something to count as a utopia for me, it would have to allow me to become smarter, answer these questions, then have the answers make a meaningful difference in the world. Otherwise, it could still be an AI disaster relative to the potential of the universe.

Comment author: cousin_it 15 June 2017 07:02:04AM *  5 points [-]

Good points and I agree with pretty much all of them, but for the sake of argument I'll try to write the strongest response I can:

It seems to me that your view of value is a little bit mystical. Our minds can only estimate the value of situations that are close to normal. There's no unique way to extend a messy function from [0,1] to [-100,100]. I know you want to use philosophy to extend the domain, but I don't trust our philosophical abilities to do that, because whatever mechanism created them could only test them on normal situations. We already see different people's philosophies disagreeing much more on abnormal situations than normal ones. If I got an email from an uplifted version of me saying he found an abnormal situation that's really valuable, I wouldn't trust it much, because it's too sensitive to arbitrary choices made during uplifting (even choices made by me).

That's why it makes sense to try to come up with a normal situation that's as good as we can imagine, without looking at abnormal situations too much. (We can push the boundaries of normal and allow some mind modification, but not too much because that invites risk.) That was a big part of the motivation for my post.

If the idea of unmodified human brains living in a coarse-grained VR utopia doesn't appeal to you, I guess a more general version is describing some other kind of nice universe, and using an arbitrary strong AI to run that universe on top of ours as described in the post. Solving population ethics etc. can probably wait until we've escaped immediate disaster. Astronomical waste is a problem, but not an extreme one, because we can use up all computation in the host universe if we want. So the problem comes down to describing a nice universe, which is similar to FAI but easier, because it doesn't require translating preferences from one domain to another (like with the blue-minimizing robot).

Comment author: Wei_Dai 16 June 2017 09:40:29PM 2 points [-]

Solving population ethics etc. can probably wait until we've escaped immediate disaster.

So there will be some way for people living inside the VR to change the AI's values later, it won't just be a fixed utility function encoding whatever philosophical views the people building the AI have? If that's the case (and you've managed to avoid bugs and potential issues like value drift and AI manipulating the people's philosophical reasoning) then I'd be happy with that. But I don't see why it's easier than FAI. Sure, you don't need to figure out how to translate preferences from one domain to another in order to implement it, but then you don't need to do that to implement CEV either. You can let CEV try to figure that out, and if CEV can't, it can do the same thing you're suggesting here, have the FAI implement a VR universe on top of the physical one.

Your idea actually seems harder than CEV in at least one respect because you have to solve how human-like consciousness relates to underlying physics for arbitrary laws of physics (otherwise what happens if your AI discovers that the laws of physics are not what we think they are), which doesn't seem necessary to implement CEV.

Comment author: cousin_it 17 June 2017 04:50:10AM *  0 points [-]

The idea that CEV is simpler (because you can "let it figure things out") is new to me! I always felt CEV was very complex and required tons of philosophical progress, much more than solving the problem of consciousness. If you think it requires less, can you sketch the argument?

Comment author: Wei_Dai 17 June 2017 06:15:49AM 0 points [-]

I think you may have misunderstood my comment. I'm not saying CEV is simpler overall, I'm saying it's not clear to me why your idea is simpler, if you're including the "feature" of allowing people inside the VR to change the AI's values. That seems to introduce problems that are analogous to the kinds of problems that CEV has. Basically you have to design your VR universe to guarantee that people who live inside them will avoid value drift and eventually reach correct conclusions about what their values are. That's where the main difficulty in CEV lies also, at least in my view. What do you think are some of the philosophical progress that CEV requires that your idea avoids?

Comment author: cousin_it 17 June 2017 07:11:59AM *  0 points [-]

The way I imagined it, people inside the VR wouldn't be able to change the AI's values. Population ethics seems like a problem that people can solve by themselves, negotiating with each other under the VR's rules, without help from AI.

CEV requires extracting all human preferences, extrapolating them, determining coherence, and finding a general way to map them to physics. (We need to either do it ourselves, or teach the AI how to do it, the difference doesn't matter to the argument.) The approach in my post skips most of these tasks, by letting humans describe a nice normal world directly, and requires mapping only one thing (consciousness) to physics. Though I agree with you that the loss of potential utility is huge, the idea is intended as a kind of lower bound.

Comment author: JenniferRM 17 June 2017 09:31:36AM *  3 points [-]

Three places similar ideas have occurred that spring to mind:

FIRST Suarez's pair of novels Daemon and Freedom(tm) are probably the most direct analogue, because it is a story of taking over the world via software, with an intensely practical focus.

The essential point for this discussion here and now is that prior to launching his system, the character who takes over the world first tests the quality of the goal state that he's aiming at by implementing it first as a real world MMORP. Then the takeover of the world proceeds via trigger-response software scripts running on the net, but causing events in the real world via: bribes, booby traps, contracted R&D, and video game like social engineering.

The MMORP start not only functions as his test bed for how he wants the world to work at the end... it also gives him starting cash, a suite of software tools for describing automated responses to human decisions, code to script the tactics of swarms of killer robots, and so on.

SECOND Nozick's Experience Machine thought experiment is remarkably similar to your thought experiment, and yet aimed at a totally different question.

Nozick was not wondering "can such a machine be described in detail and exist" (this was assumed) but rather "would people enter any such machine and thereby give up on some sort of atavistic connection to an unmediated substrate reality, and if not what does this mean about the axiological status of subjective experience as such?"

Personally I find the specifics of the machine to matter an enormous amount to how I feel about it... so much so that Nozick's thought experiment doesn't really work for me in its philosophically intended manner. There has been a lot of play with the concept in fiction that neighbors on the trope where the machine just gives you the experience of leaving the machine if you try to leave it. This is probably some kind of archetypal response to how disgusting it is in practice for people to be pure subjective hedonists?

THIRD Greg Egan's novel Diaspora has most of the human descended people living purely in and as software.

In the novel any common environment simulator and interface (which has hooks into the sensory processes of the software people) is referred to as a "scape" and many of the software people's political positions revolve around which kinds of scapes are better or worse for various reasons.

Konishi Polis produces a lot of mathematicians, and has a scape that supports "gestalt" (like vision) and "linear" (like speech or sound) but it does not support physical contact between avatars (their relative gestalt positions just ghost around and through each other) because physical contact seems sort of metaphysically coercive and unethical to them. By contrast Carter-Zimmerman produces the best physicists, and it has relatively high quality physics simulations built into their scape, because they think that high quality minds with powerful intuitions require that kind of low level physical experience embedded into their everyday cognitive routines. There are also flesh people (who think flesh gives them authenticity or something like that) and robots (who think "fake physics" is fake, even though having flesh bodies is too dangerous) and so on.

All of the choices matter personally to the people... but there is essentially no lock in, in the sense that people are forced to do one thing or another by an overarching controller that settles how things will work for everyone for all time.

If you want to emmigrate from Konishi to Carter Zimmerman you just change which server you're hosted on (for better latency) and either have mind surgery (to retrofit your soul with the necessary reflexes for navigating the new kind of scape) or else turn on a new layer of exoself (that makes your avatar in the new place move according to a translation scheme based on your home scape's equivalent reflexes).

If you want to, you can get a robot body instead (the physical world then becomes like a very very slow scape and you run into the question of whether to slow down your clocks and let all your friends and family race ahead mentally, or keep your clock at a normal speed and have the robot body be like a slow moving sculpture you direct to do new things over subjectively long periods of time). Some people are still implemented in flesh, but if they choose they can get scanned into software and run as a biology emulation. Becoming biologically based is the only transformation rarely performed because... uh... once you've been scanned (or been built from software from scratch) why would you do this?!

Interesting angles:

Suarez assumes physical coercion and exponential growth as the natural order, and is mostly interested in the details of these processes as implemented in real political/economic systems. He doesn't care about 200 years from now and he uses MMORP simulations as simply a testbed for practical engineering in intensely human domains.

Nozick wants to assume utopia, and often an objection is "who keeps the Experience Machine from breaking down?"

Egan's novel has cool posthuman world building, but the actual story revolves around the question of keeping the experience machine from breaking down... eventually stars explode or run down... so what should be done in the face of a seemingly inevitable point in time where there will be no good answer to the question of "how can we survive this new situation?"

Comment author: Kyre 15 June 2017 05:07:38AM 2 points [-]

Don't humans have to give up on doing their own science then (at least fundamental physics) ?

I guess I can have the FAI make me a safe "real physics box" to play with inside the system; something that emulates what it finds out about real physics.

Comment author: cousin_it 15 June 2017 07:37:19AM *  1 point [-]

If unfriendly AI is possible, making a safe physics box seems harder than the rest of my proposal :-) I agree that it's a drawback of the proposal though.

Comment author: Tyrrell_McAllister 15 June 2017 06:17:08PM 1 point [-]

An interesting thought-experiment. But I don't follow this part:

So in theory we could hand it off to human philosophers or some other human-based procedure, thus dealing with "complexity of value" without much risk.

The complexity of value has to do with how the border delineating good outcomes from all possible outcomes cannot be specified in a compact way. Granted, the space of possible polygon arrangements is smaller than the space of possible atom arrangements. That does make the space of possible outcomes relatively more manageable in your VR world. But the space of outcomes is still Vast. It seems Vast enough that the border separating good from bad is still complex beyond our capacity to specify.

Comment author: cousin_it 15 June 2017 06:20:36PM 1 point [-]

It's certainly vast. People can't write the best possible immersive videogame. But they can write a videogame that would be pretty good for most people, and use that as a formal goal for UFAI. My idea isn't any more ambitious than that, it certainly wastes tons of potential utility, I'm just trying to make something that's better than apocalypse.

Comment author: MrMind 15 June 2017 10:42:47AM *  1 point [-]

Is it a useful way to split up the problem

I think it's a useful way to split the problem of a utopia between condition-level utopia and ontology-level. In a coarse VR you can verify that anything related to the condition of a utopia are friendly, but there are other values related to the ontology of the world: self-modification, new technology, agency in companions / mates / victims, power of divine agents, etc.
It's not always possible to rescue the utility function.

Comment author: cousin_it 15 June 2017 11:43:50AM *  0 points [-]

Interesting! That seems like a stronger claim though. Can you give your best example of a utility function that's hard to rescue?

Comment author: MrMind 15 June 2017 01:02:25PM 1 point [-]

I'm thinking about the utility of praying to the right divinity. It's hard to connect to something that doesn't exists.

Comment author: cousin_it 15 June 2017 02:12:31PM *  1 point [-]

The rare folks who want to determine the right divinity will be satisfied with the answer that there's no divinity. Most folks, who just want to pray to Yahweh or whatever, can keep doing it in the VR utopia if they like :-)

Comment author: entirelyuseless 15 June 2017 01:37:19PM 0 points [-]

There is no reason that some people cannot have their VR arranged so that their prayers are always answered. And if you say that it isn't "really" God who is answering them, that requires you to explain exactly what it would mean in the first place. If you succeed in explaining the "real" meaning, it might not be so impossible after all.

Comment author: Lumifer 15 June 2017 05:01:13PM 0 points [-]

There is no reason that some people cannot have their VR arranged so that their prayers are always answered.

Their own VR, yes (though I expect it will do bad things to these people), but a shared VR, no because some prayers will conflict.

Comment author: tristanm 14 June 2017 09:45:34PM 1 point [-]

The way that I choose to evaluate my overall experience is generally through the perception of my own feelings. Therefore, I assume this simulated world will be evaluated in a similar way: I perceive the various occurrences within it and rate them according to my preferences. I assume the AI will receive this information and be able to update the simulated world accordingly. The main difference then, appears to be that the AI will not have access to my nervous system, if my avatar is being represented in this world and that is all the AI has access to, which would prevent it from wire-heading by simply manipulating my brain however it wants. Likewise it would not have access to its own internal hardware or be able to model it (since that would require knowledge of actual physics). It could in theory be able to interact with buttons and knobs in the simulated world that were connected to its hardware in the real world.

I think this is basically the correct approach and it actually is being considered by AI researchers (take Paul's recent paper for example, human yes-or-no feedback on actions in a simulated environment). The main difficulty then becomes domain transfer, when the AI is "released" into the physical world - it now has access to both its own hardware and human "hardware", and I don't see how to predict its actions once it learns these additional facts. I don't think we have much theory for what happens then, but the approach is probably very suitable for narrow AI and for training robots that will eventually take actions in the real world.

Comment author: RomeoStevens 15 June 2017 07:13:26AM 0 points [-]

It does have access to your nervous system since your nervous system can be rewired via backdriving inputs from your perceptions.

Comment author: metatroll 14 June 2017 07:21:47PM 0 points [-]


We see a bank of monitors, each showing a small worm or group of worms going about their lives.

Cut to: a long black trenchcoat standing in front of the monitors. The sleeves hang empty, but from the collar protrudes the tapered, featureless head of a man-sized worm.

Cut to: a youngish man in a grey T-shirt seated in a high-backed leather swivel chair.

YOUNGISH MAN: Hello, Caeno. I am the Zuckitect. I created the Growth Matrix.

Today I want to focus on the most important question of all: are we building the world we all want?

The first growth matrix I designed was quite naturally perfect, a supportive, safe, informed, civically-engaged, inclusive community...

Comment author: Lumifer 14 June 2017 03:11:38PM 0 points [-]

Hm. But if you already have the brain-in-a-vat (aka Matrix) scenario where all brains are happy, why bother rebuilding reality? Would that brain-in-a-vat know what reality is, anyway? This looks like semi-wireheading to me...

Comment author: cousin_it 14 June 2017 03:23:54PM *  1 point [-]

If everyone agrees to some version of the matrix, I guess the reason to change reality is to protect the matrix and ensure that no huge suffering is happening outside of it. But yeah, the second part of the plan is questionable, I'm more interested in the first, figuring out what kind of matrix would be okay.

Comment author: Lumifer 14 June 2017 03:33:24PM 0 points [-]

Well, you have a continuum from "acceptable to one individual" which is easy to "acceptable to literally everyone" which is impossible.

You can have an Archipelago scenario where there is a large variety of virtual realities and people are free to move between them (or construct more if none of the existing ones fit).

Comment author: cousin_it 14 June 2017 03:45:30PM *  0 points [-]

Yeah, Archipelago is one of the designs that can probably be made acceptable to almost everyone. It would need to be specified in a lot more detail though, down to how physics works (so people don't build dangerous AI all over again). The whole point is to let human values survive the emergence of stronger than human intelligence.

Comment author: Lumifer 14 June 2017 03:57:09PM *  1 point [-]

so people don't build dangerous AI all over again

For an entity that has build and runs the simulation (aka God) it should be trivial to enforce limits on power/complexity in its simulation (the Tower of Babel case :-D)

The whole point is to let human values survive

I don't see how that exercise helps. If someone controls your reality, you're powerless.

Comment author: cousin_it 14 June 2017 04:02:18PM *  0 points [-]

Well, baseline humans in a world with superhuman intelligences will be powerless almost by definition. So I guess you can only be satisfied by stopping all superintelligences or upgrading all humans. The first seems unrealistic. The second might work and I've thought about it for a long time, but this post is exploring a different scenario where we build a friendly superintelligence to keep others at bay.

Comment author: Lumifer 14 June 2017 04:12:19PM *  0 points [-]

where we build a friendly superintelligence

Sure, but "human values survive" only because the FAI maintains them -- and that returns us back to square one of the "how to make FAI have appropriate values" problem.

Comment author: cousin_it 15 June 2017 08:36:32AM *  1 point [-]

The post proposed to build an arbitrary general AI with a goal of making all conscious experiences in reality match {unmodified human brains + this coarse-grained VR utopia designed by us}. This plan wastes tons of potential value and requires tons of research, but it seems much simpler than solving FAI. For example, it skips figuring out how all human preferences should be extracted, extrapolated, and mapped to true physics. (It does still require solving consciousness though, and many other things.)

Mostly I intended the plan to serve as a lower bound for outcome of intelligence explosion that's better than "everyone dies" but less vague than CEV, because I haven't seen too many such lower bounds before. Of course I'd welcome any better plan.

Comment author: Lumifer 15 June 2017 04:47:36PM 0 points [-]

So, that "arbitrary general AI" is not an agent? It's going to be tool AI? I'm not quite sure how do you envisage it being smart enough to do all that you want it to do (e.g. deal with an angsty teenager: "I want the world to BURN!") and yet have no agency of its own and no system of values.

lower bound for outcome of intelligence explosion

Lower bound in which sense? A point where the intelligence explosion will stop on its own? Or one which the humans will be able to enforce? Or what?

Comment author: cousin_it 15 June 2017 05:18:18PM *  1 point [-]

The idea is that if the problem of consciousness is solved (which is admittedly a tall order), "make all consciousness in the universe reflect this particular VR utopia with these particular human brains and evolve it faithfully from there" becomes a formalizable goal, akin to paperclips, which you can hand to an unfriendly agent AI. You don't need to solve all the other philosophical problems usually required for FAI. Note that solving the problem of consciousness is a key requirement, you can't just say "simulate these uploaded brains in this utopia forever and nevermind what consciousness means", because that could open the door to huge suffering happening elsewhere (e.g. due to the AI simulating many scenarios). You really need the "all consciousness in the universe" part.

Lower bound means that before writing this post, I didn't know any halfway specific plan for navigating the intelligence explosion that didn't kill everyone. Now I know that we can likely achieve something as good as this, though it isn't very good. It's a lower bound on what's achievable.

Comment author: Luke_A_Somers 15 June 2017 02:29:03PM 0 points [-]

Like, "Please, create a new higher bar that we can expect a truly super-intelligent being to be able to exceed."?

Comment author: entirelyuseless 14 June 2017 12:54:52PM 0 points [-]

How hard is it to describe a coarse-grained VR utopia that you would agree to live in?

Not hard, but it is still hard to describe one that everyone would agree to live in.

Comment author: cousin_it 14 June 2017 01:18:25PM *  0 points [-]

Agreed. We could put people into separate utopias, but many of them would have objections to each other's utopias, so it's not a simple problem. It needs to be analyzed and decomposed further.