I think I've come up with a fun thought experiment about friendly AI. It's pretty obvious in retrospect, but I haven't seen it posted before. 

When thinking about what friendly AI should do, one big source of difficulty is that the inputs are supposed to be human intuitions, based on our coarse-grained and confused world models. While the AI's actions are supposed to be fine-grained actions based on the true nature of the universe, which can turn out very weird. That leads to a messy problem of translating preferences from one domain to another, which crops up everywhere in FAI thinking, Wei's comment and Eliezer's writeup are good places to start.

What I just realized is that you can handwave the problem away, by imagining a universe whose true nature agrees with human intuitions by fiat. Think of it as a coarse-grained virtual reality where everything is built from polygons and textures instead of atoms, and all interactions between objects are explicitly coded. It would contain player avatars, controlled by ordinary human brains sitting outside the simulation (so the simulation doesn't even need to support thought).

The FAI-relevant question is: How hard is it to describe a coarse-grained VR utopia that you would agree to live in?

If describing such a utopia is feasible at all, it involves thinking about only human-scale experiences, not physics or tech. So in theory we could hand it off to human philosophers or some other human-based procedure, thus dealing with "complexity of value" without much risk. Then we could launch a powerful AI aimed at rebuilding reality to match it (more concretely, making the world's conscious experiences match a specific coarse-grained VR utopia, without any extra hidden suffering). That's still a very hard task, because it requires solving decision theory and the problem of consciousness, but it seems more manageable than solving friendliness completely. The resulting world would be suboptimal in many ways, e.g. it wouldn't have much room for science or self-modification, but it might be enough to avert AI disaster (!)

I'm not proposing this as a plan for FAI, because we can probably come up with something better. But what do you think of it as a thought experiment? Is it a useful way to split up the problem, separating the complexity of human values from the complexity of non-human nature?

New to LessWrong?

New Comment
48 comments, sorted by Click to highlight new comments since: Today at 2:24 PM

This is a very interesting idea!

Let me try an example to see if I've got it right. Humans think that it is wrong to destroy living things, but okay to destroy non-living things. But in physics the line between living and non-living is blurry. For example a developing embryo goes from non-living to living in a gradual way; hence the abortion debate. The AI is acting on our behalf and so it also wants to preserve life. But this is difficult because "life" doesn't have a clear boundary. So it fixes this problem by ensuring that every object in the simulation is either alive or non-alive. When people in the simulation become pregnant it looks and feels to them as though they have a growing baby inside them, but in fact there is no child's brain outside the simulation. At the moment they give birth the AI very quickly fabricates a child-brain and assigns it control of the simulated baby. This means that if someone decides to terminate their pregnancy then they can be assured that they are not harming a living thing (this is hypothetical because presumably the simulation is utopic enough that abortions are never necessary). After the child is born then it definitely is living and the people in the simulation know that they have to act to protect it.

Is that the right idea?

Yes! I was hoping that the post would provoke ideas like that. It's a playground for thinking about what people want, without distractions like nanotech etc.

How does value scale with brain size? For example, if we had enough atoms to build 10 human-like brains, is it better to do that or to build a giant brain that can have much more complex thoughts and experiences? Are there experiences that we can't imagine today that are many times more valuable than what we can imagine, without even using more atoms? Also consider questions like population ethics and egoism vs altruism, e.g., should we use the rest of the universe to create more lives (if so how many), or to extend/improve the lives of existing people?

For something to count as a utopia for me, it would have to allow me to become smarter, answer these questions, then have the answers make a meaningful difference in the world. Otherwise, it could still be an AI disaster relative to the potential of the universe.

Good points and I agree with pretty much all of them, but for the sake of argument I'll try to write the strongest response I can:

It seems to me that your view of value is a little bit mystical. Our minds can only estimate the value of situations that are close to normal. There's no unique way to extend a messy function from [0,1] to [-100,100]. I know you want to use philosophy to extend the domain, but I don't trust our philosophical abilities to do that, because whatever mechanism created them could only test them on normal situations. We already see different people's philosophies disagreeing much more on abnormal situations than normal ones. If I got an email from an uplifted version of me saying he found an abnormal situation that's really valuable, I wouldn't trust it much, because it's too sensitive to arbitrary choices made during uplifting (even choices made by me).

That's why it makes sense to try to come up with a normal situation that's as good as we can imagine, without looking at abnormal situations too much. (We can push the boundaries of normal and allow some mind modification, but not too much because that invites risk.) That was a big part of the motivation for my post.

If the idea of unmodified human brains living in a coarse-grained VR utopia doesn't appeal to you, I guess a more general version is describing some other kind of nice universe, and using an arbitrary strong AI to run that universe on top of ours as described in the post. Solving population ethics etc. can probably wait until we've escaped immediate disaster. Astronomical waste is a problem, but not an extreme one, because we can use up all computation in the host universe if we want. So the problem comes down to describing a nice universe, which is similar to FAI but easier, because it doesn't require translating preferences from one domain to another (like with the blue-minimizing robot).

Solving population ethics etc. can probably wait until we've escaped immediate disaster.

So there will be some way for people living inside the VR to change the AI's values later, it won't just be a fixed utility function encoding whatever philosophical views the people building the AI have? If that's the case (and you've managed to avoid bugs and potential issues like value drift and AI manipulating the people's philosophical reasoning) then I'd be happy with that. But I don't see why it's easier than FAI. Sure, you don't need to figure out how to translate preferences from one domain to another in order to implement it, but then you don't need to do that to implement CEV either. You can let CEV try to figure that out, and if CEV can't, it can do the same thing you're suggesting here, have the FAI implement a VR universe on top of the physical one.

Your idea actually seems harder than CEV in at least one respect because you have to solve how human-like consciousness relates to underlying physics for arbitrary laws of physics (otherwise what happens if your AI discovers that the laws of physics are not what we think they are), which doesn't seem necessary to implement CEV.

The idea that CEV is simpler (because you can "let it figure things out") is new to me! I always felt CEV was very complex and required tons of philosophical progress, much more than solving the problem of consciousness. If you think it requires less, can you sketch the argument?

I think you may have misunderstood my comment. I'm not saying CEV is simpler overall, I'm saying it's not clear to me why your idea is simpler, if you're including the "feature" of allowing people inside the VR to change the AI's values. That seems to introduce problems that are analogous to the kinds of problems that CEV has. Basically you have to design your VR universe to guarantee that people who live inside them will avoid value drift and eventually reach correct conclusions about what their values are. That's where the main difficulty in CEV lies also, at least in my view. What do you think are some of the philosophical progress that CEV requires that your idea avoids?

The way I imagined it, people inside the VR wouldn't be able to change the AI's values. Population ethics seems like a problem that people can solve by themselves, negotiating with each other under the VR's rules, without help from AI.

CEV requires extracting all human preferences, extrapolating them, determining coherence, and finding a general way to map them to physics. (We need to either do it ourselves, or teach the AI how to do it, the difference doesn't matter to the argument.) The approach in my post skips most of these tasks, by letting humans describe a nice normal world directly, and requires mapping only one thing (consciousness) to physics. Though I agree with you that the loss of potential utility is huge, the idea is intended as a kind of lower bound.

[-][anonymous]7y00

Yeah. My post is trying to solve an easier problem, how to give many people mostly happy lives with many dimensions of value protected from superintelligence. Haven't seen too many solutions to that. I very much agree that the harder problem of not wasting the universe's potential is also worth solving, and that it needs a different approach. Your approach of solving metaphilosophy seems like the most promising one to me now.

[This comment is no longer endorsed by its author]Reply

Three places similar ideas have occurred that spring to mind:

FIRST Suarez's pair of novels Daemon and Freedom(tm) are probably the most direct analogue, because it is a story of taking over the world via software, with an intensely practical focus.

The essential point for this discussion here and now is that prior to launching his system, the character who takes over the world first tests the quality of the goal state that he's aiming at by implementing it first as a real world MMORP. Then the takeover of the world proceeds via trigger-response software scripts running on the net, but causing events in the real world via: bribes, booby traps, contracted R&D, and video game like social engineering.

The MMORP start not only functions as his test bed for how he wants the world to work at the end... it also gives him starting cash, a suite of software tools for describing automated responses to human decisions, code to script the tactics of swarms of killer robots, and so on.

SECOND Nozick's Experience Machine thought experiment is remarkably similar to your thought experiment, and yet aimed at a totally different question.

Nozick was not wondering "can such a machine be described in detail and exist" (this was assumed) but rather "would people enter any such machine and thereby give up on some sort of atavistic connection to an unmediated substrate reality, and if not what does this mean about the axiological status of subjective experience as such?"

Personally I find the specifics of the machine to matter an enormous amount to how I feel about it... so much so that Nozick's thought experiment doesn't really work for me in its philosophically intended manner. There has been a lot of play with the concept in fiction that neighbors on the trope where the machine just gives you the experience of leaving the machine if you try to leave it. This is probably some kind of archetypal response to how disgusting it is in practice for people to be pure subjective hedonists?

THIRD Greg Egan's novel Diaspora has most of the human descended people living purely in and as software.

In the novel any common environment simulator and interface (which has hooks into the sensory processes of the software people) is referred to as a "scape" and many of the software people's political positions revolve around which kinds of scapes are better or worse for various reasons.

Konishi Polis produces a lot of mathematicians, and has a scape that supports "gestalt" (like vision) and "linear" (like speech or sound) but it does not support physical contact between avatars (their relative gestalt positions just ghost around and through each other) because physical contact seems sort of metaphysically coercive and unethical to them. By contrast Carter-Zimmerman produces the best physicists, and it has relatively high quality physics simulations built into their scape, because they think that high quality minds with powerful intuitions require that kind of low level physical experience embedded into their everyday cognitive routines. There are also flesh people (who think flesh gives them authenticity or something like that) and robots (who think "fake physics" is fake, even though having flesh bodies is too dangerous) and so on.

All of the choices matter personally to the people... but there is essentially no lock in, in the sense that people are forced to do one thing or another by an overarching controller that settles how things will work for everyone for all time.

If you want to emmigrate from Konishi to Carter Zimmerman you just change which server you're hosted on (for better latency) and either have mind surgery (to retrofit your soul with the necessary reflexes for navigating the new kind of scape) or else turn on a new layer of exoself (that makes your avatar in the new place move according to a translation scheme based on your home scape's equivalent reflexes).

If you want to, you can get a robot body instead (the physical world then becomes like a very very slow scape and you run into the question of whether to slow down your clocks and let all your friends and family race ahead mentally, or keep your clock at a normal speed and have the robot body be like a slow moving sculpture you direct to do new things over subjectively long periods of time). Some people are still implemented in flesh, but if they choose they can get scanned into software and run as a biology emulation. Becoming biologically based is the only transformation rarely performed because... uh... once you've been scanned (or been built from software from scratch) why would you do this?!

Interesting angles:

Suarez assumes physical coercion and exponential growth as the natural order, and is mostly interested in the details of these processes as implemented in real political/economic systems. He doesn't care about 200 years from now and he uses MMORP simulations as simply a testbed for practical engineering in intensely human domains.

Nozick wants to assume utopia, and often an objection is "who keeps the Experience Machine from breaking down?"

Egan's novel has cool posthuman world building, but the actual story revolves around the question of keeping the experience machine from breaking down... eventually stars explode or run down... so what should be done in the face of a seemingly inevitable point in time where there will be no good answer to the question of "how can we survive this new situation?"

[-][anonymous]7y00

Of course there are many examples of virtual reality in fiction! The goal of the post is dealing with superintelligence x-risk, by making UFAI build the VR in a particular way that prevents extra suffering and further intelligence explosions. All examples you gave are still vulnerable to superintelligence x-risk, as far as I can tell.

[This comment is no longer endorsed by its author]Reply

Don't humans have to give up on doing their own science then (at least fundamental physics) ?

I guess I can have the FAI make me a safe "real physics box" to play with inside the system; something that emulates what it finds out about real physics.

If unfriendly AI is possible, making a safe physics box seems harder than the rest of my proposal :-) I agree that it's a drawback of the proposal though.

Is it a useful way to split up the problem

I think it's a useful way to split the problem of a utopia between condition-level utopia and ontology-level. In a coarse VR you can verify that anything related to the condition of a utopia are friendly, but there are other values related to the ontology of the world: self-modification, new technology, agency in companions / mates / victims, power of divine agents, etc.
It's not always possible to rescue the utility function.

Interesting! That seems like a stronger claim though. Can you give your best example of a utility function that's hard to rescue?

I'm thinking about the utility of praying to the right divinity. It's hard to connect to something that doesn't exists.

The rare people who want to determine the right divinity will be satisfied with the answer that there's no divinity. Most people, who just want to pray to Yahweh or whatever, can keep doing it in the VR utopia if they like :-)

There is no reason that some people cannot have their VR arranged so that their prayers are always answered. And if you say that it isn't "really" God who is answering them, that requires you to explain exactly what it would mean in the first place. If you succeed in explaining the "real" meaning, it might not be so impossible after all.

There is no reason that some people cannot have their VR arranged so that their prayers are always answered.

Their own VR, yes (though I expect it will do bad things to these people), but a shared VR, no because some prayers will conflict.

[-][anonymous]7y00

You're asking people to rescue their own utility functions or else. I wouldn't buy a FAI that worked like that.

[This comment is no longer endorsed by its author]Reply

An interesting thought-experiment. But I don't follow this part:

So in theory we could hand it off to human philosophers or some other human-based procedure, thus dealing with "complexity of value" without much risk.

The complexity of value has to do with how the border delineating good outcomes from all possible outcomes cannot be specified in a compact way. Granted, the space of possible polygon arrangements is smaller than the space of possible atom arrangements. That does make the space of possible outcomes relatively more manageable in your VR world. But the space of outcomes is still Vast. It seems Vast enough that the border separating good from bad is still complex beyond our capacity to specify.

It's certainly vast. People can't write the best possible immersive videogame. But they can write a videogame that would be pretty good for most people, and use that as a formal goal for UFAI. My idea isn't any more ambitious than that, it certainly wastes tons of potential utility, I'm just trying to make something that's better than apocalypse.

The way that I choose to evaluate my overall experience is generally through the perception of my own feelings. Therefore, I assume this simulated world will be evaluated in a similar way: I perceive the various occurrences within it and rate them according to my preferences. I assume the AI will receive this information and be able to update the simulated world accordingly. The main difference then, appears to be that the AI will not have access to my nervous system, if my avatar is being represented in this world and that is all the AI has access to, which would prevent it from wire-heading by simply manipulating my brain however it wants. Likewise it would not have access to its own internal hardware or be able to model it (since that would require knowledge of actual physics). It could in theory be able to interact with buttons and knobs in the simulated world that were connected to its hardware in the real world.

I think this is basically the correct approach and it actually is being considered by AI researchers (take Paul's recent paper for example, human yes-or-no feedback on actions in a simulated environment). The main difficulty then becomes domain transfer, when the AI is "released" into the physical world - it now has access to both its own hardware and human "hardware", and I don't see how to predict its actions once it learns these additional facts. I don't think we have much theory for what happens then, but the approach is probably very suitable for narrow AI and for training robots that will eventually take actions in the real world.

It does have access to your nervous system since your nervous system can be rewired via backdriving inputs from your perceptions.

WOW I think it's just amazing! "So in theory we could hand it off to human philosophers or some other human-based procedure, thus dealing with "complexity of value" without much risk." *That could simply be handled by producing more complexity than entropy in replication pdf

"Then we could launch a powerful AI aimed at rebuilding reality to match it (more concretely, making the world's conscious experiences match a specific coarse-grained VR utopia, without any extra hidden suffering). That's still a very hard task, because it requires solving decision theory and the problem of consciousness, but it seems more manageable than solving friendliness completely." *In a coarse-grained virtual reality where everything is built from polygons and textures consciousness could be quantized trough Stone's representation theorem or Heyting algebra as symbols that represent themselves by way of symmetry.

"The resulting world would be suboptimal in many ways, e.g. it wouldn't have much room for science or self-modification." *It will have room for science as the predictor of thermodynamic cost. Really the trick by which coarse-grained VR utopia might just work is because of a process called "refrigeration by randomizing a bit" and "adiabatic demagnetization".

In a nutshell,

  1. We can measure and predict the thermodynamic costs as a result of producing more complexity (a series of undeveloped modules that may or may not lead to functional specialization).
  2. If the "coarseness" of the VR is obtained by randomizing a bit and using that bit as a refrigeration, well then we can blow the second law of thermodynamics.
  3. Effectiveness of functional specialization in decreasing thermodynamic cost would be the golden rule that will avoid decision making or consciousness.

Screenplay: MISTER ELEGANS

We see a bank of monitors, each showing a small worm or group of worms going about their lives.

Cut to: a long black trenchcoat standing in front of the monitors. The sleeves hang empty, but from the collar protrudes the tapered, featureless head of a man-sized worm.

Cut to: a youngish man in a grey T-shirt seated in a high-backed leather swivel chair.

YOUNGISH MAN: Hello, Caeno. I am the Zuckitect. I created the Growth Matrix.

Today I want to focus on the most important question of all: are we building the world we all want?

The first growth matrix I designed was quite naturally perfect, a supportive, safe, informed, civically-engaged, inclusive community...

Hm. But if you already have the brain-in-a-vat (aka Matrix) scenario where all brains are happy, why bother rebuilding reality? Would that brain-in-a-vat know what reality is, anyway? This looks like semi-wireheading to me...

If everyone agrees to some version of the matrix, I guess the reason to change reality is to protect the matrix and ensure that no huge suffering is happening outside of it. But yeah, the second part of the plan is questionable, I'm more interested in the first, figuring out what kind of matrix would be okay.

Well, you have a continuum from "acceptable to one individual" which is easy to "acceptable to literally everyone" which is impossible.

You can have an Archipelago scenario where there is a large variety of virtual realities and people are free to move between them (or construct more if none of the existing ones fit).

Yeah, Archipelago is one of the designs that can probably be made acceptable to almost everyone. It would need to be specified in a lot more detail though, down to how physics works (so people don't build dangerous AI all over again). The whole point is to let human values survive the emergence of stronger than human intelligence.

so people don't build dangerous AI all over again

For an entity that has build and runs the simulation (aka God) it should be trivial to enforce limits on power/complexity in its simulation (the Tower of Babel case :-D)

The whole point is to let human values survive

I don't see how that exercise helps. If someone controls your reality, you're powerless.

Well, baseline humans in a world with superhuman intelligences will be powerless almost by definition. So I guess you can only be satisfied by stopping all superintelligences or upgrading all humans. The first seems unrealistic. The second might work and I've thought about it for a long time, but this post is exploring a different scenario where we build a friendly superintelligence to keep others at bay.

where we build a friendly superintelligence

Sure, but "human values survive" only because the FAI maintains them -- and that returns us back to square one of the "how to make FAI have appropriate values" problem.

The post proposed to build an arbitrary general AI with a goal of making all conscious experiences in reality match {unmodified human brains + this coarse-grained VR utopia designed by us}. This plan wastes tons of potential value and requires tons of research, but it seems much simpler than solving FAI. For example, it skips figuring out how all human preferences should be extracted, extrapolated, and mapped to true physics. (It does still require solving consciousness though, and many other things.)

Mostly I intended the plan to serve as a lower bound for outcome of intelligence explosion that's better than "everyone dies" but less vague than CEV, because I haven't seen too many such lower bounds before. Of course I'd welcome any better plan.

So, that "arbitrary general AI" is not an agent? It's going to be tool AI? I'm not quite sure how do you envisage it being smart enough to do all that you want it to do (e.g. deal with an angsty teenager: "I want the world to BURN!") and yet have no agency of its own and no system of values.

lower bound for outcome of intelligence explosion

Lower bound in which sense? A point where the intelligence explosion will stop on its own? Or one which the humans will be able to enforce? Or what?

The idea is that if the problem of consciousness is solved (which is admittedly a tall order), "make all consciousness in the universe reflect this particular VR utopia with these particular human brains and evolve it faithfully from there" becomes a formalizable goal, akin to paperclips, which you can hand to an unfriendly agent AI. You don't need to solve all the other philosophical problems usually required for FAI. Note that solving the problem of consciousness is a key requirement, you can't just say "simulate these uploaded brains in this utopia forever and nevermind what consciousness means", because that could open the door to huge suffering happening elsewhere (e.g. due to the AI simulating many scenarios). You really need the "all consciousness in the universe" part.

Lower bound means that before writing this post, I didn't know any halfway specific plan for navigating the intelligence explosion that didn't kill everyone. Now I know that we can likely achieve something as good as this, though it isn't very good. It's a lower bound on what's achievable.

Those are not potshots -- at a meta level what's happening is that your picture of this particular piece of the world doesn't quite match my picture and I'm trying to figure out where exactly the mismatch is and is it mostly a terms/definitions problem or there's something substantive there. That involves pointing at pieces which stick out or which look to be holes and asking you questions about them. The point is not to destroy the structure, but to make it coherent in my mind.

That said... :-)

which you can hand to an unfriendly agent AI

Isn't a major point of the Sequences that you can NOT hand anything to a UFAI because it will always find ways to fuck you over? Once you have a UFAI up and running, it's done, your goose is cooked.

But my point was different: before you get to your formalizable goal, you need to have that VR utopia up and running. Something will have to run it which will include things like preventing some humans from creating virtual hells, waging war on neighbours, etc. etc. That something will have to be an AI. You're implicitly claiming that this AI will not be an agent (is that so?) and so harmless. I am expressing doubt that you can have an AI with sufficient capabilities and have it be harmless at the same time.

As to intelligence explosion, are you saying that its result will be the the non-agent AI to handle the VR utopia? Most scenarios, I think, assume that the explosion itself will be uncontrolled: you create a seed and that seed recursively self-improves to become a god. Under these assumptions there is no lower bound.

And if there's not going to be recursive self-improvement, it's no longer an explosion -- the growth is likely to be slow, finicky, and incremental.

Isn't a major point of the Sequences that you can NOT hand anything to a UFAI because it will always find ways to fuck you over? Once you have a UFAI up and running, it's done, your goose is cooked.

The sequences don't say an AI will always fail to optimize a formal goal. The problem is more of mismatch between the formal goal and what humans want. My idea tries to make that mismatch small, by making the goal say directly which conscious experiences should exist in the universe (isomorphic to a given set of unmodified human brains experiencing a given VR setup, sometimes creating new brains according to given rules, all of which was defined without AI involvement). Then we're okay with recursive self-improvement and all sorts of destruction in the pursuit of that goal. It can eat the whole universe if it likes.

Something will have to run it which will include things like preventing some humans from creating virtual hells, waging war on neighbours, etc. etc. That something will have to be an AI.

My idea was to make humans set all the rules, while defining the VR utopia, before giving it to the UFAI. It'd be like writing a video game. It seems possible to write a video game that doesn't let people create hells (including virtual hells, because the VR can be very coarse-grained). Similar for the problem of pain, just give people some control buttons that other people can't take away. I think hand-coding a toy universe that feels livable long term and has no sharp tools is well within mankind's ability.

My idea tries to make that mismatch small, by making the goal say directly which conscious experiences should exist

You think you can formalize a goal which specifies which conscious experiences should exist? It looks to me to be equivalent to formalizing the human value system. And being isomorphic to a "set of unmodified human brains" just gives you the whole humanity as it is: some people's fantasies involve rainbows and unicorns, and some -- pain and domination. There are people who do want hells, virtual or not -- so you either will have them in your utopia or you will have to filter such desires out and that involves a value system to decide what's acceptable in the utopia and what's not.

My idea was to make humans set all the rules, while defining the VR utopia, before even starting the AI.

That's called politics and is equivalent to setting the rules for real-life societies on real-life Earth. I don't see why you would expect it to go noticeably better this time around -- you're still deciding on rules for reality, just with a detour through VR. And how would that work in practice? A UN committee or something? How will disagreements be resolved?

just give people some control buttons that other people can't take away

To take a trivial example, consider internet harassment. Everyone has a control button that online trolls cannot take away: the off switch on your computer (or even the little X in the top corner of your window). You think it works that well?

You think you can formalize a goal which specifies which conscious experiences should exist? It looks to me to be equivalent to formalizing the human value system.

The hope is that encoding the idea of consciousness will be strictly easier than encoding everything that humans value, including the idea of consciousness (and pleasure, pain, love, population ethics, etc). It's an assumption of the post.

That's called politics and is equivalent to setting the rules for real-life societies on real-life Earth.

Correct. My idea doesn't aim to solve all human problems forever. It aims to solve the problem that right now we're sitting on a powder keg, with many ways for smarter than human intelligences to emerge, most of which kill everyone. Once we've resolved that danger, we can take our time to solve things like politics, internet harassment, or reconciling people's fantasies.

I agree that defining the VR is itself a political problem, though. Maybe we should do it with a UN committee! It's a human-scale decision, and even if we get it wrong and a bunch of people suffer, that might be still preferable to killing everyone.

Once we've resolved that danger, we can take our time to solve things

I don't know -- I think that once you hand off the formalized goal to the UFAI, you're stuck: you snapshotted the desired state and you can't change anything any more. If you can change things, well, that UFAI will make sure things will get changed in the direction it wants.

I think it should be possible to define a game that gives people tools to peacefully resolve disagreements, without giving them tools for intelligence explosion. The two don't seem obviously connected.

So then, basically, the core of your idea is to move all humans to a controlled reality (first VR, then physical) where an intelligence explosion is impossible? It's not really supposed to solve any problems, just prevent the expected self-destruction?

Yeah. At quite high cost, too. Like I said, it's intended as a lower bound of what's achievable, and I wouldn't have posted it if any better lower bound was known.

Like, "Please, create a new higher bar that we can expect a truly super-intelligent being to be able to exceed."?

[-][anonymous]7y00

A bar like "whew, now we can achieve an outcome at least this good, instead of killing everyone. Let's think how we can do better."

[This comment is no longer endorsed by its author]Reply

How hard is it to describe a coarse-grained VR utopia that you would agree to live in?

Not hard, but it is still hard to describe one that everyone would agree to live in.

Agreed. We could put people into separate utopias, but many of them would have objections to each other's utopias, so it's not a simple problem. It needs to be analyzed and decomposed further.