Discussion: Which futures are good enough?

WrongBot

Thirty years from now, a well-meaning team of scientists in a basement creates a superintelligent AI with a carefully hand-coded utility function. Two days later, every human being on earth is seamlessly scanned, uploaded and placed into a realistic simulation of their old life, such that no one is aware that anything has changed. Further, the AI had so much memory and processing power to spare that it gave every single living human being their own separate simulation.

Each person lives an extremely long and happy life in their simulation, making what they perceive to be meaningful accomplishments. For those who are interested in acquiring scientific knowledge and learning the nature of the universe, the simulation is accurate enough that everything they learn and discover is true of the real world. Every other pursuit, occupation, and pastime is equally fulfilling. People create great art, find love that lasts for centuries, and create worlds without want. Every single human being lives a genuinely excellent life, awesome in every way. (Unless you mind being simulated, in which case at least you'll never know.)

I offer this particular scenario because it seems conceivable that with no possible competition between people, it would be possible to avoid doing interpersonal utility comparison, which could make Mostly Friendly AI (MFAI) easier. I don't think this is likely or even worthy of serious consideration, but it might make some of the discussion questions easier to swallow.

1. Value is fragile. But is Eliezer right in thinking that if we get just one piece wrong the whole endeavor is worthless? (Edit: Thanks to Lukeprog for pointing out that this question completely misrepresents EY's position. Error deliberately preserved for educational purposes.)

2. Is the above scenario better or worse than the destruction of all earth-originating intelligence? (This is the same as question 1.)

3. Are there other values (besides affecting-the-real-world) that you would be willing to trade off?

4. Are there other values that, if we traded them off, might make MFAI much easier?

5. If the answers to 3 and 4 overlap, how do we decide which direction to pursue?

Thirty years from now, a well-meaning team of scientists in a basement creates a superintelligent AI with a carefully hand-coded utility function. Two days later, every human being on earth is seamlessly scanned, uploaded and placed into a realistic simulation of their old life, such that no one is aware that anything has changed. Further, the AI had so much memory and processing power to spare that it gave every single living human being their own separate simulation.

Each person lives an extremely long and happy life in their simulation, making what they perceive to be meaningful accomplishments. For those who are interested in acquiring scientific knowledge and learning the nature of the universe, the simulation is accurate enough that everything they learn and discover is true of the real world. Every other pursuit, occupation, and pastime is equally fulfilling. People create great art, find love that lasts for centuries, and create worlds without want. Every single human being lives a genuinely excellent life, awesome in every way. (Unless you mind being simulated, in which case at least you'll never know.)

2. Is the above scenario better or worse than the destruction of all earth-originating intelligence? (This is the same as question 1.)

3. Are there other values (besides affecting-the-real-world) that you would be willing to trade off?

4. Are there other values that, if we traded them off, might make MFAI much easier?

5. If the answers to 3 and 4 overlap, how do we decide which direction to pursue?

In terms of directions to pursue, it seems like the first thing you want to do is make sure the AI is essentially transparent and that we don't have much of an inferential gap with it. Otherwise when we attempt to have it give a values and tradeoffs solution, we may not get anywhere near what we want.

In essence if the AI should be able to look at all the problems facing earth and say something like "I'm 97% sure our top priority is to build asteroid deflectors, based on these papers, calculations, and projections. The proposed plan of earthquake stabilizers is only 2% likely to be the best course of action based on these other papers, calculations, and projection" If it doesn't have that kind of approach, there seem to be many ways that things can go horribly wrong.

Examples:

A: If the AI can build Robotic Earthquake stabilizers at essentially no cost, and prevent children from being killed in earthquakes, or, it can simulate everyone and have our simulations have that experience at essentially no cost, the AI should probably be aware of the fact that these are different things so we don't say "Yes, build those earthquake stabilizers." and then it uploads everyone, and we say "That isn't what I meant!"

B: And the AI should definitely provide some kind of information about proposed plans/alternatives. If we say "Earthquake stabilizers save the most children, build those!" and the AI is aware "Actually, Asteroid deflectors save ten times more children." it shouldn't just go "Oh well, they SAID earthquake stabilizers, I'm not even going to mention the deflectors."

C: Or maybe: "I thought killing all children was the best way to stop children from suffering, and that this was trivially obvious so of course you wanted me to make a childkiller plague and I did so and released it without telling you when you said "Reduce children's suffering.""

D: Or it could simulate everyone and say "Well, they never said to keep the simulation running after I simulated everyone, so time to shutdown all simulations and save power for their next request."

Once you've got that settled, you can attempt to have the AI do other things, like assess Anti-Earthquake/Asteroid Deflection/Uploading, because you'll actually be able to ask it "Which of these are the right things to do and why based on these values and these value tradeoffs?" and get an answer which makes sense. You may not like or expect the answer, but at least you should be able to understand it given time.

For instance, going back to the sample problem, I don't mind that simulation that much, but I don't mind it because I am assuming it works as advertised. If it has a problem like D and I just didn't realize that and the AI didn't think it noteworthy, that's a problem. Also, for all I know, there is an even better proposed life, that the AI was aware of, and didn't think to even suggest as in B.

Given a sufficiently clear AI, I'd imagine that it could explain things to me sufficiently well that there wouldn't even be a question of which values to trade off, because the solution would be clear, but for all I know, it might come up with "Well, about half of you want to live in a simulated utopia, and about half of you want to live in a real utopia, and this is unresolvable to me because of these factors unless you solve this value tradeoff problem."

It would still however, have collected all the reasons together that explained WHY it couldn't solve that value tradeoff problem, which would still be a handy thing to have anyway, since I don't have that right now.

Edit: Eek, I did not realize the "#" sign bolded things, extra bolds removed.

8

Discussion: Which futures are good enough?

8

8

8

Discussion: Which futures are good enough?

8

8