In terms of directions to pursue, it seems like the first thing you want to do is make sure the AI is essentially transparent and that we don't have much of an inferential gap with it. Otherwise when we attempt to have it give a values and tradeoffs solution, we may not get anywhere near what we want.
In essence if the AI should be able to look at all the problems facing earth and say something like "I'm 97% sure our top priority is to build asteroid deflectors, based on these papers, calculations, and projections. The proposed plan of earthquake stabilizers is only 2% likely to be the best course of action based on these other papers, calculations, and projection" If it doesn't have that kind of approach, there seem to be many ways that things can go horribly wrong.
Examples:
A: If the AI can build Robotic Earthquake stabilizers at essentially no cost, and prevent children from being killed in earthquakes, or, it can simulate everyone and have our simulations have that experience at essentially no cost, the AI should probably be aware of the fact that these are different things so we don't say "Yes, build those earthquake stabilizers." and then it uploads everyone, and we say "That isn't what I meant!"
B: And the AI should definitely provide some kind of information about proposed plans/alternatives. If we say "Earthquake stabilizers save the most children, build those!" and the AI is aware "Actually, Asteroid deflectors save ten times more children." it shouldn't just go "Oh well, they SAID earthquake stabilizers, I'm not even going to mention the deflectors."
C: Or maybe: "I thought killing all children was the best way to stop children from suffering, and that this was trivially obvious so of course you wanted me to make a childkiller plague and I did so and released it without telling you when you said "Reduce children's suffering.""
D: Or it could simulate everyone and say "Well, they never said to keep the simulation running after I simulated everyone, so time to shutdown all simulations and save power for their next request."
Once you've got that settled, you can attempt to have the AI do other things, like assess Anti-Earthquake/Asteroid Deflection/Uploading, because you'll actually be able to ask it "Which of these are the right things to do and why based on these values and these value tradeoffs?" and get an answer which makes sense. You may not like or expect the answer, but at least you should be able to understand it given time.
For instance, going back to the sample problem, I don't mind that simulation that much, but I don't mind it because I am assuming it works as advertised. If it has a problem like D and I just didn't realize that and the AI didn't think it noteworthy, that's a problem. Also, for all I know, there is an even better proposed life, that the AI was aware of, and didn't think to even suggest as in B.
Given a sufficiently clear AI, I'd imagine that it could explain things to me sufficiently well that there wouldn't even be a question of which values to trade off, because the solution would be clear, but for all I know, it might come up with "Well, about half of you want to live in a simulated utopia, and about half of you want to live in a real utopia, and this is unresolvable to me because of these factors unless you solve this value tradeoff problem."
It would still however, have collected all the reasons together that explained WHY it couldn't solve that value tradeoff problem, which would still be a handy thing to have anyway, since I don't have that right now.
Edit: Eek, I did not realize the "#" sign bolded things, extra bolds removed.
I offer this particular scenario because it seems conceivable that with no possible competition between people, it would be possible to avoid doing interpersonal utility comparison, which could make Mostly Friendly AI (MFAI) easier. I don't think this is likely or even worthy of serious consideration, but it might make some of the discussion questions easier to swallow.
1. Value is fragile. But is Eliezer right in thinking that if we get just one piece wrong the whole endeavor is worthless? (Edit: Thanks to Lukeprog for pointing out that this question completely misrepresents EY's position. Error deliberately preserved for educational purposes.)
2. Is the above scenario better or worse than the destruction of all earth-originating intelligence? (This is the same as question 1.)
3. Are there other values (besides affecting-the-real-world) that you would be willing to trade off?
4. Are there other values that, if we traded them off, might make MFAI much easier?
5. If the answers to 3 and 4 overlap, how do we decide which direction to pursue?