The idea is that if the problem of consciousness is solved (which is admittedly a tall order), "make all consciousness in the universe reflect this particular VR utopia with these particular human brains and evolve it faithfully from there" becomes a formalizable goal, akin to paperclips, which you can hand to an unfriendly agent AI. You don't need to solve all the other philosophical problems usually required for FAI. Note that solving the problem of consciousness is a key requirement, you can't just say "simulate these uploaded brains in this utopia forever and nevermind what consciousness means", because that could open the door to huge suffering happening elsewhere (e.g. due to the AI simulating many scenarios). You really need the "all consciousness in the universe" part.
Lower bound means that before writing this post, I didn't know any halfway specific plan for navigating the intelligence explosion that didn't kill everyone. Now I know that we can likely achieve something as good as this, though it isn't very good. It's a lower bound on what's achievable.
Those are not potshots -- at a meta level what's happening is that your picture of this particular piece of the world doesn't quite match my picture and I'm trying to figure out where exactly the mismatch is and is it mostly a terms/definitions problem or there's something substantive there. That involves pointing at pieces which stick out or which look to be holes and asking you questions about them. The point is not to destroy the structure, but to make it coherent in my mind.
That said... :-)
which you can hand to an unfriendly agent AI
Isn't a major p...
I think I've come up with a fun thought experiment about friendly AI. It's pretty obvious in retrospect, but I haven't seen it posted before.
When thinking about what friendly AI should do, one big source of difficulty is that the inputs are supposed to be human intuitions, based on our coarse-grained and confused world models. While the AI's actions are supposed to be fine-grained actions based on the true nature of the universe, which can turn out very weird. That leads to a messy problem of translating preferences from one domain to another, which crops up everywhere in FAI thinking, Wei's comment and Eliezer's writeup are good places to start.
What I just realized is that you can handwave the problem away, by imagining a universe whose true nature agrees with human intuitions by fiat. Think of it as a coarse-grained virtual reality where everything is built from polygons and textures instead of atoms, and all interactions between objects are explicitly coded. It would contain player avatars, controlled by ordinary human brains sitting outside the simulation (so the simulation doesn't even need to support thought).
The FAI-relevant question is: How hard is it to describe a coarse-grained VR utopia that you would agree to live in?
If describing such a utopia is feasible at all, it involves thinking about only human-scale experiences, not physics or tech. So in theory we could hand it off to human philosophers or some other human-based procedure, thus dealing with "complexity of value" without much risk. Then we could launch a powerful AI aimed at rebuilding reality to match it (more concretely, making the world's conscious experiences match a specific coarse-grained VR utopia, without any extra hidden suffering). That's still a very hard task, because it requires solving decision theory and the problem of consciousness, but it seems more manageable than solving friendliness completely. The resulting world would be suboptimal in many ways, e.g. it wouldn't have much room for science or self-modification, but it might be enough to avert AI disaster (!)
I'm not proposing this as a plan for FAI, because we can probably come up with something better. But what do you think of it as a thought experiment? Is it a useful way to split up the problem, separating the complexity of human values from the complexity of non-human nature?