I think I've come up with a fun thought experiment about friendly AI. It's pretty obvious in retrospect, but I haven't seen it posted before.
When thinking about what friendly AI should do, one big source of difficulty is that the inputs are supposed to be human intuitions, based on our coarse-grained and confused world models. While the AI's actions are supposed to be fine-grained actions based on the true nature of the universe, which can turn out very weird. That leads to a messy problem of translating preferences from one domain to another, which crops up everywhere in FAI thinking, Wei's comment and Eliezer's writeup are good places to start.
What I just realized is that you can handwave the problem away, by imagining a universe whose true nature agrees with human intuitions by fiat. Think of it as a coarse-grained virtual reality where everything is built from polygons and textures instead of atoms, and all interactions between objects are explicitly coded. It would contain player avatars, controlled by ordinary human brains sitting outside the simulation (so the simulation doesn't even need to support thought).
The FAI-relevant question is: How hard is it to describe a coarse-grained VR utopia that you would agree to live in?
If describing such a utopia is feasible at all, it involves thinking about only human-scale experiences, not physics or tech. So in theory we could hand it off to human philosophers or some other human-based procedure, thus dealing with "complexity of value" without much risk. Then we could launch a powerful AI aimed at rebuilding reality to match it (more concretely, making the world's conscious experiences match a specific coarse-grained VR utopia, without any extra hidden suffering). That's still a very hard task, because it requires solving decision theory and the problem of consciousness, but it seems more manageable than solving friendliness completely. The resulting world would be suboptimal in many ways, e.g. it wouldn't have much room for science or self-modification, but it might be enough to avert AI disaster (!)
I'm not proposing this as a plan for FAI, because we can probably come up with something better. But what do you think of it as a thought experiment? Is it a useful way to split up the problem, separating the complexity of human values from the complexity of non-human nature?
Good points and I agree with pretty much all of them, but for the sake of argument I'll try to write the strongest response I can:
It seems to me that your view of value is a little bit mystical. Our minds can only estimate the value of situations that are close to normal. There's no unique way to extend a messy function from [0,1] to [-100,100]. I know you want to use philosophy to extend the domain, but I don't trust our philosophical abilities to do that, because whatever mechanism created them could only test them on normal situations. We already see different people's philosophies disagreeing much more on abnormal situations than normal ones. If I got an email from an uplifted version of me saying he found an abnormal situation that's really valuable, I wouldn't trust it much, because it's too sensitive to arbitrary choices made during uplifting (even choices made by me).
That's why it makes sense to try to come up with a normal situation that's as good as we can imagine, without looking at abnormal situations too much. (We can push the boundaries of normal and allow some mind modification, but not too much because that invites risk.) That was a big part of the motivation for my post.
If the idea of unmodified human brains living in a coarse-grained VR utopia doesn't appeal to you, I guess a more general version is describing some other kind of nice universe, and using an arbitrary strong AI to run that universe on top of ours as described in the post. Solving population ethics etc. can probably wait until we've escaped immediate disaster. Astronomical waste is a problem, but not an extreme one, because we can use up all computation in the host universe if we want. So the problem comes down to describing a nice universe, which is similar to FAI but easier, because it doesn't require translating preferences from one domain to another (like with the blue-minimizing robot).
So there will be some way for people living inside the VR to change the AI's values later, it won't just be a fixed utility function encoding whatever philosophical views the people building the AI have? If that's the case (and you've managed to avoid bugs and potential issues like value drift and AI manipulating the people's philosophical reasoning) then I'd be happy with that. But I don't see why it's easier than FAI. Sure, you don't need to figure out how to trans... (read more)