A recent post at my blog may be interesting to LW. It is a high-level discussion of what precisely defined value extrapolation might look like. I mostly wrote the essay while a visitor at FHI.
The basic idea is that we can define extrapolated values by just taking an emulation of a human, putting it in a hypothetical environment with access to powerful resources, and then adopting whatever values it eventually decides on. You might want some philosophical insight before launching into such a definition, but since we are currently laboring under the threat of catastrophe, it seems that there is virtue in spending our effort on avoiding death and delegating whatever philosophical work we can to someone on a more relaxed schedule.
You wouldn't want to run an AI with the values I lay out, but at least it is pinned down precisely. We can articulate objections relatively concretely, and hopefully begin to understand/address the difficulties.
(Posted at the request of cousin_it.)
(Also, "utility function" might be confusing especially for outsiders who are used to "utility function" meaning a mapping from world states to utility values, whereas Paul is using it to mean a parameterless computation that returns a utility value.)
I think Paul is thinking that the utility definition that the simulated humans come up with is not necessarily a definition of our actual values, but just something that causes the outer AGI to self-modify into an FAI, and for that purpose it might be enough to define it using a programming language.
I think Paul's intuition here is that the simulated humans (or enhanced humans and/or FAIs they build inside the simulation) may find it useful to "blur the lines". In other words, the distinction you draw is not a fundamental one but just a safety heuristic that the simulated researchers may decide to discard or modify once they become "powerful enough". For example they may decide to partially simulate the outer AGI or otherwise try to reason about what it might do given various definitions of U' the simulation might ultimately decide upon, once they understand enough theory to see how to do this in a safe way.