A recent post at my blog may be interesting to LW. It is a high-level discussion of what precisely defined value extrapolation might look like. I mostly wrote the essay while a visitor at FHI.
The basic idea is that we can define extrapolated values by just taking an emulation of a human, putting it in a hypothetical environment with access to powerful resources, and then adopting whatever values it eventually decides on. You might want some philosophical insight before launching into such a definition, but since we are currently laboring under the threat of catastrophe, it seems that there is virtue in spending our effort on avoiding death and delegating whatever philosophical work we can to someone on a more relaxed schedule.
You wouldn't want to run an AI with the values I lay out, but at least it is pinned down precisely. We can articulate objections relatively concretely, and hopefully begin to understand/address the difficulties.
(Posted at the request of cousin_it.)
The outer AGI doesn't control the initial program if the initial program doesn't listen to the outer AGI. It's a kind of reverse AI box problem: the program that the AGI runs shouldn't let the AGI in. This certainly argues that the initial program should take no input, and output its result blindly. That it shouldn't run the outer AGI internally is then the same kind of AI safety consideration as that it shouldn't run any other UFAI internally, so it doesn't seem like an additional problem.
Of course, once you are powerful enough you let the AGI in (or you define a utility function which invokes the AI, which is really no difference), because this is how you control it.