The general "goal" of this system is to make sure the world is controlled by the decisions of the program produced by the initial program, so the simulation of the initial program and yielding of control to its output are subgoals of that.
My proposal seems like the default way to try and implement that. But I definitely agree that it's reasonable to think about this aspect of the problem more.
I think it's useful to separate the problem of pointing the external AGI to the output of a specific program, and the problem of arranging the structure of the initial program so that it produces a desirable output. The structure of the initial program shouldn't be overengineered, since its role is to perform basic philosophical research that we don't understand how to do, so the focus there should be mainly on safeguards that promote desirable research dynamics (and prevent UFAI risks inside the initial program).
On the other hand, the way in which AGI us...
A recent post at my blog may be interesting to LW. It is a high-level discussion of what precisely defined value extrapolation might look like. I mostly wrote the essay while a visitor at FHI.
The basic idea is that we can define extrapolated values by just taking an emulation of a human, putting it in a hypothetical environment with access to powerful resources, and then adopting whatever values it eventually decides on. You might want some philosophical insight before launching into such a definition, but since we are currently laboring under the threat of catastrophe, it seems that there is virtue in spending our effort on avoiding death and delegating whatever philosophical work we can to someone on a more relaxed schedule.
You wouldn't want to run an AI with the values I lay out, but at least it is pinned down precisely. We can articulate objections relatively concretely, and hopefully begin to understand/address the difficulties.
(Posted at the request of cousin_it.)