A simplified version of this proposal that applies more generally is to implement a WBE-based FAI project using AGI, before normal WBE becomes available. This way, you only need to figure out how to build a "yield control to this-here program as it would be after N cycles" AGI, and the rest of the FAI project design can be left to the initial WBE-ed team. This would possibly have a side effect of initially destroying the rest of the human world, since the AGI won't be guided by our values before the N cycles of internal simulation complete (it would care about simulating the internal environment, not about saving external human lives; it might turn out to be FAI-complete to make it safe throughout), but if that internal environment can be guaranteed to be given control afterwards (when a FAI project inside it is complete), then eventually this is a plausible winning plan.
The "yield control" AGI seems problematic to define, much less to imagine existing. Do you think it is plausible?
This is certainly the line of thought that led me here, however.
A recent post at my blog may be interesting to LW. It is a high-level discussion of what precisely defined value extrapolation might look like. I mostly wrote the essay while a visitor at FHI.
The basic idea is that we can define extrapolated values by just taking an emulation of a human, putting it in a hypothetical environment with access to powerful resources, and then adopting whatever values it eventually decides on. You might want some philosophical insight before launching into such a definition, but since we are currently laboring under the threat of catastrophe, it seems that there is virtue in spending our effort on avoiding death and delegating whatever philosophical work we can to someone on a more relaxed schedule.
You wouldn't want to run an AI with the values I lay out, but at least it is pinned down precisely. We can articulate objections relatively concretely, and hopefully begin to understand/address the difficulties.
(Posted at the request of cousin_it.)