A recent post at my blog may be interesting to LW. It is a high-level discussion of what precisely defined value extrapolation might look like. I mostly wrote the essay while a visitor at FHI.
The basic idea is that we can define extrapolated values by just taking an emulation of a human, putting it in a hypothetical environment with access to powerful resources, and then adopting whatever values it eventually decides on. You might want some philosophical insight before launching into such a definition, but since we are currently laboring under the threat of catastrophe, it seems that there is virtue in spending our effort on avoiding death and delegating whatever philosophical work we can to someone on a more relaxed schedule.
You wouldn't want to run an AI with the values I lay out, but at least it is pinned down precisely. We can articulate objections relatively concretely, and hopefully begin to understand/address the difficulties.
(Posted at the request of cousin_it.)
I'm slightly worried that even formally specifying an "idealized and unbounded computer" will turn out to be Oracle-AI-complete. We don't need to worry about it converting something valuable into computronium, but we do need to ensure that it interacts with the simulated human(s) in a friendly way. We need to ensure that it doesn't modify the human to simplify the process of explaining something. The simulated human needs to be able to control what kinds of minds the computer creates in the process of thinking (we may not care, but the human would). And the computer should certainly not hack its way out of the hypothetical via being thought about by the FAI.
We are trying to formally specify the input-output behavior of an idealized computer, running some simple program. The mathematical definition of a Turing machine with an input tape would suffice, as would a formal specification of a version of Python running with unlimited memory.