A recent post at my blog may be interesting to LW. It is a high-level discussion of what precisely defined value extrapolation might look like. I mostly wrote the essay while a visitor at FHI.
The basic idea is that we can define extrapolated values by just taking an emulation of a human, putting it in a hypothetical environment with access to powerful resources, and then adopting whatever values it eventually decides on. You might want some philosophical insight before launching into such a definition, but since we are currently laboring under the threat of catastrophe, it seems that there is virtue in spending our effort on avoiding death and delegating whatever philosophical work we can to someone on a more relaxed schedule.
You wouldn't want to run an AI with the values I lay out, but at least it is pinned down precisely. We can articulate objections relatively concretely, and hopefully begin to understand/address the difficulties.
(Posted at the request of cousin_it.)
Cool! Thanks to you, we finally seem to have a viable attack on the problem of FAI, by defining goals in terms of hypothetical processes that could output a goal specification, like brain emulations with powerful computers. Everyone please help advance this direction of inquiry :-)
One potential worry is that the human subject must be over some minimal threshold of intelligence for this scheme to work. A village fool would fail. How do I convince myself that the threshold is below the "reasonably intelligent human" level?
(Note that the hypothetical process probably doesn't even output a goal specification, it just outputs a number, which the AI tries to control.)
The hope is something like: "We can reason about the outputs of this process, so an AI as smart as us can reason about the outputs of this process (perhaps... (read more)