Formalizing Value Extrapolation

paulfchristiano

A recent post at my blog may be interesting to LW. It is a high-level discussion of what precisely defined value extrapolation might look like. I mostly wrote the essay while a visitor at FHI.

The basic idea is that we can define extrapolated values by just taking an emulation of a human, putting it in a hypothetical environment with access to powerful resources, and then adopting whatever values it eventually decides on. You might want some philosophical insight before launching into such a definition, but since we are currently laboring under the threat of catastrophe, it seems that there is virtue in spending our effort on avoiding death and delegating whatever philosophical work we can to someone on a more relaxed schedule.

You wouldn't want to run an AI with the values I lay out, but at least it is pinned down precisely. We can articulate objections relatively concretely, and hopefully begin to understand/address the difficulties.

(Posted at the request of cousin_it.)

A recent post at my blog may be interesting to LW. It is a high-level discussion of what precisely defined value extrapolation might look like. I mostly wrote the essay while a visitor at FHI.

(Posted at the request of cousin_it.)

You kill the shell not when it is in an infinite loop, but when it takes more than a few seconds to run. We can set up such a safety net, allowing the human to run anything that takes (say) less than a million years to run, without risk of crashing. This is the sort of thing I was referring to by "some care."

Ultimately we do want the human to be able to run arbitrarily expensive subroutines, which prohibits using any heuristic of the form "stop this computation if it goes on for more than N steps."

Ultimately we do want the human to be able to run arbitrarily expensive subroutines, which prohibits using any heuristic of the form "stop this computation if it goes on for more than N steps."

What if we keep this heuristic but also define T to have an instruction that is equivalent to calling a halting-problem oracle (with each call counting as one step)? Of course that makes it harder for the outer AGI to reason about how to maximize its utility, but the increase in difficulty doesn't seem very large relative to the difficulty in the original proposal.

26

Formalizing Value Extrapolation

26

26

26

Formalizing Value Extrapolation

26

26