Formalizing Value Extrapolation

paulfchristiano

A recent post at my blog may be interesting to LW. It is a high-level discussion of what precisely defined value extrapolation might look like. I mostly wrote the essay while a visitor at FHI.

The basic idea is that we can define extrapolated values by just taking an emulation of a human, putting it in a hypothetical environment with access to powerful resources, and then adopting whatever values it eventually decides on. You might want some philosophical insight before launching into such a definition, but since we are currently laboring under the threat of catastrophe, it seems that there is virtue in spending our effort on avoiding death and delegating whatever philosophical work we can to someone on a more relaxed schedule.

You wouldn't want to run an AI with the values I lay out, but at least it is pinned down precisely. We can articulate objections relatively concretely, and hopefully begin to understand/address the difficulties.

(Posted at the request of cousin_it.)

A recent post at my blog may be interesting to LW. It is a high-level discussion of what precisely defined value extrapolation might look like. I mostly wrote the essay while a visitor at FHI.

(Posted at the request of cousin_it.)

After reading the article, I thought I understood it, but from reading the comments, this appears to be an illusion. Yet, I think I should be able to understand, it doesn't seem to require any special math or radically new concepts... My understanding is below. Could someone check it and tell me where I'm wrong?

The proposal is to define a utility function U(), which takes as input some kind of description of the universe, and returns the evaluation of this description, a number between 0 and 1.

The function U is defined in terms of two other functions - H and T, representing a mathematical description of a specific human brain, and an infinitely powerful computing environment.

Although the U-maximizing AGI will not be able to actually calculate U, it will be able to reason about it (that is, prove theorems), which should allow it to perform at least some actions, which would therefore be provably friendly.

26

Formalizing Value Extrapolation

26

26

26

Formalizing Value Extrapolation

26

26