Formalizing Value Extrapolation

paulfchristiano

A recent post at my blog may be interesting to LW. It is a high-level discussion of what precisely defined value extrapolation might look like. I mostly wrote the essay while a visitor at FHI.

The basic idea is that we can define extrapolated values by just taking an emulation of a human, putting it in a hypothetical environment with access to powerful resources, and then adopting whatever values it eventually decides on. You might want some philosophical insight before launching into such a definition, but since we are currently laboring under the threat of catastrophe, it seems that there is virtue in spending our effort on avoiding death and delegating whatever philosophical work we can to someone on a more relaxed schedule.

You wouldn't want to run an AI with the values I lay out, but at least it is pinned down precisely. We can articulate objections relatively concretely, and hopefully begin to understand/address the difficulties.

(Posted at the request of cousin_it.)

A recent post at my blog may be interesting to LW. It is a high-level discussion of what precisely defined value extrapolation might look like. I mostly wrote the essay while a visitor at FHI.

(Posted at the request of cousin_it.)

Regarding the 100 years informal example at the beginning:

Let's say I kidnapped you, put you in a box, and told you that you would spend the rest of your life figuring out the answer to some obvious moral question ("Should we cure Brad of cancer?") I could imagine you might become quite resentful, might suffer from some sort of mental illness after the first 10 years, and might give the wrong answer just out of spite.

Of course, if we take a snapshot of your brain and put it through the same experience in a simulation, it will feel exactly the same way.

Obviously you can't just cut out whatever parts of the brain might be responsible for resentment/boredom/mental illness. Specifying some sort of entertainment for you might cause you to ignore solving the problem to focus on the entertainment, or the entertainment might change your values. Terminating your simulation as soon as you found the answer to the question could cause you to focus on your impending death.

Maybe you could give us an informal summary of your proposal, to harvest cognitive surplus from those who don't want to read the entire thing and possibly find some relevant common sense sticking point that didn't occur to you? I suspect I would find an informal discussion where specifics were hashed out as necessary more interesting to read. Of course the eventual goal is formalization, but best practices for formalization may not be best practices for harvesting cognitive surplus.

To put it another way, you might wish to make sure you had something worth formalizing (through lots of informal discussion) before taking the trouble to formalize it.

"In(1)" looked like the natural log to me, for what it's worth.

I do give a (somewhat) concise overview, in the section headed 'The Proposal.'

The 100 years example is not quite right, in that in the real example we put you in an environment with unlimited computational power. One of the first things you are likely to do is create an extremely pleasant environment for yourself to work in (another is to create a community to work alongside you, either out of emulations of yourself, emulations of others, or reconstructed from simulations of worlds like Earth), while you figure out what should be done.

That said, there are... (read more)

26

Formalizing Value Extrapolation

26

26

26

Formalizing Value Extrapolation

26

26