On the fragility of values

Stuart_Armstrong

Programming human values into an AI is often taken to be very hard because values are complex (no argument there) and fragile. I would agree that values are fragile in the construction; anything lost in the definition might doom us all. But once coded into a utility function, they are reasonably robust.

As a toy model, let's say the friendly utility function U has a hundred valuable components - friendship, love, autonomy, etc... - assumed to have positive numeric values. Then to ensure that we don't lose any of these, U is defined as the minimum of all those hundred components.

Now define V as U, except we forgot the autonomy term. This will result in a terrible world, without autonomy or independence, and there will be wailing and gnashing of teeth (or there would, except the AI won't let us do that). Values are indeed fragile in the definition.

However... A world in which V is maximised is a terrible world from the perspective of U as well. U will likely be zero in that world, as the V-maximising entity never bothers to move autonomy above zero. So in utility function space, V and U are actually quite far apart.

Indeed we can add any small, bounded utility to W to U. Assume W is bounded between zero and one; then an AI that maximises W+U will never be more that one expected 'utiliton' away, according to U, from one that maximises U. So - assuming that one 'utiliton' is small change for U - a world run by an W+U maximiser will be good.

So once they're fully spelled out inside utility space, values are reasonably robust, it's in their initial definition that they're fragile.

... is there any particular reason that you are choosing to ignore entire paragraphs (multiple) by me that address the question of what it is you're actually trying to say with statements like this, while also demonstrating that under at least half of the available valid definitions of the terms you are using, your stated conclusion is demonstrably false?

That's what lawyers call a "leading question".

I do not accept your characterisation of the situation. FWIW, I ignore most of what I encounter on the internet - so don't take it too personally.

Yes, but goals are used extensively for signalling purposes. Declared goals should not normally be be taken to be actual goals - but more as what the brain's P.R. department would have others believe your goals to be.

Just two days ago I offered to help a man who, as a complication of poorly managed diabetes and other symptoms, is now infirm to the point of requiring assistance to move about his own home, commit suicide if that was what he wanted -- because I value personal autonomy and the requisite self-determination it implies. In other words, while some individuals might only be 'mouthing the words' of personal autonomy as an inherent good, I'm walking the walk over here. And I know for a fact that I am not the only person who does so.

So: I was not suggesting that people do not do good deeds. Indeed: good deeds make for good P.R.

So again: how does your - quite frankly, rather biased in appearances to me - epistemology account for the existence of individuals such as myself who do view personal liberty and autonomy as an inherent good and act upon that principle in our daily lives

So: people believe deeply in all kinds of religious and political doctrines and values. That doesn't mean that these are best modelled as being intrinsic values. When people change their religions and political systems, it is evidence against the associated values being intrinsic.

Valuing something instrumentally is not intended as some kind of insult. I value art and music instrumentally. It doesn't bother me that these are not intrinsic values.

That's what lawyers call a "leading question".

This would only be valid if and only if I were not relating an exactly accurate depiction of what was occurring. IF it is leading you to a specific response -- it is a response that is in accordance with what's really happening. This makes it no more "leading" than "would you care to tell the jury why you would be on this particular piece of film stabbing the victim twenty times with a knife, Mr. Defendant?"

I cannot help it that you dislike the necessary conclusions of the curre... (read more)

9

On the fragility of values

9

9

9

On the fragility of values

9

9