Stupid Questions May 2015

Gondolinian

Stupid Questions May 2015 — LessWrong

Comment Permalink

[anonymous]11y60

How can we have Friendly AI even if we humans cannot agree about our ethical values? This is an SQ because probably this was the first problem solved -it just so obivious - yet I cannot find it.

I have not finished the sequences yet, but they sound a bit optimistic to me - as if basically everybody is a modern utilitarian and the rest of the people just don't count. To give you the really dumbest question: what about religious folks? Is it just supposed to be a secular-values AI and they can go pound sand, or some sort of an agreement, compromise drawn with them and then that implemented? Is some sort of a generally agreed Human Values system a prerequisite?

My issue here is that if we want to listen to everybody, then this will be a never-ending debate. If you draw the line and e.g. include on people with reasonably utilitarian value systems, where do you draw the line etc.

hairyfigment11y00

As I told someone else, this pdf has preliminary discussion about how to resolve differences that persist under extrapolation.

The specific example of religious disagreements seems like a trivial problem to anyone who gets far enough to consider the question. Since there aren't any gods, the AI can ask what religious people would want if they accepted this fact. (This is roughly why I would oppose extrapolating only LW-ers rather than humanity as a whole.) But hey, maybe the question is more difficult than I think - we wouldn't specifically tell the AI to b... (read more)

0Ishaan11y

The "if we knew more, thought faster, were more the people we wished we were, had grown up farther together" CEV idea hopes that the disagreements are really just misunderstandings and mistakes in some sense. Otherwise, take some form of average or median, I guess?

0DanArmak11y

AFAIK it has not been solved and if it has, I would love to hear about it too. I also believe that, like you said, while it's possible for humans to negotiate and agree, any agreement would clearly be a compromise, and not the simultaneous fulfillment of everyone's different values. CEV has, IIRC, a lot of handwaving and unfounded assumptions about the existence and qualities of the One True Utility Function it's trying to build. Is there something better?

See in context