Oracle AI: Human beliefs vs human values

Stuart_Armstrong

It seems that if we can ever define the difference between human beliefs and values, we could program a safe Oracle by requiring it to maximise the accuracy of human beliefs on a question, while keeping human values fixed (or very little changing). Plus a whole load of other constraints, as usual, but that might work for a boxed Oracle answering a single question.

This is a reason to suspect it will not be easy to distinguish human beliefs and values ^_^

It seems that if we can ever define the difference between human beliefs and values, we could program a safe Oracle

Why do you think so? I have some idea that it follows from your other posts on AI safety but I'm not clear how.

Or do you intend this as a post where the reader is asked to figure it out as an exercise?

One of the problem of Oracles is that they will mislead us (eg social engineering, seduction) to achieve their goals. Most ways of addressing this either require strict accuracy in the response (which means the response might be incomprehensible or useless, eg if its given in terms of atom position) or some measure about the human reaction to the response (which brings up the risk of social engineering).

If we can program the AI to distinguish between human beliefs and values, we can give it goals like "ensure that human beliefs post answer are more ac... (read more)

4

Oracle AI: Human beliefs vs human values

4

4

4

Oracle AI: Human beliefs vs human values

4

4