You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Gunnar_Zarncke comments on Oracle AI: Human beliefs vs human values - Less Wrong Discussion

2 Post author: Stuart_Armstrong 22 July 2015 11:54AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (14)

You are viewing a single comment's thread.

Comment author: Gunnar_Zarncke 22 July 2015 05:50:15PM 2 points [-]

It seems that if we can ever define the difference between human beliefs and values, we could program a safe Oracle

Why do you think so? I have some idea that it follows from your other posts on AI safety but I'm not clear how.

Or do you intend this as a post where the reader is asked to figure it out as an exercise?

Comment author: Stuart_Armstrong 23 July 2015 08:33:44AM 2 points [-]

One of the problem of Oracles is that they will mislead us (eg social engineering, seduction) to achieve their goals. Most ways of addressing this either require strict accuracy in the response (which means the response might be incomprehensible or useless, eg if its given in terms of atom position) or some measure about the human reaction to the response (which brings up the risk of social engineering).

If we can program the AI to distinguish between human beliefs and values, we can give it goals like "ensure that human beliefs post answer are more accurate, while their values are (almost?) unchanged". This solves the issue of defining an accurate answer, and removes that part of the risk (it doesn't remove others).