Lumifer comments on An Oracle standard trick - Less Wrong

4 Post author: Stuart_Armstrong 03 June 2015 02:17PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (33)

You are viewing a single comment's thread. Show more comments above.

Comment author: Lumifer 05 June 2015 02:35:44PM 1 point [-]

it also acts as if it believed the message wasn't read (note this doesn't mean that it believes it!)

So... you want to introduce, as a feature, the ability to believe one thing but act as if you believe something else? That strikes me as a remarkably bad idea. For one thing, people with such a feature tend to end up in psychiatric wards.

Comment author: gjm 05 June 2015 04:28:40PM 1 point [-]

I haven't thought hard about Stuart's ideas, so this may or may not have any relevance to them; but it's at least arguable that it's really common (even outside psychiatric wards) for explicit beliefs and actions to diverge. A standard example: many Christians overtly believe that when Christians die they enter into a state of eternal infinite bliss, and yet treat other people's deaths as tragic and try to avoid dying themselves.

Comment author: Stuart_Armstrong 05 June 2015 04:17:49PM 0 points [-]

Have you read the two article I linked to, explaining the general principle?

Comment author: Lumifer 05 June 2015 04:53:01PM *  0 points [-]

Yes, though I have not thought deeply (hat tip to Jonah :-D) about them.

The idea of decoupling AI beliefs from AI actions looks bad to me on its face. I expect it to introduce a variety of unpleasant failure modes ("of course I fully believe in CEV, it's just that I'm going to act differently...") and general fragility. And even if one of utility functions is "do not care about anything but miracles" I still think it's just going to lead to a catatonic state, is all.