Stuart_Armstrong comments on An Oracle standard trick - Less Wrong

4 Post author: Stuart_Armstrong 03 June 2015 02:17PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (33)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 05 June 2015 11:53:38AM 0 points [-]

The design I had in mind is: utility u causes the AI to want to send messages. This is modified to u' so that it also acts as if it believed the message wasn't read (note this doesn't mean that it believes it!). Then if u' remains stable under self-improvement, we have the same behaviour after self-improvement.

Comment author: Lumifer 05 June 2015 02:35:44PM 1 point [-]

it also acts as if it believed the message wasn't read (note this doesn't mean that it believes it!)

So... you want to introduce, as a feature, the ability to believe one thing but act as if you believe something else? That strikes me as a remarkably bad idea. For one thing, people with such a feature tend to end up in psychiatric wards.

Comment author: gjm 05 June 2015 04:28:40PM 1 point [-]

I haven't thought hard about Stuart's ideas, so this may or may not have any relevance to them; but it's at least arguable that it's really common (even outside psychiatric wards) for explicit beliefs and actions to diverge. A standard example: many Christians overtly believe that when Christians die they enter into a state of eternal infinite bliss, and yet treat other people's deaths as tragic and try to avoid dying themselves.

Comment author: Stuart_Armstrong 05 June 2015 04:17:49PM 0 points [-]

Have you read the two article I linked to, explaining the general principle?

Comment author: Lumifer 05 June 2015 04:53:01PM *  0 points [-]

Yes, though I have not thought deeply (hat tip to Jonah :-D) about them.

The idea of decoupling AI beliefs from AI actions looks bad to me on its face. I expect it to introduce a variety of unpleasant failure modes ("of course I fully believe in CEV, it's just that I'm going to act differently...") and general fragility. And even if one of utility functions is "do not care about anything but miracles" I still think it's just going to lead to a catatonic state, is all.