XiXiDu comments on Limits on self-optimisation - Less Wrong

6 Post author: RolfAndreassen 20 January 2012 09:58PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (36)

You are viewing a single comment's thread.

Comment author: XiXiDu 21 January 2012 10:14:32AM 0 points [-]

A Friendly AI, optimising itself, must ensure that it remains Friendly after the modification;

Isn't this also true for unfriendly AI? Any AI has to ensure that improved versions of itself are friendly with respect to its initial values. So for each modification, or successor, it has to find a proof that it will not only respect its values but that it will do so in a way that more effectively maximizes expected utility.

Comment author: RolfAndreassen 21 January 2012 09:50:43PM 2 points [-]

Ah no. Friendliness is a special category of AIs, and as such is more restrictive: No AI can be Friendly whose output changes under optimisation, but an Unfriendly AI is still Unfriendly if its output changes.

Comment author: timtyler 21 January 2012 10:50:30PM 0 points [-]

Not really. For example, you could have a "sloppy" superintelligence that traded short term gain over the future of the universe by giving it a short planning horizon.

Comment author: TheOtherDave 21 January 2012 06:19:00PM 0 points [-]

The phrase "has to" is a little confusing here. Sure, any AI that doesn't reliably preserve its value structure under self-modification risks destroying value when it self-modifies. But something can be an AI without preserving its value structure, just like we can be NIs without preserving our value structures.