Stuart_Armstrong comments on Domesticating reduced impact AIs - Less Wrong

9 Post author: Stuart_Armstrong 14 February 2013 04:59PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (104)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 17 February 2013 04:56:44AM 0 points [-]

Even invalidating a proof doesn't automatically mean the outcome is the opposite of the proof. The key question is whether there's a cognitive search process actively looking for a way to exploit the flaws in a cage. An FAI isn't looking for ways to stop being Friendly, quite the opposite. More to the point, it's not actively looking for a way to make its servers or any other accessed machinery disobey the previously modeled laws of physics in a way that modifies its preferences despite the proof system. Any time you have a system which sets that up as an instrumental goal you must've done the Wrong Thing from an FAI perspective. In other words, there's no super-clever being doing a cognitive search for a way to force an invalidating behavior - that's the key difference.

Comment author: Stuart_Armstrong 18 February 2013 01:14:34PM 0 points [-]

The problem is that it's a utility maximiser. If the ontology crises causes the FAI's goals to slide a bit in the wrong direction, it may end up optimising us out of existence (even if "happy humans with worthwhile and exciting lives" is still high in its preference ordering, it might not be at the top).

Comment author: Eliezer_Yudkowsky 18 February 2013 05:52:17PM 1 point [-]

This is a uniform problem among all AIs. Avoiding it is very hard. That is why such a thing as the discipline of Friendly AI exists in the first place. You do, in fact, have to specify the preference ordering sufficiently well and keep it sufficiently stable.

Stepping down from maximization is also necessary just because actual maximization is undoable, but then that also has to be kept stable (satisficers may become maximizers, etc.) and if there's something above eudaimonia in its preference ordering it might not take very much 'work' to bring it into existence.