somervta comments on The genie knows, but doesn't care - Less Wrong

54 Post author: RobbBB 06 September 2013 06:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (515)

You are viewing a single comment's thread. Show more comments above.

Comment author: wedrifid 11 September 2013 02:13:53AM 7 points [-]

However, few existing algorithms, if at all, have the failure modes you describe. They fail early, and they fail hard.

Yes, most algorithms fail early and and fail hard. Most of my AI algorithms failed early with a SegFault for instance. New, very similar algorithms were then designed with progressively more advanced bugs. But these are a separate consideration. What we are interested in here is the question "Given an AI algorithm that is capable of recursive self improvement is successfully created by humans how likely is it that they execute this kind of failure mode?" The "fail early fail hard" cases are screened off. We're looking at the small set that is either damn close to a desired AI or actually a desired AI and distinguishing between them.

Looking at the context to work out what the 'failure mode' being discussed is it seems to be the issue where an AI is programmed to optimise based on a feedback mechanism controlled by humans. When the AI in question is superintelligent most failure modes tend to be variants of "conquer the future light cone, kill everything that is a threat and supply perfect feedback to self". When translating this to the nearest analogous failure mode in some narrow AI algorithm of the kind we can design now it seems like this refers to the failure mode whereby the AI optimises exactly what it is asked to optimise but in a way that is a lost purpose. This is certainly what I had to keep in mind in my own research.

A popular example that springs to mind is the results of an AI algorithm designed by a military research agency. From memory their task was to take a simplified simulation of naval warfare, with specifications for how much each aspect of ships, boats and weaponry cost and a budget. They were to use this to design the optimal fleet given their resources and the task was undertaken by military officers and a group which use an AI algorithm of some sort. The result was that the AI won easily but did so in a way that led the overseers to dismiss them as a failure because they optimised the problem specification as given, not the one 'common sense' led the humans to optimise. Rather than building any ships the AI produced tiny unarmored dingies with a single large cannon or missile attached. For whatever reason the people running the game did not consider this an acceptable outcome. Their mistake was to supply a problem specification which did not match their actual preferences. They supplied a lost purpose.

When it comes to considering proposals for how to create friendly superintelligences it becomes easy to spot notorious failure modes in what humans typically think are a clever solution. It happens to be the case that any solution that is based on an AI optimising for approval or achieving instructions given just results in Everybody Dies.

Where Eliezer suggests getting AI experience to get a feel for such difficulties I suggest an alternative. Try being a D&D dungeon master in a group full of munchkins. Make note of every time that for the sake of the game you must use your authority to outlaw the use of a by-the-rules feature.

Comment author: somervta 11 September 2013 03:49:34AM 6 points [-]

A popular example that springs to mind is the results of an AI algorithm designed by a military research agency. From memory their task was to take a simplified simulation of naval warfare, with specifications for how much each aspect of ships, boats and weaponry cost and a budget. They were to use this to design the optimal fleet given their resources and the task was undertaken by military officers and a group which use an AI algorithm of some sort. The result was that the AI won easily but did so in a way that led the overseers to dismiss them as a failure because they optimised the problem specification as given, not the one 'common sense' led the humans to optimise. Rather than building any ships the AI produced tiny unarmored dingies with a single large cannon or missile attached. For whatever reason the people running the game did not consider this an acceptable outcome. Their mistake was to supply a problem specification which did not match their actual preferences. They supplied a lost purpose.

The AI in questions was Eurisko, and it entered the Traveller Trillion Credit Squadron tournament in 1981 as described above. It was also entered the next year, after an extended redesign of the rules, and won, again. After this the competition runners announced that if Eurisko won a third time the competition would be discontinued, so Lenat (the programmer) stopped entering.