Examples of AI's behaving badly

Stuart_Armstrong

Some past examples to motivate thought on how AI's could misbehave:

An algorithm pauses the game to never lose at Tetris.

In "Learning to Drive a Bicycle using Reinforcement Learning and Shaping", Randlov and Alstrom, describes a system that learns to ride a simulated bicycle to a particular location. To speed up learning, they provided positive rewards whenever the agent made progress towards the goal. The agent learned to ride in tiny circles near the start state because no penalty was incurred from riding away from the goal.

A similar problem occurred with a soccer-playing robot being trained by David Andre and Astro Teller (personal communication to Stuart Russell). Because possession in soccer is important, they provided a reward for touching the ball. The agent learned a policy whereby it remained next to the ball and “vibrated,” touching the ball as frequently as possible.

Algorithms claiming credit in Eurisko: Sometimes a "mutant" heuristic appears that does little more than continually cause itself to be triggered, creating within the program an infinite loop. During one run, Lenat noticed that the number in the Worth slot of one newly discovered heuristic kept rising, indicating that had made a particularly valuable find. As it turned out the heuristic performed no useful function. It simply examined the pool of new concepts, located those with the highest Worth values, and inserted its name in their My Creator slots.

There was something else going on, though. The AI was crafting super weapons that the designers had never intended. Players would be pulled into fights against ships armed with ridiculous weapons that would cut them to pieces. "It appears that the unusual weapons attacks were caused by some form of networking issue which allowed the NPC AI to merge weapon stats and abilities," according to a post written by Frontier community manager Zac Antonaci. "Meaning that all new and never before seen (sometimes devastating) weapons were created, such as a rail gun with the fire rate of a pulse laser. These appear to have been compounded by the additional stats and abilities of the engineers weaponry."

Programs classifying gender based on photos of irises may have been artificially effective due to mascara in the photos.

Some past examples to motivate thought on how AI's could misbehave:

An algorithm pauses the game to never lose at Tetris.

Programs classifying gender based on photos of irises may have been artificially effective due to mascara in the photos.

People do this as well. They wanted to eliminate corruption from public construction projects in a certain country, and created a numbers-based evaluation systems of tenders. The differences in price offered were taken into account with a weight of 1 and the differences in penalties / liquidated damage with a weight of 6. I am not sure what is the best English term for the later, but basically it was the construction company saying if the project is late I am willing to pay X amount of penalty per day. Usually most companies offer something like 0,1% of the price. One company offered 2% which means if they are like 10-15 days late their whole profits are gone, and as this was to be taken into account with a weight of 6, they could offer an outrageous price and the rules still forced the government to accept their offer. It turned out, it was not just a bold gaming of the rules, it was corruption as well: there was no such law that such a penalty offered must also be really enforced in case of late delivery, the government's man can decide to demand less penalty if he feels the vendor is not entirely at fault. So most likely they simply planned to bribe that guy in case if they are late. Thus the new rules simply moved the bribery into a different stage of the process.

When humans are motivated by entirely external incentives like fsck everything let's make as much money on this project as possible, they behave just like the vibrating AI-Messi.

Which means - maybe we need to figure out what the heck is an inner motivation in humans that makes them want to the sensible and how to emulate it.

People do this as well.

This is known as Goodhart's law or Campbell's law.

21gjm11y

Another, famous, example. At one time, somewhere in India there were a lot of cobras, which are dangerous. So the government (it happened to be the British Raj at the time) decided to offer a bounty for dead cobras. That worked for a while, until people figured out that they could breed cobras for the bounty. Then the government worked out what was going on, and cancelled the bounty. So then the cobra breeders released all their now-valueless cobras into the wild. (According to Wikipedia this particular instance isn't actually well documented, but a similar one involving rats in Hanoi is.)

41

Examples of AI's behaving badly

41

41

41

Examples of AI's behaving badly

41

41