Some past examples to motivate thought on how AI's could misbehave:
An algorithm pauses the game to never lose at Tetris.
In "Learning to Drive a Bicycle using Reinforcement Learning and Shaping", Randlov and Alstrom, describes a system that learns to ride a simulated bicycle to a particular location. To speed up learning, they provided positive rewards whenever the agent made progress towards the goal. The agent learned to ride in tiny circles near the start state because no penalty was incurred from riding away from the goal.
A similar problem occurred with a soccer-playing robot being trained by David Andre and Astro Teller (personal communication to Stuart Russell). Because possession in soccer is important, they provided a reward for touching the ball. The agent learned a policy whereby it remained next to the ball and “vibrated,” touching the ball as frequently as possible.
Algorithms claiming credit in Eurisko: Sometimes a "mutant" heuristic appears that does little more than continually cause itself to be triggered, creating within the program an infinite loop. During one run, Lenat noticed that the number in the Worth slot of one newly discovered heuristic kept rising, indicating that had made a particularly valuable find. As it turned out the heuristic performed no useful function. It simply examined the pool of new concepts, located those with the highest Worth values, and inserted its name in their My Creator slots.
There was something else going on, though. The AI was crafting super weapons that the designers had never intended. Players would be pulled into fights against ships armed with ridiculous weapons that would cut them to pieces. "It appears that the unusual weapons attacks were caused by some form of networking issue which allowed the NPC AI to merge weapon stats and abilities," according to a post written by Frontier community manager Zac Antonaci. "Meaning that all new and never before seen (sometimes devastating) weapons were created, such as a rail gun with the fire rate of a pulse laser. These appear to have been compounded by the additional stats and abilities of the engineers weaponry."
Programs classifying gender based on photos of irises may have been artificially effective due to mascara in the photos.
There's an old story about a neural network that was trained to recognize pictures of tanks for the military. It worked almost perfectly. When someone else tried the program, they found it didn't work at all.
It turned out that they had taken all the pictures of tanks on cloudy days, and all the other pictures on different days. The neural network simply learned to recognize the weather.
This actually sort of happens in modern deep neural networks. It thinks that roads are parts of cars, or that people's lips are part of bubbles, or that arms are parts of dumbbells.
That seems to be the go to story for NNs. I remember hearing it back in grad school. Though now I'm wondering if it is just an urban legend.
Some cursory googling shows others wondering the same thing.
Any have an actual cite for this? Or if not an actual cite, at least you had heard a concrete cite for it once?