Interesting list of examples where AI programs gamed the specification, solving the problem in rather creative (or dumb) ways not intended by the programmers.

New Comment
6 comments, sorted by Click to highlight new comments since:

These are great (and terrifying).

It’s hard to pick just one favorite, but I think I’ll go with that amazing last entry:

We noticed that our agent discovered an adversarial policy to move around in such a way so that the monsters in this virtual environment governed by the M model never shoots a single fireball in some rollouts. Even when there are signs of a fireball forming, the agent will move in a way to extinguish the fireballs magically as if it has superpowers in the environment.

Literally “hacking the Matrix to gain superpowers”.

Rereading this a year later and holy christ that example is great and terrifying.

Also recently discussed on Hacker News: https://news.ycombinator.com/item?id=18415031

[-]VikaΩ230

As a result of the recent attention, the specification gaming list has received a number of new submissions, so this is a good time to check out the latest version :).

I noticed this has already been posted to Lesswrong here: https://www.lesswrong.com/posts/AanbbjYr5zckMKde7/specification-gaming-examples-in-ai

Should I delete the post?

[-]habrykaΩ230

Seems fine to leave here, as long as we link to the other place, and the other place links to here.