Random, possibly stupid thought from my associations: what if we could create an AI capable of finding exploits in the rules of the games? Not just Goodhart the rules, but explicitly output "hey, game designers, I think this is an exploit, it's against the spirit of the game". It might have something to do with the alignment.
As i understand the linked text, EURISKO just played a game, not compared the spirit of the game with the rules as written. The latter would require general knowledge about the world at the level of current language models.
Even if an AI wouldn't explicitly search for exploits, if you just had it search for the best winning solution it's quite likely that it'd hit on something that the people making the game would consider an exploit. EURISKO did it, evolutionary algorithms often do it, and communities dedicated to specific games also often find effective strategies that are considered "exploity". So if you just had an AI optimize for winning, you could probably find lots of exploits just by looking to see what its best strategies are based on.
Yes, I understand. My whole idea is that this AI should explicitly output something like "I found this strategy and I think this is an exploit and it should be fixed" in some cases (for example, if it found dominant strategy in a game that is primarily about trade negotiations and this strategy allows you to not use trade at all. Or if it found that in a game about air combat you can fly into terrain because of a bug in game engine) and just be good at playing in other cases (for example, in chess or go).
Appendix
So anyway, what are game mechanics?