AI Safety, Anthropomorphizing, and Action Spaces
Question: Why doesn't AlphaGo ever try to spell out death threats on the board and intimidate its opponent into resigning? This seems like it would be a highly effective strategy for winning.
At a guess AlphaGo doesn't because it isn't an agent. Which just passes the buck to why isn't it an agent, so at a guess it's a partial agent. What this means is kind of like, it's a good sport - it's not going to try to spell out death threats. (Though this seems more to do with it a) it not knowing language - imagine trying to spell out threats to aliens you've never seen on a Go board, when a1) you don't have a language, a2) the aliens don't know your language, and b):
It's answering a question about its model of the world which is different from the real world.
) Though it was trained via simulation/watching pro games (depending on the version). If you just trained such a program on a database where that was a strategy, maybe you'd get something that would. Additionally, AI has a track record of also being (what some might call) a bad sport - using "cheats" and the like. It's kind of about the action space and the training I'd guess.
Basically, if you're looking for an AI to come up with new ways of being evil, maybe it needs a head start - once a bot understands that some patterns spelled out on the board will work well against a certain type of opponent*, maybe it'll try to find patterns that do that. Maybe it's an "architecture" issue, not a training issue - Monte Carlo Tree Search might be well suited to beating Go, but not to finding ways to spell out death threats on a Go board in the middle of a game. (I also don't think that's a good strategy a priori.)
*You could test how different ways of training turn out if you add a way to cheat/cheatcodes - like if you spell out "I WIN" or one swear word** you win.
**I imagine trying to go all the way to threats immediately (inside the game Go) isn't going to go very fast, so you have to start small.
My thoughts on this:
The main reason seems to me that Alpha Go is trained primarily on self-play, or on imitating existing top players, and as such there is very little training data that could cause it to build a model that includes humans (and it isn't remotely good enough at generalization to generalize from that training data to human models).
In a world where Alpha Go was trained on a very accurate simulation of a human, in the same way, I expect that it would learn intimidation strategies reasonably well, in particular if the simulated humans are static and don't learn in response to Alpha Go's actions.