Our team of five neural networks, OpenAI Five, has started to defeat amateur human teams at Dota 2. While today we play with restrictions, we aim to beat a team of top professionals at The International in August subject only to a limited set of heroes. We may not succeed: Dota 2 is one of the most popular and complex esports games in the world, with creative and motivated professionals who train year-round to earn part of Dota’s annual $40M prize pool (the largest of any esports game).
Commentary by Sam Altman: http://blog.samaltman.com/reinforcement-learning-progress
This is the game that to me feels closest to the real world and complex decision making (combining strategy, tactics, coordinating, and real-time action) of any game AI had made real progress against so far.
The agents we train consistently outperform two-week old agents with a win rate of 90-95%. We did this without training on human-played games—we did design the reward functions, of course, but the algorithm figured out how to play by training against itself.
This is a big deal because it shows that deep reinforcement learning can solve extremely hard problems whenever you can throw enough computing scale and a really good simulated environment that captures the problem you’re solving. We hope to use this same approach to solve very different problems soon. It's easy to imagine this being applied to environments that look increasingly like the real world.
I don't know how hard it would be to do a side by side "FLOPS" comparison of Dota 5v5 vs AlphaGo / AlphaZero, but it seems like they are relatively similar in terms of computational cost required to achieve something close to "human level". However, as has been noted by many, Dota is a game of vastly more complexity because of its continuous state, partial observability, large action space, and time horizon. So what does it mean when it requires roughly similar orders of magnitude of compute to achieve the same level of ability as humans, using a fairly general architecture and learning algorithm?
Some responses to AlphaGo at the time were along the lines of "Don't worry too much about this, it looks very impressive, but the game still has a discrete action space and is fully observable, so that explains why this was easy."