How big a deal is this? What, if anything, does it signal about when we get smarter than human AI?
Thanks. Key quote:
What this indicates is not that deep learning in particular is going to be the Game Over algorithm. Rather, the background variables are looking more like "Human neural intelligence is not that complicated and current algorithms are touching on keystone, foundational aspects of it." What's alarming is not this particular breakthrough, but what it implies about the general background settings of the computational universe.
His argument proves too much.
You could easily transpose it for the time when Checkers or Chess programs beat professional players: back then the "keystone, foundational aspect" of intelligence was thought to be the ability to do combinatorial search in large solution spaces, and scaling up to AGI was "just" a matter of engineering better heuristics. Sure, it didn't work on Go yet, but Go players were not using a different cortical algorithm than Chess players, were they?
Or you could transpose it for the time when MCTS Go programs reached "dan" (advanced amateur) level. They still couldn't beat professional players, but professional players were not using a different cortical algorithm than advanced amateur players, were they?
AlphaGo succeded at the current achievement by using artificial neural networks in a regime where they are know to do well. But this regime, and the type of games like Go, Chess, Checkers, Othello, etc. represent a small part of the range of human cognitive tasks. In fact, we probably find this kind of board games fascinating precisely because they are very different than the usual cognitive stimuli we deal with in everyday life.
It...
It's a big deal for Go, but I don't think it's a very big deal for AGI.
Conceptually Go is like Chess or Checkers: fully deterministic, perfect information two-player games.
Go is more challenging for computers because the search space (and in particular the average branching factor) is larger and known position evaluation heuristics are not as good, so traditional alpha-beta minimax search becomes infeasible.
The first big innovation, already put into use by most Go programs for a decade (although the idea is older) was Monte Carlo tree search, which addresses the high branching factor issue: while traditional search either does not expand a node or expands it and recursively evaluates all its children, MCTS stochastically evaluates nodes with a probability that depends on how promising they look, according to some heuristic.
DeepMind's innovation consists in using a NN to learn a good position evaluation heuristic in a supervised fashion from a large database of professional games, refining it with reinforcement learning in "greedy" self-play mode and then using both the refined heuristic and the supervised heuristic in a MCTS engine.
Their approach essentially relies on big...
How big a deal is this? What, if anything, does it signal about when we get smarter than human AI?
It shows that Monte-Carlo tree search meshes remarkably well with neural-network-driven evaluation ("value networks") and decision pruning/policy selection ("policy networks"). This means that if you have a planning task to which MCTS can be usefully applied, and sufficient data to train networks for state-evaluation and policy selection, and substantial computation power (a distributed cluster, in AlphaGo's case), you can significantly improve performance on your task (from "strong amateur" to "human champion" level). It's not an AGI-complete result however, any more than Deep-Blue or TD-gammon were AGI-complete.
The "training data" factor is a biggie; we lack this kind of data entirely for things like automated theorem proving, which would otherwise be quite amenable to this 'planning search + complex learned heuristics' approach. In particular, writing provably-correct computer code is a minor variation on automated theorem proving. (Neural networks can already write incorrect code, but this is not good enough if you want a provably Friendly AGI.)
This is a big deal, and it is another sign that AGI is near.
Intelligence boils down to inference. Go is an interesting case because good play for both humans and bots like AlphaGo requires two specialized types of inference operating over very different timescales:
Machines have been strong in planning/search style inference for a while. It is only recently that the slower learning component (2nd order inference over circuit/program structure) is starting to approach and surpass human level.
Critics like to point out that DL requires tons of data, but so does the human brain. A more accurate comparison requires quantifying the dataset human pro go players train on.
A 30 year old asian pro will have perhaps 40,000 hours of play...
I'm not going to argue that you should pay attention to EY. His arguments convince me, but if they don't convince you, I'm not gonna do any better.
What I'm trying to get at is, when you ask "is there any evidence that will result in EY ceasing to urgently ask for your money?"... I mean, I'm sure there is such evidence, but I don't wish to speak for him. But it feels to me that by asking that question, you possibly also think of EY as the sort of person who says: "this is evidence that AI risk is near! And this is evidence that AI risk is near! Everything is evidence that AI risk is near!" And I'm pointing out that no, that's not how he acts.
While we're at it, this exchange between us seems relevant. ("Eliezer has said that security mindset is similar, but not identical, to the mindset needed for AI design." "Well, what a relief!") You seem surprised, and I'm not sure what about it was surprising to you, but I don't think you should have been surprised.
Basically, even if you're right that he's wrong, I feel like you're wrong about how he's wrong. You seem to have a model of him which is very different from my model of him.
(Btw, his opinion seems to be that AlphaGo's methods are what makes it more of a leap than a self-driving car or than Deep Blue, not the results. Not sure that affects your position.)
I also think MIRI should stop hitting people up for money and get a normal funding stream going. You know, let their ideas of how to avoid UFAI compete in the normal marketplace of ideas.
Currently MIRI gets their funding by 1) donations 2) grants. Isn't that exactly what the normal funding stream for non-profits is?
I should say, getting this working is very impressive, and took an enormous amount of effort. +1 to the team!
An interesting comment:
...The European champion of Go is not the world champion, or even close. The BBC, for example, reported that “Google achieves AI ‘breakthrough’ by beating Go champion,” and hundreds of other news outlets picked up essentially the same headline. But Go is scarcely a sport in Europe; and the champion in question is ranked only #633 in the world. A robot that beat the 633rd-ranked tennis pro would be impressive, but it still wouldn’t be fair to say that it had “mastered” the game. DeepMind made major progress, but the Go journey is still
This is a big deal, and it is another sign that AGI is near.
Intelligence boils down to inference. Go is an interesting case because good play for both humans and bots like AlphaGo requires two specialized types of inference operating over very different timescales:
Machines have been strong in planning/search style inference for a while. It is only recently that the slower learning component (2nd order inference over circuit/program structure) is starting to approach and surpass human level.
Critics like to point out that DL requires tons of data, but so does the human brain. A more accurate comparison requires quantifying the dataset human pro go players train on.
A 30 year old asian pro will have perhaps 40,000 hours of playing experience (20 years 50 40 hrs/week). The average game duration is perhaps an hour and consists of 200 moves. In addition, pros (and even fans) study published games. Reading a game takes less time, perhaps as little as 5 minutes or so.
So we can estimate very roughly that a top pro will have absorbed between 100,000 games to 1 million games, and between 20 to 200 million individual positions (around 200 moves per game) .
AlphaGo was trained on the KGS dataset: 160,00 games and 29 million positions. So it did not train on significantly more data than a human pro. The data quantities are actually very similar.
Furthermore, the human's dataset is perhaps of better quality for a pro, as they will be familiar with mainly pro level games, whereas the AlphaGo dataset is mostly amateur level.
The main difference is speed. The human brain's 'clockrate' or equivalent is about 100 hz, whereas AlphaGo's various CNNs can run at roughly 1000hz during training on a single machine, and perhaps 10,000 hz equivalent distributed across hundreds of machines. 40,000 hours - a lifetime of experience - can be compressed 100x or more into just a couple of weeks for a machine. This is the key lesson here.
The classification CNN trained on KGS was run for 340 million steps, which is about 10 iterations per unique position in the database.
The ANNs that AlphaGo uses are much much smaller than a human brain, but the brain has to do a huge number of other tasks, and also has to solve complex vision and motor problems just to play the game. AlphaGO's ANNs get to focus purely on Go.
A few hundred TitanX's can muster up perhaps a petaflop of compute. The high end estimate of the brain is 10 petaflops (100 trillion synapses 100 hz max firing rate). The more realistic estimate is 100 teraflops (100 trillion synapes 1 hz avg firing rate), and the lower end is 1/10 that or less.
So why is this a big deal? Because it suggests that training a DL AI to master more economically key tasks, such as becoming an expert level programmer, could be much closer than people think.
The techniques used here are nowhere near their optimal form yet in terms of efficiency. When Deep Blue beat Kasparov in 1996, it required a specialized supercomputer and a huge team. 10 years later chess bots written by individual programmers running on modest PC's soared past Deep Blue - thanks to more efficient algorithms and implementations.
A 30 year old asian pro will have perhaps 40,000 hours of playing experience (20 years 50 40 hrs/week). The average game duration is perhaps an hour and consists of 200 moves. In addition, pros (and even fans) study published games. Reading a game takes less time, perhaps as little as 5 minutes or so.
So we can estimate very roughly that a top pro will have absorbed between 100,000 games to 1 million games, and between 20 to 200 million individual positions (around 200 moves per game) .
I asked a pro player I know whether these numbers sounded reasonabl...
DeepMind's go AI, called AlphaGo, has beaten the European champion with a score of 5-0. A match against top ranked human, Lee Se-dol, is scheduled for March.
Games are a great testing ground for developing smarter, more flexible algorithms that have the ability to tackle problems in ways similar to humans. Creating programs that are able to play games better than the best humans has a long history
[...]
But one game has thwarted A.I. research thus far: the ancient game of Go.