AlphaGo versus Lee Sedol

gjm

There have been a couple of brief discussions of this in the Open Thread, but it seems likely to generate more so here's a place for it.

The original paper in Nature about AlphaGo.

Google Asia Pacific blog, where results will be posted. DeepMind's YouTube channel, where the games are being live-streamed.

Discussion on Hacker News after AlphaGo's win of the first game.

There have been a couple of brief discussions of this in the Open Thread, but it seems likely to generate more so here's a place for it.

The original paper in Nature about AlphaGo.

Google Asia Pacific blog, where results will be posted. DeepMind's YouTube channel, where the games are being live-streamed.

Discussion on Hacker News after AlphaGo's win of the first game.

One difference is that you can't get feedback as fast when dealing with human judgement rather than win/lose in a game (where AlphaGo can play millions of games against itself).

Yes it would require a lot of human input.

However the AI could learn to predict what humans like, and then use that as it's judge. Trying to produce songs that it predicts humans will like. Then when it tests it on actual humans, it can see if it's predictions were right and improve them.

This is also a domain with vast amounts of unsupervised data available. We've created millions of songs, which it can learn from. Out of the space of all possible sounds, we've decided that this tiny subset is pleasing to listen to. There's a lot of information in that.

3gwern10y

You can get fast feedback by reusing existing databases if your RL agent can do off-policy learning. (You can consider this what the supervised pre-learning phase is 'really' doing.) Your agent doesn't have to take an action before it can learn from it. Consider the experience replay buffers. You could imagine a song-writing RL agent which has a huge experience replay buffer which is made just of fragments of songs you grabbed online (say, from the Touhou megatorrent with its 50k tracks).

30

AlphaGo versus Lee Sedol

30

30

30

AlphaGo versus Lee Sedol

30

30