You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

gjm comments on [LINK] Deep Learning Machine Teaches Itself Chess in 72 Hours - Less Wrong Discussion

8 Post author: ESRogs 14 September 2015 07:38PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (15)

You are viewing a single comment's thread. Show more comments above.

Comment author: gjm 18 September 2015 11:41:08AM 1 point [-]

It looks to me as if they did the following:

  • Design the features:
    • Manually try various combinations of features. For each candidate feature-set, attempt to learn Stockfish's evaluation function.
  • Having chosen features, learn the weights:
    • Initialize weights via some kind of bootstrapping process using a manually-designed (but deliberately rather stupid) evaluator.
    • Optimize weights by unsupervised TD-leaf learning using a large database of positions (from computer-computer games) as starting points for self-play.
Comment author: Douglas_Knight 18 September 2015 04:36:57PM 1 point [-]

That sounds correct, but in the first step, I think what they are optimizing is not (just) features but representations of those features. The natural language descriptions of the features are very simple ones that require no expertise, but some representations and ways of wiring them together are more conducive to learning and more conducive to building higher level features that Stockfish uses. But, again, it doesn't sound like they are applying much optimization power at this step.

Also, one thing that I think is just omitted is whether the network in the first step is the same network in the later steps, or whether the later steps introduce more layers to exploit the same features.

Comment author: gjm 18 September 2015 09:26:26PM 0 points [-]

Features and representations: agreed. (I wasn't trying to be precise.)

I assumed the same network in the first step as later, but agree that it isn't made explicit in the paper.