You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

V_V comments on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning - Less Wrong Discussion

14 Post author: ESRogs 27 January 2016 09:04PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (122)

You are viewing a single comment's thread. Show more comments above.

Comment author: V_V 31 January 2016 08:23:08PM *  0 points [-]

The best setting for that is probably only 3-5 characters, not 20.

In NLP applications where Markov language models are used, such as speech recognition and machine translation, the typical setting is 3 to 5 words. 20 characters correspond to about 4 English words, which is in this range.

Anyway, I agree that in this case the order-20 Markov model seems to overfit (Googling some lines from the snippets in the post often locates them in an original source file, which doesn't happen as often with the RNN snippets). This may be due to the lack of regularization ("smoothing") in the probability estimation and the relatively small size of the training corpus: 474 MB versus the >10 GB corpora which are typically used in NLP applications. Neural networks need lots of data, but still less than plain look-up tables.