DanielVarga comments on Statistical Prediction Rules Out-Perform Expert Human Judgments - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (195)
I second the advice.
Let me brag a bit. Once in a friendly discussion the following question came up: How to predict for an unknown first name whether it is a male or female name. This was in a context of Hungarian names, as all of us were Hungarians. I had a list of Hungarian first names in digital format. The discussion turned into a bet: I said I can write a program in half an hour that tells with at least 70% precision the sex of a first name it never saw before. I am quite fast with writing small scripts. It wasn't even close: It took me 9 minutes to
The model reached an accuracy of 90%. In retrospect, this is not surprising at all. Looking into the linear model, the most important feature it identified was whether the name ends with an 'a'. This trivial model alone reaches some 80% precision for Hungarian names, so if I knew this in advance, I could have won the bet in 30 seconds instead of 9 minutes, with the sed command s/a$/a FEMALE/.
These sound like powers I should acquire. Could you drop some further hints on:
I used Zhang Le's tool. Note that it is a rather obscure thing, not an industry standard like say, the huge Weka and Mallet packages. It made very easy the tasks you ask for. When I had a train and test data featurized,
built the model and
told me its accuracy on the test data.