You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Kawoomba comments on Open Thread, June 2-15, 2013 - Less Wrong Discussion

5 Post author: TimS 02 June 2013 02:22AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (433)

You are viewing a single comment's thread. Show more comments above.

Comment author: sixes_and_sevens 04 June 2013 04:00:50PM *  9 points [-]

I scraped the last few hundred pages of comments on Main and Discussion, and made a simple application for pulling the highest TF-IDF-scoring words for any given user.

I'll provide these values for the first ten respondents who want them. [Edit: that's ten]

EDIT: some meta-information - the corpus comprises 23.8 MB, and spans the past 400 comment pages on Main and Discussion (around six months and two and a half months respectively). The most prolific contributor is gwern with ~780kB. Eliezer clocks in at ~280kB.

Comment author: Kawoomba 06 June 2013 08:16:32PM 2 points [-]

Can you comment on your methodology - tools, wget scripts or what?

Comment author: sixes_and_sevens 06 June 2013 11:12:34PM 1 point [-]

Scraping is done with python and lxml, and the scoring is done in Java. It came about as I needed to brush up on my Java for work, and was looking for an extensible project.

I also didn't push it to my personal repo, so all requests will have to wait until I'm back at work.