dbaupp comments on What's going on here? - Less Wrong

15 Post author: RobertLumley 27 January 2012 01:38AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (61)

You are viewing a single comment's thread. Show more comments above.

Comment author: dbaupp 27 January 2012 01:58:01PM *  0 points [-]

I'd be interested in PageRank on LW.

Or, if not that, just the most linked to posts. (It'd be cool if one of the Trike Apps people could run a regex (e.g. "lesswrong.com/[^ ]*") over the database (reddit_data_comment[key=data] and reddit_data_article[key=article], I believe) and publish a .txt dump of that somewhere.)

Comment author: shokwave 27 January 2012 02:35:23PM 2 points [-]

I'll be in the TrikeApps office about a week from now; I'll do my best to remember this and have something workable ready to offer to them; can't promise they'll be excited about data-mining LessWrong though.

Comment author: dbaupp 29 January 2012 11:21:07AM 1 point [-]

Thanks! (I'm not getting my hopes up for it.)

I've knocked out something quickly. I've got no idea how fast it will be over the ~250000 comments (there are probably some performance improvements by replacing "for ... in cursor" with a paged retrieve).

I believe that that will only keep public posts (so no drafts or deleted posts), I'm not so sure about the comments though (I'm not sure if comments on deleted articles are kept or not, or if there is such a thing as a "private" comment that I'm not filtering properly).

That script is a "best case" situation, since it records the origin along with the target of each link (and the date/karma too). If that data was to be published, I'll do try some analysis (and maybe even a proper article!).