You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

sixes_and_sevens comments on Open Thread, March 16-31, 2012 - Less Wrong Discussion

2 Post author: OpenThreadGuy 16 March 2012 04:53AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (114)

You are viewing a single comment's thread. Show more comments above.

Comment author: sixes_and_sevens 20 March 2012 01:45:24PM 0 points [-]

My day job is DB admin and development. In the unlikely event of LW back-end admin-types being comfortable running a query sent in by some dude off the site, I wouldn't be comfortable giving it to them. The effort of due diligence on a foreign script is probably greater than that required to put it together.

The data I want correspond to:

  • the IDs (i.e. primary key, not the username) of all the users
  • the IDs (PK) and authorship (user ID) of all posts and comments in a contiguous ~3 month period
  • the adjacency of users and posts as upvotes and downvotes over this period (I assume this is a single junction table)

If I were providing this data, I would also scramble the IDs in some fashion while maintaining the underlying relationships, as consecutive IDs could provide some small clue as to the identity and chronology of users or posts. While this is pretty straightforward, the mechanism for such scrambling should not be known to recipients of the data.