I was excited to find this site, so I wanted to know how many people had joined LessWrong. Was it what it seemed - that a lot of people had actually gathered around the theme of rational thought - or was that just wishful thinking about a site that a guy with a neat idea and his buddies put together? I couldn't find anything stating the number of members on LessWrong anywhere on the site or the internet, so I decided it would be a fun test of my search engine knowledge to nail jello to a tree and make my own.
Some argue that Google totals are completely meaningless, however, the real problem is that it's very complicated and if you don't know how search engines work, your likelihood of getting a usable number is low. I took into account the potential pitfalls when MacGyvering this figure out of Google. So far, no one has posted a significant flaw with my specific method. (I will change that statement if they do, once I've read their comment.) Also, I was right (Find in page: total).
Here is the query I constructed:
site:lesswrong.com/user -"submitted by" -"comments by"
(Translation provided at the end.)
This gets a similar result in Bing and Yahoo:
"lesswrong.com/user"
If this is correct, LessWrong has over 9,000 members. That's my claim: "LessWrong probably has over 9,000 members" not "LessWrong has exactly 9,000 members". My LessWrong population figure is likely to be low. (I explain this below.)
Why did I do this? I was really overjoyed to find this site and wanted to see whether it was somebody's personal site with just a few buddies, or if they actually managed to draw a significant gathering of people who are interested in rational thought. I was very happy to see that it looks much bigger than a personal site. Since it was so hard to find out how many users LessWrong has, I decided to share.
I think a lot of people assume the hasty generalization that "all search engine totals are meaningless". If you're an average user just plugging in search terms with little understanding of how search engines work: yes, you should regard them as meaningless. However, if you know the limitations of a technique, what parts of the system your working within are consistent and what parts of it are not, I say it is possible to get some meaning within those limitations. Do I know all the limitations? Well, I assume I am unaware of things I don't know, so I won't say that. But I do know that so far nobody has proven this number or method wrong. If you want to prove me wrong, go for it. That would be fascinating. Remember that the claim is "LessWrong probably has over 9,000 members". The entire purpose of this was to get an "at least this many" figure for how many members LessWrong has. The inaccuracies I've already taken into consideration in order to compensate for the limits of this technique are listed below:
Why this is an "at least this many" figure, pitfalls I've avoided or addressed, and inaccuracies.
- Some users may not be included in Google's index yet. For instance, if they have never posted, there may be no link to their page (which is what I searched for - user pages), and the spider would not find them. This may be restricted to members that have actually commented, posted, or have been linked to in some way somewhere on the internet.
- Search engine caches are not in real time. There can be a lag of up to months, depending on how much the search engine "likes" the page.
- It has been reported by previous employees of a major search engine that they are using crazy old computer equipment to store their caches. I've been told that it is common for sections of cache to be down for that reason.
- Search engines have restrictions in place to conserve resources. For instance, they won't let you peruse all of the results using the "next" button, and they don't total all of the results that they have when you first press "search" (you may see that number increase later if you continue to press "next" to see more pages of results.)
- It has been argued that Google doesn't interpret search terms the way you'd think. I knew that before I started. The query was designed with that in mind. I explain that here: http://lesswrong.com/r/discussion/lw/e4j/number_of_members_on_lesswrong/780g
- Some of the results in Bing and Yahoo were irrelevant, though I think I weeded them pretty thoroughly for Google if my random samples of results pages are a good indication of the whole.
- When you go to your user page, if you have more than 10 comments, a next link shows at the bottom and clicking it makes more pages appear. My understanding is that Google doesn't index these types of links - and they don't seem to be getting included. http://lesswrong.com/lw/e4j/number_of_members_on_lesswrong/7839
Go ahead and check it out - stick the query in Google and see how many LessWrong members it shows. You'll certainly get a more up-to-date total than I have posted here. ;)
Translation for those of you that don't know Google's codes:
site:lesswrong.com/user
"Search only lesswrong.com, only the user directory."
(The user directory is where each user's home page is, so I'm essentially telling it "find all the home page directories".)
-"submitted by" -"comments by"
Exclude any page in that directory with the exact text "submitted by" or "comments by"
(The submissions and comments pages use a url in that directory, so they will show up in the results if I do not subtract them. Also, I used exact text specific to those pages, so that the text in the links on user home pages do not get user home pages omitted from the search. )
Note:
I realize this number isn't scientific proof of anything, (we can't see Google's code so that would be foolish), which is why I'm not attempting to use it to convince anyone of anything important.
In July 2012, more than 600 different users posted a comment;
since March 2012, about 1600 different users;
since October 2011 — 2300;
since May 2011 — 2900;
since December 2010 — 3400;
since May 2010 — 3900;
since August 2009 — 4400.
Since the beginning, including the comments imported from Overcoming Bias, with some duplicates (people sometimes re-registered with different usernames when moving to LW, and the same username on Overcoming Bias was imported as multiple different usernames on LW if it corresponded to different emails), comments were posted under about 7500 different usernames.
Of the 4400 users who commented since August 2009, 1390 have written at least 10 comments;
900 users — at least 25 comments;
630 users — at least 50 comments;
429 users — at least 100 comments;
225 users — at least 250 comments;
134 users — at least 500 comments;
57 users — at least 1000 comments;
13 users — at least 2500 comments.
Wedrifid has written more than 10000 comments.
(Based on a wget'ed dump of all LW comments.)
One flaw: You're not locating anywhere near all of the people that registered using this method because I bet a lot of people have never commented. In one website's database that I've got access to, almost 70% of the users register without ever doing the expected main activity. Unless you spider your copy of all the comments to cache home pages, follow the links off of friends lists and include other links to home pages around the internet (like Google does, which is why I chose Google instead of wget), you're probably missing a huge proportion of the pr... (read more)