Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: VipulNaik 25 December 2016 02:48:55PM 2 points [-]

Good point! Something I thought a bit about but didn't get around to discussing in this post. The Slate Star Codex audience returned a total of 618 responses. I don't have a very good idea of how many people read the SSC blog carefully enough to go through all the links, but my best guess is that that number is in the low thousands. If that's the case the response rate is 15% or higher. This is still low but not that low.

Another way of framing this: how low would the response rate have to be for the true SSC readership to be like the SurveyMonkey Audience or Google Surveys audiences? Based on the numbers it seems like the selection bias would have to be really strong for that to happen.

So while I don't think selection for Wikipedia specifically is the driving factor here, it could be that rather than talk about SSC readership, it makes more sense to talk about "SSC readers who are devoted enough and curious enough to read through every link in the link roundup."

On a related note, effective response rates for on-site Wikipedia surveys (which we didn't discuss here, but might be the subject of future posts) can be around 0.1% to 0.2%, see for instance Why We Read Wikipedia (to get the response rate you would need to use existing info on the number of pageviews to Wikipedia; I have emailed the researchers and confirmed that the response rate was in that ballpark). Compared to that, the SSC response rate seems pretty high and more definitely informative about the population.

Comment author: VipulNaik 17 March 2017 10:00:37PM 0 points [-]

The 2017 SSC Survey had 5500 respondents. Presumably this survey was more widely visible and available than mine (which was one link in the middle of a long link list).

https://slatestarcodex.com/2017/03/17/ssc-survey-2017-results/

Comment author: hofmannsthal 04 January 2017 07:56:21AM 2 points [-]

If any, is there a shortlist of alternatives that people use over wikipedia? i.e. people who just go to the first result or who avoid wiki; where do they end up?

In my experience, the first link is typically wikipedia if you search for an exact term rather than a question ("What is the capital of italy" vs "Italy capital". the latter puts wiki at the top for me).

Comment author: VipulNaik 07 January 2017 11:58:42PM 1 point [-]

Varies heavily by context. Typical alternatives:

(a) Google's own answers for simple questions.

(b) Transactional websites for search terms that denote possible purchase intent, or other websites that are action-oriented (e.g., Yelp reviews).

(c) More "user-friendly" explanation sites (e.g., for medical terminology, a website that explains it in a more friendly style, or WikiHow)

(d) Subject-specific references (some overlap with (c), but could also include domain Wikias, or other wikis)

(e) When the search term is trending because of a recent news item, then links to the news item (even if the search query itself does not specify the associated news)

Comment author: chaosmage 28 December 2016 11:00:41PM 1 point [-]

My guess is there is a huge spread in how much people read. SSC and the LessWrong sequences are indigestible if you cannot comfortably stomach 20000 or more words in a day. Lots of people read way less than that!

I suspect this is a big part of the reason we're such a high IQ crowd: you have to be super verbal to absorb this stuff! Map and Territory, Consequentialism and even AI risk aren't actually terribly complicated ideas, but we have a tradition of transporting them in long blog posts, and generally a culture of communication that optimizes for precision at the cost of conciseness.

I think you've discovered that Wikipedia is similarly more a "verbal elite" thing. My prediction would be that number of books read this year is very highly correlated with Wikipedia use, and number of academic papers read even higher. And both of those I would expect are also highly correlated with SSC / LW readership.

I've sat through quite a number of academic presentations that were obviously heavily based on Wikipedia articles (this can be easy to tell in humanities subjects and if you're the person who wrote that article) but not mentioned to be so. I therefore suspect Wikipedia is the most-plagiarized source of information in the world. So I don't think it is that important whether people get information from Wikipedia directly. If they can get information from somebody who got it from Wikipedia, that should be enough.

Comment author: VipulNaik 28 December 2016 11:19:40PM 0 points [-]

Interesting. I suspect that even among verbal elites, there are further splits in the type of consumption. Some people are heavy on reading books since they want a full, cohesive story of what's happening, whereas others consume information in smaller bits, building pieces of knowledge across different domains. The latter would probably use Wikipedia more.

Similarly, some people like opinion-rich material whereas others want factual summaries more. The factual summary camp probably uses Wikipedia more.

However, I don't know if there are easy ways of segmenting users, i.e., I don't know if there are websites or communities that are much more dominated by users who prefer longer content, or users who prefer factual summaries.

Comment author: ChristianKl 25 December 2016 11:33:24PM 1 point [-]

It might be possible to get Scott to include the "number of Wikipedia pages read per week" into his next census. That would give more accurate base rates.

Comment author: VipulNaik 26 December 2016 05:22:30AM 1 point [-]

Good idea, but I don't think he does the census that frequently. The most recent one I can find is from 2014: http://slatestarcodex.com/2015/11/04/2014-ssc-survey-results/

The annual LessWrong survey might be another place to consider putting it. I don't know who's responsible for doing it in 2017, but when I find out I'll ask them.

Comment author: gwern 25 December 2016 02:39:20PM 3 points [-]

I'm still a little surprised at the low effect sizes of demographic differences within the United States. Still, a lot of questions can be raised about the methodology. Other than gender, we didn't really collect large samples for anything.

Do you think you should've spent more for larger samples? $325 is really not that much money, especially considering how much time it takes to set up and analyze anything.

Comment author: VipulNaik 25 December 2016 03:00:31PM 2 points [-]

It's not too late, if I do so decide :). In other words, it's always possible to spend later for larger samples, if that actually turns out to be something I want to do.

Right now, I think that:

  • It'll be pretty expensive: I'd probably want to spend using several different survey tools, since each has its strengths and weaknesses (so SurveyMonkey, Google Surveys, maybe Survata and Mechanical Turk as well). Then with each I'd need 1000+ responses to be able to regress against all variables and variable pairs. The costs do add up quickly to over a thousand dollars.

  • I don't currently have that much uncertainty: It might show that age and income actually do explain a little more of the variation than it seems right now (and that would be consistent with the Pew research). But I feel that we already have enough data to see that it doesn't have anywhere near the effect that SSC membership has.

I'm open to arguments to convince me otherwise.

Comment author: Lumifer 15 July 2016 03:22:12PM 1 point [-]

Would you care to come to some conclusions on the basis of these surveys, maybe even speculate a bit? What did you find to be particularly interesting?

Comment author: VipulNaik 25 December 2016 02:50:17PM *  0 points [-]

I've published a new version of this post where the takeaways are more clearly highlighted (I think!). The post is longer but the takeaways (which are summarized on top) should be quick to browse if you're interested.

It's at http://lesswrong.com/r/discussion/lw/odb/wikipedia_usage_survey_results/

Comment author: ChristianKl 25 December 2016 02:34:09PM 2 points [-]

It seems to me like heavy users of Wikipedia are more likely to fill out a survey for Wikipedia users. On the other hand there's no similar filter for the survey monkey audience.

Comment author: VipulNaik 25 December 2016 02:48:55PM 2 points [-]

Good point! Something I thought a bit about but didn't get around to discussing in this post. The Slate Star Codex audience returned a total of 618 responses. I don't have a very good idea of how many people read the SSC blog carefully enough to go through all the links, but my best guess is that that number is in the low thousands. If that's the case the response rate is 15% or higher. This is still low but not that low.

Another way of framing this: how low would the response rate have to be for the true SSC readership to be like the SurveyMonkey Audience or Google Surveys audiences? Based on the numbers it seems like the selection bias would have to be really strong for that to happen.

So while I don't think selection for Wikipedia specifically is the driving factor here, it could be that rather than talk about SSC readership, it makes more sense to talk about "SSC readers who are devoted enough and curious enough to read through every link in the link roundup."

On a related note, effective response rates for on-site Wikipedia surveys (which we didn't discuss here, but might be the subject of future posts) can be around 0.1% to 0.2%, see for instance Why We Read Wikipedia (to get the response rate you would need to use existing info on the number of pageviews to Wikipedia; I have emailed the researchers and confirmed that the response rate was in that ballpark). Compared to that, the SSC response rate seems pretty high and more definitely informative about the population.

Comment author: VipulNaik 25 December 2016 02:17:45PM 1 point [-]

Per the suggestion at Improve comments by tagging claims, here is a comment to collect discussion of the third takeaway:

The gap between elite samples of Wikipedia users and general United States Internet users is significantly greater than the gap between the different demographics within the United States that we measured. It is comparable to the gap between United States Internet users and Internet users in low-income countries.

I'm still a little surprised at the low effect sizes of demographic differences within the United States. Still, a lot of questions can be raised about the methodology. Other than gender, we didn't really collect large samples for anything. And Google Surveys uses inferred values for age and income for most respondents, so it's probably not that reliable.

The Pew Internet surveys offer some independent evidence of the strength of the correlation of Wikipedia use with gender, age, and income, but the questions there are too coarse (just asking people whether they use Wikipedia).

Could there be other demographic variables that we didn't explore that could have higher predictive power?

Comment author: VipulNaik 25 December 2016 02:12:44PM 0 points [-]

Per the suggestion at Improve comments by tagging claims, here is a comment to collect discussion of the second takeaway:

we’ve revised upward our estimate of the impact per pageview, and revised downward our estimate of the broad appeal and reach of Wikipedia.

A lot of this comes down to whether the indicators we've identified for heavy Wikipedia use actually are things to be optimistic about. Is the typical SSC or LessWrong reader better able to use information gleaned from Wikipedia?

And what about the alleged downside that Wikipedia is being read by fewer people than we might think? How much does that cut into the value of writing pages with hopefully broad appeal?

Comment author: VipulNaik 25 December 2016 02:04:35PM 1 point [-]

Per the suggestion at Improve comments by tagging claims, here is a comment to collect discussion of the first takeaway:

Wikipedia consumption is heavily skewed toward a profile of “elite” people, and these people use the site in qualitatively different ways.

I didn't talk about it much in the post since it would be too speculative, but I'm interested in more concrete thoughts on predicting what websites or online communities would have a high degree of Wikipedia use. The SurveyMonkey Audience and Google Surveys results plausibly show that crude demographic proxies such as intelligence, education, wealth, income, gender, age, etc. have very little predictive power compared with something like "reads Slate Star Codex and is willing to click to a survey link from there."

I wonder what sort of attributes might be most predictive of using Wikipedia a lot. I'd say it's something like "general intellectual curiosity": curiosity of an intellectual kind, but general, across domains, rather than narrowly related to one domain where one can achieve enough mastery so as not to need Wikipedia. I do know of curious people who don't use Wikipedia much, because their curiosity is specific to some domains where they have far surpassed Wikipedia, or Wikipedia doesn't cover well.

I wonder what other websites similar to SSC might qualify. Would LessWrong? Wait But Why? EconLog? Overcoming Bias? XKCD? SMBC Comics?

I also wonder what friend networks or other online community filters would predict high Wikipedia use. Does being a Yudkowsky follower on Facebook predict high Wikipedia use? What about being in particular subreddits?

Comment author: VipulNaik 25 December 2016 02:06:39PM 0 points [-]

On a related note, one of famous LessWronger Carl Shulman's research suggestions mentions Wikipedia:

Try to get datasets (Wikipedia lists, World Bank info, USDA, etc.) as a primary step in thinking about a question.

From his research advice document

View more: Next