In response to Linkposts now live!
Comment author: VipulNaik 28 September 2016 10:56:22PM *  4 points [-]

I'm unable to edit past posts of mine; it seems that this broke very recently and I'm wondering if it's related to the changes you made.

Specifically, when I click the Submit or the "Save and Continue" buttons after making an edit, it goes to lesswrong.com/submit with a blank screen. When I look at the HTTP error code it says it's a 404.

I also checked the post after that to see if the edit still went through, and it didn't. In other words, my edit did not get saved.

Do you know what's going on? There were a few corrections/expansions on past posts that I need to push live soon.

Comment author: ThisSpaceAvailable 21 August 2016 01:31:02AM *  4 points [-]

I suppose this might be better place to ask than trying to resurrect a previous thread:

What kind of statistics can Signal offer on prior cohorts? E.g. percentage with jobs, percentage with jobs in data science field, percentage with incomes over $100k, median income of graduates, mean income of graduates, mean income of employed graduates, etc.? And how do the different cohorts compare? (Those are just examples; I don't necessarily expect to get those exact answers, but it would be good to have some data and have it be presented in a manner that is at least partially resistant to cherry picking/massaging, etc.) Basically, what sort of evidence E does Signal have to offer, such that I should update towards it being effective, given both E, and "E has been selected by Signal, and Signal has an interest in choosing E to be as flattering rather than as informative as possible" are true?

Also, the last I heard, there was a deposit requirement. What's the refund policy on that?

Comment author: VipulNaik 22 August 2016 11:48:33PM *  0 points [-]

One relevant consideration in such an evaluation is that Signal's policies with respect to various things (like percentage of income taken, initial deposit, length of program) may have changed since the program's inception. Of course, the program itself has changed since it started. Therefore, feedback or experiences from students in initial cohorts needs to be viewed in that light.

Disclosure: I share an apartment with Jonah Sinick, co-founder of Signal. I have also talked extensively about Signal with Andrew J. Ho, one of its key team members, and somewhat less extensively with Bob Cordwell, the other co-founder. ETA: I also conducted a session on data science and machine learning engineering in the real world (drawing on my work experience) with Signal's third cohort on Saturday, August 20, 2016.

Comment author: gwern 15 July 2016 05:15:00PM 6 points [-]

Their motivation is public education & outreach:

Vipul and I ultimately want to get a better sense of the value of a Wikipedia pageview (one way to measure the impact of content creation), and one way to do this is to understand how people are using Wikipedia. As we focus on getting more people to work on editing Wikipedia – thus causing more people to read the content we pay and help to create – it becomes more important to understand what people are doing on the site.

This is a topic I've wondered about myself, as I occasionally spend substantial amounts of time trying to improve Wikipedia articles; most recently GCTA, liability threshold model, result-blind peer review, missing heritability problem, Tominaga Nakamoto, & debunking urban legends (Rutherford, Kelvin, Lardner, bicycle face, Feynman IQ, MtGox). Even though I've been editing WP since 2004, it can be deeply frustrating (look at the barf all over the result-blind peer review right now) and I'm never sure if it's worth the time.

Results:

  • most people in LW/SSC/WP/general college-educated SurveyMonkey population/Vipul Naik's social circles read WP regularly (with a skew to reading WP a huge amount), have some preference for it in search engines & sometimes search on WP directly, every few months is surprised by a gap in WP which could be filled (sounding like a long tail of BLPs and foreign material; the latter being an area that the English WP has always been weak in)

    • reading patterns in the total sample match aggregate page-view statistics fairly well; respondents tend to have read the most popular WP articles
  • they primarily skim articles; reading usage tends to be fairly superficial, with occasional use of citations or criticism sections but not any more detailed evaluation of the page or editing process

At face value, this suggests that WP editing may not be that great a use of time. Most people do not read the articles carefully, and aggregate traffic suggests that the sort of niche topics I write on is not reaching all the people one might hope. For example, take threshold models & GCTA traffic statistcs - 74/day and 35/day respectively, or maybe 39k page views a year total. (Assuming, of course, that my contributions don't get butchered.) This is not a lot in general - I get more like 1k page views a day on gwern.net. A blogpost making it to the front page of Hacker News frequently gets 20k+ page views within the first few days, for comparison.

I interpret this as implying that a case for WP editing can't be made based on just the traffic numbers. I may get 1k page views a day, but relatively little of that is to pages using GCTA or threshold models even in passing. It may be that writing those articles is highly effective because when someone does need to know about GCTA, they'll look it up on WP and read it carefully (even though they don't read most WP pages carefully), and over the years, it'll have a positive effect on the world that way. This is harder to quantify in a survey, since people will hardly remember what changed their beliefs (indeed, it sounds like most people find it hard to remember how they use WP at all, it's almost like asking how people use Google searches - it's so engrained).

My belief is that WP editing can have long-term effects like that, based primarily on my experiences editing Neon Genesis Evangelion and tracking down references and figuring out the historical context. I noticed that increasingly discussions of NGE online took on a much better informed hue, and in particular, the misguided obsession with the Christian & Kabbalic symbolism has died down a great deal, in part due to documenting staff quotes denying that the symbolism was important. On the downside, if you look through the edit history, you can see that a lot of terrific (and impeccably sourced) material I added to the article has been deleted over the years. So YMMV. Presumably working on scientific topics will be less risky.

Comment author: VipulNaik 17 July 2016 12:16:46AM *  3 points [-]

I think Issa might write a longer reply later, and also update the post with a summary section, but I just wanted to make a quick correction: the college-educated SurveyMonkey population we sampled in fact did not use Wikipedia a lot (in S2, CEYP had fewer heavy Wikipedia users than the general population).

It's worth noting that the general SurveyMonkey population as well as the college-educated SurveyMonkey population used Wikipedia very little, and one of our key findings was the extent to which usage is skewed to a small subset of the population that uses it heavily (although almost everybody has heard of it and used it at some point). Also, the responses to S1Q2 show that the general population rarely seeks Wikipedia actively, in contrast with the small subset of heavy users (including many SSC readers, people who filled my survey through Facebook).

Your summary of the post is an interesting take on it (and consistent with your perspective and goals) but the conclusions Issa and I drew (especially regarding short-term value) were somewhat different. In particular, both in terms of the quantity of traffic (over a reasonably long time horizon) and the quality and level of engagement with pages, Wikipedia does better than a lot of online content. Notably, it does best in terms of having sustained traffic, as opposed to a lot of "news" that trends for a while and then drops sharply (in marketing lingo, Wikipedia content is "evergreen").

Comment author: JonahSinick 12 April 2016 01:26:52AM *  1 point [-]

Hi Toggle,

Thanks for your question!

Most of our students have just started looking for jobs over the past ~2 weeks, and the job search process in the tech sector typically takes ~2 months, from sending out resumes to accepting offers (see, e.g. "Managing your time" in Alexei's post Maximizing Your Donations via a Job).

The feedback loop here is correspondingly longer than we'd like. We expect to have an answer to your question by the time we advertise our third cohort.

Comment author: VipulNaik 07 July 2016 05:51:45AM 1 point [-]

Following up!

Comment author: VipulNaik 16 April 2016 04:16:59PM 2 points [-]

[Comment cross-posted to the Effective Altruism Forum]

[I will use "Effective Altruists" or "EAs" to refer to the people who self-identify as members of the community, and "effective altruists" (without capitalization) for people to whom effectiveness matters a lot in altruism, regardless of whether they self-identify as EAs.]

I think this post makes some important and valuable points. Even if not novel, the concise summary here could make for a good WikiHow article on how to be a more effective fundraiser. However, I believe that this post falls short by failing to mention, let alone wrestle with, the tradeoffs involved with these strategies.

I don't believe there is a clear and obvious answer to the many tradeoffs involved with adopting various sales tactics that compromise epistemic value. I believe, however, that not even acknowledging these tradeoffs can lead to potentially worse decisions.

My points below overlap somewhat.

First, effective altruists in general, and EAs in particular, are a niche segment in the philanthropic community. The rules for selling to this niche can differ from the rules of selling to the general public. So much so that sales tactics that are considered good for the general public are actively considered bad when selling to this niche. Putting an identifiable victim may help with, say, 30% of potential donors in the general public, but alienate 80% of potential donors among effective altruists, because they have (implicitly or explicitly) learned to overcome the identifiable victim effect. In general, using messaging targeted at the public for a niche that is often based, implicitly or explicitly, on rejecting various aspects of such messaging, is a bad thing. A politician does not benefit from taking positions held by the majority of people all the time; rather, whereas some politicians are majoritarian moderates, others seek specific niches where their support is strong, often with the alienation of a majority as a clear consequence (for instance, a politician in one subregion of a country may adopt rhetoric and policies that make the politician unpopular countrywide but guarantee re-election in that subregion). Similarly, not every social network benefits from adopting Facebook's approach to partial openness and diversity of forms of expression. Snapchat, Pinterest, and Twitter have each carved a niche based on special features they have.

Second, in addition to the effect in rhetorical terms, it's also important to consider the effect in substantive terms on how the organizations involved spend their money and resources, and make decisions. Ideally, you can imagine a wall of separation: the organization focuses on being maximally effective, and a separate sales/fundraising group optimizes the message for the general public. However, many of the strategies suggested here actually affect the organization's core functions. Pairing donors with individual recipients significantly affects the organization's operations on the ground, raising costs. Could this in the long run lead to e.g. organizations selecting to operate in areas where recipients have characteristics that make them more interesting to donors to communicate with (e.g., they are more familiar with the language of the donor's country?). I don't see a way of making overall effectiveness, in the way that many EAs care about, still the dominant evaluation criterion if fundraising success is tied heavily to other outreach strategies.

Third (building somewhat on the first), insofar as there is a tradeoff between being able to sell more to effective altruists versus appealing more to the general public, the sign of the financial effect is actually ambiguous. The number of donors in the general public is much larger, but the amount that they donate per capita tends to be smaller. One of the ingredients to EA success is that its strength lies not so much in its numbers but in the depth of convictions of many self-identified EAs, plus other effective altruists (such as GiveWell donors). People who might have previously donated a few hundred dollars a year for an identifiable victim may now be putting in tens of thousands of dollars because the large-scale statistics have touched them in a deeper way. GiveWell moved $103 million to its top charities in 2015, of which $70 million was from Good Ventures (that's giving away money from a Facebook co-founder) and another $20 million is from individual donors who are giving amounts in excess of $100,000 each. To borrow sales jargon, these deals are highly lucrative and took a long time to close. Closing them required the donor to have high confidence in the epistemic rigor from a number of donors, many of whom were probably jaded by psychologically pitch-perfect campaigns. I'm not even saying that GiveWell's reviews are actually rigorous, but rather, that the perception of rigor surrounding them was a key aspect to many people donating to GiveWell-recommended charities.

Fourth, if the goal is to spread better, more rational giving habits, then caving in to sales tactics that exploit known forms of irrationality hampers that goal.

None of these imply that the ideas you suggest are inapplicable in the context of EA or for effective altruists in general. Nor am I suggesting that EAs (or effective altruists in general) are bias-free and rational demigods: I think many EAs have their own sets of biases that are more sophisticated than those of the general public but still real. I also think that many of the biases, such as the identifiable victim, can actually be epistemically justified somewhat, and you could make a good epistemic case for using individual case studies as not just a sales strategy but something that actually helps provide yet another sanity check (this is sort of what GiveWell tried to do by sponsoring field trips to the areas of operation of its top charities). You could also argue that the cost of alienating some people is a cost worth bearing in order to achieve a somewhat greater level of popularity, or that a wall of separation is not that hard to achieve.

But acknowledging these tradeoffs openly is a first step to letting others (including the orgs and fundraisers you are targeting) make a careful, informed decision. It can also help people figure out new, creative compromises. Perhaps, for instance, showing an identifiable victim and, after people are sort-of-sold, then pivoting to the statistics, provides the advantages of mass appeal and epistemic rigor. Perhaps there are ways to use charities' own survey data to create composite profiles of typical beneficiaries that can help inform potential donors as well as appeal to their desire for an identifiable victim. Perhaps, at the end of the day, raising money matters more than spreading ideas, and getting ten million people to donate a few hundred dollars a year is better than the current EA donor profile or the current GiveWell donor profile.

Comment author: Petter 30 March 2015 08:59:46PM 1 point [-]

Mobile is a larger platform than desktop 2015. That fact and the knowledge graph seem like very plausible explanations.

Comment author: VipulNaik 31 March 2015 11:07:42PM 1 point [-]
Comment author: John_Maxwell_IV 30 March 2015 08:05:16AM 4 points [-]

Regression to the mean is a potential problem when you choose to examine the most extreme data points in a data set (highly viewed wikipedia pages in this case).

Comment author: VipulNaik 31 March 2015 11:06:04PM 2 points [-]

I didn't pick them as points that were most extreme as of earlier years, I picked them as generically popular topics. There should be no particular temporal directionality to view counts for such pages.

The great decline in Wikipedia pageviews (condensed version)

13 VipulNaik 27 March 2015 02:02PM

To keep this post manageable in length, I have only included a small subset of the illustrative examples and discussion. I have published a longer version of this post, with more examples (but the same intro and concluding section), on my personal site.

Last year, during the months of June and July, as my work for MIRI was wrapping up and I hadn't started my full-time job, I worked on the Wikipedia Views website, aimed at easier tabulation of the pageviews for multiple Wikipedia pages over several months and years. It relies on a statistics tool called stats.grok.se, created by Domas Mituzas, and maintained by Henrik.

One of the interesting things I noted as I tabulated pageviews for many different pages was that the pageview counts for many already popular pages were in decline. Pages of various kinds peaked at different historical points. For instance, colors have been in decline since early 2013. The world's most populous countries have been in decline since as far back as 2010!

Defining the problem

The first thing to be clear about is what these pageviews count and what they don't. The pageview measures are taken from stats.grok.se, which in turn uses the pagecounts-raw dump provided hourly by the Wikimedia Foundation's Analytics team, which in turn is obtained by processing raw user activity logs. The pagecounts-raw measure is flawed in two ways:

  • It only counts pageviews on the main Wikipedia website and not pageviews on the mobile Wikipedia website or through Wikipedia Zero (a pared down version of the mobile site that some carriers offer at zero bandwidth costs to their customers, particularly in developing countries). To remedy these problems, a new dump called pagecounts-all-sites was introduced in September 2014. We simply don't have data for views of mobile domains or of Wikipedia Zero at the level of individual pages for before then. Moreover, stats.grok.se still uses pagecounts-raw (this was pointed to me in a mailing list message after I circulated an early version of the post).
  • The pageview count includes views by bots. The official estimate is that about 15% of pageviews are due to bots. However, the percentage is likely higher for pages with fewer overall pageviews, because bots have a minimum crawling frequency. So every page might have at least 3 bot crawls a day, resulting in a minimum of 90 bot pageviews even if there are only a handful of human pageviews.

Therefore, the trends I discuss will refer to trends in total pageviews for the main Wikipedia website, including page requests by bots, but excluding visits to mobile domains. Note that visits from mobile devices to the main site will be included, but mobile devices are by default redirected to the mobile site.

How reliable are the metrics?

As noted above, the metrics are unreliable because of the bot problem and the issue of counting only non-mobile traffic. German Wikipedia user Atlasowa left a message on my talk page pointing me to an email thread suggesting that about 40% of pageviews may be bot-related, and discussing some interesting examples.

Relationship with the overall numbers

I'll show that for many pages of interest, the number of pageviews as measured above (non-mobile) has declined recently, with a clear decline from 2013 to 2014. What about the total?

We have overall numbers for non-mobile, mobile, and combined. The combined number has largely held steady, whereas the non-mobile number has declined and the mobile number has risen.

What we'll find is that the decline for most pages that have been around for a while is even sharper than the overall decline. One reason overall pageviews haven't declined so fast is the creation of new pages. To give an idea, non-mobile traffic dropped by about 1/3 from January 2013 to December 2014, but for many leading categories of pages, traffic dropped by about 1/2-2/3.

Why is this important? First reason: better context for understanding trends for individual pages

People's behavior on Wikipedia is a barometer of what they're interested in learning about. An analysis of trends in the views of pages can provide an important window into how people's curiosity, and the way they satisfy this curiosity, is evolving. To take an example, some people have proposed using Wikipedia pageview trends to predict flu outbreaks. I myself have tried to use relative Wikipedia pageview counts to gauge changing interests in many topics, ranging from visa categories to technology companies.

My initial interest in pageview numbers arose because I wanted to track my own influence as a Wikipedia content creator. In fact, that was my original motivation with creating Wikipedia Views. (You can see more information about my Wikipedia content contributions on my site page about Wikipedia).

Now, when doing this sort of analysis for individual pages, one needs to account for, and control for, overall trends in the views of Wikipedia pages that are occurring for reasons other than a change in people's intrinsic interest in the subject. Otherwise, we might falsely conclude from a pageview count decline that a topic is falling in popularity, whereas what's really happening is an overall decline in the use of (the non-mobile version of) Wikipedia to satisfy one's curiosity about the topic.

Why is this important? Second reason: a better understanding of the overall size and growth of the Internet.

Wikipedia has been relatively mature and has had the top spot as an information source for at least the last six years. Moreover, unlike almost all other top websites, Wikipedia doesn't try hard to market or optimize itself, so trends in it reflect a relatively untarnished view of how the Internet and the World Wide Web as a whole are growing, independent of deliberate efforts to manipulate and doctor metrics.

The case of colors

Let's look at Wikipedia pages on some of the most viewed colors (I've removed the 2015 and 2007 columns because we don't have the entirety of these years). Colors are interesting because the degree of human interest in colors in general, and in individual colors, is unlikely to change much in response to news or current events. So one would at least a priori expect colors to offer a perspective into Wikipedia trends with fewer external complicating factors. If we see a clear decline here, then that's strong evidence in favor of a genuine decline.

I've restricted attention to a small subset of the colors, that includes the most common ones but isn't comprehensive. But it should be enough to get a sense of the trends. And you can add in your own colors and check that the trends hold up.

Page namePageviews in year 2014Pageviews in year 2013Pageviews in year 2012Pageviews in year 2011Pageviews in year 2010Pageviews in year 2009Pageviews in year 2008TotalPercentageTags
Black 431K 1.5M 1.3M 778K 900K 1M 958K 6.9M 16.1 Colors
Blue 710K 1.3M 1M 987K 1.2M 1.2M 1.1M 7.6M 17.8 Colors
Brown 192K 284K 318K 292K 308K 300K 277K 2M 4.6 Colors
Green 422K 844K 779K 707K 882K 885K 733K 5.3M 12.3 Colors
Orange 133K 181K 251K 259K 275K 313K 318K 1.7M 4 Colors
Purple 524K 906K 847K 895K 865K 841K 592K 5.5M 12.8 Colors
Red 568K 797K 912K 1M 1.1M 873K 938K 6.2M 14.6 Colors
Violet 56K 96K 75K 77K 69K 71K 65K 509K 1.2 Colors
White 301K 795K 615K 545K 788K 575K 581K 4.2M 9.8 Colors
Yellow 304K 424K 453K 433K 452K 427K 398K 2.9M 6.8 Colors
Total 3.6M 7.1M 6.6M 6M 6.9M 6.5M 6M 43M 100 --
Percentage 8.5 16.7 15.4 14 16 15.3 14 100 -- --
 

Since the decline appears to have happened between 2013 and 2014, let's examine the 24 months from January 2013 to December 2014:

 

MonthViews of page BlackViews of page BlueViews of page BrownViews of page GreenViews of page OrangeViews of page PurpleViews of page RedViews of page VioletViews of page WhiteViews of page YellowTotal Percentage
201412 30K 41K 14K 27K 9.6K 28K 67K 3.1K 21K 19K 260K 2.4
201411 36K 46K 15K 31K 10K 35K 50K 3.7K 23K 22K 273K 2.5
201410 37K 52K 16K 34K 10K 34K 51K 4.5K 25K 26K 289K 2.7
201409 37K 57K 16K 35K 9.9K 37K 45K 4.8K 27K 29K 298K 2.8
201408 33K 47K 14K 34K 8.5K 31K 38K 3.9K 21K 22K 253K 2.4
201407 33K 47K 14K 30K 9.3K 31K 37K 4.2K 22K 22K 250K 2.3
201406 32K 49K 14K 31K 10K 34K 39K 4.9K 23K 22K 259K 2.4
201405 44K 55K 17K 37K 10K 51K 42K 5.2K 26K 26K 314K 2.9
201404 34K 60K 17K 36K 14K 38K 47K 5.8K 27K 28K 306K 2.8
201403 37K 136K 19K 51K 14K 123K 52K 5.5K 30K 31K 497K 4.6
201402 38K 58K 19K 39K 13K 41K 49K 5.6K 29K 29K 321K 3
201401 40K 60K 19K 36K 14K 40K 50K 4.4K 27K 28K 319K 3
201312 62K 67K 17K 44K 12K 48K 48K 4.4K 42K 26K 372K 3.5
201311 141K 96K 20K 65K 11K 68K 55K 5.3K 71K 34K 566K 5.3
201310 145K 102K 21K 69K 11K 77K 59K 5.7K 71K 36K 598K 5.6
201309 98K 80K 17K 60K 11K 53K 51K 4.9K 45K 30K 450K 4.2
201308 109K 87K 20K 57K 20K 57K 60K 4.6K 53K 28K 497K 4.6
201307 107K 92K 21K 61K 11K 66K 65K 4.6K 61K 30K 520K 4.8
201306 115K 106K 22K 69K 13K 73K 64K 5.5K 70K 33K 571K 5.3
201305 158K 122K 24K 79K 14K 83K 69K 11K 77K 39K 677K 6.3
201304 151K 127K 28K 83K 14K 86K 74K 12K 78K 40K 694K 6.4
201303 155K 135K 31K 92K 15K 99K 84K 12K 80K 43K 746K 6.9
201302 152K 131K 31K 84K 28K 95K 84K 17K 77K 41K 740K 6.9
201301 129K 126K 32K 81K 19K 99K 84K 9.6K 70K 42K 691K 6.4
Total 2M 2M 476K 1.3M 314K 1.4M 1.4M 152K 1.1M 728K 11M 100
Percentage 18.1 18.4 4.4 11.8 2.9 13.3 12.7 1.4 10.2 6.8 100 --
Tags Colors Colors Colors Colors Colors Colors Colors Colors Colors Colors -- --

 

As we can see, the decline appears to have begun around March 2013 and then continued steadily till about June 2014, at which numbers stabilized to their lower levels.

A few sanity checks on these numbers:

  • The trends appear to be similar for different colors, with the notable difference that the proportional drop was higher for the more viewed color pages. Thus, for instance, black and blue saw declines from 129K and 126K to 30K and 41K respectively (factors of four and three respectively) from January 2013 to December 2014. Orange and yellow, on the other hand, dropped by factors of close to two. The only color that didn't drop significantly was red (it dropped from 84K to 67K, as opposed to factors of two or more for other colors), but this seems to have been partly due to an unusually large amount of traffic in the end of 2014. The trend even for red seems to suggest a drop similar to that for orange.
  • The overall proportion of views for different colors comports with our overall knowledge of people's color preferences: blue is overall a favorite color, and this is reflected in its getting the top spot with respect to pageviews.
  • The pageview decline followed a relatively steady trend, with the exception of some unusual seasonal fluctuation (including an increase in October and November 2013).

One might imagine that this is due to people shifting attention from the English-language Wikipedia to other language Wikipedias, but most of the other major language Wikipedias saw a similar decline at a similar time. More details are in my longer version of this post on my personal site.

Geography: continents and subcontinents, countries, and cities

Here are the views of some of the world's most populated countries between 2008 and 2014, showing that the peak happened as far back as 2010:

Page namePageviews in year 2014Pageviews in year 2013Pageviews in year 2012Pageviews in year 2011Pageviews in year 2010Pageviews in year 2009Pageviews in year 2008TotalPercentageTags
China 5.7M 6.8M 7.8M 6.1M 6.9M 5.7M 6.1M 45M 9 Countries
India 8.8M 12M 12M 11M 14M 8.8M 7.6M 73M 14.5 Countries
United States 13M 15M 18M 18M 34M 16M 15M 129M 25.7 Countries
Indonesia 5.3M 5.2M 3.7M 3.6M 4.2M 3.1M 2.5M 28M 5.5 Countries
Brazil 4.8M 4.9M 5.3M 5.5M 7.5M 4.9M 4.3M 37M 7.4 Countries
Pakistan 2.9M 4.5M 4.4M 4.3M 5.2M 4M 3.2M 28M 5.7 Countries
Bangladesh 2.2M 2.9M 3M 2.8M 2.9M 2.2M 1.7M 18M 3.5 Countries
Russia 5.6M 5.6M 6.5M 6.8M 8.6M 5.4M 5.8M 44M 8.8 Countries
Nigeria 2.6M 2.6M 2.9M 3M 3.5M 2.6M 2M 19M 3.8 Countries
Japan 4.8M 6.4M 6.5M 8.3M 10M 7.3M 6.6M 50M 10 Countries
Mexico 3.1M 3.9M 4.3M 4.3M 5.9M 4.7M 4.5M 31M 6.1 Countries
Total 59M 69M 74M 74M 103M 65M 59M 502M 100 --
Percentage 11.7 13.8 14.7 14.7 20.4 12.9 11.8 100 -- --

Of these countries, China, India and the United States are the most notable. China is the world's most populous. India has the largest population with some minimal English knowledge and legally (largely) unfettered Internet access to Wikipedia, while the United States has the largest population with quality Internet connectivity and good English knowledge. Moreover, in China and India, Internet use and access have been growing considerably in the last few years, whereas it has been relatively stable in the United States.

It is interesting that the year with the maximum total pageview count was as far back as 2010. In fact, 2010 was so significantly better than the other years that the numbers beg for an explanation. I don't have one, but even excluding 2010, we see a declining trend: gradual growth from 2008 to 2011, and then a symmetrically gradual decline. Both the growth trend and the decline trend are quite similar across countries.

We see a similar trend for continents and subcontinents, with the peak occurring in 2010. In contrast, the smaller counterparts, such as cities, peaked in 2013, similarly to colors, and the drop, though somewhat less steep than with colors, has been quite significant. For instance, a list for Indian cities shows that the total pageviews for these Indian cities declined from about 20 million in 2013 (after steady growth in the preceding years) to about 13 million in 2014.

Some niche topics where pageviews haven't declined

So far, we've looked at topics where pageviews have been declining since at least 2013, and some that peaked as far back as 2010. There are, however, many relatively niche topics where the number of pageviews has stayed roughly constant. But this stability itself is a sign of decay, because other metrics suggest that the topics have experienced tremendous growth in interest. In fact, the stability is even less impressive when we notice that it's a result of a cancellation between slight declines in views of established pages in the genre, and traffic going to new pages.

For instance, consider some charity-related pages:

Page namePageviews in year 2014Pageviews in year 2013Pageviews in year 2012Pageviews in year 2011Pageviews in year 2010Pageviews in year 2009Pageviews in year 2008TotalPercentageTags
Against Malaria Foundation 5.9K 6.3K 4.3K 1.4K 2 0 0 18K 15.6 Charities
Development Media International 757 0 0 0 0 0 0 757 0.7 Pages created by Vipul Naik Charities
Deworm the World Initiative 2.3K 277 0 0 0 0 0 2.6K 2.3 Charities Pages created by Vipul Naik
GiveDirectly 11K 8.3K 2.6K 442 0 0 0 22K 19.2 Charities Pages created by Vipul Naik
International Council for the Control of Iodine Deficiency Disorders 1.2K 1 2 2 0 1 2 1.2K 1.1 Charities Pages created by Vipul Naik
Nothing But Nets 5.9K 6.6K 6.6K 5.1K 4.4K 4.7K 6.1K 39K 34.2 Charities
Nurse-Family Partnership 2.9K 2.8K 909 30 8 72 63 6.8K 5.9 Pages created by Vipul Naik Charities
Root Capital 3K 2.5K 414 155 51 1.2K 21 7.3K 6.3 Charities Pages created by Vipul Naik
Schistosomiasis Control Initiative 4K 2.7K 1.6K 191 0 0 0 8.5K 7.4 Charities Pages created by Vipul Naik
VillageReach 1.7K 1.9K 2.2K 2.6K 97 3 15 8.4K 7.3 Charities Pages created by Vipul Naik
Total 38K 31K 19K 9.9K 4.6K 5.9K 6.2K 115K 100 --
Percentage 33.4 27.3 16.3 8.6 4 5.1 5.4 100 -- --

For this particular cluster of pages, we see the totals growing robustly year-on-year. But a closer look shows that the growth isn't that impressive. Whereas earlier, views were doubling every year from 2010 to 2013 (this was the take-off period for GiveWell and effective altruism), the growth from 2013 to 2014 was relatively small. And about half the growth from 2013 to 2014 was powered by the creation of new pages (including some pages created after the beginning of 2013, so they had more months in a mature state in 2014 than in 2013), while the other half was powered by growth in traffic to existing pages.

The data for philanthropic foundations demonstrates a fairly slow and steady growth (about 5% a year), partly due to the creation of new pages. This 5% hides a lot of variation between individual pages:

Page namePageviews in year 2014Pageviews in year 2013Pageviews in year 2012Pageviews in year 2011Pageviews in year 2010Pageviews in year 2009Pageviews in year 2008TotalPercentageTags
Atlantic Philanthropies 11K 11K 12K 10K 9.8K 8K 5.8K 67K 2.1 Philanthropic foundations
Bill & Melinda Gates Foundation 336K 353K 335K 315K 266K 240K 237K 2.1M 64.9 Philanthropic foundations
Draper Richards Kaplan Foundation 1.2K 25 9 0 0 0 0 1.2K 0 Philanthropic foundations Pages created by Vipul Naik
Ford Foundation 110K 91K 100K 90K 100K 73K 61K 625K 19.5 Philanthropic foundations
Good Ventures 9.9K 8.6K 3K 0 0 0 0 21K 0.7 Philanthropic foundations Pages created by Vipul Naik
Jasmine Social Investments 2.3K 1.8K 846 0 0 0 0 5K 0.2 Philanthropic foundations Pages created by Vipul Naik
Laura and John Arnold Foundation 3.7K 13 0 1 0 0 0 3.7K 0.1 Philanthropic foundations Pages created by Vipul Naik
Mulago Foundation 2.4K 2.3K 921 0 1 1 10 5.6K 0.2 Philanthropic foundations Pages created by Vipul Naik
Omidyar Network 26K 23K 19K 17K 19K 13K 11K 129K 4 Philanthropic foundations
Peery Foundation 1.8K 1.6K 436 0 0 0 0 3.9K 0.1 Philanthropic foundations Pages created by Vipul Naik
Robert Wood Johnson Foundation 26K 26K 26K 22K 27K 22K 17K 167K 5.2 Philanthropic foundations
Skoll Foundation 13K 11K 9.2K 7.8K 9.6K 5.8K 4.3K 60K 1.9 Philanthropic foundations
Smith Richardson Foundation 8.7K 3.5K 3.8K 3.6K 3.7K 3.5K 2.9K 30K 0.9 Philanthropic foundations
Thiel Foundation 3.6K 1.5K 1.1K 47 26 1 0 6.3K 0.2 Philanthropic foundations Pages created by Vipul Naik
Total 556K 533K 511K 466K 435K 365K 340K 3.2M 100 --
Percentage 17.3 16.6 15.9 14.5 13.6 11.4 10.6 100 -- --

 

The dominant hypothesis: shift from non-mobile to mobile Wikipedia use

The dominant hypothesis is that pageviews have simply migrated from non-mobile to mobile. This is most closely borne by the overall data: total pageviews have remained roughly constant, and the decline in total non-mobile pageviews has been roughly canceled by growth in mobile pageviews. However, the evidence for this substitution doesn't exist at the level of individual pages because we don't have pageview data for the mobile domain before September 2014, and much of the decline occurred between March 2013 and June 2014.

What would it mean if there were an approximate one-on-one substitution from non-mobile to mobile for the page types discussed above? For instance, non-mobile traffic to colors dropped to somewhere between 1/3 and 1/2 of their original traffic level between January 2013 and December 2014. This would mean that somewhere between 1/2 and 2/3 of the original non-mobile traffic to colors has shifted to mobile devices. This theory should be at least partly falsifiable: if the sum of traffic to non-mobile and mobile platforms today for colors is less than non-mobile-only traffic in January 2013, then clearly substitution is only part of the story.

Although the data is available, it's not currently in an easily computable form, and I don't currently have the time and energy to extract it. I'll update this once the data on all pageviews since September 2014 is available on stats.grok.se or a similar platform.

Other hypotheses

The following are some other hypotheses for the pageview decline:

  1. Google's Knowledge Graph: This is the hypothesis raised in Wikipediocracy, the Daily Dot, and the Register. The Knowledge Graph was introduced in 2012. Through 2013, Google rolled out snippets (called Knowledge Cards and Knowledge Panels) based on the Knowledge Graph in its search results. So if, for instance, you only wanted the birth date and nationality of a musician, Googling would show you that information right in the search results and you wouldn't need to click through to the Wikipedia page. I suspect that the Knowledge Graph played some role in the decline for colors seen between March 2013 and June 2014. On the other hand, many of the pages that saw a decline don't have any search snippets based on the Knowledge Graph, and therefore the decline for those pages cannot be explained this way.
  2. Other means of accessing Wikipedia's knowledge that don't involve viewing it directly: For instance, Apple's Siri tool uses data from Wikipedia, and people making queries to this tool may get information from Wikipedia without hitting the encyclopedia. The usage of such tools has increased greatly starting in late 2012. Siri itself was released with the third generation iPad in September 2012 and became part of the iPhone released the next month. Since then, it has shipped with all of Apple's mobile devices and tablets.
  3. Substitution away from Wikipedia to other pages that are becoming more search-optimized and growing in number: For many topics, Wikipedia may have been clearly the best information source a few years back (as judged by Google), but the growth of niche information sources, as well as better search methods, have displaced it from its undisputed leadership position. I think there's a lot of truth to this, but it's hard to quantify.
  4. Substitution away from coarser, broader pages to finer, narrower pages within Wikipedia: While this cannot directly explain an overall decline in pageviews, it can explain a decline in pageviews for particular kinds of pages. Indeed, I suspect that this is partly what's going on with the early decline of pageviews (e.g., the decline in pageviews of countries and continents starting around 2010, as people go directly to specialized articles related to the particular aspects of those countries or continents they are interested in).
  5. Substitution to Internet use in other languages: This hypothesis doesn't seem borne out by the simultaneous decline in pageviews for the English, French, and Spanish Wikipedia, as documented for the color pages.

It's still a mystery

I'd like to close by noting that the pageview decline is still very much a mystery as far as I am concerned. I hope I've convinced you that (a) the mystery is genuine, (b) it's important, and (c) although the shift to mobile is probably the most likely explanation, we don't yet have clear evidence. I'm interested in hearing whether people have alternative explanations, and/or whether they have more compelling arguments for some of the explanations proffered here.

Comment author: tog 16 February 2015 12:15:21PM 0 points [-]

Did you ever find the answer to this?

Comment author: VipulNaik 17 February 2015 11:08:49PM 0 points [-]

No

Tentative tips for people engaged in an exercise that involves some form of prediction or forecasting

5 VipulNaik 30 July 2014 05:24AM

Note: This is the concluding post of my LessWrong posts related to my forecasting work for MIRI. There are a few items related to forecasting that I didn't get time to look into and might return to later. I might edit this post to include references to those posts if I get to them later.

I've been looking at forecasting in different domains as part of work for the Machine Intelligence Research Institute (MIRI). I thought I'd draw on whatever I've learned to write up advice for people engaged in any activity that involves making forecasts. This could include a wide range of activities, including those that rely on improving the accuracy of predictions in highly circumscribed contexts (such as price forecasting or energy use forecasting) as well as those that rely on trying to determine the broad qualitative contours of possible scenarios.

The particular application of interest to MIRI is forecasting AI progress, leading up to (but not exclusively focused on) the arrival of AGI. I will therefore try to link my general tips with thoughts on how it applies to forecasting AI progress. That being said, I hope that what I say here will have wider interest and appeal.

If you're interested in understanding the state of the art with respect to forecasting AI progress specifically, consider reading Luke Muehlhauser's summary of the state of knowledge on when AI will be created. The post was written in May 2013, and there have been a couple of developments since then, including:

#1: Appreciate that forecasting is hard

It's hard to make predictions, especially about the future (see also more quotes here). Forecasting is a difficult job along many dimensions. Apart from being difficult, it's also a job where feedback is far from immediate. This holds more true as the forecasting horizon becomes wider (for lists of failed predictions made in the past, see here and here). Fortunately, a fair amount has been discovered about forecasting in general, and you can learn from the experience of people trying to make forecasts in many different domains.

Philip Tetlock's work on expert political judgment, whose conclusions he described here, and that I discussed in my post on the historical evaluations of forecasting, shows that at least in the domain of political forecasting, experts often don't do a much better job than random guesses, and even the experts who do well rarely do better than simple trend extrapolation. Not only do experts fail to do well, they are also poorly calibrated as to the quality of forecasts.

Even in cases where experts are right about the median or modal scenario, they often fail to both estimate and communicate forecast uncertainty.

The point that forecasting is hard, and should be approached with humility, will be repeated throughout this post, in different contexts.

#2: Avoid the "not invented here" fallacy, and learn more about forecasting across a wide range of different domains

The not invented here fallacy refers to people's reluctance to use tools developed outside of their domain or organization. In the context of forecasting, it's quite common. For instance, climate scientists have been accused of not following forecasting principles. The reaction of some of them has been along the lines of "why should we listen to forecasters, when they don't understand any climate science?" (more discussion of that response here, see also a similar answer on Quora). Moreover, it's not enough to only listen to outsiders who treat you with respect. The point of listening to and learning from other domains isn't to be generous to people in those domains, but to understand and improve one's own work (in this case, forecasting work).

There are some examples of successful importation of forecasting approaches from one domain to another. One example is the ideas developed for forecasting rare events, as I discussed in this post. Power laws for some rare phenomena, such as earthquakes, have been around for a while. Aaron Clauset and his co-authors have recently applied the same mathematical framework of power laws to other types of rare events, including terrorist attacks.

Evaluating AI progress forecasting on this dimension: My rough impression is that AI progress forecasting tends to be insular, learning little from other domains. While I haven't seen a clear justification from AI progress forecasters, the typical arguments I've seen are the historical robustness of Moore's law and the idea that the world of technology is fundamentally different from the world of physical stuff.

I think that future work on AI progress forecasting should explicitly consider forecasting problems in domains other than computing, and explicitly explain what lessons cross-apply and what don't, and why. I don't mean that all future work should consider all other domains. I just mean that at least some future work should consider at least some other domains.

#3: Start by reading a few really good general-purpose overviews

Personally, I would highlight Nate Silver's The Signal and the Noise. Silver's book is quite exceptional in the breadth of topics it covers, the clarity of its presentation, and the easy toggling between general principles and specific instances. Silver's book comfortably combines ideas from statistics, data mining, machine learning, predictive analytics, and forecasting. Not only would I recommend reading it quickly when you're starting out, I would also recommend returning to specific chapters of the book later if they cover topics that interest you. I personally found the book a handy reference (and quoted extensively from it) when writing LessWrong posts about forecasting domains that the book has covered.

Other books commonly cited are Tetlock's Expert Political Judgment and the volume Principles of Forecasting edited by J. Scott Armstrong, and contributed to by several forecasters. I believe both these books are good, but I'll be honest: I haven't read them, although I have read summaries of the books and shorter works by the authors describing the main points. I believe that you can similarly get the bulk of the value of Tetlock's work by reading his article for Cato Unbound co-authored with Dan Gardner, that I discussed here. For the principles of forecasting, see #4 below.

Evaluating AI progress forecasting on this dimension: There seems to be a lot of focus on a few AI-related and computing-related futurists, such as Ray Kurzweil. I do think the focus should be widened, and getting an understanding of general challenges related to forecasting is a better starting point than reading The Singularity is Near. That said, the level of awareness among MIRI and LessWrong people about the work of Silver, Armstrong, and Tetlock definitely seems higher than among the general public or even among the intelligentsia. I should also note that Luke Muehlhauser was the person who first pointed me to J. Scott Armstrong, and he's referenced Tetlock's work frequently.

#4: Understand key concepts and distinctions in forecasting, and review the literature and guidelines developed by the general-purpose forecasting community

In this post, I provided an overview of different kinds of forecasting, and also included names of key people, key organizations, key journals, and important websites. I would recommend reading that to get a general sense, and then proceeding to the Forecasting Principles website (though, fair warning: the website's content management system is a mess, and in particular, you might find a lot of broken links). Here's their full list of 140 principles, along with discussion of the evidence base for each principle. However, see also point #5 below.

#5: Understand some alternatives to forecasting, specifically scenario analysis and futures studies

If you read the literature commonly classified as "forecasting" in academia, you will find very little mention of scenario analysis and futures studies. Conversely, the literature on scenario analysis and futures studies rarely cites the general-purpose forecasting literature. But the actual "forecasting" exercise you intend to engage in may be better suited to scenario analysis than to forecasting. Or you might find that the methods of futures studies are a closer fit for what you are trying to achieve. Or you might try to use a mix of techniques.

Broadly, scenario analysis becomes more important when there is more uncertainty, and when it's important to be prepared for a wider range of eventualities. This matters more as we move to longer time horizons for forecasting. I discussed scenario analysis in this post, where I also speculate on possible reasons for the lack of overlap with the forecasting community.

Futures studies is closely related to scenario analysis (in fact, scenario analysis can be considered a method of futures studies) but the futures studies field has a slightly different flavor. I looked at the field of futures studies in this post.

It could very well be the case that you find the ideas of scenario analysis and futures studies inappropriate for the task at hand. But such a decision should be made only after acquiring a reasonable understanding of the methods.

Some other domains that might be better suited to the problem at hand include predictive analytics, predictive modeling, data mining, machine learning, and risk analysis. I haven't looked into any of these in depth in connection with my MIRI project (I've been reading up on machine learning for other work, and have been and will be posting about it on LessWrong but that's independent of my MIRI work).

Evaluating AI progress forecasting on this dimension: I think a reasonable case can be made that the main goals of AI progress forecasting are better met through scenario analysis. I discussed this in detail in this post.

#6: Examine forecasting in other domains, including domains that do not seem to be related to your domain at the object level

This can be thought of as a corollary to #2. Chances are, if you have read Nate Silver and some of the other sources, your curiosity about forecasting in other domains has already been piqued. General lessons about human failure and error may cross-apply between domains, even if the object-level considerations are quite different.

In addition to Silver's book, I recommend taking a look at some of my own posts on forecasting in various domains. These posts are based on rather superficial research, so please treat them only as starting points.

General:

Some domain-specific posts:

I also did some additional posts on climate science as a case study in forecasting. I have paused the exercise due to time and ability limitations, but I think the posts so far might be useful:

 

#7: Consider setting up data collection using best practices early on

Forecasting works best when we have a long time series of data to learn from. So it's best to set up data collection as quickly as possible, and use good practices in setting it up. Data about the present or recent past may be cheap to collect now, but could be hard to collect a few decades from now. We don't want to be spending our time two decades later figuring out how to collect data (and adjudicating disputes about the accuracy of data) if we could collect and archive the data in a stable repository right now.

If your organization is too small to do primary data collection, find another organization that engages in the data collection activities, and make sure you archive the data they collect, so that the data is available to you even if that organization stops operating.

Evaluating AI progress forecasting on this dimension: I think that there are some benefits from creating standardized records and measurements of the current state of AI and the quality of the current hardware and software. That said, there do exist plenty of reasonably standardized measurements already in these domains. There is little danger of this information completely disappearing, so that the project of combining and integrating them into a big picture is important but not time-sensitive. Hardware progress and specs are already well-documented, and we can get time series at places such as the Performance Curve Database. Software progress and algorithmic progress have also been reasonably well-recorded, as described by Katja Grace in her review for MIRI of algorithmic progress in six domains.

#8: Consider recording forecasts and scenarios, and the full reasoning or supporting materials

It's not just useful to have data from the past, it's also useful to have forecasts made based on past data and see how they compared to what actually transpired. The problem with forecasts is even worse than with data: if two decades later we want to know what one would have predicted using the data that is available right now, we simply cannot do that unless we make and record the predictions now. (We could do it in principle by imagining that we don't have access to the intermediate data. But in practice, people can find it hard to avoid being influenced by their knowledge of what has transpired in the interim when they build and tune their models). Retrodictions and hindcasts are useful for analysis and diagnosis, but they ultimately do not provide a convincing independent test of the model being used to make forecasts.

Evaluating AI progress forecasting on this dimension: See the link suggestions for recent work on AI progress forecasting at the beginning of the post.

The remaining points are less important and more tentative. I've included for completeness' sake.

#9: Evaluate how much expertise the domain experts have in forecasting

In some cases, domain experts also have expertise in making forecasts. In other cases, the relationship between domain expertise and the ability to make forecasts, or even to calibrate one's own forecast accuracy, is tenuous. I discussed the issue of how much deference to give to domain experts in this post.

#10: Use best practices from statistical analysis, computer programming, software engineering, and economics

Wherever using these disciplines, use them well. Statistical analysis arises in quantitative forecasting and prediction. Computer programming is necessary for setting up prediction markets or carrying out time series forecasting or machine learning with large data sets or computationally intensive algorithms. Software engineering is necessary once the computer programs exceed a basic level of complexity, or if they need to survive over the long term. Insights from economics and finance may be necessary for designing effective prediction markets or other tools to incentivize people to make accurate predictions and minimize their chances of gaming the system in ways detrimental to prediction accuracy.

The insularity critique of climate science basically accused the discipline of not doing this.

What if your project is too small and you don't have access to expertise in these domains? Often, a very cursory, crude analysis can be helpful in ballparking the situation. As I described in my historical evaluations of forecasting, the Makridakis Competitions provide evidence in favor of the hypothesis that simple models tend to perform quite well, although the correctly chosen complex models can outperform simple ones under special circumstances (see also here). So keeping it simple to begin with is fine. However, the following caveats should be noted:

  • Even "simple" models and setups can benefit from overview by somebody with subject matter expertise. The overviews can be fairly quick, but they still help. For instance, after talking to a few social scientists, I realized the perils of using simple linear regression for time series data. This isn't a deep point, but it can elude even a smart and otherwise knowledgeable person who hasn't thought much about the specific tools.
  • The limitations of the model, and the uncertainty in the associated forecast, should be clearly noted (see my post on communicating forecast uncertainty).

Evaluating AI progress forecasting on this dimension: I think that AI progress forecasting is at too early a stage to get into using detailed statistical analysis or software, so using simple models and getting feedback from experts, while noting potential weaknesses, seems like a good strategy.

#11: Consider carefully the questions of openness of data, practices, supporting code, and internal debate

While confidentiality and anonymity are valuable in some contexts, openness and transparency are good antidotes to errors that arise due to insufficient knowledge and groupthink (such as the types of problems I noted in my post on the insularity critique of climate science).

#12: Consider ethical issues related to forecasting, such as the waysyour forecasting exercise can influence real-world decisions and outcomes

This is a topic I intended to look into more but didn't get time to. I've collected a few links for interested parties:

View more: Next