Filter This month

Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

How does personality vary across US cities?

30 JonahSinick 20 December 2016 08:00AM

In 2007, psychology researchers Michal Kosinski and David Stillwell released a personality testing app on Facebook app called myPersonality. The app ended up being used by 4 million Facebook users, most of whom consented to their personality question answers and some information from their Facebook profiles to be used for research purposes.

The very large sample size and matching data from Facebook profiles make it possible to investigate many questions about personality differences that were previously inaccessible. Koskinski and Stillwell have used it in a number of interesting publications, which I highly recommend (e.g. [1], [2] [3]).

In this post, I focus on what the dataset tells us about how big five personality traits vary by geographic region in the United States

continue reading »

The Adventure: a new Utopia story

23 Stuart_Armstrong 25 December 2016 11:51AM

For an introduction to this story, see here. For a previous utopian attempt, see here. This story only explores a tiny part of this utopia.

 

The Adventure

Hark! the herald daemons spam,

Glory to the newborn World,

Joyful, all post-humans, rise,

Join the triumph of the skies.


Veiled in wire the Godhead see,

Built that man no more may die,

Built to raise the sons of earth,

Built to give them second birth.

 

The cold cut him off from his toes, then fingers, then feet, then hands. Clutched in a grip he could not unclench, his phone beeped once. He tried to lift a head too weak to rise, to point ruined eyes too weak to see. Then he gave up.

So he never saw the last message from his daughter, reporting how she’d been delayed at the airport but would be the soon, promise, and did he need anything, lots of love, Emily. Instead he saw the orange of the ceiling become blurry, that particularly hateful colour filling what was left of his sight.

His world reduced to that orange blur, the eternally throbbing sore on his butt, and the crisp tick of a faraway clock. Orange. Pain. Tick. Orange. Pain. Tick.

He tried to focus on his life, gather some thoughts for eternity. His dry throat rasped - another flash of pain to mingle with the rest - so he certainly couldn’t speak words aloud to the absent witnesses. But he hoped that, facing death, he could at least put together some mental last words, some summary of the wisdom and experience of years of living.

But his memories were denied him. He couldn’t remember who he was - a name, Grant, was that it? How old was he? He’d loved and been loved, of course - but what were the details? The only thought he could call up, the only memory that sometimes displaced the pain, was of him being persistently sick in a broken toilet. Was that yesterday or seventy years ago?

Though his skin hung loose on nearly muscle-free bones, he felt it as if it grew suddenly tight, and sweat and piss poured from him. Orange. Pain. Tick. Broken toilet. Skin. Orange. Pain...

The last few living parts of Grant started dying at different rates.

*~*~*

Much later:

continue reading »

A review of cryonics/brain preservation in 2016

21 Andy_McKenzie 31 December 2016 06:19PM

Relevance to Less Wrong: Whether you think it is for better or worse, users on LW are about 50,000x more likely to be signed up for cryonics than the average person

Disclaimer: I volunteer at the Brain Preservation Foundation, but I speak for myself in this post and I'm only writing about publicly available information.

In 2016, cryonics remains a fringe operation. When it is discussed in the news or on social media, many express surprise that cryonics is a "real thing" outside of science fiction. Many others who do know about cryonics tend to label it a pseudoscience. Brain preservation (BP) through non-conventional cryonics methods such as those using aldehyde fixation is even more fringe, with most people not aware of it, and others dismissing it because it uses "toxic" chemicals. 

Here's a rundown of some events important to cryonics/BP in 2016. 

Research progress

- The Brain Preservation Foundation prize was won in February by Robert McIntyre and Greg Fahy. Their winning technique uses glutaraldehyde fixation followed by glycerol cryoprotection (in addition to a step to improve blood-brain barrier permeability and several other components) and allows for the preservation of neural structure as verified by electron microscopy across the cortex. McIntyre has since started a company called Nectome in part to improve and refine this procedure.
- Aschwin de Wolf of Advanced Neural Biosciences announced in November at the CryoSuisse conference that Advanced Neural Biosciences has developed a method that reduces dehydration in rat brain vitrification by using "brain optimized cryoprotectants." There is no peer-reviewed data or more detailed procedure available as of yet, and viability of the tissue may be a concern. 

Legal progress

- In Canada, Keegan Macintosh and Carrie Wong are challenging the anti-cryonics laws in British Columbia
- A right-to-die law passed in Colorado. Although not directly relevant to cryonics, it increases the number of locations where it might be possible to start brain preservation procedures in a more controlled manner by taking advantage of physician-assisted suicide in a terminally ill patient. This has been described as "cryothanasia" and is controversial both within the cryonics community and outside of it. 
- As far as I know, cryonics and brain preservation remain illegal in France, China, and many other areas. 

Current Cryonics Organizations 

- Alcor
- Cryonics Institute 
- KrioRus. They are planning on moving to Tver, which is a few hours west of Moscow (see Bloomberg profile). 
- Oregon Cryonics. This year, they put a hold on allowing users to sign up through their member portal, with the organization pivoting towards research until they can focus on "some critical cryonics research" to validate their methods. OC was profiled by Vice in March
- TransTime. This small cryonics company in San Leandro is still active, and was profiled in a video in Fusion earlier this year
- Osiris. This is a new, for-profit company in Florida that has so far been controversial within the cryonics community, and was recently profiled in the Miami New Times.  
- There are other organizations that only do standby and/or cryoprotectant perfusion. 

Essays about cryonics

- Tim Urban's post at Wait But Why about cryonics has wonderful diagrams explaining concepts such as why many people consider death to be a process, not an event. Like most everything Urban writes, it went viral and is still being posted on social media.  
- Corey Pein's article at The Baffler focuses primarily on critiques of Alcor and in particular Max More. 
- In April, an essay by Rachel Nuwer at BBC considered what would happen if cryonics worked. 
- Neuroscientist Clive Coen critiqued cryonics in an essay at New Humanist in November. 
- In January, PZ Myers critiqued aldehyde stabilized cryopreservation as "wishful thinking" because it is not yet possible to upload the memories/behaviors of even a simple organism based on information extracted post-fixation. 

Cryonics in the news

- In April, a profile of Elaine Walker, who is signed up with Alcor, on CNBC led to a moderately large amount of press for cryonics. 
- In August, a profile of Steve Aoki in Rolling Stone, who is also signed up with Alcor, mentions his plan to do cryonics. 
- In November, by far the biggest news story of the year about cryonics (dominating almost all of the Google trends variance) was about a 14-year-old girl who wanted cryonics and who had to go to court to prevent her father from stopping it. The court allowed her to be cryopreserved following her legal death. This case and related issues were covered extensively in the Guardian and other British news outlets, sparking debate about cryonics generally in the UK. 

A quick note on weirdness points and Solstices [And also random other Solstice discussion]

19 Raemon 21 December 2016 05:29PM

Common knowledge is important. So I wanted to note:

Every year on Solstice feedback forms, I get concerns about songs like "The X days of X-Risk" or "When I Die" (featuring lines including 'they may freeze my body when I die'), that they are too weird and ingroupy and offputting to people who aren't super-nerdy-transhumanists

But I also get comments from people who know little about X-risk or cryonics or whatever who say "these songs are hilarious and awesome." Sunday Assemblies who have no connection to Less Wrong sing When I Die and it's a crowd favorite every year.

And my impression is that people are only really weirded out by these songs on behalf of other people who are only weirded out by them on behalf of other people. There might be a couple people who are genuinely offput the ideas but if so it's not super clear to me. I take very seriously the notion of making Solstice inclusive while retaining it's "soul", talk to lots of people about what they find alienating or weird, and try to create something that can resonate with as many people as possible.

So I want it to at least be clear: if you are personally actually offput by those songs for your own sake, that makes sense and I want to know about it, but if you're just worried about other people, I'm pretty confident you don't need to be. The songs are designed so you don't need to take them seriously if you don't want to.

-

Random note 1: I think the only line that's raised concern from some non-LW-ish people for When I Die is "I'd prefer to never die at all", and that's because it's literally putting words in people's mouths which aren't true for everyone. I mentioned that to Glen. We'll see if he can think of anything else

Random note 2: Reactions to more serious songs like "Five Thousand Years" seem generally positive among non-transhumanists, although sometimes slightly confused. The new transhumanist-ish song this year, Endless Light, has gotten overall good reviews.

[Link] Dominic Cummings: how the Brexit referendum was won

16 The_Jaded_One 12 January 2017 09:26PM

Planning the Enemy's Retreat

15 Gram_Stone 11 January 2017 05:44AM

Related: Leave a Line of Retreat

When I was smaller, I was sitting at home watching The Mummy, with my mother, ironically enough. There's a character by the name of Bernard Burns, and you only need to know two things about him. The first thing you need to know is that the titular antagonist steals his eyes and tongue because, hey, eyes and tongues spoil after a while you know, and it's been three thousand years.

The second thing is that Bernard Burns was the spitting image of my father. I was terrified! I imagined my father, lost and alone, certain that he would die, unable to see, unable even to properly scream!

After this frightening ordeal, I had the conversation in which it is revealed that fiction is not reality, that actions in movies don't really have consequences, that apparent consequences are merely imagined and portrayed.

Of course I knew this on some level. I think the difference between the way children and adults experience fiction is a matter of degree and not kind. And when you're an adult, suppressing those automatic responses to fiction has itself become so automatic, that you experience fiction as a thing compartmentalized. You always know that the description of consequences in the fiction will not by magic have fire breathed into them, that Imhotep cannot gently step out of the frame and really remove your real father's real eyes.

So, even though we often use fiction to engage, to make things feel more real, in another way, once we grow, I think fiction gives us the chance to entertain formidable ideas at a comfortable distance.

A great user once said, "Vague anxieties are powerful anxieties." Related to this is the simple rationality technique of Leaving a Line of Retreat: before evaluating the plausibility of a highly cherished or deeply frightening belief, one visualizes the consequences of the highly cherished belief being false, or of the deeply frightening belief being true. We hope that it will thereby become just a little easier to evaluate the plausibility of that belief, for if we are wrong, at least we know what we're doing about it. Sometimes, if not often, what you'd really do about it isn't as bad as your intuitions would have you think.

If I had to put my finger on the source of that technique's power, I would name its ability to reduce the perceived hedonic costs of truthseeking. It's hard to estimate the plausibility of a charged idea because you expect your undesired outcome to feel very bad, and we naturally avoid this. The trick is in realizing that, in any given situation, you have almost certainly overestimated how bad it would really feel.

But Sun Tzu didn't just plan his own retreats; he also planned his enemies' retreats. What if your interlocutor has not practiced the rationality technique of Leaving a Line of Retreat? Well, Sun Tzu might say, "Leave one for them."

As I noted in the beginning, adults automatically compartmentalize fiction away from reality. It is simply easier for me to watch The Mummy than it was when I was eight. The formidable idea of my father having his eyes and tongue removed is easier to hold at a distance.

Thus, I hypothesize, truth in fiction is hedonically cheap to seek.

When you recite the Litany of Gendlin, you do so because it makes seemingly bad things seem less bad. I propose that the idea generalizes: when you're experiencing fiction, everything seems less bad than its conceivably real counterpart, it's stuck inside the book, and any ideas within will then seem less formidable. The idea is that you can use fiction as an implicit line of retreat, that you can use it to make anything seem less bad by making it make-believe, and thus, safe. The key, though, is that not everything inside of fiction is stuck inside of fiction forever. Sometimes conclusions that are valid in fiction also turn out to be valid in reality. 

This is hard to use on yourself, because you can't make a real scary idea into fiction, or shoehorn your scary idea into existing fiction, and then make it feel far away. You'll know where the fiction came from. But I think it works well on others.

I don't think I can really get the point across in the way that I'd like without an example. This proposed technique was an accidental discovery, like popsicles or the Slinky:

A history student friend of mine was playing Fallout: New Vegas, and he wanted to talk to me about which ending he should choose. The conversation seemed mostly optimized for entertaining one another, and, hoping not to disappoint, I tried to intertwine my fictional ramblings with bona fide insights. The student was considering giving power to a democratic government, but he didn't feel very good about it, mostly because this fictional democracy was meant to represent anything that anyone has ever said is wrong with at least one democracy, plausible or not.

"The question you have to ask yourself," I proposed to the student, "is 'Do I value democracy because it is a good system, or do I value democracy per se?' A lot of people will admit that they value democracy per se. But that seems wrong to me. That means that if someone showed you a better system that you could verify was better, you would say 'This is good governance, but the purpose of government is not good governance, the purpose of government is democracy.' I do, however, understand democracy as a 'current best bet' or local maximum."

I have in fact gotten wide-eyed stares for saying things like that, even granting the closing ethical injunction on democracy as local maximum. I find that unusual, because it seems like one of the first steps you would take towards thinking about politics clearly, to not equivocate democracy with good governance. If you were further in the past and the fashionable political system were not democracy but monarchy, and you, like many others, consider democracy preferable to monarchy, then upon a future human revealing to you the notion of a modern democracy, you would find yourself saying, regrettably, "This is good governance, but the purpose of government is not good governance, the purpose of government is monarchy."

But because we were arguing for fictional governments, our autocracies, or monarchies, or whatever non-democratic governments heretofore unseen, could not by magic have fire breathed into them. For me to entertain the idea of a non-democratic government in reality would have solicited incredulous stares. For me to entertain the idea in fiction is good conversation.

The student is one of two people with whom I've had this precise conversation, and I do mean in the particular sense of "Which Fallout ending do I pick?" I snuck this opinion into both, and both came back weeks later to tell me that they spent a lot of time thinking about that particular part of the conversation, and that the opinion I shared seemed deep.

Also, one of them told me that they had recently received some incredulous stares.

So I think this works, at least sometimes. It looks like you can sneak scary ideas into fiction, and make them seem just non-scary enough for someone to arrive at an accurate belief about that scary idea.

I do wonder though, if you could generalize this even more. How else could you reduce the perceived hedonic costs of truthseeking?

[Link] The engineer and the diplomat

14 Benquo 27 December 2016 08:49PM

Triaging mental phenomena or: leveling up the stack trace skill

13 RomeoStevens 23 December 2016 12:15AM

Related: The 5-Second LevelAttention control is critical for increasing/altering motivation

Epistemic Status: sharing a hypothesis that's slowly been coalescing since a discussion with Eliezer at EAG and got catalyzed by Anna's latest LW post along with an exercise I have been using. n=1

Mental phenomena (and thus rationality skills) can't be trained without a feedback loop that causes calibration in the relevant direction. One of my guesses for a valuable thing Eliezer did was habitual stack traces causing a leveling up of stack trace resolution i.e. seeing more fine grain detail in mental phenomena. This is related to 'catching flinches' as Anna describes, as an example of a particularly useful phenomena to be able to catch. In general, you can't tune black boxes, you need to be able to see individual steps.

How can you level up the stack trace skill? Triaging your unwillingness to do things, and we'll start with your unwillingness to practice the stack trace skill! I like 'triage' more than 'classify' because it imports some connotations about scope sensitivity.

In order to triage we need a taxonomy. Developing/hacking/modding your own is what ultimately works best, but you can use prebuilt ones as training wheels. Here are two possible taxonomies:

Note whether it is experienced as

  • Distracting Desire
  • Aversion
  • Laziness
  • Agitation/Annoyance
  • Ambiguity/Doubt

Note whether it is experienced as 

  • Mental talk
  • Mental images
  • Sensations in the body

Form the intention to practice the stack trace skill and then try to classify at least one thing that happens. If you feel good when you get a 'hit' you will be more likely to catch additional events.

You can try this on anything. The desire for an unhealthy snack, the unwillingness to email someone etc. Note that the exercise isn't about forcing yourself to do things you don't want to do. You just want to see more clearly your own objections to doing it. If you do it more, you'll start to notice that you can catch more 'frames' or multiple phenomena at the same time or in a row e.g. I am experiencing ambiguity as the mental talk "I'm not sure how to do that" and as a slightly slimy/sliding away sensation followed by aversion to feeling the slimy feeling and an arising distracting desire to check my email. Distinguishing between actual sensations in the body and things that only seem like they could maybe be described as sensations is mostly a distraction and not all that important initially.

These are just examples and finding nice tags in your own mentalese makes the thing run smoother. You can also use this as fuel for focusing for particularly interesting frames you catch e.g. when you catch a limiting belief. It's also interesting to notice instances of the 'to-be' verb form in mental talk as this is the source of a variety of map-territory distinction errors.

There is a specific failure worth mentioning: coming up with a story. If you ask yourself questions like "Why did I think that?" your brain is great at coming up with plausible sounding stories that are often bullshit. This is why, when practicing the skill, you have to prime the intention to catch specific things beforehand. Once the skill has been built up you can use it on arbitrary thoughts and have a sense for the difference between 'story' and actual frame catching.

If other people try this I'm curious for feedback. My experience so far has been that increasing the resolution on stack traces has made the practice of every other mental technique dramatically easier because the feedback loops are all tighter. Especially relevant to repairing a failed TAP. How much practice was involved? A few minutes a day for 3 weeks caused a noticeable effect that has endured. My models, plans, and execution fail less often. When they do I have a much better chance of catching the real culprit.

Improve comments by tagging claims

13 Benquo 20 December 2016 05:04PM

I used to think that comments didn’t matter. I was wrong. This is important because communities of discourse are an important source of knowledge. I’ll explain why I changed my mind, and then propose a simple mechanism for improving them, that can be implemented on any platform that allows threaded comments.

continue reading »

[Link] EA Has A Lying Problem

12 Benquo 11 January 2017 10:31PM

[Link] Yudkowsky's 'Four Layers of Intellectual Conversation'

12 Gram_Stone 08 January 2017 09:47PM

A different argument against Universal Basic Income

12 chaosmage 28 December 2016 10:35PM

I grew up in socialist East Germany. Like most of my fellow citizens, I was not permitted to leave the country. But there was an important exception: People could leave after retirement. Why? Because that meant their forfeited their retirement benefits. Once you took more from the state than you gave, you were finally allowed to leave. West Germany would generously take you in. My family lived near the main exit checkpoint for a while and there was a long line of old people most days.

And then there is Saudi Arabia and other rentier states. Rentier states(https://en.m.wikipedia.org/wiki/Rentier_state) derive most of their income not from their population. The population gets a lot more wealth from the state than the state gets from the population. States like Saudi Arabia are therefore relatively independent of their population's consent with policy. A citizen who is unhappy is welcome to leave, or to retreat to their private sphere and live off benefits while keeping their mouth shut - neither of these options incurs a significant cost for the state.

I think these facts are instructive in thinking about Universal Basic Income. I want to make a point that I haven't seen made in discussions of the matter.

Most political systems (not just democracies) are built on an assumption that the state needs its citizens. This assumption is always a bit wrong - for example, no state has much need of the terminally ill, except to signal to its citizens that it cares for all of them. In the cases of East Germany and Saudi Arabia, this assumption is more wrong. And Universal Basic Income makes it more wrong as well.

From the point of view of a state, there are citizens who are more valuable (or who help in competition with other states) and ones who are more of a burden (who make competing with other states more difficult). Universal Basic Income massively broadens the part of society that is a net loss to the state.

Now obviously technological unemployment is likely to do that anyway. But there's a difference between answers to that problem that divide up the available work between the members of society and answers that divide up society into contributors and noncontributors. My intuition is that UBI is the second kind of solution, because states will be incentivized to treat contributors differently from noncontributors. The examples are to illustrate that a state can behave very differently towards citizens if it is fundamentally not interested in retaining them.

I go along with Harari's suggestion that the biggest purely political problem of the 21st century is the integration of the economically unnecessary parts of the population into society. My worry is that UBI, while helping with immediate economic needs, makes that problem worse in the long run. Others have already pointed out problems with UBI (such as that in a democracy it'll be impossible to get rid of if it is a failure) that gradual approaches like lower retirement age, later entry into the workforce and less work per week don't have. But I reckon that behind the immediate problems with UBI such as the amount of funding it needs and the question of what it does to the motivation to work, there's a whole class of problems that arise out of the changed relationships between citizens, states and economies. With complex networks of individuals and institutions responding intelligently to the changed circumstances, a state inviting its citizens to emigrate may not be the weirdest of unforeseen consequences.

Ideas for Next Generation Prediction Technologies

12 ozziegooen 20 December 2016 10:06PM

Prediction markets are powerful, but also still quite niche. I believe that part of this lack of popularity could be solved with significantly better tools. During my work with Guesstimate I’ve thought a lot about this issue and have some ideas for what I would like to see in future attempts at prediction technologies.

 

 

1. Machine learning for forecast aggregation

In financial prediction markets, the aggregation method is the market price. In non-market prediction systems, simple algorithms are often used. For instance, in the Good Judgement Project, the consensus trends displays “the median of the most recent 40% of the current forecasts from each forecaster.”[1] Non-financial prediction aggregation is a pretty contested field with several proposed methods.[2][3][4]


I haven’t heard much about machine learning used for forecast aggregation. It would seem to me like many, many factors could be useful in aggregating forecasts. For instance, some elements of one’s social media profile may be indicative of their forecasting ability. Perhaps information about the educational differences between multiple individuals could provide insight on how correlated their knowledge is.


Perhaps aggregation methods, especially with training data, could partially detect and offset predictable human biases. If it is well known that people making estimates of project timelines are overconfident, then this could be taken into account. For instance, someone enters in “I think I will finish this project in 8 weeks”, and the system can infer something like, “Well, given the reference class I have of similar people making similar calls, I’d expect it to take 12.


A strong machine learning system would of course require a lot of sample data, but small strides may be possible with even limited data. I imagine that if data is needed, lots of people on platforms like Mechanical Turk could be sampled.


2. Prediction interval input

The prediction tools I am familiar with focus on estimating the probabilities of binary events. This can be extremely limiting. For instance, instead of allowing users to estimate what Trump’s favorable rating would be, they instead have to bet on whether it will be over a specific amount, like “Will Trump’s favorable rate be at least 45.0% on December 31st?”[5]


It’s probably no secret that I have a love for probability densities. I propose that users should be able to enter probability densities directly. User entered probability densities would require more advanced aggregation techniques, but is doable.[6]

Probability density inputs would also require additional understanding from users. While this could definitely be a challenge, many prediction markets already are quite complicated, and existing users of these tools are quite sophisticated.


I would suspect that using probability densities could simplify questions about continuous variables and also give much more useful information on their predictions. If there are tail risks these would be obvious; and perhaps more interestingly, probability intervals from prediction tools could be directly used in further calculations. For instance, if there were separate predictions about the population of the US and the average income, these could be multiplied to have an estimate of the total GDP (correlations complicate this, but for some problems may not be much of an issue, and in others perhaps they could be estimated as well).


Probability densities make less sense for questions with a discrete set of options, like predicting who will win an election. There are a few ways of dealing with these. One is to simply leave these questions to other platforms, or to resort back to the common technique of users estimating specific percentage likelihoods in these cases. Another is to modify some of these to be continuous variables that determine discrete outcomes; like the number of electoral college votes a U.S. presidential candidate will receive. Another option is to estimate the ‘true’ probability of something as a distribution, where the ‘true’ probability is defined very specifically. For instance, a group could make probability density forecasts for the probability that the blog 538 will give to a specific outcome on a specific date. In the beginning of an election, people would guess 538's percent probability for one candidate winning a month before the election.


3. Intelligent Prize Systems

I think the main reason why so many academics and rationalists are excited about prediction markets is because of their positive externalities. Prediction markets like InTrade seem to do quite well at predicting many political and future outcomes, and this information is very valuable to outside third parties.

I’m not sure how comfortable I feel about the incentives here. The fact that the main benefits come from externalities indicates that the main players in the markets aren’t exactly optimizing for these benefits. While users are incentivized to be correct and calibrated, they are not typically incentivized to predict things that happen to be useful for observing third parties.

I would imagine that the externalities created by prediction tools would be strongly correlate with the value of information to these third parties, which does rely on actionable and uncertain decisions. So if the value of information from prediction markets were to be optimized, it would make sense that these third parties have some way of ranking what gets attention based on what their decisions are.

 

For instance, a whole lot of prediction markets and related tools focus heavily on sports forecasts. I highly doubt that this is why most prediction market enthusiasts get excited about these markets.


In many ways, promoting prediction markets for their positive externalities is very strange endeavor. It’s encouraging the creation of a marketplace because of the expected creation of some extra benefit that no one directly involved in that marketplace really cares about. Perhaps instead there should be otherwise-similar ways for those who desire information from prediction groups to directly pay for that information.


One possibility that has been discussed is for prediction markets to be subsidized in specific ways. This obviously would have to be done carefully in order to not distort incentives. I don’t recall seeing this implemented successfully yet, just hearing it be proposed.


For prediction tools that aren’t markets, prizes can be given out by sponsoring parties. A naive system is for one large sponsor to sponsor a ‘category’, then the best few people in that category get the prizes. I believe something like this is done by Hypermind.


I imagine a much more sophisticated system could pay people as they make predictions. One could imagine a system that numerically estimates how much information was added to the new aggregate when a new prediction is made. Users with established backgrounds will influence the aggregate forecast significantly more than newer ones, and thus will be rewarded proportionally. A more advanced system would also take into account estimate supply and demand; if there are some conditions where users particularly enjoy adding forecasts, they may not need to be compensated as much for these, despite the amount or value of information contributed.


On the prize side, a sophisticated system could allow various participants to pool money for different important questions and time periods. For instance, several parties put down a total of $10k on the question ‘what will the US GDP be in 2020’, to be rewarded over the period of 2016 to 2017. Participants who put money down could be rewarded by accessing that information earlier than others or having improved API access.


Using the system mentioned above, an actor could hypothetically build up a good reputation, and then use it to make a biased prediction in the expectation that it would influence third parties. While this would be very possible, I would expect it to require the user to generate more value than their eventual biased prediction would cost. So while some metrics may become somewhat biased, in order for this to happen many others would become improved. If this were still a problem, perhaps forecasts could make bets in order to demonstrate confidence (even if the bet were made in a separate application).


4. Non-falsifiable questions


Prediction tools are really a subset of estimation tools, where the requirement is that they estimate things that are eventually falsifiable. This is obviously a very important restriction, especially when bets are made. However, it’s not an essential restriction, and hypothetically prediction technologies could be used for much more general estimates.


To begin, we could imagine how very long term ideas could be forecasted. A simple model would be to have one set of forecasts for what the GDP will be in 2020, and another for what the systems’ aggregate will think the GDP is in 2020, at the time of 2018. Then in 2018 everyone could be ranked, even though the actual event has not yet occurred.


In order for the result in 2018 to be predictive it would obviously require that participants would expect future forecasts to be predictive. If participants thought everyone else would be extremely optimistic, they would be encouraged to make optimistic predictions as well. This leads to a feedback loop that the more accurate the system is thought to be the more accurate it will be (approaching the accuracy of an immediately falsifiable prediction). If there is sufficient trust in a community and aggregation system, I imagine this system could work decently, but if there isn’t, then it won’t.


In practice I would imagine that forecasters would be continually judged as future forecasts are contributed that agree or disagree with them, rather than only when definitive events happen that prove or disprove their forecasts. This means that forecasters could forecast things that happen in very long time horizons, and still be ranked based on their ability in the short term.


Going more abstract, there could be more abstract poll-like questions like, “How many soldiers died in war in WW2?” or “How many DALYs  would donating $10,000 to the AMF create in 2017?”. For these, individuals could propose their estimates, then the aggregation system would work roughly like normal to combine these estimates. Even though these questions may never be known definitively, if there is built in trust in the system, I could imagine that they could produce reasonable results.


One question here which is how to evaluate the results of aggregation systems for non-falsifiable questions. I don’t imagine any direct way, but could imagine ways of approximating it by asking experts how reasonable the results seem to them. While methods to aggregate results for non-falsifiable questions are themselves non-falsifiable, the alternatives also are very lacking. Given how many of these questions exist, it seems to me like perhaps they should be dealt with; and perhaps they can use the results from communities and statistical infrastructure optimized in situations that do have answers.


Conclusion

Each one of the above features could be described in much more detail, but I think the basic ideas are quite simple. I’m very enthusiastic about these, and would be interested in talking with anyone interested in collaborating on or just talking about similar tools. I’ve been considering attempting a system myself, but first want to get more feedback.

 

  1. The Good Judgement Project FAQ, https://www.gjopen.com/faq

  2. Sharpening Your Forecasting Skills, Link

  3. IARPA Aggregative Contingent Estimation (ACE) research program https://www.iarpa.gov/index.php/research-programs/ace

  4. The Good Judgement Project: A Large Scale Test of Different Methods of Combining Expert Predictions
    Link

  5. “Will Trump’s favorable rate be at least 45.0% on December 31st?” on PredictIt (Link).

  6. I believe Quantile Regression Averaging is one way of aggregating prediction intervals https://en.wikipedia.org/wiki/Quantile_regression_averaging

  7. Hypermind (http://hypermind.com/)

Welcome to LessWrong (10th Thread, January 2017) (Thread A)

11 folkTheory 07 January 2017 05:43AM

(Thread B for January is here, created as a duplicate by accident)

Hi, do you read the LessWrong website, but haven't commented yet (or not very much)? Are you a bit scared of the harsh community, or do you feel that questions which are new and interesting for you could be old and boring for the older members?

This is the place for the new members to become courageous and ask what they wanted to ask. Or just to say hi.

The older members are strongly encouraged to be gentle and patient (or just skip the entire discussion if they can't).

Newbies, welcome!

 

The long version:

 

If you've recently joined the Less Wrong community, please leave a comment here and introduce yourself. We'd love to know who you are, what you're doing, what you value, how you came to identify as an aspiring rationalist or how you found us. You can skip right to that if you like; the rest of this post consists of a few things you might find helpful. More can be found at the FAQ.

 

A few notes about the site mechanics

To post your first comment, you must have carried out the e-mail confirmation: When you signed up to create your account, an e-mail was sent to the address you provided with a link that you need to follow to confirm your e-mail address. You must do this before you can post!

Less Wrong comments are threaded for easy following of multiple conversations. To respond to any comment, click the "Reply" link at the bottom of that comment's box. Within the comment box, links and formatting are achieved via Markdown syntax (you can click the "Help" link below the text box to bring up a primer).

You may have noticed that all the posts and comments on this site have buttons to vote them up or down, and all the users have "karma" scores which come from the sum of all their comments and posts. This immediate easy feedback mechanism helps keep arguments from turning into flamewars and helps make the best posts more visible; it's part of what makes discussions on Less Wrong look different from those anywhere else on the Internet.

However, it can feel really irritating to get downvoted, especially if one doesn't know why. It happens to all of us sometimes, and it's perfectly acceptable to ask for an explanation. (Sometimes it's the unwritten LW etiquette; we have different norms than other forums.) Take note when you're downvoted a lot on one topic, as it often means that several members of the community think you're missing an important point or making a mistake in reasoning— not just that they disagree with you! If you have any questions about karma or voting, please feel free to ask here.

Replies to your comments across the site, plus private messages from other users, will show up in your inbox. You can reach it via the little mail icon beneath your karma score on the upper right of most pages. When you have a new reply or message, it glows red. You can also click on any user's name to view all of their comments and posts.

All recent posts (from both Main and Discussion) are available here. At the same time, it's definitely worth your time commenting on old posts; veteran users look through the recent comments thread quite often (there's a separate recent comments thread for the Discussion section, for whatever reason), and a conversation begun anywhere will pick up contributors that way.  There's also a succession of open comment threads for discussion of anything remotely related to rationality.

Discussions on Less Wrong tend to end differently than in most other forums; a surprising number end when one participant changes their mind, or when multiple people clarify their views enough and reach agreement. More commonly, though, people will just stop when they've better identified their deeper disagreements, or simply "tap out" of a discussion that's stopped being productive. (Seriously, you can just write "I'm tapping out of this thread.") This is absolutely OK, and it's one good way to avoid the flamewars that plague many sites.

EXTRA FEATURES:
There's actually more than meets the eye here: look near the top of the page for the "WIKI", "DISCUSSION" and "SEQUENCES" links.
LW WIKI: This is our attempt to make searching by topic feasible, as well as to store information like common abbreviations and idioms. It's a good place to look if someone's speaking Greek to you.
LW DISCUSSION: This is a forum just like the top-level one, with two key differences: in the top-level forum, posts require the author to have 20 karma in order to publish, and any upvotes or downvotes on the post are multiplied by 10. Thus there's a lot more informal dialogue in the Discussion section, including some of the more fun conversations here.
SEQUENCES: A huge corpus of material mostly written by Eliezer Yudkowsky in his days of blogging at Overcoming Bias, before Less Wrong was started. Much of the discussion here will casually depend on or refer to ideas brought up in those posts, so reading them can really help with present discussions. Besides which, they're pretty engrossing in my opinion. They are also available in a book form.

A few notes about the community

If you've come to Less Wrong to  discuss a particular topic, this thread would be a great place to start the conversation. By commenting here, and checking the responses, you'll probably get a good read on what, if anything, has already been said here on that topic, what's widely understood and what you might still need to take some time explaining.

If your welcome comment starts a huge discussion, then please move to the next step and create a LW Discussion post to continue the conversation; we can fit many more welcomes onto each thread if fewer of them sprout 400+ comments. (To do this: click "Create new article" in the upper right corner next to your username, then write the article, then at the bottom take the menu "Post to" and change it from "Drafts" to "Less Wrong Discussion". Then click "Submit". When you edit a published post, clicking "Save and continue" does correctly update the post.)

If you want to write a post about a LW-relevant topic, awesome! I highly recommend you submit your first post to Less Wrong Discussion; don't worry, you can later promote it from there to the main page if it's well-received. (It's much better to get some feedback before every vote counts for 10 karma—honestly, you don't know what you don't know about the community norms here.)

Alternatively, if you're still unsure where to submit a post, whether to submit it at all, would like some feedback before submitting, or want to gauge interest, you can ask / provide your draft / summarize your submission in the latest open comment thread. In fact, Open Threads are intended for anything 'worth saying, but not worth its own post', so please do dive in! Informally, there is also the unofficial Less Wrong IRC chat room, and you might also like to take a look at some of the other regular special threads; they're a great way to get involved with the community!

If you'd like to connect with other LWers in real life, we have  meetups  in various parts of the world. Check the wiki page for places with regular meetups, or the upcoming (irregular) meetups page. There's also a Facebook group. If you have your own blog or other online presence, please feel free to link it.

If English is not your first language, don't let that make you afraid to post or comment. You can get English help on Discussion- or Main-level posts by sending a PM to one of the following users (use the "send message" link on the upper right of their user page). Either put the text of the post in the PM, or just say that you'd like English help and you'll get a response with an email address. 
Normal_Anomaly 
Randaly 
shokwave 
Barry Cotter

A note for theists: you will find the Less Wrong community to be predominantly atheist, though not completely so, and most of us are genuinely respectful of religious people who keep the usual community norms. It's worth saying that we might think religion is off-topic in some places where you think it's on-topic, so be thoughtful about where and how you start explicitly talking about it; some of us are happy to talk about religion, some of us aren't interested. Bear in mind that many of us really, truly have given full consideration to theistic claims and found them to be false, so starting with the most common arguments is pretty likely just to annoy people. Anyhow, it's absolutely OK to mention that you're religious in your welcome post and to invite a discussion there.

A list of some posts that are pretty awesome

I recommend the major sequences to everybody, but I realize how daunting they look at first. So for purposes of immediate gratification, the following posts are particularly interesting/illuminating/provocative and don't require any previous reading:

More suggestions are welcome! Or just check out the top-rated posts from the history of Less Wrong. Most posts at +50 or more are well worth your time.

Welcome to Less Wrong, and we look forward to hearing from you throughout the site!

Buckets and memetic immune disorders

11 Tyrrell_McAllister 03 January 2017 11:51PM

AnnaSalamon's recent post on "flinching" and "buckets" nicely complements PhilGoetz's 2009 post Reason as memetic immune disorder. (I'll be assuming that readers have read Anna's post, but not necessarily Phil's.) Using Anna's terminology, I take Phil to be talking about the dangers of merging buckets that started out as separate. Anna, on the other hand, is talking about how to deal with one bucket that should actually be several.

Phil argued (paraphrasing) that rationality can be dangerous because it leads to beliefs of the form "P implies Q". If you convince yourself of that implication, and you believe P, then you are compelled to believe Q. This is dangerous because your thinking about P might be infected by a bad meme. Now rationality has opened the way for this bad meme to infect your thinking about Q, too.

It's even worse if you reason yourself all the way to believing "P if and only if Q". Now any corruption in your thinking about either one of P and Q will corrupt your thinking about the other. In terms of buckets: If you put "Yes" in the P bucket, you must put "Yes" in the Q bucket, and vice versa. In other words, the P bucket and the Q bucket are now effectively one and the same.

In this sense, Phil was pointing out that rationality merges buckets. (More precisely, rationality creates dependencies among buckets. In the extreme case, buckets become effectively identical). This can be bad for the reasons that Anna gives. Phil argues that some people resist rationality because their "memetic immune system" realizes that rational thinking might merge buckets inappropriately. To avoid this danger, people often operate on the principle that it's suspect even to consider merging buckets from different domains (e.g., religious scripture and personal life).

This suggests a way in which Anna's post works at the meta-level, too.

Phil's argument is that people resist rationality because, in effect, they've identified the two buckets "Think rationally" and "Spread memetic infections". They fear that saying "Yes" to "Think rationally" forces them to say "Yes" to the dangers inherent to merged buckets.

But Anna gives techniques for "de-merging" buckets in general if it turns out that some buckets were inappropriately merged, or if one bucket should have been several in the first place.

In other words, Anna's post essentially de-merges the two particular buckets "Think rationally" and "Spread memetic infections". You can go ahead and use rational thinking, even though you will risk inappropriately merging buckets, because you now have techniques for de-merging those buckets if you need to.

In this way, Anna's post may diminish the "memetic immune system" obstacle to rational thinking that Phil observed.

Project Hufflepuff

10 Raemon 18 January 2017 06:57PM

(This is a crossposted FB post, so it might read a bit weird)

My goal this year (in particular, my main focus once I arrive in the Bay, but also my focus in NY and online in the meanwhile), is to join and champion the growing cause of people trying to fix some systemic problems in EA and Rationalsphere relating to "lack of Hufflepuff virtue".

I want Hufflepuff Virtue to feel exciting and important, because it is, and I want it to be something that flows naturally into our pursuit of both epistemic integrity, intellectual creativity, and concrete action.

Some concrete examples:

- on the 5 second reflex level, notice when people need help or when things need doing, and do those things.

- have an integrated understanding that being kind to people is *part* of helping them (and you!) to learn more, and have better ideas.

(There are a bunch of ways to be kind to people that do NOT do this, i.e. politely agreeing to disagree. That's not what I'm talking about. We need to hold each other to higher standards but not talk down to people in a fashion that gets in the way of understanding. There are tradeoffs and I'm not sure of the best approach but there's a lot of room for improvement)

- be excited and willing to be the person doing the grunt work to make something happen

- foster a sense that the community encourages people to try new events, actively take personal responsibility to notice and fix community-wide problems that aren't necessarily sexy.

- when starting new projects, try to have mentorship and teamwork built into their ethos from the get-go, rather than hastily tacked on later

I want these sorts of things to come easily to mind when the future people of 2019 think about the rationality community, and have them feel like central examples of the community rather than things that we talk about wanting-more-of.

Bioelectronic medicine: a very brief overview

10 sarahconstantin 22 December 2016 04:03AM

Drugs that affect the nervous system get administered systemically.  It's easy to imagine that we could do much more if we could stimulate one nerve at a time, and in patterns designed to have particular effects on the body.

"Neural coding" can detect the nerve impulses that indicate that a paralyzed person intends to move a limb, and build prosthetics that respond to the mind the way a real limb would.  A company called BrainGate is already making these.  You can see a paralyzed person using a robotic arm with her mind here.

A fair number of diseases that don't seem "neurological", like rheumatoid arthritis and ulcerative colitis, can actually be treated by stimulating the vagus nerve.  The nervous system is tightly associated with the immune and endocrine systems, which is probably why autoimmune diseases are so associated with psychiatric comorbidities; it also means that the nervous system might be an angle towards treating autoimmune diseases. There is a "cholinergic anti-inflammatory pathway", involving the vagus nerve, which inactivates macrophages when they're exposed to the neurotransmitter acetylcholine, and thus lessens the immune response.  Turning this pathway on electronically is thus a prospective treatment for autoimmune or inflammatory diseases.  Vagus nerve stimulation has been tested and found successful in rheumatoid arthritis patients, in rat models of inflammatory bowel disease, and in dog experiments on chronic heart failure; vagus nerve activity mediates pancreatitis in mice; and vagus nerve stimulation attenuates the inflammatory response (cytokine release and shock) to the bacterial poison endotoxin.

Here is a detailed roadmap from this Nature article about the research that would need to be done to make bioelectronic medicine a reality. 

We'd need much more detailed maps of where exactly nerves innervate various organs and which neurotransmitters they use; we'd need to record patterns of neural activity to detect which nerve signals modulate which diseases and experimentally determine causal relationships between neural signals and organ functions; we'd need to build small electronic interfaces (cuffs and chips) for use on peripheral nerves; we'd need lots of improvements in small-scale and non-invasive sensor technology (optogenetics, neural dust, ultrasound and electromagnetic imaging); and we'd need better tools for real-time, quantitative measurements of hormone and neurotransmitter release from nerves and organs.

A lot of this seems to clearly need hardware and software engineers, and signal-processing/image-processing/machine-learning people, in addition to traditional biologists and doctors. In the general case, neural modulation of organ function is Big Science in the way brain mapping or genomics is. You need to know where the nerves are, and what they're doing, in real time.  This is likely going to need specialized software which outpaces what labs are currently capable of.

Google is already on this; they recently announced a partnership with GlaxoSmithKline called Galvani Bioelectronics and they seem to be hiring.  

Theodore Berger, the scientist who created the first neural memory implant , has cofounded a company, Kernel, to develop neural prostheses for cognitive function.

Bioelectronics seems potentially important not just for disease treatment today, but for more speculative goals like brain uploads or intelligence enhancement.  It's a locally useful step along the path of understanding what the brain is actually doing, at a finer-grained level than the connectome alone can indicate, which may very well be relevant to AI.

It's tricky for non-academic software people (like myself and many LessWrong readers) to get involved in biomedical technology, but I predict that this is going to be one of the opportunities that needs us most, and if you're interested, it's worth watching this space to see when it gets out of the stage of university labs and DARPA projects and into commercialization.

Rationality Considered Harmful (In Politics)

9 The_Jaded_One 08 January 2017 10:36AM

Why you should be very careful about trying to openly seek truth in any political discussion


1. Rationality considered harmful for Scott Aaronson in the great gender debate

In 2015, complexity theorist and rationalist Scott Aaronson was foolhardy enough to step into the Gender Politics war on his blog with a comment stating that extreme feminism that he bought into made him hate himself and try to seek ways to chemically castrate himself. The feminist blogoshere got hold of this and crucified him for it, and he has written a few followup blog posts about it. Recently I saw this comment by him on his blog:

As the comment 171 affair blew up last year, one of my female colleagues in quantum computing remarked to me that the real issue had nothing to do with gender politics; it was really just about the commitment to truth regardless of the social costs—a quality that many of the people attacking me (who were overwhelmingly from outside the hard sciences) had perhaps never encountered before in their lives. That remark cheered me more than anything else at the time

 

2. Rationality considered harmful for Sam Harris in the islamophobia war

I recently heard a very angry, exasperated 2 hour podcast by the new atheist and political commentator Sam Harris about how badly he has been straw-manned, misrepresented and trash talked by his intellectual rivals (who he collectively refers to as the "regressive left"). Sam Harris likes to tackle hard questions such as when torture is justified, which religions are more or less harmful than others, defence of freedom of speech, etc. Several times, Harris goes to the meta-level and sees clearly what is happening:

Rather than a searching and beautiful exercise in human reason to have conversations on these topics [ethics of torture, military intervention, Islam, etc], people are making it just politically so toxic, reputationally so toxic to even raise these issues that smart people, smarter than me, are smart enough not to go near these topics

Everyone on the left at the moment seems to be a mind reader.. no matter how much you try to take their foot out of your mouth, the mere effort itself is going to be counted against you - you're someone who's in denial, or you don't even understand how racist you are, etc

 

3. Rationality considered harmful when talking to your left-wing friends about genetic modification

In the SlateStarCodex comments I posted complaining that many left-wing people were responding very personally (and negatively) to my political views. 

One long term friend openly and pointedly asked whether we should still be friends over the subject of eugenics and genetic engineering, for example altering the human germ-line via genetic engineering to permanently cure a genetic disease. This friend responded to a rational argument about why some modifications of the human germ line may in fact be a good thing by saying that "(s)he was beginning to wonder whether we should still be friends". 

A large comment thread ensued, but the best comment I got was this one:

One of the useful things I have found when confused by something my brain does is to ask what it is *for*. For example: I get angry, the anger is counterproductive, but recognizing that doesn’t make it go away. What is anger *for*? Maybe it is to cause me to plausibly signal violence by making my body ready for violence or some such.

Similarly, when I ask myself what moral/political discourse among friends is *for* I get back something like “signal what sort of ally you would be/broadcast what sort of people you want to ally with.” This makes disagreements more sensible. They are trying to signal things about distribution of resources, I am trying to signal things about truth value, others are trying to signal things about what the tribe should hold sacred etc. Feeling strong emotions is just a way of signaling strong precommitments to these positions (i.e. I will follow the morality I am signaling now because I will be wracked by guilt if I do not. I am a reliable/predictable ally.) They aren’t mad at your positions. They are mad that you are signaling that you would defect when push came to shove about things they think are important.

Let me repeat that last one: moral/political discourse among friends is for “signalling what sort of ally you would be/broadcast what sort of people you want to ally with”. Moral/political discourse probably activates specially evolved brainware in human beings; that brainware has a purpose and it isn't truthseeking. Politics is not about policy

 

4. Takeaways

This post is already getting too long so I deleted the section on lessons to be learned, but if there is interest I'll do a followup. Let me know what you think in the comments!

Prediction-based medicine (PBM)

9 ChristianKl 29 December 2016 10:49PM

We need a new paradigm for doing medicine. I make the case by first speaking about the problems of our current paradigm of evidence-based medicine.


The status quo of evidence-based medicine


While biology moves forward and the cost of genetic-sequencing dropped a lot faster than Moore's law the opposite is true for the development of new drugs. In the current status quo the development of new drugs rises exponentially with Eroom's law. While average lifespan increased greatly about the last century in Canada the average life span at age 90 increased only 1.9 years over the last century. In 2008 the Centers for Disease Control and Prevention reported that life expectancy in the US declined from 77.9 to 77.8 years. After Worldbank data Germany increased average lifespan by two years over the last decade which is not enough for the dream of radical lifespan increases in our lifetime.


When it costs 80 million to test whether an intervention works and most attempts show that the intervention doesn't work we have a problem. We end up paying billions for every new intervention.


Eric Ries wrote "The Lean Startup". In it he argues that it's the job of a startup to produce validated learning. He proposes that companies that work with small batch sizes can produce more innovation because they can learn faster how to build good products. The existing process in medicine doesn't allow for small batch innovation because the measuring stick for whether an intervention works is too expensive.


In addition the evidence-based approach rests on the assumption that we don't build bespoke interventions for every client. As long as a treatment doesn’t generalize about multiple different patients, it’s not possible to test it with a trial. In principle a double-blind trial can't give you evidence that a bespoke intervention that targets the specific DNA profile of a patient and his co-morbidity works.


The ideal of prediction-based medicine


The evidence-based approach also assumes that practitioners are exchangeable. It doesn't model the fact that different physical therapist or psychologists have different skill levels. It doesn't provide a mechanism to reward highly skilled practitioners but it treats every practitioner that uses the same treatment intervention the same way.


Its strong focus on asking whether a treatment beats a placebo in double-blind studies makes it hard to compare different treatments against each other. In the absence of an ability to predict the effect sizes of different drugs with the literature the treatment that wins on the market is often the treatment that's best promoted by a pharmaceutical company.

How could a different system work? What's the alternative to making treatment decisions based on big and expensive studies that provide evidence?


I propose that a treatment provider should provide a patient with the credence that the treatment provider estimates for treatment outcomes that are of interest to the client.


If Bob wants to stop smoking and asks doctor Alice whether the treatment Alice provides will result in Bob not smoking in a year, Alice should provide him with her credence estimation. In addition Alice’s credence estimations can be entered in a central database. This allows Bob to see Alice’s Brier score that reflects the ability of Alice to predict the effects of her treatment recommendations.


In this framework Alice’s expertise isn't backed up by having gotten an academic degree and recommending interventions that are studied with expensive gold-standard studies. Her expertise is backed by her track record.


This means that Alice can charge money based on the quality of her skills. If Alice is extremely good she can make a lot of money with her intervention without having to pay billions for running trials.


Why don't we pay doctors in the present system based on their skills? We can't measure their skills in the present paradigm, because we can't easily compare the outcomes of different doctors. Hard patients get send to doctors with good reputations and as a result every doctor has an excuse for getting bad outcomes. In the status quo he can just assert that his patients were hard.


In prediction-based medicine a doctor can write down a higher credence for a positive treatment outcome for an easy patient than a hard patient. Patients can ask multiple doctors and are given good data to choose the treatment that provides the best outcome for which they are willing to pay.


In addition to giving the patient a more informed choice about the advantages of different treatment options this process helps the treatment provider to increase his skills. They learn about where they make errors in the estimation of treatment outcomes.


The provider can also innovate new treatments in small batches. Whenever he understands a treatment well enough to make predictions about its outcomes he's in business. He can easily iterate on his treatment and improve it.


The way to bring prediction-based medicine into reality


I don't propose to get rid of evidence-based medicine. It has its place and I don't have any problem with it for the cases where it works well.


It works quite poorly for body work interventions and psychological interventions that are highly skill based. I have seen hypnosis achieve great effects but at the same time there are also many hypnotists who don't achieve great effects. In the status quo a patient who seeks hypnosis treatment has no effective way to judge the quality of the treatment before he's buying.


A minimal viable product might be a website that's Uber for body workers and hypnotists. The website lists the treatment providers. The patient can enter his issue and every treatment provider can offer his credence of solving the issue of the patient and the price of his treatment.


Before getting shown the treatment providers, a prospective patient would take a standardized test to diagnose the illness. The information from the standardized test will allow the treatment providers make better predictions about the likelihood that they can cure the patient. Other standardized tests that aren’t disease specific like the OCEAN personality index can also be provided to the patient.


Following the ideas of David Burn’s TEAM framework, the treatment provider can also tell the patient to take tests between treatments sessions to keep better track of the progression of the patient.


When making the purchasing decision the patient agrees to a contract that includes him paying a fine, if he doesn’t report the treatment outcome after 3 months, 6 months and 1 year. This produces a comprehensive database of claims that allows us to measure how well the treatment providers are calibrated.

Various Quantified Self gadgets can be used to gather data. Many countries have centralized electronic health records that could be linked to a user account.


The startup has a clear business model. It can take a cut of every transaction. It has strong network effects and it's harder for a treatment provider to switch because all his prediction track record is hosted on the website.

 

Thanks to various people from the Berlin Lesswrong crowd who gave valuable feedback for the draft of this article.

Exploration-Exploitation problems

9 Elo 28 December 2016 01:33AM

Original post: http://bearlamp.com.au/exploration-exploitation-problems/

I have been working on the assumption that exploration-exploitation knowledge was just common.  Unfortunately I did the smart thing of learning about them from a mathematician at a dojo in Melbourne, which means that no.  Not everyone knows about it.  I discovered that again today when I searched for a good quick explanation of the puzzle.  With that in mind this is about Exploration Exploitation.


The classic Exploration-Exploitation problem in mathematics is the multi-armed bandit.  Which is a slang term for a bank of slot machines.  Where the player knows that each machine has a variable payoff and you have a limit number of attempts before you run out of money.  You want to balance trying out new machines with unknown payoffs against exploiting the knowledge you already have from the earlier machines you tried.

When you first start on new bandits, you really don't know which will pay out and at what rates.  So some exploration is necessary to know what your reward ratio in the territory will be.  As your knowledge grows, you get to know which bandits are likely to pay, and which are not, and this later informs your choices as to where to place your dollars.

Mathematicians love a well specified problem like this because it allows us to make algorithm models of patterns that will return rewards or guarantee rewards given certain circumstances.  (see also - the secretary problem which does similar.  Where I showed how it applied to real life dating)

Some of the mathematical solutions to this problem look like:

Epsilon greedy - The best lever is selected for a proportion 1-ε of the trials, and a lever is selected at random (with uniform probability) for a proportion ε. A typical parameter value might be ε =0.1 but this can vary widely depending on circumstances.

Epsilon-decreasing strategy: Similar to the epsilon-greedy strategy, except that the value of ε decreases as the experiment progresses, resulting in highly exploratory behaviour at the start and highly exploitative behaviour at the finish.

Of course there are more strategies, and the context and nature of the problem matters.  If the machines suddenly one day in the future all change, you might have a strategy that would prepare for potential scenarios like that.  As you start shifting away from the hypothetical and towards real life your models need to increase complexity to cater to the details of the real world.

If this problem is more like real life (where we live and breathe), the possible variability of reality starts coming in to play more and more.  In talking about this - I want to emphasise not the problem as interesting, but the solution of <sometimes explore> and <sometimes exploit> in specific ratios or for specific reasons.  The mathematical solutions the the multi-armed bandit problem are used in such a way to take advantage of the balance between not knowing enough and taking advantage of what you do know.

What supercharges this solution and how it can be applied to real life is value of information.

Value of Information says that in relation to making a decision, what informs that decision is worth something.  With expensive decisions, risky decisions, dangerous decisions, highly lucrative decisions, or particularly unknown decisions being more sure is important to think about.

VoI suggests that any decision that is worth money (or worth something) can have information that informs that decision.  The value of information can add up to the value of the reward on correctly making the decision.  Of course if you spend all the potential gains from the decision on getting the perfect information you lose the chance to make a profit.  However usually a cheap (relative to the decision) piece of information exists that will inform the decision and assist.

How does this apply to exploration-exploitation?

The idea of VoI is well covered in the book, how to measure anything.  While the book goes into detail and is really really excellent for applying to big decisions, the ideas can also be applied to our simple every day problems as well.  With this in mind I propose a heuristic:

You want to explore as much as to increase your information with regard to both the quality of the rest of the exploration and possible results and the expected returns on the existing knowledge.


The next thing to supercharge our exploration-exploitation and VoI knowledge is Diminishing returns.

Diminishing returns on VoI is when you start out not knowing anything at all, and adding a little bit of knowledge goes a long way.  As you keep adding more and more information the return on the extra knowledge has a diminishing value.

Worked example:  Knowing the colour of the sky.

So you are blind and no one has ever told you what colour the sky is.  You can't really be sure what colour the sky is but generally if you ask enough people the consensus should be a good enough way to conclude the answer.

So one guy gave you your first inkling of what the answer is.  But can you really trust him?

Yea cool.  Ten people.  Probably getting sure of yourself now.

Really, what good is Two Thousand people after the first fifty?  Especially if they all agree.  There's got to be less value of the 2001st person telling you than there was the 3rd person telling you.


Going back to VoI, how valuable was the knowledge that the sky is blue?  Probably not very valuable, and this isn't a great way to gather knowledge in the long run.

The great flaw with this is also if I asked you the question - "what colour is the sky?" you could probably hint as to a confident guess.  If you are a well calibrated human, you already know a little bit of everything and the good news is that calibration is trainable.

With that in mind; if you want to play a calibration game there are plenty available on google.

The great thing about calibration is that it seems to apply across all your life, and all things that you estimate.  Which is to say that once you are calibrated, you are calibrated across domains.  This means that if you become good at it in one area, you become better at it in other areas.  We're not quite talking about hitting the bullseye every time, but we are talking about being confident that the bullseye is over there in that direction.  Which is essentially the ability to predict the future within a reasonable set of likelihoods.


Once you are calibrated, you can take calibration, use it to apply diminishing returns through VoI to supercharge your exploration exploitation.  But we're not done.  What if we add in Bayesian statistics?  What if we can shape our predicted future and gradually update our beliefs based on tiny snippits of data that we gather over time and purposefully by thinking about VoI, and the diminishing returns of information.

I don't want to cover Bayes because people far smarter than me have covered it very well.  If you are interested in learning bayes I would suggest heading to Arbital for their excellent guides.

But we're not done at bayes.  This all comes down to the idea of trade-offs.  Exploration VS exploitation is a trade off of {time/energy} vs expected reward.


A classic example of a trade-off is a story of Sharpening the Saw (from the book 7 habits of highly effective people)

A woodcutter strained to saw down a tree.  A young man who was watching asked “What are you doing?”

“Are you blind?” the woodcutter replied. “I’m cutting down this tree.”

The young man was unabashed. “You look exhausted! Take a break. Sharpen your saw.”

The woodcutter explained to the young man that he had been sawing for hours and did not have time to take a break.

The young man pushed back… “If you sharpen the saw, you would cut down the tree much faster.”

The woodcutter said “I don’t have time to sharpen the saw. Don’t you see I’m too busy?”

The thing about life and trade offs is that all of life is trade-offs between things you want to do and other things you want to do.

Exploration and exploitation is a trade off between the value of what you know and the value of what you might know if you find out.


Try this:

  • Make a list of all the things you have done over the last 7 days.  (Use your diary and rough time chunking)
  • Sort them into exploration activities and exploitation activities.
    Answer this:
  • Am I exploring enough? (on a scale of 1-10)
  • Am I exploiting enough? (on a scale of 1-10)
  • Have I turned down any exploring opportunities recently?
  • Have I turned down any exploitation opportunities recently?
  • How could I expand any exploring I am already doing?
  • How could I expand any exploiting I am already doing?
  • How could I do more exploring?  How could I do less exploring?
  • How could I do more exploiting?  How could I do less exploiting?

There are two really important things to take away from the Exploration-Exploitation dichotomy:

  1. You probably make the most measurable and ongoing gains in the Exploitation phase.  I mean - lets face it, these are long running goal-seeking behaviours like sticking to an exercise routine.
  2. The exploration might be seem more fun (find exciting and new hobbies) but are you sure that's what you want to be doing in regard to 1?

Meta: This is part 1 of a 4 part series.

This took in the order of 10-15 hours to finish because I was doing silly things like trying to fit 4 posts into 1 and stumbling over myself.

The challenge of writing Utopia

9 Stuart_Armstrong 24 December 2016 05:35PM

The story itself has been posted here.

Tomorrow, to celebrate a certain well-known event, I'll be posting another story of a Utopia. Unlike the previous attempt, this is utopia on hard mode.

What does that mean? Well, utopias are pretty hard to write anyway. Writing needs challenges for the characters, and that's trivially easy in a dystopia (everything is a challenge), a fake utopia (the challenge is to to look beneath the facade, and fight the secret enemy), or even imperfect utopias (the challenge is to solve the remaining problems). Iain M. Bank's Culture illustrates another way you can write about utopias and keep them interesting: by having an external foe as a challenge.

I avoided all those tricks. The challenge then was to write about a genuine utopia, one that people would enjoy living in, without any hidden flaws or enemies, internal or external. And these had to be real people doing things they wanted to do, rather than idealised people doing things they should do. Basically a real utopia has to contain internet trolls and various fanatics, and still be a great place for everyone.

The setting is a future Earth that is full-fledged techno-utopia, full of powerful artificial intelligences (with human-friendly goals, of course), uploads (human minds run on computers), massive technological developments, and the beginning of universal space colonisation.

In one sense, this made the story easier to write - nobody argues over the last leg of lamb needed to prevent starvation. In another sense, it made it much harder. Any human could desire to purge themselves of sinful thoughts, upgrade themselves to superintelligence, or copy themselves ten trillion times. And the AIs could perfectly grant them their wish - but should they? If so, do they let arbitrarily bad consequences happen? And if not, how do they go about forbidding things in a utopia? And what happens to disputes between humans - like when one person wants to join a group and the members of the group don't want to let them in? Can you prevent social nastiness - but then what about those people who want to be nasty?

You can read the story to see how well or badly I've answered these challenges. The Utopia was inspired a lot by Eliezer's fun sequence, Scott Alexander's Archipelago, and LARP. The general principles are that there has to be a functioning society behind everything, that people can become whatever they want to be (eventually, and after a lot of challenges, if need be), and that the good aspects of everything must be preserved, if possible.

To explain that last point: it's clear that tolerant liberal democracies are better places than repressive theocracies. But repressive theocracies will probably have certain positive aspects lacking in democracies (maybe a sense of place? an enjoyable resignation to fate or government?). The challenge is to take that positive aspect, fill it out, and make it available without the rest of the baggage. Similarly, the quote "death brings meaning to life" is nonsense, but there's something in that idea-space - something about contemplating the brevity of existence, and the perspective it gives - that is worth preserving. For some people or most people (or groups), if not necessarily for all people in all groups. Similarly, good outcomes often have bad aspects. So the engineering challenge is to separate the good aspects of all experiences from the bad, gaining the wisdom or experience without the intolerable pain and anxiety.

Since I tried to cram the maximum of ideas in, the story suffers from a certain degree of "tell, not show". Now, this is very much in the tradition of utopias (it's "Plato's Republic", not "Exciting Adventures in Plato's Republic (XXX-rated!!!!)"), but it is a narrative, and hopefully it's clear there's the potential for more - for much more.

In any case, I hope it works, and gives people something to aim for.

 

Thanks to all those, to numerous to mention, who have helped directly or indirectly with this. Have a great Holiday Festival!

[Link] Announcement: Intelligence in Literature Prize

8 Vaniver 04 January 2017 08:07PM

Throw a prediction party with your EA/rationality group

8 eukaryote 31 December 2016 11:02PM

TL;DR: Prediction & calibration parties are an exciting way for your EA/rationality/LessWrong group to practice rationality skills and celebrate the new year.

On December 30th, Seattle Rationality had a prediction party. Around 15 people showed up, brought snacks, brewed coffee, and spent several hours making predictions for 2017, and generating confidence levels for those predictions.

This was heavily inspired by Scott Alexander’s yearly predictions. (2014 results, 2015 results, 2016 predictions.) Our move was to turn this into a communal activity, with a few alterations to meet our needs and make it work better in a group.

Procedure:

  • Each person individually writes a bunch of predictions for the upcoming year. They can be about global events, people’s personal lives, etc.
    • If you use Scott Alexander’s system, create 5+ predictions each for fixed confidence levels (50%, 60%, 70%, 80%, 90%, 95%, etc.)
    • If you want to generate Brier scores or logarithmic scores, just do 30+ predictions at whatever confidence levels you believe.
  • Write down confidence levels for each prediction.
  • Save your predictions and put it aside for 12 months.
  • Open up your predictions and see how everyone did.

To make this work in a group, we recommend the following:

  • Don’t share your confidence intervals. Avoid anchoring by just not naming how likely or unlikely you think any prediction is.
  • Do share predictions. Generating 30+ predictions is difficult, and sharing ideas (without confidence levels) makes it way easier to come up with a bunch. We made a shared google doc, and everyone pasted some of their predictions into it.
  • Make predictions that, in a year, will verifiably have happened or not. (IE, not “the academic year will go well”, which is debatable, but “I will finish the year with a 3.5 GPA or above”.)
  • It’s convenient to assume that unless stated otherwise predictions that end by the next year (IE, "I will go to the Bay Area" means "I will go to the Bay Area at least once in 2017.") It’s also fine to make predictions that have other end dates (“I will go to EA Global this summer.”)
  • Make a bunch of predictions first without thinking too hard about how likely they are, then assign confidence levels. This post details why. You could also generate a group list of predictions, and everyone individually lists their own confidence levels.


This makes a good activity for rationality/EA groups for the following reasons:

  • Practicing rationality skills:
    • Making accurate predictions
    • Using confidence intervals
  • Accessibility
    • It’s open to many different knowledge levels. Even if you don’t know a thing about geopolitics, you can still give predictions and confidence intervals about media, sports, or your own life.
    • More free-form and less intimidating than using a prediction market. You do not have to know about the details of forecasting to try this.
  • Natural time and recurring activity
    • You could do this at any point during the year, but doing it at the start of the year seems appropriate for ringing in the new year.
    • In twelve months, you have an automatic new activity, which is coming back together and checking everybody’s predictions from last year. Then you make a new set of predictions for next year. (If this falls through for some reason, everyone can, of course, still check their predictions on their own.)
  • Fostering a friendly sense of competitiveness
    • Everyone wants to have the best calibration, or the lowest Brier score. Everyone wants to have the most accurate predictions!


Some examples of the predictions people used:

  • Any open challenges from the Good Judgment Project.
  • I will switch jobs.
  • I will make more than $1000 money in a way that is different from my primary job or stock.
  • I will exercise 3 or more times per week in October, November, December.
  • I’ll get another tattoo.
  • Gay marriage will continue to be legal in Washington state.
  • Gay marriage will continue to be legal in all 50 states.
  • I will try Focusing at least once.
  • I will go to another continent.
  • CRISPR clinical trials will happen on humans in the US.
  • A country that didn’t previously have nuclear weapons will acquire them.
  • I will read Thinking Fast and Slow.
  • I will go on at least 3 dates.

Also relevant:

  • 16 types of useful predictions
  • Brier values and graphs of ‘perfect’ vs. actual scores will give you different information. Yvain writes about the differences between these. Several of us did predictions last year using the Scott Alexander method (bins at fixed probabilities), although this year, everybody seems to have used continuous probabilities. The exact method by which we’ll determine how well-calibrated we were will be left to Seattle Rationality of 2018, but will probably include Brier values AND something to determine calibration.

Meetup Discussion

7 luminosity 08 January 2017 05:14AM

One of the most valuable services the Less Wrong community has to offer are the meetup groups. However, it strikes me that there isn't a lot of knowledge sharing between different meetup groups. Presumably there's a lot that the different groups could learn from each other -- things that can be done, experiments that have or haven't worked out, procedural and organisational tips. Hence this post. Please go ahead and write a summary about your local less wrong meetup below:

  • What meetups do you run?
  • What's worked?
  • What hasn't?
  • How is the group organised?

 

Wikipedia usage survey results

7 VipulNaik 25 December 2016 01:55PM

Contents

Summary

In 2016, Issa Rice and I conducted several surveys of Wikipedia usage. We collected survey responses from Slate Star Codex readers, Vipul’s Facebook friends, and a few United States audiences through SurveyMonkey Audience and Google Surveys (known at the time as Google Consumer Surveys). Our survey questions measured how heavily people use Wikipedia, what sort of pages they read or expected to find, the relation between their search habits and Wikipedia, and other actions they took within Wikipedia.

The surveys are part of our work to understand the impact of contributing to Wikipedia. Both of us regularly contribute to the site, and we are also getting more people to work on editing and adding content to Wikipedia. Therefore we want to understand how people use Wikipedia, how much they use Wikipedia, what types of people tend to use Wikipedia, and so on, so that we can direct efforts more strategically.

Our three main takeaways:

  • Wikipedia consumption is heavily skewed toward a profile of “elite” people, and these people use the site in qualitatively different ways. (More)

  • As a result, we’ve revised upward our estimate of the impact per pageview, and revised downward our estimate of the broad appeal and reach of Wikipedia. (More)

  • The gap between elite samples of Wikipedia users and general United States Internet users is significantly greater than the gap between the different demographics within the United States that we measured.
    It is comparable to the gap between United States Internet users and Internet users in low-income countries. (More)

This post goes over the survey questions, the responses of participants, and other survey data (specifically, data from the New Readers surveys by the Wikimedia Foundation) and then explains the takeaways.

Surveys

First SurveyMonkey survey (S1)

At the end of May 2016, Issa Rice and I created a Wikipedia usage survey on SurveyMonkey to gauge the usage habits of Wikipedia readers and editors.

Audiences for S1

SurveyMonkey allows the use of different “collectors” (i.e. survey URLs that keep results separate), so we circulated several different URLs among four locations to see how different audiences would respond.

The audiences were as follows:

  • SurveyMonkey’s United States audience with no demographic filters (62 responses, 54 of which are full responses). We will refer to this audience as SM, or S1SM if needed to avoid ambiguity.

    Acquisition cost: $100 ($2 per response for 50 responses, 4 extra responses given gratis)

  • My Facebook timeline (post asking people to take the survey; 70 responses, 69 of which are full responses). For background on the timeline audience, see my page on how I use Facebook. We will refer to this audience as “Vipul’s Facebook friends” or V for short.

    Acquisition cost: None

  • The Wikipedia Analytics mailing list (email linking to the survey; 7 responses, 6 of which are full responses). Note that due to the small size of this group, the results below should not be trusted, unless possibly when the votes are decisive. We will refer to this audience as AM.

    Acquisition cost: None

  • Slate Star Codex (post that links to the survey; 618 responses, 596 of which are full responses). We will refer to this audience as SSC.

    While Slate Star Codex isn’t the same as LessWrong, we think there is significant overlap in the two sites’ audiences (see e.g. the recent LessWrong diaspora survey results).

    Acquisition cost: None

  • In addition, although not an actual audience with a separate URL, several of the tables we present below will include an “H” group; this is the heavy users group of people who responded by saying they read 26 or more articles per week on Wikipedia. This group has 179 people: 164 from Slate Star Codex, 11 from Vipul’s timeline, and 4 from the Analytics mailing list.

We ran the survey from May 30 to July 9, 2016 (although only the Slate Star Codex survey had a response past June 1).

Questions for S1

For reference, here are the survey questions for the first survey. A dummy/mock-up version of the survey can be found here: https://www.surveymonkey.com/r/PDTTBM8.

The survey introduction said the following:

This survey is intended to gauge Wikipedia use habits. This survey has 3 pages with 5 questions total (3 on the first page, 1 on the second page, 1 on the third page). Please try your best to answer all of the questions, and make a guess if you’re not sure.

And the actual questions:

  1. How many distinct Wikipedia pages do you read per week on average?

    • less than 1
    • 1 to 10
    • 11 to 25
    • 26 or more
  2. On a search engine (e.g. Google) results page, do you explicitly seek Wikipedia pages, or do you passively click on Wikipedia pages only if they show up at the top of the results?

    • I explicitly seek Wikipedia pages
    • I have a slight preference for Wikipedia pages
    • I just click on what is at the top of the results
  3. Do you usually read a particular section of a page or the whole article? (Multiple options can be selected)

    • Particular section
    • Whole page
  4. How often do you do the following? (Choices: Several times per week, About once per week, About once per month, About once per several months, Never/almost never.)

    • Use the search functionality on Wikipedia
    • Be surprised that there is no Wikipedia page on a topic
  5. For what fraction of pages you read do you do the following? (Choices: For every page, For most pages, For some pages, For very few pages, Never. These were displayed in a random order for each respondent, but displayed in alphabetical order here.)

    • Check (click or hover over) at least one citation to see where the information comes from on a page you are reading
    • Check how many pageviews a page is getting (on an external site or through the Pageview API)
    • Click through/look for at least one cited source to verify the information on a page you are reading
    • Edit a page you are reading because of grammatical/typographical errors on the page
    • Edit a page you are reading to add new information
    • Look at the “See also” section for additional articles to read
    • Look at the editing history of a page you are reading
    • Look at the editing history solely to see if a particular user wrote the page
    • Look at the talk page of a page you are reading
    • Read a page mostly for the “Criticisms” or “Reception” (or similar) section, to understand different views on the subject
    • Share the page with a friend/acquaintance/coworker

For the SurveyMonkey audience, there were also some demographic questions (age, gender, household income, US region, and device type). These questions were not filled by respondents at the time of the survey, but rather, are filled in by respondents in order to be able to participate in these surveys. You can learn more on the SurveyMonkey Contribute page.

Second SurveyMonkey survey (S2)

After we looked at the survey responses on the first day, Issa and I decided to create a second survey to focus on the parts from the first survey that interested us the most.

Audiences for S2

The second survey was only circulated among SurveyMonkey’s audiences:

  • SurveyMonkey’s US audience with no demographic filters (54 responses).

    Acquisition cost: $50 ($1 per response for 50 responses, 4 extra responses given gratis)

  • SurveyMonkey’s US audience with the following filters: ages 18–29 with a college or graduate degree (50 responses).

    Acquisition cost: $125 ($2.50 per response for 50 responses)

We first ran the survey on the unfiltered audience again because the wording of our first question was changed and we wanted to have the new baseline. We then chose to filter for young college-educated people because our prediction was that more educated people would be more likely to read Wikipedia. The SurveyMonkey demographic data does not include education, and we hadn’t seen the Pew Internet Research surveys in the next section, so we were relying on our intuition and some demographic data from past surveys) for the “college-educated” part. Our selection of the age group was based on the fact that young people in our first survey gave more informative free-form responses in survey 2 (SurveyMonkey’s demographic data does include age).

Questions for S2

For reference, here are the survey questions for the second survey. A dummy/mock-up version of the survey can be found here: https://www.surveymonkey.com/r/28BW78V.

The survey introduction said the following:

This survey is intended to gauge Wikipedia use habits. Please try your best to answer all of the questions, and make a guess if you’re not sure.

This survey has 4 questions across 3 pages.

In this survey, “Wikipedia page” refers to a Wikipedia page in any language (not just the English Wikipedia).

And the actual questions:

  1. How many distinct Wikipedia pages do you read (at least one sentence of) per week on average?

    • Fewer than 1
    • 1 to 10
    • 11 to 25
    • 26 or more
  2. Which of these articles have you read (at least one sentence of) on Wikipedia (select all that apply)? (These were displayed in a random order except the last option for each respondent, but displayed in alphabetical order except the last option here.)

    • Adele
    • Barack Obama
    • Bernie Sanders
    • China
    • Donald Trump
    • Google
    • Hillary Clinton
    • India
    • Japan
    • Justin Bieber
    • Justin Trudeau
    • Katy Perry
    • Taylor Swift
    • The Beatles
    • United States
    • World War II
    • None of the above
  3. What are some of the Wikipedia articles you have most recently read (at least one sentence of)? Feel free to consult your browser’s history.

  4. Recall a time when you were surprised that a topic did not have a Wikipedia page. What were some of these topics?

As with the SurveyMonkey Audience responses for S1, the responses for S2 also came with demographic information that the respondents had previously filled in.

Google Surveys survey (GS)

We ran a third survey on Google Surveys (known at the time as Google Consumer Surveys) with a single question that was a word-to-word replica of the first question from the second survey. The main motivation here was that on Google Surveys, a single-question survey costs only 10 cents per response, so it was possible to get to a large number of responses at relatively low cost, and achieve more confidence in the tentative conclusions we had drawn from the SurveyMonkey surveys.

Audiences for GS

We bought 500 responses at 10 cents per response, for a total acquisition cost of $50. The responses were from a general United States audience.

GS uses a “surveywall” methodology to collect survey responses: the survey questions are shown to people who want to access a piece of content (article or video) and they need to answer the question to access it.

Overall, Google Surveys in the United States is reasonably close to representative of the voting US population and the Internet-using population. Also, the sample size of the survey was largest. Therefore, among the surveys we did, this survey comes closest to approximating the behavior of the Internet-using population in the United States.

You can learn more at the Wikipedia page for Google Surveys.

Questions for GS

This survey had exactly one question. The wording of the question was exactly the same as that of the first question of the second survey.

  1. How many distinct Wikipedia pages do you read (at least one sentence of) per week on average?

    • Fewer than 1
    • 1 to 10
    • 11 to 25
    • 26 or more

One slight difference was that whereas in the second SurveyMonkey survey, the order of the options was fixed, the Google Surveys survey did a 50/50 split between that order and the exact reverse order. Such splitting is a best practice to deal with any order-related biases, while still preserving the logical order of the options.

You can read more on the questionnaire design page of the Pew Research Center.

The GS responses come with inferred demographic and geographic data (age, gender, income level, location). The geographic data is generally reliable because it is based on IP address, but inferred age and gender data is not as reliable as the self-reported data that we get from SurveyMonkey Audience. For more on the accuracy of the inferred data, see the Pew Research Center’s comparison.

Wikimedia Foundation New Readers survey (NR)

In late 2016, the Wikimedia Foundation’s Global Reach team published the results of New Readers phone surveys. The questions in these surveys have some overlap with the questions in our surveys, so we have updated our post to include a discussion of these surveys and how the results compare with ours.

Audiences for NR

The NR surveys were conducted in the following five countries: Nigeria (2768 responses), India (9235 responses), Mexico (2568 responses), Brazil (5343 responses), and Egypt (3976 responses). The surveys were conducted by phone.

For the first three countries (Nigeria, India, and Mexico), results of additional in-person surveys have also been published.

We were not able to find information on the cost of the surveys, but considering the large audience size (23,890 in total), the survey length, and the labor-intensive method of conducting the survey, we estimate that it cost tens of thousands of dollars. For comparison, an article about survey firm GeoPoll suggests that $10 per response is a fairly good rate for conducting surveys in some African countries.

Questions for NR

We will compare the results of our surveys with the results of the New Readers surveys. To shed light on this comparison, we include below the list of questions in the New Readers phone survey.

Not all questions were presented in all surveys. The Egypt survey, which is the more recent, had the longest list of questions, and we provide this list below. The numbering is mostly based on the Egypt survey, though off by one for later questions due to a question missing from the Egypt survey.

Our later analysis will focus on the first, fourth, and seventh question, which are together comparable against the first question of S1, S2, and GS.

  1. Do you use the Internet?

    • Yes
    • Said no, but uses Facebook
    • No
  2. What do you use the Internet for the most? (for those who said Yes to Q1)

    • Look up info
    • Social media
    • Entertainment
    • News
    • Others
  3. What’s the biggest reason you don’t use the Internet? (for those who said No to Q1)

    • Too expensive
    • Not sure it’s useful
    • Not sure what it is
    • Other
  4. Have you ever heard of Wikipedia?

    • Yes
    • No
  5. Where did you find out about Wikipedia?

    • Internet
    • School
    • Friends and family
    • Radio or TV
    • Not sure
  6. What do you use Wikipedia for?

    • School
    • Work
    • Entertainment
    • Other
  7. How often do you use Wikipedia?

    • Daily
    • Weekly
    • Monthly
    • Rarely
    • Never
  8. How interested are you in reading Wikipedia? (for those who answered “Rarely” or “Never” to the previous question)

    • Not interested
    • Somewhat
    • Very interested
  9. What’s the largest barrier keeping you from reding Wikipedia? (for those who answere “Very interested” to Q8)

    • Don’t trust content
    • Expensive data
    • Not interesting enough
    • Can’t find it
    • Other
  10. What would make you more likely to use Wikipedia? (for those who answered “Not interested” to Q8)

    • Trusted the content
    • Cheaper data
    • More interesting articles
    • Known how to find it
    • None
  11. Do you have a mobile phone? (This question was in some other country surveys though not in the Egypt one. Hence the numbering for later questions is one more than the numbering in the actual Egypt survey)

    • Yes
    • No
  12. Can you use the Internet with your phone?

    • Yes
    • No
  13. How do you access the Internet on your phone?

    • Cellular
    • Wifi and cell
    • Wifi only
    • No Internet
    • Not sure
  14. What is your usual network speed?

    • 2G / Edge
    • 3G
    • Better than 3G
    • Not sure
  15. Do you download and use Apps?

    • Yes
    • No
  16. What is your gender?

    • Male
    • Female
  17. What is your age?

    • Under 18
    • 19–31
    • 31–50
    • over 50
    • Prefer not to say
  18. What is your location?

    • Urban
    • Rural
    • Not sure
  19. What is your geographical zone? (options specific to Egypt)

Other surveys

Several demographic surveys regarding Wikipedia have been conducted, targeting both editors and users. The surveys we found most helpful were the following:

  • The 2010 Wikipedia survey by the Collaborative Creativity Group and the Wikimedia Foundation. The explanation before the bottom table on page 7 of the overview PDF has “Contributors show slightly but significantly higher education levels than readers”, which provides weak evidence that more educated people are more likely to engage with Wikipedia.
  • The Global South User Survey 2014 by the Wikimedia Foundation
  • Pew Internet Research’s 2011 survey: “Education level continues to be the strongest predictor of Wikipedia use. The collaborative encyclopedia is most popular among internet users with at least a college degree, 69% of whom use the site.” (page 3)
  • Pew Internet Research’s 2007 survey.

There is also the New Readers survey mentioned earlier, that we examine in detail in this post.

Motivation

Issa and I ultimately want to get a better sense of the value of a Wikipedia pageview (one way to measure the impact of content creation), and one way to do this is to understand how people are using Wikipedia. As we focus on getting more people to work on editing Wikipedia – thus causing more people to read the content we pay and help to create – it becomes more important to understand who is reading the content, and how they engage with it.

For some previous discussion, see also my answers to the following Quora questions:

Wikipedia allows relatively easy access to pageview data (especially by using tools developed for this purpose, including one that I made), and there are some surveys that provide demographic data (see “Other surveys” above). However, after looking around, it was apparent that the kind of information our survey was designed to find was not available. This was before the New Readers survey results had been published.

Results

In this section we present the highlights from each of the survey questions. If you prefer to dig into the data yourself, there are also some exported PDFs below provided by SurveyMonkey. Most of the inferences can be made using these PDFs, but there are some cases where additional filters are needed to deduce certain percentages.

For the SurveyMonkey surveys, we use the notation “SnQm” to mean “survey n question m”. The Google Surveys survey question is referred to as GS, and the New Readers questions are referred to with the notation “NRQm” for question m of the survey.

S1Q1: number of Wikipedia pages read per week

Here is a table that summarizes the data for Q1. Note that SMM and SMF don’t add up to SM as some respondents did not specify their gender.

How many distinct Wikipedia pages do you read per week on average? SM = SurveyMonkey audience, V = Vipul Naik’s timeline, SSC = Slate Star Codex audience, AM = Wikipedia Analytics mailing list, SMM = SurveyMonkey males, SMF = SurveyMonkey females.
Response SM (N=62) V (N=70) SSC (N=618) AM (N=7) SMM (N=28) SMF (N=26)
less than 1 42% 1% 1% 0% 25% 58%
1 to 10 45% 40% 37% 29% 46% 42%
11 to 25 13% 43% 36% 14% 29% 0%
26 or more 0% 16% 27% 57% 0% 0%
pgs/wk lower 1.88 9.29 11.35 16.65 3.65 0.42
pgs/wk upper 8.17 22.76 26.21 34.90 12.10 4.70

The “pgs/wk lower” is obtained as the average pages read per week if everybody read at the lower end of their estimate (so the respective estimates are 0, 1, 11, and 26).

The “pgs/wk upper” is obtained as the average of pages read per week if everybody read at the upper end of their estimate, except the “26 or more” case where we assume a value of 50 (so the respective estimates are 1, 10, 25, and 50). We choose 50 as a reasonable upper bound on what the average person who views more than 26 pages likely views, rather than a strict bound on every individual.

There are two reasons to compute the “pgs/wk lower” and “pgs/wk upper” numbers:

  • Having these ranges makes it easier to quickly compare different audiences.

  • The (very approximate) estimates of pages/week can be validated against known information about total pageviews.

The comments indicated that S1Q1 was flawed in several ways: we didn’t specify which language Wikipedias count nor what it meant to “read” an article (the whole page, a section, or just a sentence?).

One comment questioned the “low” ceiling of 26; however, the actual distribution of responses suggests that the ceiling wasn’t too low.

An interesting potential modification of the survey would be to ask further questions of people who selected an extreme response, to better bucket them.

S1Q2: affinity for Wikipedia in search results

We asked Q2, “On a search engine (e.g. Google) results page, do you explicitly seek Wikipedia pages, or do you passively click on Wikipedia pages only if they show up at the top of the results?”, to see to what extent people preferred Wikipedia in search results.

The main implication to this for people who do content creation on Wikipedia is that if people do explicitly seek Wikipedia pages (for whatever reason), it makes sense to give them more of what they want. On the other hand, if people don’t prefer Wikipedia, it makes sense to update in favor of diversifying one’s content creation efforts while still keeping in mind that raw pageviews indicate that content will be read more if placed on Wikipedia (see for instance Brian Tomasik’s experience, which is similar to my own, or gwern’s page comparing Wikipedia with other wikis).

The following table summarizes our results. Wikipedia has been shortened to WP to conserve column width.

On a search engine (e.g. Google) results page, do you explicitly seek Wikipedia pages, or do you passively click on Wikipedia pages only if they show up at the top of the results? SM = SurveyMonkey audience, V = Vipul Naik’s timeline, SSC = Slate Star Codex audience, AM = Wikipedia Analytics mailing list, H = heavy users (26 or more articles per week) of Wikipedia.
Response SM (N=62) V (N=70) SSC (N=618) AM (N=7) H (N=179) SMM (N=28) SMF (N=26)
Explicitly seek WP 19% 60% 63% 57% 79% 25% 12%
Slight preference for WP 29% 39% 34% 43% 20% 39% 23%
Just click on top results 52% 1% 3% 0% 1% 35% 65%

An oversight on our part was not to include an option for people who avoided Wikipedia or did something else. This became apparent from the comments. For this reason, the “Just click on top results” options might be inflated. In addition, some comments indicated a mixed strategy of preferring Wikipedia for general overviews while avoiding it for specific inquiries, so allowing multiple selections might have been better for this question.

S1Q3: section vs whole page

This question is relevant for us because the work we fund is mainly whole-page creation. If people are mostly reading the introduction or a particular section like the “Criticisms” or “Reception” section (see S1Q5), then that forces us to consider spending more time on those sections, or to strengthen those sections on weak existing pages.

Responses to this question were fairly consistent across different audiences, as can be see in the following table.

Do you usually read a particular section of a page or the whole article? SM = SurveyMonkey audience, V = Vipul Naik’s timeline, SSC = Slate Star Codex audience, AM = Wikipedia Analytics mailing list, H = Heavy users (26 or more articles per week) of Wikipedia, SMM = SurveyMonkey males, SMF = SurveyMonkey females.
Response SM (N=62) V (N=70) SSC (N=618) AM (N=7) H (N=179) SMM (N=28) SMF (N=26)
Section 73% 80% 74% 86% 70% 68% 73%
Whole 34% 23% 33% 29% 37% 39% 31%

People were allowed to select more than one option for this question. The comments indicate that several people do a combination, where they read the introductory portion of an article, then narrow down to the section of their interest.

S1Q4: search functionality on Wikipedia and surprise at lack of Wikipedia pages

We asked about whether people use the search functionality on Wikipedia because we wanted to know more about people’s article discovery methods. The data is summarized in the following table.

How often do you use the search functionality on Wikipedia? SM = SurveyMonkey audience, V = Vipul Naik’s timeline, SSC = Slate Star Codex audience, AM = Wikipedia Analytics mailing list, H = heavy users (26 or more articles per week) of Wikipedia, SMM = SurveyMonkey males, SMF = SurveyMonkey females.
Response SM (N=62) V (N=69) SSC (N=613) AM (N=7) H (N=176) SMM (N=28) SMF (N=26)
Several times per week 8% 14% 32% 57% 55% 14% 0%
About once per week 19% 17% 21% 14% 15% 21% 19%
About once per month 15% 13% 14% 0% 3% 14% 12%
About once per several months 13% 12% 9% 14% 5% 7% 19%
Never/almost never 45% 43% 24% 14% 23% 43% 50%

Many people noted here that rather than using Wikipedia’s search functionality, they use Google with “wiki” attached to their query, DuckDuckGo’s “!w” expression, or some browser configuration to allow a quick search on Wikipedia.

To be more thorough about discovering people’s content discovery methods, we should have asked about other methods as well. We did ask about the “See also” section in S1Q5.

Next, we asked how often people are surprised that there is no Wikipedia page on a topic to gauge to what extent people notice a “gap” between how Wikipedia exists today and how it could exist. We were curious about what articles people specifically found missing, so we followed up with S2Q4.

How often are you surprised that there is no Wikipedia page on a topic? SM = SurveyMonkey audience, V = Vipul Naik’s timeline, SSC = Slate Star Codex audience, AM = Wikipedia Analytics mailing list, H = heavy users (26 or more articles per week) of Wikipedia, SMM = SurveyMonkey males, SMF = SurveyMonkey females.
Response SM (N=62) V (N=69) SSC (N=613) AM (N=7) H (N=176) SMM (N=28) SMF (N=26)
Several times per week 2% 0% 2% 29% 6% 4% 0%
About once per week 8% 22% 18% 14% 34% 14% 4%
About once per month 18% 36% 34% 29% 31% 18% 15%
About once per several months 21% 22% 27% 0% 19% 29% 15%
Never/almost never 52% 20% 19% 29% 10% 36% 65%

Two comments on this question (out of 59) – both from the SSC group – specifically bemoaned deletionism, with one comment calling deletionism “a cancer killing Wikipedia”.

S1Q5: behavior on pages

This question was intended to gauge how often people perform an action for a specific page; as such, the frequencies are expressed in page-relative terms.

The following table presents the scores for each response, which are weighted by the number of responses. The scores range from 1 (for every page) to 5 (never); in other words, the lower the number, the more frequently one does the thing.

For what fraction of pages you read do you do the following? Note that the responses have been shortened here; see the Questions for S1 section for the wording used in the survey. Responses are sorted by the values in the SSC column. SM = SurveyMonkey audience, V = Vipul Naik’s timeline, SSC = Slate Star Codex audience, AM = Wikipedia Analytics mailing list, H = heavy users (26 or more articles per week) of Wikipedia, SMM = SurveyMonkey males, SMF = SurveyMonkey females.
Response SM (N=54) V (N=69) SSC (N=596) AM (N=7) H (N=169) SMM (N=28) SMF (N=26)
Check ≥1 citation 3.57 2.80 2.91 2.67 2.69 3.43 3.73
Look at “See also” 3.65 2.93 2.92 2.67 2.76 3.43 3.88
Read mostly for “Criticisms” or “Reception” 4.35 3.12 3.34 3.83 3.14 4.32 4.38
Click through ≥1 source to verify information 3.80 3.07 3.47 3.17 3.36 3.86 3.73
Share the page 4.11 3.72 3.86 3.67 3.79 4.11 4.12
Look at the talk page 4.31 4.28 4.03 3.00 3.86 4.21 4.42
Look at the editing history 4.35 4.32 4.12 3.33 3.92 4.36 4.35
Edit a page for grammatical/typographical errors 4.50 4.41 4.22 3.67 4.02 4.54 4.46
Edit a page to add new information 4.61 4.55 4.49 3.83 4.34 4.57 4.65
Look at editing history to verify author 4.50 4.65 4.48 3.67 4.73 4.46 4.54
Check how many pageviews a page is getting 4.63 4.88 4.96 3.17 4.92 4.68 4.58

The table above provides a good ranking of how often people perform these actions on pages, but not the distribution information (which would require three dimensions to present fully). In general, the more common actions (scores of 2.5–4) had responses that clustered among “For some pages”, “For very few pages”, and “Never”, while the less common actions (scores above 4) had responses that clustered mainly in “Never”.

One comment (out of 43) – from the SSC group, but a different individual from the two in S1Q4 – bemoaned deletionism.

S2Q1: number of Wikipedia pages read per week

Note the wording changes from S1Q1: “less” was changed to “fewer”, the clarification “at least one sentence of” was added, and we explicitly allowed any language. (The explicit allowing of any language was in the introduction to the survey and not part of the question itself). We have also presented the survey 1 results for the SurveyMonkey audience in the corresponding rows, but note that because of the change in wording, the correspondence isn’t exact.

How many distinct Wikipedia pages do you read (at least one sentence of) per week on average? SM = SurveyMonkey audience with no demographic filters, CEYP = College-educated young people of SurveyMonkey, S1SM = SurveyMonkey audience with no demographic filters from the first survey, SMM = SurveyMonkey males, SMF = SurveyMonkey females, CEYPM = College-educated young males of SurveyMonkey, CEYPF = College-educated young females of SurveyMonkey.
Response SM (N=54) CEYP (N=50) S1SM (N=62) SMM (N=25) SMF (N=26) CEYPM (N=24) CEYPF (N=26)
Fewer than 1 37% 32% 42% 32% 42% 29% 35%
1 to 10 48% 64% 45% 40% 54% 67% 62%
11 to 25 7% 2% 13% 16% 0% 4% 0%
26 or more 7% 2% 0% 12% 4% 0% 4%
pgs/wk lower 3.07 1.38 1.88 5.28 1.58 1.11 1.66
pgs/wk upper 9.02 7.82 8.17 11.92 7.02 7.99 7.55

The “pgs/wk lower” is obtained as the average pages read per week if everybody read at the lower end of their estimate (so the respective estimates are 0, 1, 11, and 26). The “pgs/wk upper” is obtained as the average of pages read per week if everybody read at the upper end of their estimate, except the “26 or more” case where we assume a value of 50 (so the respective estimates are 1, 10, 25, and 50). For more, see the S1Q1 explanation.

Comparing SM with S1SM, we see that probably because of the wording, the percentages have drifted in the direction of more pages read. It might be surprising that the young educated audience seems to have a smaller fraction of heavy users than the general population. However note that each group only had ~50 responses, and that we have no education information for the SM group.

S2Q2: multiple-choice of articles read

Our intention with this question was to see if people’s stated or recalled article frequencies matched the actual, revealed popularity of the articles. Therefore we present the pageview data along with the percentage of people who said they had read an article.

Which of these articles have you read (at least one sentence of) on Wikipedia (select all that apply)? SM = SurveyMonkey audience with no demographic filters, CEYP = College-educated young people of SurveyMonkey. Columns “2016” and “2015” are desktop pageviews in millions. Note that the 2016 pageviews only include pageviews through the end of June. The rows are sorted by the values in the CEYP column followed by those in the SM column.
Response SM (N=54) CEYP (N=50) 2016 2015
None 37% 40%
World War II 17% 22% 2.6 6.5
Barack Obama 17% 20% 3.0 7.7
United States 17% 18% 4.3 9.6
Donald Trump 15% 18% 14.0 6.6
Taylor Swift 9% 18% 1.7 5.3
Bernie Sanders 17% 16% 4.3 3.8
Japan 11% 16% 1.6 3.7
Adele 6% 16% 2.0 4.0
Hillary Clinton 19% 14% 2.8 1.5
China 13% 14% 1.9 5.2
The Beatles 11% 14% 1.4 3.0
Katy Perry 9% 12% 0.8 2.4
Google 15% 10% 3.0 9.0
India 13% 10% 2.4 6.4
Justin Bieber 4% 8% 1.6 3.0
Justin Trudeau 9% 6% 1.1 3.0

Below are four plots of the data. Note that r_s denotes Spearman’s rank correlation coefficient. Spearman’s rank correlation coefficient is used instead of Pearson’s r because the former is less affected by outliers. Note also that the percentage of respondents who viewed a page counts each respondent once, whereas the number of pageviews does not have this restriction (i.e. duplicate pageviews count), so we wouldn’t expect the relationship to be entirely linear even if the survey audiences were perfectly representative of the general population.

SM vs 2016 pageviews

SM vs 2016 pageviews

SM vs 2015 pageviews

SM vs 2015 pageviews

CEYP vs 2016 pageviews

CEYP vs 2016 pageviews

CEYP vs 2015 pageviews

CEYP vs 2015 pageviews

S2Q3: free response of articles read

The most common response was along the lines of “None”, “I don’t know”, “I don’t remember”, or similar. Among the more useful responses were:

S2Q4: free response of surprise at lack of Wikipedia pages

As with the previous question, the most common response was along the lines of “None”, “I don’t know”, “I don’t remember”, “Doesn’t happen”, or similar.

The most useful responses were classes of things: “particular words”, “French plays/books”, “Random people”, “obscure people”, “Specific list pages of movie genres”, “Foreign actors”, “various insect pages”, and so forth.

GS

The survey was circulated to a target size of 500 in the United States (no demographic filters), and received 501 responses.

Since there was only one question, but we obtained data filtered by demographics in many different ways, we present this table with the columns denoting responses and the rows denoting the audience segments.

We also include the S1Q1SM, S2Q1SM, and S2Q1CEYP responses for easy comparison. Note that S1Q1SM did not include the “at least one sentence of” caveat. We believe that adding this caveat would push people’s estimates upward.

If you view the Google Surveys results online you will also see the 95% confidence intervals for each of the segments. Note that percentages in a row may not add up to 100% due to rounding or due to people entering “Other” responses. For the entire GS audience, every pair of options had a statistically significant difference, but for some subsegments, this was not true.

How many distinct Wikipedia pages do you read (at least one sentence of) per week, on average? SM = SurveyMonkey Audience, GS = Google Surveys, SMM = SurveyMonkey males, SMF = SurveyMonkey females.
Audience segment Fewer than 1 1 to 10 11 to 25 26 or more pgs/wk range
S1Q1SM (N=62) 42% 45% 13% 0% 1.88–8.17
S1Q1SMM (N=28) 25% 46% 29% 0% 3.65–12.10
S1Q1SMF (N=26) 58% 42% 0% 0% 0.42–4.70
S2Q1SM (N=54) 37% 48% 7% 7% 3.07–10.42
S2Q1SMM (N=25) 32% 40% 16% 12% 5.28–14.32
S2Q1SMF (N=26) 42% 54% 0% 4% 1.58–7.82
S2Q1CEYP (N=50) 32% 64% 2% 2% 1.38–8.22
S2Q1CEYPM (N=24) 29% 67% 4% 0% 1.11–7.99
S2Q1CEYPF (N=26) 35% 62% 0% 4% 1.66–8.55
GS all (N=501) 47% 35% 12% 6% 3.23–9.73
GS male (N=205) 41% 38% 16% 5% 3.44–10.71
GS female (N=208) 52% 34% 10% 5% 2.74–8.92
GS 18–24 (N=54) 33% 46% 13% 7% 3.71–11.68
GS 25–34 (N=71) 41% 37% 16% 7% 3.95–11.61
GS 35–44 (N=69) 51% 35% 10% 4% 2.49–8.51
GS 45–54 (N=77) 46% 40% 12% 3% 2.50–8.96
GS 55–64 (N=69) 57% 32% 7% 4% 2.13–7.52
GS 65+ (N=50) 52% 24% 18% 4% 3.26–9.42
GS Urban (N=176) 44% 35% 14% 7% 3.71–10.94
GS Suburban (N=224) 50% 34% 10% 6% 3.00–9.40
GS Rural (N=86) 44% 35% 14% 6% 3.45–10.44
GS $0–24K (N=49) 41% 37% 16% 6% 3.69–11.11
GS $25–49K (N=253) 53% 30% 10% 6% 2.96–9.03
GS $50–74K (N=132) 42% 39% 13% 6% 3.38–10.57
GS $75–99K (N=37) 43% 35% 11% 11% 4.42–12.18
GS $100–149K (N=11) 9% 64% 18% 9% 4.78–15.49
GS $150K+ (N=4) 25% 75% 0% 0% 0.75–7.75

The “pgs/wk range” is obtained as follows. The lower bound is obtained as the average pages read per week if everybody read at the lower end of their estimate (so the respective estimates are 0, 1, 11, and 26). The upper bound is obtained as the average of pages read per week if everybody read at the upper end of their estimate, except the “26 or more” case where we assume a value of 50 (so the respective estimates are 1, 10, 25, and 50). For more, see the S1Q1 explanation.

We can see that the overall GS data vindicates the broad conclusions we drew from SurveyMonkey data. Moreover, most GS segments with a sufficiently large number of responses (50 or more) display a similar trend as the overall data. One exception is that younger audiences seem to be slightly less likely to use Wikipedia very little (i.e. fall in the “Fewer than 1” category), and older audiences seem slightly more likely to use Wikipedia very little.

Data validation using known total United States Wikipedia pageviews

Using the country breakdown data for traffic to Wikipedia, we see that Wikipedia received 3.54 billion views in the United States for a recent 30-day period, which translates to about 827 million weekly pageviews.

Estimates for the number of active Internet users in the United States vary, based on definition, between 150 million and 290 million. With these estimates, we get a range of 2.85–5.51 for the number of pageviews per week for a United States user. We see that this range is loosely within the range for the SurveyMonkey surveys as well as Google Surveys. In other words, the survey data is loosely plausible and consistent with known facts.

NRQ1: Do you use the Internet? and NRQ4: Have you ever heard of Wikipedia?

Both questions were asked in the New Readers phone survey for all five countries. NRQ1 was the same across all countries (though for Egypt, the “No” responses were further split to separate people who used Facebook). NRQ4 was asked as Q3 in Nigeria, Mexico, and Brazil.

We additionally want to know the percentage of Internet users who have heard of Wikipedia, as this will be useful later when making estimates of total pages/week read by people. We don’t directly know this number. However, if we assume that the people who have heard of Wikipedia are a subset of the people who use the Internet, then we can compute this percentage as the ratio of the percentage of Yes responses to NRQ4 and NRQ1. This assumption is a reasonable proxy for reality, so we will use the ratio as a stand-in for the percentage of Internet users who have heard of Wikipedia.

New Readers question responses. NRQ1 = Do you use the Internet? NRQ4 = Have you heard of Wikipedia?
Country NRQ4 Yes NRQ1 Yes Ratio
Nigeria (N=2768) 23% 65% 35%
India (N=9235) 25% 64% 39%
Mexico (N=2568) 45% 80% 56%
Brazil (N=5343) 32% 77% 42%
Egypt (N=3976) 17% 59% 29%

An interesting note of comparison: for the surveys we circulated, we did not even ask people if they had heard of Wikipedia. The implicit assumption was that people had heard of Wikipedia. This assumption was probably reasonable in the contexts we operated in, and it didn’t make sense to waste a question (and the underlying survey costs) on getting that information.

NRQ7: How often do you use Wikipedia?

This question was in all country surveys, though at different positions.

The respondents to this question appear to have been selected as only the ones who had heard of Wikipedia.

How often do you use Wikipedia? N values represent respondents to the question.
Country Daily Weekly Monthly Rarely Never pgs/wk range
Nigeria (N=610) 20% 24% 15% 17% 24% 1.07–11.35
India (N=2270) 22% 26% 16% 20% 16% 1.17–12.46
Mexico (N=1169) 18% 33% 19% 17% 14% 1.09–10.84
Brazil (N=1736) 13% 33% 23% 20% 11% 0.89–8.38
Egypt (N=665) 11% 23% 23% 24% 19% 0.72–6.88

The pgs/wk range is calculated as follows. For daily use, we assume between 4 and 50 views a week. For weekly use, we assume between 1 and 5 views a week. For monthly use, we assume between 0.2 and 1 view a week. We do not count any contribution for “Rarely” and “Never”.

We also calculate the percentages relative to the set of all survey respondents (so that the denominator now includes people who have never heard of Wikipedia) and add all the ones who didn’t respond to the Never column:

How often do you use Wikipedia? N values represent respondents to the survey. Those who did not respond to the question are placed in the Never category.
Country Daily Weekly Monthly Rarely Never pgs/wk range
Nigeria (N=2768) 4.4% 5.3% 3.3% 3.7% 83.3% 0.24–2.50
India (N=9235) 5.4% 6.4% 3.9% 4.9% 79.3% 0.29–3.06
Mexico (N=2568) 8.2% 15.0% 8.7% 7.7% 61.0% 0.50–4.94
Brazil (N=5343) 4.2% 10.7% 7.5% 6.5% 71.1% 0.29–2.71
Egypt (N=3976) 1.8% 3.8% 3.8% 4.0% 86.5% 0.12–1.13

Next, we do the same calculation, but now use our denominator as the number of people who use the Internet. This is the closest in spirit to the audience for SurveyMonkey Audience and Google Surveys in the United States, though the selection dynamic does differ quite a bit.

How often do you use Wikipedia? N values represent respondents to the survey who use the Internet? Those who use the Internet but did not respond to this question are placed in the Never category.
Country Daily Weekly Monthly Rarely Never pgs/wk range
Nigeria 7.1% 8.5% 5.3% 6.0% 73.1% 0.38–4.03
India 8.6% 10.2% 6.3% 7.8% 67.3% 0.46–4.87
Mexico 10.1% 18.6% 10.7% 9.6% 51.6% 0.61–6.09
Brazil 5.4% 13.7% 9.6% 8.3% 63.0% 0.37–3.48
Egypt 3.2% 6.6% 6.6% 6.9% 76.7% 0.21–2.00

Comparison against United States audiences

The combin can be compared with S1Q1, S2Q1, and GS. However, the buckets presented to users were very different. The potential correspondence is below.

  1. How many distinct Wikipedia pages do you read (at least one sentence of) per week on average?

    • Fewer than 1: This corresponds to Monthly, Rarely, and Never.
    • 1 to 10: This corresponds to Weekly and a subset of Daily.
    • 11 to 25: This mostly corresponds to Daily.
    • 26 or more: This mostly corresponds to Daily.

The data show that the people surveyed read Wikipedia less than the SurveyMonkey Audience and Google Surveys audiences. The total of the Monthly, Rarely, and Never columns for each of the five countries is over 70%, and it is over 80% for all countries other than Mexico. The corresponding “Fewer than 1” percentage for each iteration of urveyMonkey Audience and Google Surveys is less than 50%, and even on subsegments it is less than 60%.

In other words, the surveys suggest that Wikipedia use is less in the five countries than in the United States.

Data validation against known total country traffic

We get the estimate for weekly traffic by scaling from 30 days to 7 days the country breakdown data. Data was captured on December 23, 2016.

We get Internet-using population estimates from the Wikipedia page, which in turn relies on Internet Live Stats. Estimates were captured on December 23, 2016. We use this data rather than the data from stats.wikimedia.org since this data is more up to date, and includes extrapolated estimates rather than the most recent confirmed estimate.

Internet user and weekly pageview counts are in the millions. The range is the one computed based on the Internet-using population.

We see that the pgs/wk number is a little lower than the range for the case of Nigeria and India, with the gap particularly huge in Nigeria. Otherwise, however, the ranges are plausible and so the pageview data loosely validates the survey results.

Comparison of known data on Internet users and Wikipedia pageviews against previous estimates of pages/week from survey
Country Internet users Weekly pageviews pgs/wk pgs/wk range
Nigeria 86.2 7.88 0.09 0.38–4.03
India 462.1 127.20 0.28 0.46–4.87
Mexico 68.3 71.01 1.04 0.61–6.09
Brazil 120.1 71.77 0.59 0.37–3.48
Egypt 42.3 9.59 0.23 0.21–2.00

SurveyMonkey allows exporting of response summaries. Here are the exports for each of the audiences.

The Google Surveys survey results are available online at https://www.google.com/insights/consumersurveys/view?survey=o3iworx2rcfixmn2x5shtlppci&question=1&filter=&rw=1.

Takeaway: Huge gap between heavy users and general US audience, plus predictors of heavy use

The most striking finding to us was just how wide the gap is between audiences such as Vipul’s Facebook friends and Slate Star Codex on the one hand, and general US Internet users (as measured through SurveyMonkey Audience and Google Surveys) on the other.

Confirming the gap with numbers

Here are three different ways to slice the data to confirm the gap between the audiences.

  • Percentage of respondents who view less than 1 Wikipedia page per week: For Vipul’s Facebook friends, Slate Star Codex, and the Analytics mailing list, this was 0% or 1%.

    In contrast, for all the SurveyMonkey Audience and Google Surveys segments considered, this was 25% or higher, with the most general US audiences and largest sample sizes giving numbers between 40% and 60%.

  • Estimated pages/week range: For Vipul’s friends, Slate Star Codex, and the Analytics mailing list, the lower end was 9 or higher, and the upper end was 19 or higher.

    In contrast, for all the SurveyMonkey Audience and Google Surveys segments considered, the lower end was less than 5 and the upper end was less than 15.

  • Percentage of respondents who view 26 or more Wikipedia pages per week: For Vipul’s friends, Slate Star Codex, and the Analytics mailing list, the number was 16%, 27%, and 57% respectively. In contrast, for all the SurveyMonkey Audience and Google Surveys segments, this percentage was less than 13%, and for most of the larger segments it was less than 7%.

Qualitative differences in other aspects of Wikipedia engagement

Through the additional questions in S1, we got evidence for these statements, true both for heavy users, and for audiences that have a larger proportion of heavy users:

  • They tend to explicitly seek Wikipedia in search results (S1Q2).

  • They are more likely to be surprised at the absence of a Wikipedia page (S1Q4).

  • They are more likely to use the search functionality witihn Wikipedia (S1Q4).

  • They are considerably more likely to engage with page content in various ways, including looking at the See Also section, sharing the page, focusing on Criticisms and Reception, checking citations, and checking the talk page (S1Q5).

However:

  • They are not too different from the general US audience in terms of the extent to which they read a section versus the whole page (S1Q3).

  • They are not noticeably more likely to engage in editing actions on Wikipedia (in other words, active Wikipedia editors constituted a small fraction of heavy users) (S1Q5).

Predictors of audiences with high proportions of heavy users

Of the three audiences with a high proportion of heavy Wikipedia users: Vipul’s Facebook friends, Slate Star Codex, and the Wikimedia Analytics mailing list, only the third has an obvious connection with Wikipedia. The first two audiences are not directly linked to Wikipedia, and this is evidenced somewhat by the low rate of Wikipedia editing in these audiences. This suggests that visiting a specific website or being in a specific friend group on social media can be good predictors of heavy Wikipedia use without necessarily predicting Wikipedia editing.

It would be interesting to run this survey among audiences of different websites and people in different friend networks to get a better sense of what attributes predict high Wikipedia use.

Takeaway: Effect on impact estimates for pageviews

As described in the Motivation section, our interest in the topic stems partly from a desire to quantify the value of individual Wikipedia pageviews. The results we obtained caused us to revise our estimate upward, but with the important caveat of downgrading the reach of Wikipedia.

Upgrading estimate of impact based on reader quality

For some pages, the main way it is impactful is if the right set of people read it. For such a page, getting 1,000 pageviews from the right people (the ones with the information, authority, and skill to act on it) is more valuable than getting 1,000 pageviews from people who happen to visit the page accidentally.

The qualities we have identified for heavy Wikipedia users around their explicit seeking of Wikipedia as well as their use of advanced features on Wikipedia to verify facts and learn more, give us a little more confidence that pages are being read by the right people who are equipped to take action on them.

Additionally, other information we have about the audiences with high proportions of heavy users (specifically, that they are friends with Vipul, read Slate Star Codex, or are on the Analytics mailing list) also give us reason to be optimistic about these readers relative to general Internet users.

Potentially downgrading estimate of impact through reach

For some pages and Wikipedia use cases, the impact pathway crucially depends on a lot of people in diverse contexts and life situations reading it. The results we have obtained suggest, very tentatively, that the views on a given page are likely to come from a less diverse audience than we might naively think.

For instance, let’s say we go on a spree to significantly improve Wikipedia’s coverage of 100 pages related to healthy living habits, and we then see that the pages we’ve improved got 10 million pageviews collectively.

Naively, we might have thought that we were reaching millions of users. However, if a lot of Wikipedia’s pageviews come from heavy users, there’s a good chance that those 10 million pageviews came from a few hundred thousand of these heavy users.

For any given page or set of pages, we can only speculate. Therefore, this downgrading is only potential, and is accompanied by considerable uncertainty.

Takeaway: Gap with elites is stronger than demographic gaps in US but comparable to gap between US and low-income countries

Gender within the United States

For S1, S2, and GS, males in the general US audience used Wikipedia a bit more than females in the general US audience. The biggest and clearest gap was in the percentage of people who view less than 1 Wikipedia page. The gaps were as follows:

  • S1Q1: 25% of males and 58% of females view less than 1 Wikipedia page per week.
  • S2Q1: 32% of males and 42% of females view fewer than 1 Wikipedia page per week.
  • GS: 41% of males and 52% of females view fewer than 1 Wikipedia page per week.

The gaps at the higher end of pages/week were less statistically robust because the percentages were too small, and therefore easily affected by outlier individuals. With that said, the overall pages/week ranges for men were mostly higher than those for women, as expected.

The gender gap in reading (which should be distinguished from the gender gap in editing) is consistent with past research on gender differences in Wikipedia reading and also with the surveys mentioned in the Other surveys section. It’s particularly interesting to compare it against the 2007 and 2011 Pew surveys, both of which target a United States audience.

  • In 2007, 39% of males and 34% of females answered Yes to the question “Do you ever use the Internet to look for information on Wikipedia?”

  • In 2011, 56% of males and 50% of females answered Yes to the same question.

While it’s a bit hard to sensibly compare these magnitudes with the magnitudes we obtained, the data does directionally support the idea of a gender gap in Wikipedia reading.

The gender gap in reading is small compared to the difference with Vipul’s Facebook friends, Slate Star Codex, and the Analytics Mailing List, all of which had 0% or 1% of people viewing less than 1 Wikipedia page per week.

Unfortunately, for the three audiences (Vipul’s Facebook friends, Slate Star Codex, and the Analytics mailing list), we do not have gender data for individual respondents. The audiences from which the respondents were drawn are between 60% and 80% male, but it’s plausible that the actual respondents had a gender proportion outside this range.

Age within the United States

For S1 and S2, the number of people within each age bucket was too small to draw any conclusion. We did notice that on S2, older people were less likely to enter optional comments, but we don’t know why (it could be because of greater difficulty typing rather than anything specific to Wikipedia).

For GS, we saw a clear age gradient. In particular, older pepole were more likely to select the “Fewer than 1” option for the number of Wikipedia pages they read per week. Here’s a snippet from the Google Surveys results table showing that.

How many distinct Wikipedia pages do you read (at least one sentence of) per week on average?
Audience segment Fewer than 1
GS 18–24 (N=54) 33%
GS 25–34 (N=71) 41%
GS 35–44 (N=69) 51%
GS 45–54 (N=77) 46%
GS 55–64 (N=69) 57%
GS 65+ (N=50) 52%

These age differences are broadly in line with common sense: the Internet is used more by people in younger age groups. For Wikipedia in particular, school and college motivate a lot of Wikipedia use. Hence the general summer dip for pageviews to many Wikipedia article; see the American Civil War as an example. The school/college motivation makes Wikipedia more useful to school- and college-age audiences.

Pew results from both 2007 and 2011 confirm the age pattern. The gradients for the two years on the question “Do you ever use the Internet to look for information on Wikipedia?”

  • 2007: 18–29 at 44%, 30–49 at 38%, 50–64 at 31%, 65+ at 26%.

  • 2011: 18–29 at 62%, 30–49 at 52%, 50–64 at 49%, 65+ at 33%.

These age differences pale in comparison with the differences with Vipul’s Facebook friends, Slate Star Codex, and the Analytics mailing list, all of which had 1% or less of their users viewing fewer than 1 page per week.

Unfortunately, for the three audiences (Vipul’s Facebook friends, Slate Star Codex, and the Analytics mailing list), we do not have age data for individual respondents. The audiences from which the respondents were drawn are mostly in the 18–24, 25–34, and 35–44 age groups.

Cross-country comparison in perspective

Here’s an ordering by Wikipedia use:

Low-income countries (India, Nigeria, Brazil, Egypt) < Mexico < United States < Audiences such as Vipul’s Facebook friends, Slate Star Codex, Analytics mailing list < Heavy users

Here are the estimates:

  • Low-income countries: 0.05 to 0.6 pages/week per Internet user (based on actual pageview data)
  • Mexico: Around 1 page/week per Internet user (based on actual pageview data)
  • United States: Between 2.85 and 5.51 pages/week per Internet user (based on actual pageview data)
  • Vipul’s Facebook friends and Slate Star Codex: Between 9 and 26 pages/week per Internet user (inferred from survey responses)
  • Heavy users: At least 26 pages/week per Internet user (by definition)

Thus, we see that the gap from the United States average to a heavy user is about the same as the gap from a low-income country to the United States.

Here’s another way of thinking about it. Wikipedia as a whole got about 16 billion pageviews over a recent 30-day period. If Internet users everywhere used it as much as they do in the United States (even at current Internet penetration rates) Wikipedia would get around double that many pageviews, or about 32 billion a month. If Internet users everywhere used Wikipedia as much as Slate Star Codex readers, Wikipedia would get between 150 billion and 300 billion pageviews a month (a number comparable to the total number of Google searches performed worldwide). If everybody in the world had Internet connectivity and used Wikipedia as much as Slate Star Codex readers do, Wikipedia would get between 400 billion and 800 billion monthly pageviews.

Further reading

The making of this post

Document source

The document and all sources used to compile it are available as a GitHub Gist.

The document is also available as a PDF.

Original version and revision history

This post is a fork of an earlier post of Issa Rice available at http://lesswrong.com/r/discussion/lw/nru/wikipedia_usage_survey_results/ and as a PDF at https://files.issarice.com/wikipedia-survey-results.pdf

The source files used to compile the earlier document are available in a GitHub Gist.

The earlier post has the following major revision history:

  • 2016-07-14: Initial public version.
  • 2016-08-27: A summary is added to the top of the post.
  • 2016-10-05: Google Surveys (then Google Consumer Surveys) results are added.

The current version of the post has been written by me (Vipul Naik), with some feedback from Issa Rice. All errors and imperfections are mine.

The main differences with Issa Rice’s most recent public version are:

  • SurveyMonkey Audience responses are also reported by gender.
  • New Readers survey responses are discussed and compared with existing data.
  • Response data has been compared with known information on total pageviews by country.
  • Explicit takeaways have been added.

The reason for publishing this as a fresh post with different authorship is that the changes that were needed were fairly major, and Issa and I were not in full agreement about how to incorporate them into the existing post.

Survey cost

The survey response collection cost was $325, broken down as follows:

  • S1, to SurveyMonkey Audience: $100, for 50 responses at $2 per response
  • S2, to SurveyMonkey Audience: $50, for 50 responses at $1 per response
  • S2, to SurveyMonkey Audience with filters (college-educated, age 18–29): $125, for 50 responses at $1.25 per response
  • GS (Google Surveys): $50, for 500 responses at 10 cents per response

This does not include the cost of the New Readers surveys, which were not borne by us, but which are likely in the tens of thousands of dollars.

License

This document is released to the public domain. Linked and referenced material may be subject to its own copyright restrictions.

Expected Error, or how wrong you expect to be

7 ozziegooen 24 December 2016 10:49PM

Expected value commonly refers to the mean of a distribution that represents the expectations of a future event. It’s much more specific than “mathematical mean”. If one were to ask about the "mean" of a poker hand there would be confusion, but the ‘expected value’ is obvious.

 

While expected value is a popular term, the fact that it describes one point value means a lot of useful information is excluded.

 

Say you have two analysts for your Californian flower selling empire. Both hand you forecasts for next year's revenue. One of them tells you that revenue will be between $8 and $12 Million, with an average of $10 million. The other tells you that it will be between -$50 and $70 million, with an average of $10 million. These both have expected values of $10 million, but I would guess that you would be very interested in the size of those ranges. The uncertainty matters.

 

One could of course use standard deviation, variance, or literally hundreds of other parameters to describe this uncertainty. But I would propose that these parameters be umbrellaed under the concept of “expected error.” Typically the expected value gets a lot of attention; after all, that is the term in this arena that we have a name for. So an intuitive counter to this focus is the “expected error,” or how much we expect the expected value to be incorrect. In a different sense, the expected error is the part of an estimate that’s not its expected value.


Expected Error: The expected difference between an expected value and an actual value.

Or, “How wrong do you think your best guess will be?”

 

 

Hypothetically, any measure of statistical dispersion could be used to describe expected error. Standard deviation, interquartile range, entropy, average absolute deviation, etc. Of these, I think that the mean absolute deviation is probably the most obvious measure to use for expected error when using continuous variables. Expected value uses a mean, so the expected error could be the "expected value" of the error between the actual value and the referenced expected value.


The mean absolute deviation could selectively be divided by the mean to get the mean absolute percentage deviation, in cases where the percentage is more useful than the absolute number. So one could say that a specific forecast has an expected error of “50” or “10%” (in the case of the expected value being 500.)


The most common way to currently describe expected errors is by using margins of error or confidence intervals. These can get much of the point across in many situations but not all. Confidence intervals have difficulties resembling distributions that aren’t very smooth. For instance, say you have a two deals, both with a 98% chance of making $1 Million. One of them has a 2% chance of making nothing, the other has a 2% chance of losing $50 Million. A 95% confidence interval would treat these two identically. Mean absolute deviation handles this distinction.


Estimate Comparison

 

Your two equally-trusted analysts are told to estimate your employee count next year, and return with the expected values of 20 and 43. At this point you can’t really compare them, except for giving each equal weight. With this information alone it’s not very obvious how much they actually agree with each other, or if one spent far longer on the analysis than the other.

Now imagine if the first presented 20 with an expected error of 5, and the other presented 43 with an expected error of 30. Here a story begins to develop. The first figured out some method that made them quite confident. The second wasn’t very sure; a true value of 20 could be reasonable according to the expected error of 30. In this case you’d probably lean closer to 20 than 43.

It’s often fair to say that expected error is negatively correlated with the amount of available information. Say you needed to estimate revenue for 2020. You make a forecast and it has a high expected error. A few years later you have more information, and you make a second forecast with lower expected error. When the time happens you may be able to make a measurement with no expected error.


GDP forecast from OECD

 

For instance, in many graphs of future projections, error bars (proportional to expected error) get larger as time goes further into the future.

 

This relationship between information and expected error amount does not always hold. One obvious example is a case where one value seems obvious at first, but upon inspection is disproven, leaving several equally-unlikely options available with similar confidence. While new information should always eliminate possible worlds, probability distributions used in expected values act as very specific lenses at those possible worlds.


Propagation of Expected Error

 

Propagation of error is to propagation of uncertainty what expected value is to mean; it’s somewhat of a specific focus of that concept. The math to understand the propagation of expected error is mostly that for the propagation of uncertainty, but implementation strategy is different.


Most descriptions of the propagation of uncertainty involve understanding how specific margins of errors of inputs correspond to margins of errors of outputs. The mathematics relating inputs to outputs are well understood, so the main question is how to propagate the error through them.


In cases where expected error is calculated, the specific model used to determine an output may be up for consideration. There could be multiple ways to estimate the same output using a set of inputs. In these cases, propagation of expected error can be used as part of the modeling process to determine which ways are the most preferred.


For instance, say you are attempting to estimate the number of piano tuners in Boston. One approach involves an intuitive guess of the number of people who own pianos. A second uses a known number of the piano tuner population in other cities with a linear regression. These approaches will result in different expected values and also different expected errors; we could expect that the expected error of the regression would be much less than that for the much more intuitive and uncertain approach. Discovering this information as part of the modeling process could be used in iterating to find optimal mathematical models.


Expected Error in Communication

 

In the cases described above expected values and expected values were in reference to specific future outcomes. They could relatedly be applied to one’s expectation of how an agent will understand some communicated information.


Say you consider ‘a few apples’ to be a distribution between 2 to 5. You tell someone else that you have a ‘few apples.’ You probably expect that their definition of ‘few apples’ is likely to have a different distribution than yours. This expected difference between their distribution and yours can be considered the expected error of this aspect of the communication.

"General communications system" in Communication in the Presence of Noise by Claude E. Shannon 

 

There is a significant study in communication theory about expectations of how noise sources will randomly distort intended signals. In the field of analogue communication and noise, error could slightly change resulting signals according to normal or simple distribution error rates. This is very similar to the concept of expected error, and the concepts of expected error can be used here.


One could imagine an agent making a forecast with expected error E1, then communicating that over a noisy channel with expected error E2, then that information may be misinterpreted with expected error E3. Here it would be useful to treat each expected error term as being represented in similar ways, so that mathematical assumptions could be made to cover the entire pipeline. This may be a bit of a contrived example, but the point is that these different types of error are often handled differently and discussed using very different terminology, and if that could be changed interesting combinations may emerge.


Expected Errors without Expected Values


In communication theory, specific information transferred between two locations isn’t as important as the total noise between the two. Communication systems are optimized to reduce noise without much attention or care about the specific messages transferred. Likewise forecasting and estimation systems could focus on minimizing expected errors.


For instance, someone may be interested in general forecast accuracy, so they may take a survey of the expected errors of a class of estimates of a similar set of complexity within an organization. In other situations they could create grading rubrics or ontologies such to minimize the expected error.


Comparisons to Risk and Uncertainty


The concepts of risk and uncertainty are similar to expected error, so I would like to highlight the differences. First, the terms of risk and uncertainty are both used for many different purposes with slight variations, and have very confusing sets of opposing popular definitions. They both have long histories that tie them to conceptual baggage. As a new term, expected error would have none of that and can be defined separate from expectations.


According to some definitions of risk, risk can be used for both positive and negative outcomes that are uncertain. However, it still strongly implies predictions of future things of consequential impact. If you estimated the number of piano tuners in Boston to be between 5 and 500 for a fun stats problem, I imagine you wouldn’t label that answer as being ‘high risk.’


Uncertainty is closer to the concept but in some cases is awkward. First, it should be mentioned that there is significant literature that assumes that uncertainty is defined as being unquantifiable. Second, in discussions of communication, there is no expected error at the point of a sender, only for for receivers. If uncertainty were to be used it would be have to be understood that it exists isolated to specific agents, which I imagine could be a bit counter-intuitive to some. Perhaps ‘expected error’ can be described as analogues to ‘perspective uncertainty’ or similar narrowed concepts.


Conclusion

 

While I am reluctant to propose a new term like expected error, I must say that I’ve personally experienced great frustration discussing related concepts without it. In my own thinking, expected error has relevance in such fields as taxonomy, semantics, mathematical modeling, philosophy, and many others.

 

Thanks to Pepe Swer and Linchuan Zhang for offering feedback on an early draft of this.

Custom games that involve skills related to rationality

7 Alexander230 22 December 2016 09:03PM

There are some custom games created by me and some other members of Moscow LW community. These games involve skills related to rationality, like fallacy detection, or inductive rule guessing with trying to falsify the hypothesis. We often play these games on our meetups.

I've translated these games to English, so anyone can print the rules and game materials and play.

1. Fallacymania. Main goals of this game is to help people notice fallacies in arguments, and of course to have fun. The game requires 3-20 players (recommended 4-12), and some materials: printed A3 sheets with fallacies (5-10 sheets), card deck with fallacies (you can cut one A3 sheet into cards, or print stickers and put them to common playing cards), pens and empty sheets, and 1 card deck of any type with at least 50 cards (optional, for counting guessing attempts). Rules of the game are explained here:

https://drive.google.com/open?id=0BzyKVqP6n3hKY3lQTVBuODRjRU0

This is the sheet of fallacies, you can download it and print on A3 or A2 sheet of paper:

https://drive.google.com/open?id=0BzyKVqP6n3hKRXZ5N2tZcDVlMW8

Also you can use this sheet to create playing cards for debaters.

Here is my github repo for Fallacymania; it contains script that generates fallacy sheets and cards (both English and Russian versions) from text file with fallacies and their descriptions:

https://github.com/Alexander230/fallacymania

And there is electronic version of Fallacymania for Tabletop Simulator:

http://steamcommunity.com/sharedfiles/filedetails/?id=723941480

2. Tower of Chaos. This is party game where you will guess the secret rule by performing experiments with people on Twister mat. The game requires 2-15 players (recommended 4-7). This game is not so casual as it seems: you have to thoroughly test your hypothesis before telling it if you want to win. Rules are here:

https://drive.google.com/open?id=0BzyKVqP6n3hKYU41VG9nMVpmM1k

3. Scientific Discovery. This is modification of Zendo, but with simultaneous turns of players. Also it has more focus on good hypotheses testing, and there is less important to hide hypotheses and intentions from other players than in original Zendo.

https://drive.google.com/open?id=0BzyKVqP6n3hKTXlpU0RRTkozd00

You can use any items (coins, paperclips etc) to play it as common table game. Of course, in table game you will have to disassemble old combinations when you run out of items.

There is electronic version for Tabletop Simulator:

http://steamcommunity.com/sharedfiles/filedetails/?id=820813131

Rules for electronic version:

https://drive.google.com/open?id=0BzyKVqP6n3hKMTdFa1hobE11U1E

 

How to talk rationally about cults

6 Viliam 08 January 2017 08:12PM

In 1978, several hundred people in Jonestown drank poison and died, because their leader told them to. Not everyone who died there was a volunteer; some members of the group objected, and they were killed first by the more dedicated members, who killed themselves later. Also, many children were killed by their parents. Still, hundreds of people died voluntarily, including the leader of the group, Jim Jones.

This is an extreme case. There are much more groups that create a comparable level of devotion in their members, but use it to extract money and services from the members. Groups where new members change their personalities, becoming almost "clones" of their leaders; then they break contacts with their families and former friends, usually after trying to recruit them for the group, or at least trying to extract from them as many resources as possible for the group. The new personality typically has black-and-white thinking, responds to questions by thought-terminating clichés, and doesn't care much about things unrelated to the group. Sometimes the membership in the group is long-term, but commonly the members are worn-out and leave the group after a few years, replaced by the fresh members they helped to recruit, so despite the individuals change, the group remains.

continue reading »

[Link] Against Compromise, or, Deciding as a Team without Succombing to Entropy

6 Alexandros 08 January 2017 01:22PM

[Link] 50 things I learned at NIPS AI and machine learning conference 2016

6 morganism 26 December 2016 08:50PM

[Link] Are you being p-hacked? Time to hack back.

6 Jacobian 20 December 2016 04:57PM

Actually Practicing Rationality and the 5-Second Level

5 lifelonglearner 06 January 2017 06:50AM

[I first posted this as a link to my blog post, but I'm reposting as a focused article here that trims some fat of the original post, which was less accessible]


I think a lot about heuristics and biases, and I admit that many of my ideas on rationality and debiasing get lost in the sea of my own thoughts.  They’re accessible, if I’m specifically thinking about rationality-esque things, but often invisible otherwise.  

That seems highly sub-optimal, considering that the whole point of having usable mental models isn’t to write fancy posts about them, but to, you know, actually use them.

To that end, I’ve been thinking about finding some sort of systematic way to integrate all of these ideas into my actual life.  

(If you’re curious, here’s the actual picture of what my internal “concept-verse” (w/ associated LW and CFAR memes) looks like)

 

MLU Mind Map v1.png


Open Image In New Tab for all the details

So I have all of these ideas, all of which look really great on paper and in thought experiments.  Some of them even have some sort of experimental backing.  Given this, how do I put them together into a kind of coherent notion?

Equivalently, what does it look like if I successfully implement these mental models?  What sorts of changes might I expect to see?  Then, knowing the end product, what kind of process can get me there?

One way of looking it would to say that if I implemented techniques well, then I’d be better able to tackle my goals and get things done.  Maybe my productivity would go up.  That sort of makes sense.  But this tells us nothing about how I’d actually be going about, using such skills.  

We want to know how to implement these skills and then actually utilize them.

Yudkowsky gives a highly useful abstraction when he talks about the five-second level.  He gives some great tips on breaking down mental techniques into their component mental motions.  It’s a step-by-step approach that really goes into the details of what it feels like to undergo one of the LessWrong epistemological techniques.  We’d like our mental techniques to be actual heuristics that we can use in the moment, so having an in-depth breakdown makes sense.

Here’s my attempt at a 5-second-level breakdown for Going Meta, or "popping" out of one's head to stay mindful of the moment:

  1. Notice the feeling that you are being mentally “dragged” towards continuing an action.
    1. (It can feel like an urge, or your mind automatically making a plan to do something.  Notice your brain simulating you taking an action without much conscious input.)
  2. Remember that you have a 5-second-level series of steps to do something about it.
  3. Feel aversive towards continuing the loop.  Mentally shudder at the part of you that tries to continue.
  4. Close your eyes.  Take in a breath.
  5. Think about what 1-second action you could take to instantly cut off the stimulus from whatever loop you’re stuck in. (EX: Turning off the display, closing the window, moving to somewhere else).
  6. Tense your muscles and clench, actually doing said action.
  7. Run a search through your head, looking for an action labeled “productive”.  Try to remember things you’ve told yourself you “should probably do” lately.  
    1. (If you can’t find anything, pattern-match to find something that seems “productive-ish”.)
  8. Take note of what time it is.  Write it down.
  9. Do the new thing.  Finish.
  10. Note the end time.  Calculate how long you did work.

Next, the other part is actually accessing the heuristic in the situations where you want it.  We want it to be habitual.

After doing some quick searches on the existing research on habits, it appears that many of the links go to Charles Duhigg, author of The Power of Habit, or B J Fogg of Tiny Habits. Both models focus on two things: Identifying the Thing you want to do.  Then setting triggers so you actually do It.  (There’s some similarity to CFAR’s Trigger Action Plans.)  

B J’s approach focuses on scaffolding new habits into existing routines, like brushing your teeth, which are already automatic.  Duhigg appears to be focused more on reinforcement and rewards, with several nods to Skinner.  CFAR views actions as self-reinforcing, so the reward isn’t even necessary— they see repetition as building automation.

Overlearning the material also seems to be useful in some contexts, for skills like acquiring procedural knowledge.  And mental notions do seem to be more like procedural knowledge.

For these mental skills specifically, we’d want them to go off, time irrespective, so anchoring it to an existing routine might not be best.  Having it as a response to an internal state (EX: “When I notice myself being ‘dragged’ into a spiral, or automatically making plans to do a thing”) may be more useful.


(Follow-up post forthcoming on concretely trying to apply habit research to implementing heuristics.)

 

 

 

The time you have

5 Elo 05 January 2017 02:13AM

Original post: http://bearlamp.com.au/the-time-you-have/


Part 1: Exploration-Exploitation

Part 2: Bargaining Trade-offs to your brain.

Part 2a: Empirical time management

Part 3: The time that you have


There is a process called The Immunity to Change by Robert Kegan.  The process is designed to deal with personal problems that are stubborn.  The first step in the process is to make a list of all the things that you are doing or not doing that does not contribute to the goal.  As you go through the process you analyse why you do these things based on what it feels like to do them.

The process is meant to be done with structure but can be done simply by asking.  Yesterday I asked someone who said he ate sugar, ate carbs, and didn't exercise.  Knowing this alone doesn't solve the problem but it helps.

The ITC process was generated by observing patients and therapists for thousands of hours and thousands of cases.  Kegan observed what seems to be effective to bring about change, in people and generated this process to assist in doing so.  The ITC hits on a fundamental universal.  If you read my brief guide on Empirical time management, as well as part 1 - exploration-exploitation of this series it speaks to this universal.  Namely what we are doing with our time is everything we are choosing not to do with our time.  It's a trade off between our values and it's counter-commitments in ITC that's often discovering the hidden counter commitments to the goals.


The interesting thing about what you end up doing with your time is that these are the things that form your revealed preferences.  Revealed preference theory is an economic theory that differentiates between people's stated preferences and their actual actions and behaviours.  It's all good and well to say that your preferences are one thing, but if you never end up doing that; your revealed preferences are in fact something entirely different.

For example - if you say you want to be a healthy person, and yet you never find yourself doing the things that you say you want to do in order to be healthy; your revealed preferences suggest that you are in fact not revealing the actions of a healthy person.  If you live to the ripe old age of 55 and the heavy weight of 130kg and you never end up exercising several times a week or eating healthy food; that means your health goals were a rather weak preference over the things you actually ended up doing (eating plenty and not keeping fit).

It's important to note that revealed preferences are different to preferences, they are in fact distinctly different.  They are their own subset.  Revealed preferences are just another description that informs the map of, "me as a person".  In many ways, a revealed preference is much much more real than a simple preference that does not actually come about.  On a philosophical level, if we have a LoudMouthBot, and all it does is declare it's preference for things.  "I want everyone to be friends", "you need to be friends with me". However it never does anything.   You can log into the bot's IRC channel and see it declaring preferences, day in, day out.  Hour after hour.  And yet, not actually doing those preferences.  He's just a bot, spitting out words that are preferences (almost analogous to a p-zombie).  You could look at LoudMouthBot from the outside and say, "all it does is spew text into a text chat", and that would be an observation which for all purposes can be taken as true.  In contrast, AgentyBot doesn't really declare a preference, Agentybot knows the litany of truth.

If the sky is blue

I desire to believe that the sky is blue,

If the sky is not blue

I desire to believe that the sky is not blue.

Or for this case; a litany of objectivity,

If my revealed preferences show that I desire this goal

I desire to know that is my goal,

If my revealed preferences show that I do not desire this goal

I desire to know that is not my goal.


Revealed preferences work in two directions.  On the one hand you can discover your revealed preferences and let that inform your future judgements and future actions.  On the other hand you can make your revealed preferences show that they line up with your goal.

A friend asked me how she should find her purpose, Easier said than done right? That's why I suggested an exercise that does the first of the two.  In contrast if you already know your goals you want to take stock of what you are doing and align it with your desired goals.

How?

I already covered how to empirically assess your time, That would be the first step of how you take stock of what you are doing.

The second step is to consider and figure out your desired goals.  Unfortunately the process as to how to do that is not always obvious.  For some people they can literally just take 5 minutes and a piece of paper and list off their goals.  For everyone else I have some clues in the form of the list of common human goals.  By going down the list of goals that people commonly obtain you can cue your sense of what are some of the things that you care about, and figure out which ones matter to you.  There are other exercises, but I take it as read that knowing what your goals are is important.  After you have your list of goals you might like to consider estimating what fraction of your time you want to offer to each of your goals.

The third step is one that I am yet to write about.  Your job is to compare the list of your goals and the list of your time use and consider which object level tasks would bring you towards your goals and which actions that you are doing are not enabling you to move towards your goals.

Everything that you do will take time.  Any goal you want to head towards will take time, if you are spending your time on one task towards one goal and not on another task towards another goal; you are preferencing the task you are doing over the other task.

If these are your revealed preferences, what do you reveal that you care about?


I believe that each of us has potential.  That word is an applause light.  Potential doesn't really have a meaning yet.  I believe that each of us could:

  1. Define what we really care about.
  2. Define what results we think we can aim for within what we really care about
  3. Define what actions we can take to yield a trajectory towards those results
  4. Stick to it because it's what we really want to do.

That's what's important right?  Doing the work you value because it leads towards your goals (which are the things you care about).

If you are not doing that, then your revealed preferences are showing that you are not a very strategic.  If you find parts of your brain doing what they want at the detriment of other parts of your goals, you need to reason with them.  Use the powers of VoI, treat this problem as an exploration-exploitation problem, and run some experiments (post coming soon).  

This whole; define what you really care about and then head towards it, you should know that it needs doing now, or you are making bad trade offs.


Meta: this is part 3 of 4 of this series.

Meta: this took 5+ hours to piece together.  I am not yet very good at staying on task when I don't know how to put the right words in the right order yet.  I guess I need more practice.  What I usually do is take small breaks and come back to it.

[Link] Mysterious Go Master Blitzes Competition, Rattles Game Community

5 scarcegreengrass 04 January 2017 05:18PM

Progress and Prizes in AI Alignment

5 Jacobian 03 January 2017 10:15PM

Edit: In case it's not obvious, I have done limited research on AI alignment organizations and the goal of my post is to ask questions from the point of view of someone who wants to contribute and is unsure how. Read down to the comments for some great info on the topic.

I was introduced to the topic of AI alignment when I joined this very forum in 2014. Two years and one "Superintelligence" later, I decided that I should donate some money to the effort. I knew about MIRI, and I looked forward to reading some research comparing their work to the other organizations working in this space. The only problem is... there really aren't any.

MIRI recently announced a new research agenda focused on "agent foundations". Yet even the Open Philanthropy Project, made up of people who at least share MIRI's broad worldview, can't decide whether that research direction is promising or useless. The Berkeley Center for Human-Compatible AI doesn't seem to have a specific research agenda beyond Stuart Russell. The AI100 Center at Stanford is just kicking off. That's it.

I think that there are two problems here:

 

  1. There's no way to tell which current organization is going to make the most progress towards solving AI alignment.
  2. These organizations are likely to be very similar to each other, not least because they practically share a zipcode. I don't think that MIRI and the academic centers will do the exact same research, but in the huge space of potential approaches to AI alignment they will likely end up pretty close together. Where's the group of evo-psych savvy philosophers who don't know anything about computer science but are working to spell out an approximation of universal human moral intuitions?
It seems like there's a meta-question that needs to be addressed, even before any work is actually done on AI alignment itself:

 

How to evaluate progress in AI alignment?

Any answer to that question, even if not perfectly comprehensive or objective, will enable two things. First of all, it will allow us to direct money (and the best people) to the existing organizations where they'll make the most progress.

More importantly, it will enable us to open up the problem of AI alignment to the world and crowdsource it. 

For example, the XPrize Foundation is a remarkable organization that creates competitions around achieving goals beneficial to humanity, from lunar rovers to ecological monitoring. The prizes have two huge benefits over direct investment in solving an issue:

 

  1. They usually attract a lot more effort than what the prize money itself would pay for. Competitors often spend in aggregate 2-10 times the prize amount in their efforts to win the competition.
  2. The XPrizes attract a wide variety of creative entrants from around the world, because they only describe what needs to be done, not how.
So, why isn't there an XPrize for AI safety? You need very clear guidelines to create an honest competition, like "build the cheapest spaceship that can take 3 people to 100km and be reused within 2 weeks". It doesn't seem like we're close to being able to formulate anything similar for AI alignment. It also seems that if anyone will have good ideas on the subject, it will be the people on this forum. So, what do y'all think?

Can we come up with creative ways to objectively measure some aspect of progress on AI safety, enough to set up a competition around it?

 

[Link] Metaknowledge - Improving on the Wisdom of Crowds

5 Houshalter 01 January 2017 09:25PM

[Link] I'm Not An Effective Altruist Because I Prefer...

5 ozymandias 28 December 2016 10:39PM

Guilt vs Shame, Pride vs Joy?

5 ProofOfLogic 19 December 2016 08:00PM

[Epistemic status: speculative. My goal here is only to articulate a plausible hypothesis given the data.]

The Moral Economy, which I mentioned in my post about my recent reading list, talks quite a bit about the crowding-out effect, in which placing external incentives on behavior decreases intrinsic drive for that behavior. This can cause incentives to have a smaller effect on behavior than classical economic theory would predict, or even have the complete opposite effect.

His main source of data is a large number of experiments in which participants across the world play simple games such as prisoner's dilemma, with modifications such as allowing players to punish each other or give bonuses for good behavior. The crowding-out effect is the most frequently observed phenomenon, but it isn't a predictable thing by a long shot: he observes a wide variety of behavior, even including what he calls "crowding in" (incentives increasing intrinsic pro-social motivation). So, there's one puzzle: what creates the different response patterns?

Another puzzle: the crowding-out effect suggests, heuristically, that capitalism would decrease pro-social motives. Incentivising behaviors with money made people more selfish in the experiments. Other priming experiments suggest that people get more selfish with the mere mention of money. Furthermore, experiments show that there is a "spillover" effect: playing games which crowd out non-selfish motives makes people play more selfishly in subsequent games. So, one might suspect that long-term exposure to capitalism would result in long-term crowding out, creating more selfish behavior in the games.

The opposite is true: exposure to capitalism highly correlates with altruistic behavior in these games. Hunter-gatherer societies which are based on mutual sharing of food are among the least cooperative, when it comes to the games. Why?

The book speculates that advanced economies engender a much higher degree of trust in strangers. If you regularly have economic interactions with strangers without being conned, you come to expect (and provide) a level of common decency.

Experiments also show that crowding-in, rather than crowding-out, can be engendered by allowing participants to discuss what actions should be punished before they start play, rather than making them anonymously dole out punishment without group consent. People from market economies are more likely to act like such an agreement already exists, though, while people from less market-based societies are likely to treat punishments as an insult and react by cooperating less.

It sounds to me like people from free-market societies, or people allowed to discuss the system of punishments before the game, internalize the incentives: they see it as consistent with their intrinsic motives, and so they feel guilty if they are punished, and respond positively by shifting their behavior to be more pro-social. (This is borne out by the data.)

When I say "guilty", I have in mind the technical distinction between guilt and shame; guilt is about bad actions, whereas shame is a feeling that you yourself are bad. Shame is low self-esteem.

The research in Self Theories (also discussed in my earlies post) suggests that a shame response is connected to goals such as impressing others, getting good grades, and looking smart. In other words, extrinsic motivators. The students in those studies who reacted well to failure didn't so much have high self-esteem; rather, they were not focused on self-esteem. They were focused on intrinsic motivators. This allowed them to take the feedback provided by failure well, converting it into information about what behaviors don't work (so, more in the realm of guilt) rather than information about them being good or bad (shame).

Both guilt and shame are internalizations of negative reinforcement. However, shame is more likely to lead to learned helplessness, while guilt is more likely to lead to adaptive behaviors.

This reminds me of Goals as Excuses or Guides by Fishbach & Dhar. They talks about a similar distinction with respect to positive rather than negative reinforcement. When we succeed, our subsequent behavior depends on what we think that success meant. If we interpret it as progress on a goal, moral self-licensing is more likely to occur. This is a phenomenon where you perceive yourself as good, and therefore, allow yourself to do more bad things. A classic example would be someone trying to quit smoking who feels good about going for a day without smoking, and rewards themselves with a smoke "because they deserve it". Fishbach & Dhar find that re-framing the achievement in terms of showing commitment to values, rather than progress toward goals, has a tendency to reinforce the behavior rather than the paradoxical self-licensing effect.

To make an analogy to the guilt vs shame distinction: moral self-licensing happens when we interpret our actions as meaning we are good, whereas reinforcement of the behavior happens when we think the actions are good. Again we can see the connection to self-esteem. High self-esteem seems as problematic as low self-esteem: it causes us to paradoxically work against ourselves. Concentrating on how our actions relate to what we value (IE, concentrating on our intrinsic motives) makes self-esteem less relevant and our actions less self-contradictory.

Human motivation seems to be really complicated! To summarize my hypothesis: extrinsic motivations crowd out intrinsic motivations, which can cause incentives to paradoxically have the opposite of the intended effect. However, this is not always observed in the experiments in The Moral Economy because sometimes people internalize incentives. It seems incentives can be internalized in at least two different ways: they can be connected to actions, or they can be connected to self-esteem. Connecting negative reinforcement to self-esteem is called shame in the psychological literature, and leads to learned helplessness. Internalizing negative reinforcement via a connection to actions is called guilt, and seems to be more adaptive. Connecting positive reinforcement to self-esteem could be called pride. It tends to make us work against our own goals via moral self-licensing. For the sake of rounding out the set of terms, I'll call the internalization of positive reinforcement as connecting our actions to our values joy. Joy reinforces successful behaviors, making our actions tend to be more consistent with our values.

Stupid Questions December 2016

5 pepe_prime 19 December 2016 02:41PM

This thread is for asking any questions that might seem obvious, tangential, silly or what-have-you. Don't be shy, everyone has holes in their knowledge, though the fewer and the smaller we can make them, the better.

Please be respectful of other people's admitting ignorance and don't mock them for it, as they're doing a noble thing.

To any future monthly posters of SQ threads, please remember to add the "stupid_questions" tag.

[Link] Honesty and perjury

4 Benquo 17 January 2017 08:08AM

View more: Next