Filter Last three months

Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

[moderator action] Eugine_Nier is now banned for mass downvote harassment

98 Kaj_Sotala 03 July 2014 12:04PM

As previously discussed, on June 6th I received a message from jackk, a Trike Admin. He reported that the user Jiro had asked Trike to carry out an investigation to the retributive downvoting that Jiro had been subjected to. The investigation revealed that the user Eugine_Nier had downvoted over half of Jiro's comments, amounting to hundreds of downvotes.

I asked the community's guidance on dealing with the issue, and while the matter was being discussed, I also reviewed previous discussions about mass downvoting and looked for other people who mentioned being the victims of it. I asked Jack to compile reports on several other users who mentioned having been mass-downvoted, and it turned out that Eugine was also overwhelmingly the biggest downvoter of users David_Gerard, daenarys, falenas108, ialdabaoth, shminux, and Tenoke. As this discussion was going on, it turned out that user Ander had also been targeted by Eugine.

I sent two messages to Eugine, requesting an explanation. I received a response today. Eugine admitted his guilt, expressing the opinion that LW's karma system was failing to carry out its purpose of keeping out weak material and that he was engaged in a "weeding" of users who he did not think displayed sufficient rationality.

Needless to say, it is not the place of individual users to unilaterally decide that someone else should be "weeded" out of the community. The Less Wrong content deletion policy contains this clause:

Harrassment of individual users.

If we determine that you're e.g. following a particular user around and leaving insulting comments to them, we reserve the right to delete those comments. (This has happened extremely rarely.)

Although the wording does not explicitly mention downvoting, harassment by downvoting is still harassment. Several users have indicated that they have experienced considerable emotional anguish from the harassment, and have in some cases been discouraged from using Less Wrong at all. This is not a desirable state of affairs, to say the least.

I was originally given my moderator powers on a rather ad-hoc basis, with someone awarding mod privileges to the ten users with the highest karma at the time. The original purpose for that appointment was just to delete spam. Nonetheless, since retributive downvoting has been a clear problem for the community, I asked the community for guidance on dealing with the issue. The rough consensus of the responses seemed to authorize me to deal with the problem as I deemed appropriate.

The fact that Eugine remained quiet about his guilt until directly confronted with the evidence, despite several public discussions of the issue, is indicative of him realizing that he was breaking prevailing social norms. Eugine's actions have worsened the atmosphere of this site, and that atmosphere will remain troubled for as long as he is allowed to remain here.

Therefore, I now announce that Eugine_Nier is permanently banned from posting on LessWrong. This decision is final and will not be changed in response to possible follow-up objections.

Unfortunately, it looks like while a ban prevents posting, it does not actually block a user from casting votes. I have asked jackk to look into the matter and find a way to actually stop the downvoting. Jack indicated earlier on that it would be technically straightforward to apply a negative karma modifier to Eugine's account, and wiping out Eugine's karma balance would prevent him from casting future downvotes. Whatever the easiest solution is, it will be applied as soon as possible.

EDIT 24 July 2014: Banned users are now prohibited from voting.

[meta] Future moderation and investigation of downvote abuse cases, or, I don't want to deal with this stuff

44 Kaj_Sotala 17 August 2014 02:40PM

Since the episode with Eugine_Nier, I have received three private messages from different people asking me to investigate various cases of suspected mass downvoting. And to be quite honest, I don't want to deal with this. Eugine's case was relatively clear-cut, since he had engaged in systematic downvoting of a massive scale, but the new situations are a lot fuzzier and I'm not sure of what exactly the rules should be (what counts as a permitted use of the downvote system and what doesn't?).

At least one person has also privately contacted me and offered to carry out moderator duties if I don't want them, but even if I told them yes (on what basis? why them and not someone else?), I don't know what kind of policy I should tell them to enforce. I only happened to be appointed a moderator because I was in the list of top 10 posters at a particular time, and I don't feel like I should have any particular authority to make the rules. Nor do I feel like I have any good idea of what the rules should be, or who would be the right person to enforce them.

In any case, I don't want to be doing this job, nor do I particularly feel like being responsible for figuring out who should, or how, or what the heck. I've already started visiting LW less often because I dread having new investigation requests to deal with. So if you folks could be so kind as to figure it out without my involvement? If there's a clear consensus that someone in particular should deal with this, I can give them mod powers, or something.

False Friends and Tone Policing

43 palladias 18 June 2014 06:20PM

TL;DR: It can be helpful to reframe arguments about tone, trigger warnings, and political correctness as concerns about false cognates/false friends.  You may be saying something that sounds innocuous to you, but translates to something much stronger/more vicious to your audience.  Cultivating a debating demeanor that invites requests for tone concerns can give you more information about about the best way to avoid distractions and have a productive dispute.


When I went on a two-week exchange trip to China, it was clear the cultural briefing was informed by whatever mistakes or misunderstandings had occurred on previous trips, recorded and relayed to us so that we wouldn't think, for example, that our host siblings were hitting on us if they took our hands while we were walking.

But the most memorable warning had to do with Mandarin filler words.  While English speakers cover gaps with "uh" "um" "ah" and so forth, the equivalent filler words in Mandarin had an African-American student on a previous trip pulling aside our tour leader and saying he felt a little uncomfortable since his host family appeared to be peppering all of their comments with "nigga, nigga, nigga..."

As a result, we all got warned ahead of time.  The filler word (那个 - nèige) was a false cognate that, although innocuous to the speaker, sounded quite off-putting to us.  It helped to be warned, but it still required some deliberate, cognitive effort to remind myself that I wasn't actually hearing something awful and to rephrase it in my head.

When I've wound up in arguments about tone, trigger warnings, and taboo words, I'm often reminded of that experience in China.  Limiting language can prompt suspicion of closing off conversations, but in a number of cases, when my friends have asked me to rephrase, it's because the word or image I was using was as distracting (however well meant) as 那个 was in Beijing.

It's possible to continue a conversation with someone who's every statement is laced with "nigga" but it takes effort.  And no one is obligated to expend their energy on having a conversation with me if I'm making it painful or difficult for them, even if it's as the result of a false cognate (or, as the French would say, false friend) that sounds innocuous to me but awful to my interlocutor.  If I want to have a debate at all, I need to stop doing the verbal equivalent of assaulting my friend to make any progress.

It can be worth it to pause and reconsider your language even if the offensiveness of a word or idea is exactly the subject of your dispute.  When I hosted a debate on "R: Fire Eich" one of the early speakers made it clear that, in his opinion, opposing gay marriage was logically equivalent to endorsing gay genocide (he invoked a slippery slope argument back to the dark days of criminal indifference to AIDS).

Pretty much no one in the room (whatever their stance on gay marriage) agreed with this equivalence, but we could all agree it was pretty lucky that this person had spoken early in the debate, so that we understood how he was hearing our speeches.  If every time someone said "conscience objection," this speaker was appending "to enable genocide," the fervor and horror with which he questioned us made a lot more sense, and didn't feel like personal viciousness.  Knowing how high the stakes felt to him made it easier to have a useful conversation.

This is a large part of why I objected to PZ Myers's deliberate obtuseness during the brouhaha he sparked when he asked readers to steal him a consecrated Host from a Catholic church so that he could desecrate it.  PZ ridiculed Catholics for getting upset that he was going to "hurt" a piece of bread, even though the Eucharist is a fairly obvious example of a false cognate that is heard/received differently by Catholics and atheists.  (After all, if it wasn't holy to someone, he wouldn't be able to profane it).  In PZ's incident, it was although we had informed our Chinese hosts about the 那个/nigga confusion, and they had started using it more boisterously, so that it would be clearer to us that they didn't find it offensive.

We were only able to defuse the awkwardness in China for two reasons.

  1. The host family was so nice, aside from this one provocation, that the student noticed he was confused and sought advice.
  2. There was someone on hand who understood both groups well enough to serve as an interpreter.

In an ordinary argument (especially one that takes place online) it's up to you to be visibly virtuous enough that, if you happen to be using a vicious false cognate, your interlocutor will find that odd, not of a piece with your other behavior.

That's one reason my debating friend did bother explaining explicitly the connection he saw between opposition to gay marriage and passive support of genocide -- he trusted us enough to think that we wouldn't endorse the implications of our arguments if he made them obvious.  In the P.Z. dispute, when Catholic readers found him as the result of the stunt, they didn't have any such trust.

It's nice to work to cultivate that trust, and to be the kind of person your friends do approach with requests for trigger warnings and tone shifts.  For one thing, I don't want to use emotionally intense false cognates and not know it, any more than I would want to be gesticulating hard enough to strike my friend in the face without noticing.  For the most part, I prefer to excise the distraction, so it's easier for both of us to focus on the heart of the dispute, but, even if you think that the controversial term is essential to your point, it's helpful to know it causes your friend pain, so you have the opportunity to salve it some other way.  


P.S. Arnold Kling's The Three Languages of Politics is a short read and a nice introduction to what political language you're using that sounds like horrible false cognates to people rooted in different ideologies.

P.P.S. I've cross-posted this on my usual blog, but am trying out cross-posting to Discussion sometimes.

[Meta] The Decline of Discussion: Now With Charts!

42 Gavin 04 June 2014 10:02PM

[Based on Alexandros's excellent dataset.]

I haven't done any statistical analysis, but looking at the charts I'm not sure it's necessary. The discussion section of LessWrong has been steadily declining in participation. My fairly messy spreadsheet is available if you want to check the data or do additional analysis.

Enough talk, you're here for the pretty pictures.

The number of posts has been steadily declining since 2011, though the trend over the last year is less clear. Note that I have excluded all posts with 0 or negative Karma from the dataset.


The total Karma given out each month has similarly been in decline.

Is it possible that there have been fewer posts, but of a higher quality?

No, at least under initial analysis the average Karma seems fairly steady. My prior here is that we're just seeing less visitors overall, which leads to fewer votes being distributed among fewer posts for the same average value. I would have expected the average karma to drop more than it did--to me that means that participation has dropped more steeply than mere visitation. Looking at the point values of the top posts would be helpful here, but I haven't done that analysis yet.

These are very disturbing to me, as someone who has found LessWrong both useful and enjoyable over the past few years. It raises several questions:


  1. What should the purpose of this site be? Is it supposed to be building a movement or filtering down the best knowledge?
  2. How can we encourage more participation?
  3. What are the costs of various means of encouraging participation--more arguing, more mindkilling, more repetition, more off-topic threads, etc?


Here are a few strategies that come to mind:

Idea A: Accept that LessWrong has fulfilled its purpose and should be left to fade away, or allowed to serve as a meetup coordinator and repository of the highest quality articles. My suspicion is that without strong new content and an online community, the strength of the individual meetup communities may wane as fewer new people join them. This is less of an issue for established communities like Berkeley and New York, but more marginal ones may disappear.

Idea B: Allow and encourage submission of rationalism, artificial intelligence, transhumanism etc related articles from elsewhere, possibly as a separate category. This is how a site like Hacker News stays high engagement, even though many of the discussions are endless loops of the same discussion. It can be annoying for the old-timers, but new generations may need to discover things for themselves. Sometimes "put it all in one big FAQ" isn't the most efficient method of teaching.

Idea C: Allow and encourage posts on "political" topics in Discussion (but probably NOT Main). The dangers here might be mitigated by a ban on discussion of current politicians, governments, and issues. "Historians need to have had a decade to mull it over before you're allowed to introduce it as evidence" could be a good heuristic. Another option would be a ban on specific topics that cause the worst mindkilling. Obviously this is overall a dangerous road.

Idea D: Get rid of Open Threads and create a new norm that a discussion post as short as a couple sentences is acceptable. Open threads get stagnant within a day or two, and are harder to navigate than the discussion page. Moving discussion from the Open Threads to the Discussion section would increase participation if users could be convinced thatit was okay to post questions and partly-formed ideas there.

The challenge with any of these ideas is that they will require strong moderation. 

At any rate, this data is enough to convince me that some sort of change is going to be needed in order to put the community on a growth trajectory. That is not necessarily the goal, but at its core LessWrong seems like it has the potential to be a powerful tool for the spreading of rational thought. We just need to figure out how to get it started into its next evolution.

Downvote stalkers: Driving members away from the LessWrong community?

39 Ander 02 July 2014 12:40AM

Last month I saw this post: addressing whether the discussion on LessWrong was in decline.  As a relatively new user who had only just started to post comments, my reaction was: “I hope that LessWrong isn’t in decline, because the sequences are amazing, and I really like this community.  I should try to write a couple articles myself and post them!  Maybe I could do an analysis/summary of certain sequences posts, and discuss how they had helped me to change my mind”.   I started working on writing an article.

Then I logged into LessWrong and saw that my Karma value was roughly half of what it had been the day before.   Previously I hadn’t really cared much about Karma, aside from whatever micro-utilons of happiness it provided to see that the number slowly grew because people generally liked my comments.   Or at least, I thought I didn’t really care, until my lizard brain reflexes reacted to what it perceived as an assault on my person.


Had I posted something terrible and unpopular that had been massively downvoted during the several days since my previous login?  No, in fact my ‘past 30 days’ Karma was still positive.  Rather, it appeared that everything I had ever posted to LessWrong now had a -1 on it instead of a 0. Of course, my loss probably pales in comparison to that of other, more prolific posters who I have seen report this behavior.

So what controversial subject must I have commented on in order to trigger this assault?  Well, let’s see, in the past week  I had asked if anyone had any opinions of good software engineer interview questions I could ask a candidate.  I posted in that I was happy to not have children, and finally, here in what appears to me to be by far the most promising candidate:  I replied to a comment about global warming data, stating that I routinely saw headlines about data supporting global warming. 


Here is our scenario: A new user is attempting to participate on a message board that values empiricism and rationality, posted that evidence supports that climate change is real.  (Wow, really rocking the boat here!)    Then, apparently in an effort to ‘win’ this discussion by silencing opposition, someone went and downvoted every comment this user had ever made on the site.   Apparently they would like to see LessWrong be a bastion of empiricism and rationality and [i]climate change denial[/i] instead? And the way to achieve this is not to have a fair and rational discussion of the existing empirical data, but rather to simply Karmassassinate anyone who would oppose them?


Here is my hypothesis: The continuing problem of karma downvote stalkers is contributing to the decline of discussion on the site.    I definitely feel much less motivated to try and contribute anything now, and I have been told by multiple other people at LessWrong meetings things such as “I used to post a lot on LessWrong, but then I posted X, and got mass downvoted, so now I only comment on Yvain’s blog”.  These anecdotes are, of course, only very weak evidence to support my claim.  I wish I could provide more, but I will have to defer to any readers who can supply more.


Perhaps this post will simply trigger more retribution, or maybe it will trigger an outswelling of support, or perhaps just be dismissed by people saying I should’ve posted it to the weekly discussion thread instead.   Whatever the outcome, rather than meekly leaving LessWrong and letting my 'stalker' win, I decided to open a discussion about the issue.  Thank you!

Against utility functions

38 Qiaochu_Yuan 19 June 2014 05:56AM

I think we should stop talking about utility functions.

In the context of ethics for humans, anyway. In practice I find utility functions to be, at best, an occasionally useful metaphor for discussions about ethics but, at worst, an idea that some people start taking too seriously and which actively makes them worse at reasoning about ethics. To the extent that we care about causing people to become better at reasoning about ethics, it seems like we ought to be able to do better than this.

The funny part is that the failure mode I worry the most about is already an entrenched part of the Sequences: it's fake utility functions. The soft failure is people who think they know what their utility function is and say bizarre things about what this implies that they, or perhaps all people, ought to do. The hard failure is people who think they know what their utility function is and then do bizarre things. I hope the hard failure is not very common. 

It seems worth reflecting on the fact that the point of the foundational LW material discussing utility functions was to make people better at reasoning about AI behavior and not about human behavior. 

Mathematics as a lossy compression algorithm gone wild

35 shminux 06 June 2014 11:53PM

This is yet another half-baked post from my old draft collection, but feel free to Crocker away.


There is an old adage from Eugene Wigner known as the "Unreasonable Effectiveness of Mathematics". Wikipedia:

the mathematical structure of a physical theory often points the way to further advances in that theory and even to empirical predictions.

The way I interpret is that it is possible to find an algorithm to compress a set of data points in a way that is also good at predicting other data points, not yet observed. In yet other words, a good approximation is, for some reason, sometimes also a good extrapolation. The rest of this post elaborates on this anti-Platonic point of view.

Now, this point of view is not exactly how most people see math. They imagine it as some near-magical thing that transcends science and reality and, when discovered, learned and used properly, gives one limited powers of clairvoyance. While only the select few wizard have the power to discover new spells (they are known as scientists), the rank and file can still use some of the incantations to make otherwise impossible things to happen (they are known as engineers). 

This metaphysical view is colorfully expressed by Stephen Hawking:

What is it that breathes fire into the equations and makes a universe for them to describe? The usual approach of science of constructing a mathematical model cannot answer the questions of why there should be a universe for the model to describe. Why does the universe go to all the bother of existing?

Should one interpret this as if he presumes here that math, in the form of "the equations" comes first and only then there is a physical universe for math to describe, for some values of "first" and "then", anyway? Platonism seems to reach roughly the same conclusions:

Wikipedia defines platonism as

the philosophy that affirms the existence of abstract objects, which are asserted to "exist" in a "third realm distinct both from the sensible external world and from the internal world of consciousness, and is the opposite of nominalism

In other words, math would have "existed" even if there were no humans around to discover it. In this sense, it is "real", as opposed to "imagined by humans". Wikipedia on mathematical realism:

mathematical entities exist independently of the human mind. Thus humans do not invent mathematics, but rather discover it, and any other intelligent beings in the universe would presumably do the same. In this point of view, there is really one sort of mathematics that can be discovered: triangles, for example, are real entities, not the creations of the human mind.

Of course, the debate on whether mathematics is "invented" or "discovered" is very old. Eliezer-2008 chimes in in

To say that human beings "invented numbers" - or invented the structure implicit in numbers - seems like claiming that Neil Armstrong hand-crafted the Moon.  The universe existed before there were any sentient beings to observe it, which implies that physics preceded physicists. 

and later:

The amazing thing is that math is a game without a designer, and yet it is eminently playable.

In the above, I assume that what Eliezer means by physics is not the science of physics (a human endeavor), but the laws according to which our universe came into existence and evolved. These laws are not the universe itself (which would make the statement "physics preceded physicists" simply "the universe preceded physicists", a vacuous tautology), but some separate laws governing it, out there to be discovered. If only we knew them all, we could create a copy of the universe from scratch, if not "for real", then at least as a faithful model. This universe-making recipe is then what physics (the laws, not science) is.

And these laws apparently require mathematics to be properly expressed, so mathematics must "exist" in order for the laws of physics to exist.

Is this the only way to think of math? I don't think so. Let us suppose that the physical universe is the only "real" thing, none of those Platonic abstract objects. Let is further suppose that this universe is (somewhat) predictable. Now, what does it mean for the universe to be predictable to begin with? Predictable by whom or by what? Here is one approach to predictability, based on agency: a small part of the universe (you, the agent) can construct/contain a model of some larger part of the universe (say, the earth-sun system, including you) and optimize its own actions (to, say, wake up the next morning just as the sun rises). 

Does waking up on time count as doing math? Certainly not by the conventional definition of math. Do migratory birds do math when they migrate thousands of miles twice a year, successfully predicting that there would be food sources and warm weather once they get to their destination? Certainly not by the conventional definition of math. Now, suppose a ship captain lays a course to follow the birds, using maps and tables and calculations? Does this count as doing math? Why, certainly the captain would say so, even if the math in question is relatively simple. Sometimes the inputs both the birds and the humans are using are the same: sun and star positions at various times of the day and night, the magnetic field direction, the shape of the terrain.

What is the difference between what the birds are doing and what humans are doing? Certainly both make predictions about the universe and act on them. Only birds do this instinctively and humans consciously, by "applying math". But this is a statement about the differences in cognition, not about some Platonic mathematical objects. One can even say that birds perform the relevant math instinctively. But this is a rather slippery slope. By this definition amoebas solve the diffusion equation when they move along the sugar gradient toward a food source. While this view has merits, the mathematicians analyzing certain aspects of the Navier-Stokes equation might not take kindly being compared to a protozoa. 

So, like JPEG is a lossy image compression algorithm of the part of the universe which creates an image on our retina when we look at a picture, the collection of the Newton's laws is a lossy compression algorithm which describes how a thrown rock falls to the ground, or how planets go around the Sun. in both cases we, a tiny part of the universe, are able to model and predict a much larger part, albeit with some loss of accuracy.

What would it mean then for a Universe to not "run on math"? In this approach it means that in such a universe no subsystem can contain a model, no matter how coarse, of a larger system. In other words, such a universe is completely unpredictable from the inside. Such a universe cannot contain agents, intelligence or even the simplest life forms. 

Now, to the "gone wild" part of the title. This is where the traditional applied math, like counting sheep, or calculating how many cannons you can arm a ship with before it sinks, or how to predict/cause/exploit the stock market fluctuations, becomes "pure math", or math for math's sake, be it proving the Pythagorean theorem or solving a Millennium Prize problem. At this point the mathematician is no longer interested in modeling a larger part of the universe (except insofar as she predicts that it would be a fun thing to do for her, which is probably not very mathematical).

Now, there is at least one serious objection to this "math is jpg" epistemology. It goes as follows: "in any universe, no matter how convoluted, 1+1=2, so clearly mathematics transcends the specific structure of a single universe". I am skeptical of this logic, since to me 1,+,= and 2 are semi-intuitive models running in our minds, which evolved to model the universe we live in. I can certainly imagine a universe where none of these concepts would be useful in predicting anything, and so they would never evolve in the "mind" of whatever entity inhabits it. To me mathematical concepts are no more universal than moral concepts: sometimes they crystallize into useful models, and sometimes they do not. Like the human concept of honor would not be useful to spiders, the concept of numbers (which probably is useful to spiders) would not be useful in a universe where size is not a well-defined concept (like something based on a Conformal Field Theory).

So the "Unreasonable Effectiveness of Mathematics" is not at all unreasonable: it reflects the predictability of our universe. Nothing "breathes fire into the equations and makes a universe for them to describe", the equations are but one way a small part of the universe predicts the salient features of a larger part of it. Rather, an interesting question is what features of a predictable universe enable agents to appear in it, and how complex and powerful can these agents get.


LW client-side comment improvements

34 Bakkot 07 August 2014 08:40PM

All of these things I mentioned in the most recent open thread, but since the first one is directly relevant and the comment where I posted it somewhat hard to come across, I figured I'd make a post too.


Custom Comment Highlights

NOTE FOR FIREFOX USERS: this contained a bug which has been squashed, causing the list of comments not to be automatically populated (depending on your version of Firefox). I suggest reinstalling. Sorry, no automatic updates unless you use the Chrome extension (though with >50% probability there will be no further updates).

You know how the highlight for new comments on Less Wrong threads disappears if you reload the page, making it difficult to find those comments again? Here is a userscript you can install to fix that (provided you're on Firefox or Chrome). Once installed, you can set the date after which comments are highlighted, and easily scroll to new comments. See screenshots. Installation is straightforward (especially for Chrome, since I made an extension as well).

Bonus: works even if you're logged out or don't have an account, though you'll have to set the highlight time manually.

Delay Before Commenting

Another script to add a delay and checkbox reading "In posting this, I am making a good-faith contribution to the collective search for truth." before allowing you to comment. Made in response to a comment by army1987.

Slate Star Codex Comment Highlighter

Edit: You no longer need to install this, since Scott's added it to his blog. Unless you want the little numbers in the title bar.

Yet another script, to make finding recent comments over at Slate Star Codex a lot easier. Also comes in Chrome extension flavor. See screenshots. Not directly relevant to Less Wrong, but there's a lot of overlap in readership, so you may be interested.

Note for LW Admins / Yvain
These would be straightforward to make available to all users (on sufficiently modern browsers), since they're just a bit of Javascript getting injected. If you'd like to, feel free, and message me if I can be of help.

This is why we can't have social science

34 Costanza 13 July 2014 09:04PM

Jason Mitchell is [edit: has been] the John L. Loeb Associate Professor of the Social Sciences at Harvard. He has won the National Academy of Science's Troland Award as well as the Association for Psychological Science's Janet Taylor Spence Award for Transformative Early Career Contribution.

Here, he argues against the principle of replicability of experiments in science. Apparently, it's disrespectful, and presumptively wrong.

Recent hand-wringing over failed replications in social psychology is largely pointless, because unsuccessful experiments have no meaningful scientific value.

Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator bungled something along the way. Unless direct replications are conducted by flawless experimenters, nothing interesting can be learned from them.

Three standard rejoinders to this critique are considered and rejected. Despite claims to the contrary, failed replications do not provide meaningful information if they closely follow original methodology; they do not necessarily identify effects that may be too small or flimsy to be worth studying; and they cannot contribute to a cumulative understanding of scientific phenomena.

Replication efforts appear to reflect strong prior expectations that published findings are not reliable, and as such, do not constitute scientific output.

The field of social psychology can be improved, but not by the publication of negative findings. Experimenters should be encouraged to restrict their “degrees of freedom,” for example, by specifying designs in advance.

Whether they mean to or not, authors and editors of failed replications are publicly impugning the scientific integrity of their colleagues. Targets of failed replications are justifiably upset, particularly given the inadequate basis for replicators’ extraordinary claims.

This is why we can't have social science. Not because the subject is not amenable to the scientific method -- it obviously is. People are conducting controlled experiments and other people are attempting to replicate the results. So far, so good. Rather, the problem is that at least one celebrated authority in the field hates that, and would prefer much, much more deference to authority.

[LINK] Claustrum Stimulation Temporarily Turns Off Consciousness in an otherwise Awake Patient

34 shminux 04 July 2014 08:00PM

This paper, or more often the New Scientist's exposition of it is being discussed online and is rather topical here. In a nutshell, stimulating one small but central area of the brain reversibly rendered one epilepsia patient unconscious without disrupting wakefulness. Impressively, this phenomenon has apparently been hypothesized before, just never tested (because it's hard and usually unethical). A quote from the New Scientist article (emphasis mine):

One electrode was positioned next to the claustrum, an area that had never been stimulated before.

When the team zapped the area with high frequency electrical impulses, the woman lost consciousness. She stopped reading and stared blankly into space, she didn't respond to auditory or visual commands and her breathing slowed. As soon as the stimulation stopped, she immediately regained consciousness with no memory of the event. The same thing happened every time the area was stimulated during two days of experiments (Epilepsy and Behavior,

To confirm that they were affecting the woman's consciousness rather than just her ability to speak or move, the team asked her to repeat the word "house" or snap her fingers before the stimulation began. If the stimulation was disrupting a brain region responsible for movement or language she would have stopped moving or talking almost immediately. Instead, she gradually spoke more quietly or moved less and less until she drifted into unconsciousness. Since there was no sign of epileptic brain activity during or after the stimulation, the team is sure that it wasn't a side effect of a seizure.

If confirmed, this hints at several interesting points. For example, a complex enough brain is not sufficient for consciousness, a sort-of command and control structure is required, as well, even if relatively small. A low-consciousness state of late-stage dementia sufferers might be due to the damage specifically to the claustrum area, not just the overall brain deterioration. The researchers speculates that stimulating the area in vegetative-state patients might help "push them out of this state". From an AI research perspective, understanding the difference between wakefulness and consciousness might be interesting, too.


[LINK] Speed superintelligence?

33 Stuart_Armstrong 14 August 2014 03:57PM

From Toby Ord:

Tool assisted speedruns (TAS) are when people take a game and play it frame by frame, effectively providing super reflexes and forethought, where they can spend a day deciding what to do in the next 1/60th of a second if they wish. There are some very extreme examples of this, showing what can be done if you really play a game perfectly. For example, this video shows how to winSuper Mario Bros 3 in 11 minutes. It shows how different optimal play can be from normal play. In particular, on level 8-1, it gains 90 extra lives by a sequence of amazing jumps.

Other TAS runs get more involved and start exploiting subtle glitches in the game. For example, this page talks about speed running NetHack, using a lot of normal tricks, as well as luck manipulation (exploiting the RNG) and exploiting a dangling pointer bug to rewrite parts of memory.

Though there are limits to what AIs could do with sheer speed, it's interesting that great performance can be achieved with speed alone, that this allows different strategies from usual ones, and that it allows the exploitation of otherwise unexploitable glitches and bugs in the setup.

What resources have increasing marginal utility?

33 Qiaochu_Yuan 14 June 2014 03:43AM

Most resources you might think to amass have decreasing marginal utility: for example, a marginal extra $1,000 means much more to you if you have $0 than if you have $100,000. That means you can safely apply the 80-20 rule to most resources: you only need to get some of the resource to get most of the benefits of having it.

At the most recent CFAR workshop, Val dedicated a class to arguing that one resource in particular has increasing marginal utility, namely attention. Initially, efforts to free up your attention have little effect: the difference between juggling 10 things and 9 things is pretty small. But once you've freed up most of your attention, the effect is larger: the difference between juggling 2 things and 1 thing is huge. Val also argued that because of this funny property of attention, most people likely undervalue the value of freeing up attention by orders of magnitude.

During a conversation later in the workshop I suggested another resource that might have increasing marginal utility, namely trust. A society where people abide by contracts 80% of the time is not 80% as good as a society where people abide by contracts 100% of the time; most of the societal value of trust (e.g. decreasing transaction costs) doesn't seem to manifest until people are pretty close to 100% trustworthy. The analogous way to undervalue trust is to argue that e.g. cheating on your spouse is not so bad, because only one person gets hurt. But cheating on spouses in general undermines the trust that spouses should have in each other, and the cumulative impact of even 1% of spouses cheating on the institution of marriage as a whole could be quite negative. (Lots of things about the world make more sense from this perspective: for example, it seems like one of the main practical benefits of religion is that it fosters trust.) 

What other resources have increasing marginal utility? How undervalued are they? 

Six Plausible Meta-Ethical Alternatives

32 Wei_Dai 06 August 2014 12:04AM

In this post, I list six metaethical possibilities that I think are plausible, along with some arguments or plausible stories about how/why they might be true, where that's not obvious. A lot of people seem fairly certain in their metaethical views, but I'm not and I want to convey my uncertainty as well as some of the reasons for it.

  1. Most intelligent beings in the multiverse share similar preferences. This came about because there are facts about what preferences one should have, just like there exist facts about what decision theory one should use or what prior one should have, and species that manage to build intergalactic civilizations (or the equivalent in other universes) tend to discover all of these facts. There are occasional paperclip maximizers that arise, but they are a relatively minor presence or tend to be taken over by more sophisticated minds.
  2. Facts about what everyone should value exist, and most intelligent beings have a part of their mind that can discover moral facts and find them motivating, but those parts don't have full control over their actions. These beings eventually build or become rational agents with values that represent compromises between different parts of their minds, so most intelligent beings end up having shared moral values along with idiosyncratic values.
  3. There aren't facts about what everyone should value, but there are facts about how to translate non-preferences (e.g., emotions, drives, fuzzy moral intuitions, circular preferences, non-consequentialist values, etc.) into preferences. These facts may include, for example, what is the right way to deal with ontological crises. The existence of such facts seems plausible because if there were facts about what is rational (which seems likely) but no facts about how to become rational, that would seem like a strange state of affairs.
  4. None of the above facts exist, so the only way to become or build a rational agent is to just think about what preferences you want your future self or your agent to hold, until you make up your mind in some way that depends on your psychology. But at least this process of reflection is convergent at the individual level so each person can reasonably call the preferences that they endorse after reaching reflective equilibrium their morality or real values.
  5. None of the above facts exist, and reflecting on what one wants turns out to be a divergent process (e.g., it's highly sensitive to initial conditions, like whether or not you drank a cup of coffee before you started, or to the order in which you happen to encounter philosophical arguments). There are still facts about rationality, so at least agents that are already rational can call their utility functions (or the equivalent of utility functions in whatever decision theory ends up being the right one) their real values.
  6. There aren't any normative facts at all, including facts about what is rational. For example, it turns out there is no one decision theory that does better than every other decision theory in every situation, and there is no obvious or widely-agreed-upon way to determine which one "wins" overall.

(Note that for the purposes of this post, I'm concentrating on morality in the axiological sense (what one should value) rather than in the sense of cooperation and compromise. So alternative 1, for example, is not intended to include the possibility that most intelligent beings end up merging their preferences through some kind of grand acausal bargain.)

It may be useful to classify these possibilities using labels from academic philosophy. Here's my attempt: 1. realist + internalist 2. realist + externalist 3. relativist 4. subjectivist 5. moral anti-realist 6. normative anti-realist. (A lot of debates in metaethics concern the meaning of ordinary moral language, for example whether they refer to facts or merely express attitudes. I mostly ignore such debates in the above list, because it's not clear what implications they have for the questions that I care about.)

One question LWers may have is, where does Eliezer's metathics fall into this schema? Eliezer says that there are moral facts about what values every intelligence in the multiverse should have, but only humans are likely to discover these facts and be motivated by them. To me, Eliezer's use of language is counterintuitive, and since it seems plausible that there are facts about what everyone should value (or how each person should translate their non-preferences into preferences) that most intelligent beings can discover and be at least somewhat motivated by, I'm reserving the phrase "moral facts" for these. In my language, I think 3 or maybe 4 is probably closest to Eliezer's position.

Look for the Next Tech Gold Rush?

32 Wei_Dai 19 July 2014 10:08AM

In early 2000, I registered my personal domain name, along with a couple others, because I was worried that the small (sole-proprietor) ISP I was using would go out of business one day and break all the links on the web to the articles and software that I had published on my "home page" under its domain. Several years ago I started getting offers, asking me to sell the domain, and now they're coming in almost every day. A couple of days ago I saw the first six figure offer ($100,000).

In early 2009, someone named Satoshi Nakamoto emailed me personally with an announcement that he had published version 0.1 of Bitcoin. I didn't pay much attention at the time (I was more interested in Less Wrong than Cypherpunks at that point), but then in early 2011 I saw a LW article about Bitcoin, which prompted me to start mining it. I wrote at the time, "thanks to the discussion you started, I bought a Radeon 5870 and started mining myself, since it looks likely that I can at least break even on the cost of the card." That approximately $200 investment (plus maybe another $100 in electricity) is also worth around six figures today.

Clearly, technological advances can sometimes create gold rush-like situations (i.e., first-come-first-serve opportunities to make truly extraordinary returns with minimal effort or qualifications). And it's possible to stumble into them without even trying. Which makes me think, maybe we should be trying? I mean, if only I had been looking for possible gold rushes, I could have registered a hundred domain names optimized for potential future value, rather than the few that I happened to personally need. Or I could have started mining Bitcoins a couple of years earlier and be a thousand times richer.

I wish I was already an experienced gold rush spotter, so I could explain how best to do it, but as indicated above, I participated in the ones that I did more or less by luck. Perhaps the first step is just to keep one's eyes open, and to keep in mind that tech-related gold rushes do happen from time to time and they are not impossibly difficult to find. What other ideas do people have? Are there other past examples of tech gold rushes besides the two that I mentioned? What might be some promising fields to look for them in the future?

Flowers for Algernon

32 Anatoly_Vorobey 18 June 2014 09:16AM

Daniel Keyes, the author of the short story Flowers for Algernon, and a novel of the same title that is its expanded version, died three days ago.

Keyes wrote many other books in the last half-century, but none achieved nearly as much prominence as the original short story (published in 1959) or the novel (came out in 1966). 

It's probable that many or even most regulars here at Less Wrong read Flowers for Algernon: it's a very famous SF story, it's about enhanced intelligence, and it's been a middle/high school literature class staple in the US. But most != all, and past experience showed me that assumptions of cultural affinity are very frequently wrong. So in case you haven't read the story, I'd like to invite you explicitly to do so. It's rather short, and available at this link:

Flowers for Algernon

(I was surprised to find out that the original story is not available on Amazon. The expanded novelization is. If you wonder which version is better to read, I have no advice to offer)

(I will edit this post in a week or so to remove the link to the story and this remark)


Hal Finney has just died.

30 cousin_it 28 August 2014 07:39PM

Change Contexts to Improve Arguments

30 palladias 08 July 2014 03:51PM

On a recent trip to Ireland, I gave a talk on tactics for having better arguments (video here).  There's plenty in the video that's been discussed on LW before (Ideological Turing Tests and other reframes), but I thought I'd highlight one other class of trick I use to have more fruitful disagreements.

It's hard, in the middle of a fight, to remember, recognize, and defuse common biases, rhetorical tricks, emotional triggers, etc.  I'd rather cheat than solve a hard problem, so I put a lot of effort into shifting disagreements into environments where it's easier for me and my opposite-number to reason and argue well, instead of relying on willpower.  Here's a recent example of the kind of shift I like to make:

A couple months ago, a group of my friends were fighting about the Brendan Eich resignation on facebook. The posts were showing up fast; everyone was, presumably, on the edge of their seats, fueled by adrenaline, and alone at their various computers. It’s a hard place to have a charitable, thoughtful debate.

I asked my friends (since they were mostly DC based) if they’d be amenable to pausing the conversation and picking it up in person.  I wanted to make the conversation happen in person, not in front of an audience, and in a format that let people speak for longer and ask questions more easily. If so, I promised to bake cookies for the ultimate donnybrook.  

My friends probably figured that I offered cookies as a bribe to get everyone to change venues, and they were partially right. But my cookies had another strategic purpose. When everyone arrived, I was still in the process of taking the cookies out of the oven, so I had to recruit everyone to help me out.

“Alice, can you pour milk for people?”

“Bob, could you pass out napkins?”

“Eve, can you greet people at the door while I’m stuck in the kitchen with potholders on?”

Before we could start arguing, people on both sides of the debate were working on taking care of each other and asking each others’ help. Then, once the logistics were set, we all broke bread (sorta) with each other and had a shared, pleasurable experience. Then we laid into each other.

Sharing a communal experience of mutual service didn’t make anyone pull their intellectual punches, but I think it made us more patient with each other and less anxiously fixated on defending ourselves. Sharing food and seating helped remind us of the relationships we enjoyed with each other, and why we cared about probing the ideas of this particular group of people.

I prefer to fight with people I respect, who I expect will fight in good faith.  It's hard to remember that's what I'm doing if I argue with them in the same forums (comment threads, fb, etc) that I usually see bad fights.  An environment shift and other compensatory gestures makes it easier to leave habituated errors and fears at the door.


Crossposted/adapted from my blog.

Fighting Biases and Bad Habits like Boggarts

29 palladias 21 August 2014 05:07PM

TL;DR: Building humor into your habits for spotting and correcting errors makes the fix more enjoyable, easier to talk about and receive social support, and limits the danger of a contempt spiral. 


One of the most reliably bad decisions I've made on a regular basis is the choice to stay awake (well, "awake") and on the internet past the point where I can get work done, or even have much fun.  I went through a spell where I even fell asleep on the couch more nights than not, unable to muster the will or judgement to get up and go downstairs to bed.

I could remember (even sometimes in the moment) that this was a bad pattern, but, the more tired I was, the more tempting it was to think that I should just buckle down and apply more willpower to be more awake and get more out of my computer time.  Going to bed was a solution, but it was hard for it not to feel (to my sleepy brain and my normal one) like a bit of a cop out.

Only two things helped me really keep this failure mode in check.  One was setting a hard bedtime (and beeminding it) as part of my sacrifice for Advent.   But the other key tool (which has lasted me long past Advent) is the gif below.

sleep eating ice cream

The poor kid struggling to eat his ice cream cone, even in the face of his exhaustion, is hilarious.  And not too far off the portrait of me around 2am scrolling through my Feedly.

Thinking about how stupid or ineffective or insufficiently strong-willed I'm being makes it hard for me to do anything that feels like a retreat from my current course of action.  I want to master the situation and prove I'm stronger.  But catching on to the fact that my current situation (of my own making or not) is ridiculous, makes it easier to laugh, shrug, and move on.

I think the difference is that it's easy for me to feel contemptuous of myself when frustrated, and easy to feel fond when amused.

I've tried to strike the new emotional tone when I'm working on catching and correcting other errors.  (e.g "Stupid, you should have known to leave more time to make the appointment!  Planning fallacy!"  becomes "Heh, I guess you thought that adding two "trivially short" errands was a closed set, and must remain 'trivially short.'  That's a pretty silly error.")

In the first case, noticing and correcting an error feels punitive, since it's quickly followed by a hefty dose of flagellation, but the second comes with a quick laugh and a easier shift to a growth mindset framing.  Funny stories about errors are also easier to tell, increasing the chance my friends can help catch me out next time, or that I'll be better at spotting the error just by keeping it fresh in my memory. Not to mention, in order to get the joke, I tend to look for a more specific cause of the error than stupid/lazy/etc.

As far as I can tell, it also helps that amusement is a pretty different feeling than the ones that tend to be active when I'm falling into error (frustration, anger, feeling trapped, impatience, etc).  So, for a couple of seconds at least, I'm out of the rut and now need to actively return to it to stay stuck. 

In the heat of the moment of anger/akrasia/etc is a bad time to figure out what's funny, but, if you're reflecting on your errors after the fact, in a moment of consolation, it's easier to go back armed with a helpful reframing, ready to cast Riddikulus!


Crossposted from my personal blog, Unequally Yoked.

Quantified Risks of Gay Male Sex

29 pianoforte611 18 August 2014 11:55PM

If you are a gay male then you’ve probably worried at one point about sexually transmitted diseases. Indeed men who have sex with men have some of the highest prevalence of many of these diseases. And if you’re not a gay male, you’ve probably still thought about STDs at one point. But how much should you worry? There are many organizations and resources that will tell you to wear a condom, but very few will tell you the relative risks of wearing a condom vs not. I’d like to provide a concise summary of the risks associated with gay male sex and the extent to which these risks can be reduced. (See Mark Manson’s guide for a similar resources for heterosexual sex.). I will do so by first giving some information about each disease, including its prevalence among gay men. Most of this data will come from the US, but the US actually has an unusually high prevalence for many diseases. Certainly HIV is much less common in many parts of Europe. I will end with a case study of HIV, which will include an analysis of the probabilities of transmission broken down by the nature of sex act and a discussion of risk reduction techniques.

When dealing with risks associated with sex, there are few relevant parameters. The most common is the prevalence – the proportion of people in the population that have the disease. Since you can only get a disease from someone who has it, the prevalence is arguably the most important statistic. There are two more relevant statistics – the per act infectivity (the chance of contracting the disease after having sex once) and the per partner infectivity (the chance of contracting the disease after having sex with one partner for the duration of the relationship). As it turns out the latter two probabilities are very difficult to calculate. I only obtained those values for for HIV. It is especially difficult to determine per act risks for specific types of sex acts since many MSM engage in a variety of acts with multiple partners. Nevertheless estimates do exist and will explored in detail in the HIV case study section.


Prevalence: Between 13 - 28%. My guess is about 13%.

The most infamous of the STDs. There is no cure but it can be managed with anti-retroviral therapy. A commonly reported statistic is that 19% of MSM (men who have sex with men) in the US are HIV positive (1). For black MSM, this number was 28% and for white MSM this number was 16%. This is likely an overestimate, however, since the sample used was gay men who frequent bars and clubs. My estimate of 13% comes from CDC's total HIV prevalence in gay men of 590,000 (2) and their data suggesting that MSM comprise 2.9% of men in the US (3).



Prevalence: Between 9% and 15% in the US

This disease affects the throat and the genitals but it is treatable with antibiotics. The CDC estimates 15.5% prevalence (4). However, this is likely an overestimate since the sample used was gay men in health clinics. Another sample (in San Francisco health clinics) had a pharyngeal gonorrhea prevalence of 9% (5).



Prevalence: 0.825% in the US

 My estimate was calculated in the same manner as my estimate for HIV. I used the CDC's data (6). Syphilis is transmittable by oral and anal sex (7) and causes genital sores that may look harmless at first (8). Syphilis is curable with penicillin however the presence of sores increases the infectivity of HIV.


Herpes (HSV-1 and HSV-2)

Prevalence: HSV-2 - 18.4% (9); HSV-1 - ~75% based on Australian data  (10)

This disease is mostly asymptomatic and can be transmitted through oral or anal sex. Sometimes sores will appear and they will usually go away with time. For the same reason as syphilis, herpes can increase the chance of transmitting HIV. The estimate for HSV-1 is probably too high. Snowball sampling was used and most of the men recruited were heavily involved in organizations for gay men and were sexually active in the past 6 months. Also half of them reported unprotected anal sex in the past six months. The HSV-2 sample came from a random sample of US households (11).



Prevalence: Rectal - 0.5% - 2.3% ; Pharyngeal - 3.0 - 10.5% (12)

 Like herpes, it is often asymptomatic - perhaps as low as 10% of infected men report symptoms. It is curable with antibiotics.



Prevalence: 47.2% (13)

 This disease is incurable (though a vaccine exists for men and women) but usually asymptomatic. It is capable of causing cancers of the penis, throat and anus. Oddly there are no common tests for HPV in part because there are many strains (over 100) most of which are relatively harmless. Sometimes it goes away on its own (14). The prevalence rate was oddly difficult to find, the number I cited came from a sample of men from Brazil, Mexico and the US.


Case Study of HIV transmission; risks and strategies for reducing risk

 IMPORTANT: None of the following figures should be generalized to other diseases. Many of these numbers are not even the same order of magnitude as the numbers for other diseases. For example, HIV is especially difficult to transmit via oral sex, but Herpes can very easily be transmitted.

Unprotected Oral Sex per-act risk (with a positive partner or partner of unknown serostatus):

Non-zero but very small. Best guess .03% without condom (15)

 Unprotected Anal sex per-act risk (with positive partner): 

Receptive: 0.82% - 1.4% (16) (17)

                          Insertive Circumcised: 0.11% (18)

         Insertive Uncircumcised: 0.62% (18)

 Protected Anal sex per-act risk (with positive partner):  

  Estimates range from 2 times lower to twenty times lower (16)  (19) and the risk is highly dependent on the slippage and   breakage rate.

Contracting HIV from oral sex is very rare. In one study, 67 men reported performing oral sex on at least one HIV positive partner and none were infected (20). However, transmission is possible (15). Because instances of oral transmission of HIV are so rare, the risk is hard to calculate so should be taken with a grain of salt. The number cited was obtained from a group of individuals that were either HIV positive or high risk for HIV. The per act-risk with a positive partner is therefore probably somewhat higher.

 Note that different HIV positive men have different levels of infectivity hence the wide range of values for per-act probability of transmission. Some men with high viral loads (the amount of HIV in the blood) may have an infectivity of greater than 10% per unprotected anal sex act (17).


Risk reducing strategies

 Choosing sex acts that have a lower transmission rate (oral sex, protected insertive anal sex, non-insertive) is one way to reduce risk. Monogamy, testing, antiretroviral therapy, PEP and PrEP are five other ways.


Testing Your partner/ Monogamy

 If your partner tests negative then they are very unlikely to have HIV. There is a 0.047% chance of being HIV positive if they tested negative using a blood test and a 0.29% chance of being HIV positive if they tested negative using an oral test. If they did further tests then the chance is even lower. (See the section after the next paragraph for how these numbers were calculated).

 So if your partner tests negative, the real danger is not the test giving an incorrect result. The danger is that your partner was exposed to HIV before the test, but his body had not started to make antibodies yet. Since this can take weeks or months, it is possible for your partner who tested negative to still have HIV even if you are both completely monogamous.


For tests, the sensitivity - the probability that an HIV positive person will test positive - is 99.68% for blood tests (21), 98.03% with oral tests. The specificity - the probability that an HIV negative person will test negative - is 99.74% for oral tests and 99.91% for blood tests. Hence the probability that a person who tested negative will actually be positive is:

 P(Positive | tested negative) = P(Positive)*(1-sensitivity)/(P(Negative)*specificity + P(Positive)*(1-sensitivity)) = 0.047% for blood test, 0.29% for oral test

 Where P(Positive) = Prevalence of HIV, I estimated this to be 13%.

 However, according to a writer for (22) - a doctor who works with HIV - there are often multiple tests which drive the sensitivity up to 99.997%.


Home Testing

Oraquick is an HIV test that you can purchase online and do yourself at home. It costs $39.99 for one kit. The sensitivity is 93.64%, the specificity is 99.87% (23). The probability that someone who tested negative will actually be HIV positive is 0.94%. - assuming a 13% prevalence for HIV. The same danger mentioned above applies - if the infection occurred recently the test would not detect it.


 Anti-Retroviral therapy

 Highly active anti-retroviral therapy (HAART), when successful, can reduce the viral load – the amount of HIV in the blood - to low or undetectable levels. Baggaley et. al (17) reports that in heterosexual couples, there have been some models relating viral load to infectivity. She applies these models to MSM and reports that the per-act risk for unprotected anal sex with a positive partner should be 0.061%. However, she notes that different models produce very different results thus this number should be taken with a grain of salt.


 Post-Exposure Prophylaxis (PEP)

 A last resort if you think you were exposed to HIV is to undergo post-exposure prophylaxis within 72 hours. Antiretroviral drugs are taken for about a month in the hopes of preventing the HIV from infecting any cells. In one case controlled study some health care workers who were exposed to HIV were given PEP and some were not, (this was not under the control of the experimenters). Workers that contracted HIV were less likely to have been given PEP with an odds ratio of 0.19 (24). I don’t know whether PEP is equally effective at mitigating risk from other sources of exposure.


 Pre-Exposure Prophylaxis (PrEP)

 This is a relatively new risk reduction strategy. Instead of taking anti-retroviral drugs after exposure, you take anti-retroviral drugs every day in order to prevent HIV infection. I could not find a per-act risk, but in a randomized controlled trial, MSM who took PrEP were less likely to become infected with HIV than men who did not (relative reduction  - 41%). The average number of sex partners was 18. For men who were more consistent and had a 90% adherence rate, the relative reduction was better - 73%. (25) (26).



























[link] Why Psychologists' Food Fight Matters

28 Pablo_Stafforini 01 August 2014 07:52AM

Why Psychologists’ Food Fight Matters: Important findings” haven’t been replicated, and science may have to change its ways. By Michelle N. Meyer and Christopher Chabris. Slate, July 31, 2014. [Via Steven Pinker's Twitter account, who adds: "Lesson for sci journalists: Stop reporting single studies, no matter how sexy (these are probably false). Report lit reviews, meta-analyses."]  Some excerpts:

Psychologists are up in arms over, of all things, the editorial process that led to the recent publication of a special issue of the journal Social Psychology. This may seem like a classic case of ivory tower navel gazing, but its impact extends far beyond academia. The issue attempts to replicate 27 “important findings in social psychology.” Replication—repeating an experiment as closely as possible to see whether you get the same results—is a cornerstone of the scientific method. Replication of experiments is vital not only because it can detect the rare cases of outright fraud, but also because it guards against uncritical acceptance of findings that were actually inadvertent false positives, helps researchers refine experimental techniques, and affirms the existence of new facts that scientific theories must be able to explain.

One of the articles in the special issue reported a failure to replicate a widely publicized 2008 study by Simone Schnall, now tenured at Cambridge University, and her colleagues. In the original study, two experiments measured the effects of people’s thoughts or feelings of cleanliness on the harshness of their moral judgments. In the first experiment, 40 undergraduates were asked to unscramble sentences, with one-half assigned words related to cleanliness (like pure or pristine) and one-half assigned neutral words. In the second experiment, 43 undergraduates watched the truly revolting bathroom scene from the movie Trainspotting, after which one-half were told to wash their hands while the other one-half were not. All subjects in both experiments were then asked to rate the moral wrongness of six hypothetical scenarios, such as falsifying one’s résumé and keeping money from a lost wallet. The researchers found that priming subjects to think about cleanliness had a “substantial” effect on moral judgment: The hand washers and those who unscrambled sentences related to cleanliness judged the scenarios to be less morally wrong than did the other subjects. The implication was that people who feel relatively pure themselves are—without realizing it—less troubled by others’ impurities. The paper was covered by ABC News, the Economist, and the Huffington Post, among other outlets, and has been cited nearly 200 times in the scientific literature.

However, the replicators—David Johnson, Felix Cheung, and Brent Donnellan (two graduate students and their adviser) of Michigan State University—found no such difference, despite testing about four times more subjects than the original studies. [...]

The editor in chief of Social Psychology later agreed to devote a follow-up print issue to responses by the original authors and rejoinders by the replicators, but as Schnall told Science, the entire process made her feel “like a criminal suspect who has no right to a defense and there is no way to win.” The Science article covering the special issue was titled “Replication Effort Provokes Praise—and ‘Bullying’ Charges.” Both there and in her blog post, Schnall said that her work had been “defamed,” endangering both her reputation and her ability to win grants. She feared that by the time her formal response was published, the conversation might have moved on, and her comments would get little attention.

How wrong she was. In countless tweets, Facebook comments, and blog posts, several social psychologists seized upon Schnall’s blog post as a cri de coeur against the rising influence of “replication bullies,” “false positive police,” and “data detectives.” For “speaking truth to power,” Schnall was compared to Rosa Parks. The “replication police” were described as “shameless little bullies,” “self-righteous, self-appointed sheriffs” engaged in a process “clearly not designed to find truth,” “second stringers” who were incapable of making novel contributions of their own to the literature, and—most succinctly—“assholes.” Meanwhile, other commenters stated or strongly implied that Schnall and other original authors whose work fails to replicate had used questionable research practices to achieve sexy, publishable findings. At one point, these insinuations were met with threats of legal action. [...]

Unfortunately, published replications have been distressingly rare in psychology. A 2012 survey of the top 100 psychology journals found that barely 1 percent of papers published since 1900 were purely attempts to reproduce previous findings. Some of the most prestigious journals have maintained explicit policies against replication efforts; for example, the Journal of Personality and Social Psychology published a paper purporting to support the existence of ESP-like “precognition,” but would not publish papers that failed to replicate that (or any other) discovery. Science publishes “technical comments” on its own articles, but only if they are submitted within three months of the original publication, which leaves little time to conduct and document a replication attempt.

The “replication crisis” is not at all unique to social psychology, to psychological science, or even to the social sciences. As Stanford epidemiologist John Ioannidis famously argued almost a decade ago, “Most research findings are false for most research designs and for most fields.” Failures to replicate and other major flaws in published research have since been noted throughout science, including in cancer research, research into the genetics of complex diseases like obesity and heart disease, stem cell research, and studies of the origins of the universe. Earlier this year, the National Institutes of Health stated “The complex system for ensuring the reproducibility of biomedical research is failing and is in need of restructuring.”

Given the stakes involved and its centrality to the scientific method, it may seem perplexing that replication is the exception rather than the rule. The reasons why are varied, but most come down to the perverse incentives driving research. Scientific journals typically view “positive” findings that announce a novel relationship or support a theoretical claim as more interesting than “negative” findings that say that things are unrelated or that a theory is not supported. The more surprising the positive finding, the better, even though surprising findings are statistically less likely to be accurate. Since journal publications are valuable academic currency, researchers—especially those early in their careers—have strong incentives to conduct original work rather than to replicate the findings of others. Replication efforts that do happen but fail to find the expected effect are usually filed away rather than published. That makes the scientific record look more robust and complete than it is—a phenomenon known as the “file drawer problem.”

The emphasis on positive findings may also partly explain the fact that when original studies are subjected to replication, so many turn out to be false positives. The near-universal preference for counterintuitive, positive findings gives researchers an incentive to manipulate their methods or poke around in their data until a positive finding crops up, a common practice known as “p-hacking” because it can result in p-values, or measures of statistical significance, that make the results look stronger, and therefore more believable, than they really are. [...]

The recent special issue of Social Psychology was an unprecedented collective effort by social psychologists to [rectify this situation]—by altering researchers’ and journal editors’ incentives in order to check the robustness of some of the most talked-about findings in their own field. Any researcher who wanted to conduct a replication was invited to preregister: Before collecting any data from subjects, they would submit a proposal detailing precisely how they would repeat the original study and how they would analyze the data. Proposals would be reviewed by other researchers, including the authors of the original studies, and once approved, the study’s results would be published no matter what. Preregistration of the study and analysis procedures should deter p-hacking, guaranteed publication should counteract the file drawer effect, and a requirement of large sample sizes should make it easier to detect small but statistically meaningful effects.

The results were sobering. At least 10 of the 27 “important findings” in social psychology were not replicated at all. In the social priming area, only one of seven replications succeeded. [...]

One way to keep things in perspective is to remember that scientific truth is created by the accretion of results over time, not by the splash of a single study. A single failure-to-replicate doesn’t necessarily invalidate a previously reported effect, much less imply fraud on the part of the original researcher—or the replicator. Researchers are most likely to fail to reproduce an effect for mundane reasons, such as insufficiently large sample sizes, innocent errors in procedure or data analysis, and subtle factors about the experimental setting or the subjects tested that alter the effect in question in ways not previously realized.

Caution about single studies should go both ways, though. Too often, a single original study is treated—by the media and even by many in the scientific community—as if it definitively establishes an effect. Publications like Harvard Business Review and idea conferences like TED, both major sources of “thought leadership” for managers and policymakers all over the world, emit a steady stream of these “stats and curiosities.” Presumably, the HBR editors and TED organizers believe this information to be true and actionable. But most novel results should be initially regarded with some skepticism, because they too may have resulted from unreported or unnoticed methodological quirks or errors. Everyone involved should focus their attention on developing a shared evidence base that consists of robust empirical regularities—findings that replicate not just once but routinely—rather than of clever one-off curiosities. [...]

Scholars, especially scientists, are supposed to be skeptical about received wisdom, develop their views based solely on evidence, and remain open to updating those views in light of changing evidence. But as psychologists know better than anyone, scientists are hardly free of human motives that can influence their work, consciously or unconsciously. It’s easy for scholars to become professionally or even personally invested in a hypothesis or conclusion. These biases are addressed partly through the peer review process, and partly through the marketplace of ideas—by letting researchers go where their interest or skepticism takes them, encouraging their methods, data, and results to be made as transparent as possible, and promoting discussion of differing views. The clashes between researchers of different theoretical persuasions that result from these exchanges should of course remain civil; but the exchanges themselves are a perfectly healthy part of the scientific enterprise.

This is part of the reason why we cannot agree with a more recent proposal by Kahneman, who had previously urged social priming researchers to put their house in order. He contributed an essay to the special issue of Social Psychology in which he proposed a rule—to be enforced by reviewers of replication proposals and manuscripts—that authors “be guaranteed a significant role in replications of their work.” Kahneman proposed a specific process by which replicators should consult with original authors, and told Science that in the special issue, “the consultations did not reach the level of author involvement that I recommend.”

Collaboration between opposing sides would probably avoid some ruffled feathers, and in some cases it could be productive in resolving disputes. With respect to the current controversy, given the potential impact of an entire journal issue on the robustness of “important findings,” and the clear desirability of buy-in by a large portion of psychology researchers, it would have been better for everyone if the original authors’ comments had been published alongside the replication papers, rather than left to appear afterward. But consultation or collaboration is not something replicators owe to original researchers, and a rule to require it would not be particularly good science policy.

Replicators have no obligation to routinely involve original authors because those authors are not the owners of their methods or results. By publishing their results, original authors state that they have sufficient confidence in them that they should be included in the scientific record. That record belongs to everyone. Anyone should be free to run any experiment, regardless of who ran it first, and to publish the results, whatever they are. [...]

some critics of replication drives have been too quick to suggest that replicators lack the subtle expertise to reproduce the original experiments. One prominent social psychologist has even argued that tacit methodological skill is such a large factor in getting experiments to work that failed replications have no value at all (since one can never know if the replicators really knew what they were doing, or knew all the tricks of the trade that the original researchers did), a surprising claim that drew sarcastic responses. [See LW discussion.] [...]

Psychology has long been a punching bag for critics of “soft science,” but the field is actually leading the way in tackling a problem that is endemic throughout science. The replication issue of Social Psychology is just one example. The Association for Psychological Science is pushing for better reporting standards and more study of research practices, and at its annual meeting in May in San Francisco, several sessions on replication were filled to overflowing. International collaborations of psychologists working on replications, such as the Reproducibility Project and the Many Labs Replication Project (which was responsible for 13 of the 27 replications published in the special issue of Social Psychology) are springing up.

Even the most tradition-bound journals are starting to change. The Journal of Personality and Social Psychology—the same journal that, in 2011, refused to even consider replication studies—recently announced that although replications are “not a central part of its mission,” it’s reversing this policy. We wish that JPSP would see replications as part of its central mission and not relegate them, as it has, to an online-only ghetto, but this is a remarkably nimble change for a 50-year-old publication. Other top journals, most notable among them Perspectives in Psychological Science, are devoting space to systematic replications and other confirmatory research. The leading journal in behavior genetics, a field that has been plagued by unreplicable claims that particular genes are associated with particular behaviors, has gone even further: It now refuses to publish original findings that do not include evidence of replication.

A final salutary change is an overdue shift of emphasis among psychologists toward establishing the size of effects, as opposed to disputing whether or not they exist. The very notion of “failure” and “success” in empirical research is urgently in need of refinement. When applied thoughtfully, this dichotomy can be useful shorthand (and we’ve used it here). But there are degrees of replication between success and failure, and these degrees matter.

For example, suppose an initial study of an experimental drug for cardiovascular disease suggests that it reduces the risk of heart attack by 50 percent compared to a placebo pill. The most meaningful question for follow-up studies is not the binary one of whether the drug’s effect is 50 percent or not (did the first study replicate?), but the continuous one of precisely how much the drug reduces heart attack risk. In larger subsequent studies, this number will almost inevitably drop below 50 percent, but if it remains above 0 percent for study after study, then the best message should be that the drug is in fact effective, not that the initial results “failed to replicate.”

[meta] Policy for dealing with users suspected/guilty of mass-downvote harassment?

28 Kaj_Sotala 06 June 2014 05:46AM

Below is a message I just got from jackk. Some specifics have been redacted 1) so that we can discuss general policy rather than the details of this specific case 2) because presumption of innocence, just in case there happens to be an innocuous explanation to this.

Hi Kaj_Sotala,

I'm Jack, one of the Trike devs. I'm messaging you because you're the moderator who commented most recently. A while back the user [REDACTED 1] asked if Trike could look into retributive downvoting against his account. I've done that, and it looks like [REDACTED 2] has downvoted at least [over half of REDACTED 1's comments, amounting to hundreds of downvotes] ([REDACTED 1]'s next-largest downvoter is [REDACTED 3] at -15).

What action to take is a community problem, not a technical one, so we'd rather leave that up to the moderators. Some options:

1. Ask [REDACTED 2] for the story behind these votes
2. Use the "admin" account (which exists for sending scripted messages, &c.) to apply an upvote to each downvoted post
3. Apply a karma award to [REDACTED 1]'s account. This would fix the karma damage but not the sorting of individual comments
4. Apply a negative karma award to [REDACTED 2]'s account. This makes him pay for false downvotes twice over. This isn't possible in the current code, but it's an easy fix
5. Ban [REDACTED 2]

For future reference, it's very easy for Trike to look at who downvoted someone's account, so if you get questions about downvoting in the future I can run the same report.

If you need to verify my identity before you take action, let me know and we'll work something out.

-- Jack

So... thoughts? I have mod powers, but when I was granted them I was basically just told to use them to fight spam; there was never any discussion of any other policy, and I don't feel like I have the authority to decide on the suitable course of action without consulting the rest of the community.

Changes to my workflow

25 paulfchristiano 26 August 2014 05:29PM

About 18 months ago I made a post here on my workflow. I've received a handful of requests for follow-up, so I thought I would make another post detailing changes since then. I expect this post to be less useful than the last one.

For the most part, the overall outline has remained pretty stable and feels very similar to 18 months ago. Things not mentioned below have mostly stayed the same. I believe that the total effect of continued changes have been continued but much smaller improvements, though it is hard to tell (as opposed to the last changes, which were more clearly improvements).

Based on comparing time logging records I seem to now do substantially more work on average, but there are many other changes during this period that could explain the change (including changes in time logging). Changes other than work output are much harder to measure; I feel like they are positive but I wouldn't be surprised if this were an illusion.

Splitting days:

I now regularly divide my day into two halves, and treat the two halves as separate units. I plan each separately and reflect on each separately. I divide them by an hour long period of reflecting on the morning, relaxing for 5-10 minutes, napping for 25-30 minutes, processing my emails, and planning the evening. I find that this generally makes me more productive and happier about the day. Splitting my days is often difficult due to engagements in the middle of the day, and I don't have a good solution to that.


I have longstanding objections to explicitly rationing internet use (since it seems either indicative of a broader problem that should be resolved directly, or else to serve a useful function that would be unwise to remove). That said, I now use the extension WasteNoTime to limit my consumption of blogs, webcomics, facebook, news sites, browser games, etc., to 10 minutes each half-day. This has cut the amount of time I spend browsing the internet from an average of 30-40 minutes to an average of 10-15 minutes. It doesn't seem to have been replaced by lower-quality leisure, but by a combination of work and higher-quality leisure.

Similarly, I turned off the newsfeed in facebook, which I found to improve the quality of my internet time in general (the primary issue was that I would sometimes be distracted by the newsfeed while sending messages over facebook, which wasn't my favorite way to use up wastenotime minutes).

I also tried StayFocusd, but ended up adopting WasteNoTime because of the ability to set limits per half-day (via "At work" and "not at work" timers) rather than per-day. I find that the main upside is cutting off the tail of derping (e.g. getting sucked into a blog comment thread, or looking into a particularly engrossing issue), and for this purpose per half-day timers are much more effective.

Email discipline:

I set gmail to archive all emails on arrival and assign them the special label "In." This lets me to search for emails and compose emails, using the normal gmail interface, without being notified of new arrivals. I process the items with label "in" (typically turning emails into todo items to be processed by the same system that deals with other todo items) at the beginning of each half day. Each night I scan my email quickly for items that require urgent attention. 

Todo lists / reminders:

I continue to use todo lists for each half day and for a range of special conditions. I now check these lists at the beginning of each half day rather than before going to bed.

I also maintain a third list of "reminders." These are things that I want to be reminded of periodically, organized by day; each morning I look at the day's reminders and think about them briefly. Each of them is copied and filed under a future day. If I feel like I remember a thing well I file it in far in the future, if I feel like I don't remember it well I file it in the near future.

Over the last month most of these reminders have migrated to be in the form "If X, then Y," e.g. "If I agree to do something for someone, then pause, say `actually I should think about it for a few minutes to make sure I have time,' and set a 5 minute timer that night to think about it more clearly." These are designed to fix problems that I notice when reflecting on the day. This is a recommendation from CFAR folks, which seems to be working well, though is the newest part of the system and least tested.

Isolating "todos":

I now attempt to isolate things that probably need doing, but don't seem maximally important; I aim to do them only on every 5th day, and only during one half-day. If I can't finish them in this time, I will typically delay them 5 days. When they spill over to other days, I try to at least keep them to one half-day or the other. I don't know if this helps, but it feels better to have isolated unproductive-feeling blocks of time rather than scattering it throughout the week.

I don't do this very rigidly. I expect the overall level of discipline I have about it is comparable to or lower than a normal office worker who has a clearer division between their personal time and work time.


I now use Toggl for detailed time tracking. Katja Grace and I experimented with about half a dozen other systems (Harvest, Yast, Klok, Freckle, Lumina, I expect others I'm forgetting) before settling on Toggl. It has a depressing number of flaws, but ends up winning for me by making it very fast to start and switch timers which is probably the most important criterion for me. It also offers reviews that work out well with what I want to look at.

I find the main value adds from detailed time tracking are:

1. Knowing how long I've spent on projects, especially long-term projects. My intuitive estimates are often off by more than a factor of 2, even for things taking 80 hours; this can lead me to significantly underestimate the costs of taking on some kinds of projects, and it can also lead me to think an activity is unproductive instead of productive by overestimating how long I've actually spent on it.

2. Accurate breakdowns of time in a day, which guide efforts at improving my day-to-day routine. They probably also make me feel more motivated about working, and improve focus during work.

Reflection / improvement:

Reflection is now a smaller fraction of my time, down from 10% to 3-5%, based on diminishing returns to finding stuff to improve. Another 3-5% is now redirected into longer-term projects to improve particular aspects of my life (I maintain a list of possible improvements, roughly sorted by goodness). Examples: buying new furniture, improvements to my diet (Holden's powersmoothie is great), improvements to my sleep (low doses of melatonin seem good). At the moment the list of possible improvements is long enough that adding to the list is less valuable than doing things on the list.

I have equivocated a lot about how much of my time should go into this sort of thing. My best guess is the number should be higher.


I don't use pomodoros at all any more. I still have periods of uninterrupted work, often of comparable length, for individual tasks. This change wasn't extremely carefully considered, it mostly just happened. I find explicit time logging (such that I must consciously change the timer before changing tasks) seems to work as a substitute in many cases. I also maintain the habit of writing down candidate distractions and then attending to them later (if at all).

For larger tasks I find that I often prefer longer blocks of unrestricted working time. I continue to use Alinof timer to manage these blocks of uninterrupted work.


Catch disappeared, and I haven't found a replacement that I find comparably useful. (It's also not that high on the list of priorities.) I now just send emails to myself, but I do it much less often.


I no longer use beeminder. This again wasn't super-considered, though it was based on a very rough impression of overhead being larger than the short-term gains. I think beeminder was helpful for setting up a number of habits which have persisted (especially with respect to daily routine and regular focused work), and my long-term averages continue to satisfy my old beeminder goals.

Project outlines:

I now organize notes about each project I am working on in a more standardized way, with "Queue of todos," "Current workspace," and "Data" as the three subsections. I'm not thrilled by this system, but it seems to be an improvement over the previous informal arrangement. In particular, having a workspace into which I can easily write thoughts without thinking about where they fit, and only later sorting them into the data section once it's clearer how they fit in, decreases the activation energy of using the system. I now use Toggl rather than maintaining time logs by hand.

Randomized trials:

As described in my last post I tried various randomized trials (esp. of effects of exercise, stimulant use, and sleep on mood, cognitive performance, and productive time). I have found extracting meaningful data from these trials to be extremely difficult, due to straightforward issues with signal vs. noise. There are a number of tests which I still do expect to yield meaningful data, but I've increased my estimates for the expensiveness of useful tests substantially, and they've tended to fall down the priority list. For some things I've just decided to do them without the data, since my best guess is positive in expectation and the data is too expensive to acquire.


Announcing The Effective Altruism Forum

25 RyanCarey 24 August 2014 08:07AM

The Effective Altruism Forum will be launched at on September 10, British time.

Now seems like a good time time to discuss why we might need an Effective Altruism Forum, and how it might compare to LessWrong.

About the Effective Altruism Forum

The motivation for the Effective Altruism Forum is to improve the quality of effective altruist discussion and coordination. A big part of this is to give many of the useful features of LessWrong to effective altruists, including:


  • Archived, searchable content (this will begin with archived content from
  • Meetups
  • Nested comments
  • A karma system
  • A dynamically upated list of external effective altruist blogs
  • Introductory materials (this will begin with these articles)


The Effective Altruism Forum has been designed by Mihai Badic. Over the last month, it has been developed by Trike Apps, who have built the new site using the LessWrong codebase. I'm glad to report that it is now basically ready, looks nice, and is easy to use.

I expect that at the new forum, as on the effective altruist Facebook and Reddit pages, people will want to discuss the which intellectual procedures to use to pick effective actions. I also expect some proposals of effective altruist projects, and offers of resources. So users of the new forum will share LessWrong's interest in instrumental and epistemic rationality. On the other hand, I expect that few of its users will want to discuss the technical aspects of artificial intelligence, anthropics or decision theory, and to the extent that they do so, they will want to do it at LessWrong. As a result, I  expect the new forum to cause:


  • A bunch of materials on effective altruism and instrumental rationality to be collated for new effective altruists
  • Discussion of old LessWrong materials to resurface
  • A slight increase to the number of users of LessWrong, possibly offset by some users spending more of their time posting at the new forum.


At least initially, the new forum won't have a wiki or a Main/Discussion split and won't have any institutional affiliations.

Next Steps:

It's really important to make sure that the Effective Altruism Forum is established with a beneficial culture. If people want to help that process by writing some seed materials, to be posted around the time of the site's launch, then they can contact me at ry [dot] duff [at] Alternatively, they can wait a short while until they automatically receive posting priveleges.

It's also important that the Effective Altruism Forum helps the shared goals of rationalists and effective altruists, and has net positive effects on LessWrong in particular. Any suggestions for improving the odds of success for the effective altruism forum are most welcome.

Causal Inference Sequence Part 1: Basic Terminology and the Assumptions of Causal Inference

25 Anders_H 30 July 2014 08:56PM

(Part 1 of the Sequence on Applied Causal Inference


In this sequence, I am going to present a theory on how we can learn about causal effects using observational data.  As an example, we will imagine that you have collected information on a large number of Swedes - let us call them Sven, Olof, Göran,  Gustaf, Annica,  Lill-Babs, Elsa and Astrid. For every Swede, you have recorded data on their gender, whether they smoked or not, and on whether they got cancer during the 10-years of follow-up.   Your goal is to use this dataset to figure out whether smoking causes cancer.   

We are going to use the letter A as a random variable to represent whether they smoked. A can take the value 0 (did not smoke) or 1 (smoked).  When we need to talk about the specific values that A can take, we sometimes use lower case a as a placeholder for 0 or 1.    We use the letter Y as a random variable that represents whether they got cancer, and L to represent their gender. 

The data-generating mechanism and the joint distribution of variables

Imagine you are looking at this data set:







Did they smoke?

Did they get cancer?



































This table records information about the joint distribution of the variables L, A and Y.  By looking at it, you can tell that 1/4 of the Swedes were men who smoked and got cancer, 1/8 were men who did not smoke and got cancer, 1/8 were men who did not smoke and did not get cancer etc.  

You can make all sorts of statistics that summarize aspects of the joint distribution.  One such statistic is the correlation between two variables.  If "sex" is correlated with "smoking", it means that if you know somebody's sex, this gives you information that makes it easier to predict whether they smoke.   If knowing about an individual's sex gives no information about whether they smoked, we say that sex and smoking are independent.  We use the symbol ∐ to mean independence. 

When we are interested in causal effects, we are asking what would happen to the joint distribution if we intervened to change the value of a variable.  For example, how many Swedes would get cancer in a hypothetical world where you intervened to make sure they all quit smoking?  

In order to answer this, we have to ask questions about the data generating mechanism. The data generating mechanism is the algorithm that assigns value to the variables, and therefore creates the joint distribution. We will think of the data as being generated by three different algorithms: One for L, one for A and one for Y.    Each of these algorithms takes the previously assigned variables as input, and then outputs a value.    

Questions about the data generating mechanism include “Which variable has its value assigned first?”,  “Which variables from the past (observed or unobserved) are used as inputs” and “If I change whether someone smokes, how will that change propagate to other variables that have their value assigned later".    The last of these questions can be rephrased as "What is the causal effect of smoking”.    

The basic problem of causal inference is that the relationship between the set of possible data generating mechanisms, and the joint distribution of variables, is many-to-one:   For any correlation you observe in the dataset, there are many possible sets of algorithms for L, A and Y that could all account for the observed patterns. For example, if you are looking at a correlation between cancer and smoking, you can tell a story about cancer causing people to take up smoking, or a story about smoking causing people to get cancer, or a story about smoking and cancer sharing a common cause.  

An important thing to note is that even if you have data on absolutely everyone, you still would not be able to distinguish between the possible data generating mechanisms. The problem is not that you have a limited sample. This is therefore not a statistical problem.  What you need to answer the question, is not more people in your study, but a priori causal information.  The purpose of this sequence is to show you how to reason about what prior causal information is necessary, and how to analyze the data if you have measured all the necessary variables. 

Counterfactual Variables and "God's Table":

The first step of causal inference is to translate the English language research question «What is the causal effect of smoking» into a precise, mathematical language.  One possible such language is based on counterfactual variables.  These counterfactual variables allow us to encode the concept of “what would have happened if, possibly contrary to fact, the person smoked”.

We define one counterfactual variable called Ya=1 which represents the outcome in the person if he smoked, and another counterfactual variable called Ya=0 which represents the outcome if he did not smoke. Counterfactual variables such as Ya=0 are mathematical objects that represent part of the data generating mechanism:  The variable tells us what value the mechanism would assign to Y, if we intervened to make sure the person did not smoke. These variables are columns in an imagined dataset that we sometimes call “God’s Table”:










Whether they would have got cancer if they smoked

Whether they would have got cancer if they didn't smoke
























Let us start by making some points about this dataset.  First, note that the counterfactual variables are variables just like any other column in the spreadsheet.   Therefore, we can use the same type of logic that we use for any other variables.  Second, note that in our framework, counterfactual variables are pre-treatment variables:  They are determined long before treatment is assigned. The effect of treatment is simply to determine whether we see Ya=0 or Ya=1 in this individual.

If you had access to God's Table, you would immediately be able to look up the average causal effect, by comparing the column Ya=1 to the column Ya=0.  However, the most important point about God’s Table is that we cannot observe Ya=1 and Ya=0. We only observe the joint distribution of observed variables, which we can call the “Observed Table”:



















The goal of causal inference is to learn about God’s Table using information from the observed table (in combination with a priori causal knowledge).  In particular, we are going to be interested in learning about the distributions of Ya=1 and Ya=0, and in how they relate to each other.  


Randomized Trials

The “Gold Standard” for estimating the causal effect, is to run a randomized controlled trial where we randomly assign the value of A.   This study design works because you select one random subset of the study population where you observe Ya=0, and another random subset where you observe Ya=1.   You therefore have unbiased information about the distribution of both Ya=0and of Ya=1

An important thing to point out at this stage is that it is not necessary to use an unbiased coin to assign treatment, as long as your use the same coin for everyone.   For instance, the probability of being randomized to A=1 can be 2/3.  You will still see randomly selected subsets of the distribution of both Ya=0 and Ya=1, you will just have a larger number of people where you see Ya=1.     Usually, randomized trials use unbiased coins, but this is simply done because it increases the statistical power. 

Also note that it is possible to run two different randomized controlled trials:  One in men, and another in women.  The first trial will give you an unbiased estimate of the effect in men, and the second trial will give you an unbiased estimate of the effect in women.  If both trials used the same coin, you could think of them as really being one trial. However, if the two trials used different coins, and you pooled them into the same database, your analysis would have to account for the fact that in reality, there were two trials. If you don’t account for this, the results will be biased.  This is called “confounding”. As long as you account for the fact that there really were two trials, you can still recover an estimate of the population average causal effect. This is called “Controlling for Confounding”.

In general, causal inference works by specifying a model that says the data came from a complex trial, ie, one where nature assigned a biased coin depending on the observed past.  For such a trial, there will exist a valid way to recover the overall causal results, but it will require us to think carefully about what the correct analysis is. 

Assumptions of Causal Inference

We will now go through in some more detail about why it is that randomized trials work, ie , the important aspects of this study design that allow us to infer causal relationships, or facts about God’s Table, using information about the joint distribution of observed variables.  

We will start with an “observed table” and build towards “reconstructing” parts of God’s Table.  To do this, we will need three assumptions: These are positivity, consistency and (conditional) exchangeability:




















Positivity is the assumption that any individual has a positive probability of receiving all values of the treatment variable:   Pr(A=a) > 0 for all values of a.  In other words, you need to have both people who smoke, and people who don't smoke.  If positivity does not hold, you will not have any information about the distribution of Ya for that value of a, and will therefore not be able to make inferences about it.

We can check whether this assumption holds in the sample, by checking whether there are people who are treated and people who are untreated. If you observe that in any stratum, there are individuals who are treated and individuals who are untreated, you know that positivity holds.  

If we observe a stratum where no individuals are treated (or no individuals are untreated), this can be either for statistical reasons (your randomly did not sample them) or for structural reasons (individuals with these covariates are deterministically never treated).  As we will see later, our models can handle random violations, but not structural violations.

In a randomized controlled trial, positivity holds because you will use a coin that has a positive probability of assigning people to either arm of the trial.


The next assumption we are going to make is that if an individual happens to have treatment (A=1), we will observe the counterfactual variable Ya=1 in this individual. This is the observed table after we make the consistency assumption:






























 Making the consistency assumption got us half the way to our goal.  We now have a lot of information about Ya=1 and Ya=0. However, half of the data is still missing.

Although consistency seems obvious, it is an assumption, not something that is true by definition.  We can expect the consistency assumption to hold if we have a well-defined intervention (ie, the intervention is a well-defined choice, not an attribute of the individual), and there is no causal interference (one individual’s outcome is not affected by whether another individual was treated).

Consistency may not hold if you have an intervention that is not well-defined:  For example, there may be multiple types of cigarettes. When you measure Ya=1 in people who smoked, it will actually be a composite of multiple counterfactual variables:  One for people who smoked regular cigarettes (let us call that Ya=1*) and another for people who smoked e-cigarettes (let us call that Ya=1#)   Since you failed to specify whether you are interested in the effect of regular cigarettes or e-cigarettes, the construct Ya=1 is a composite without any meaning, and people will be unable to use your results to predict the consequences of their actions.


To complete the table, we require an additional assumption on the nature of the data. We call this assumption “Exchangeability”.  One possible exchangeability assumption is “Ya=0 ∐ A and Ya=1 ∐ A”.   This is the assumption that says “The data came from a randomized controlled trial”. If this assumption is true, you will observe a random subset of the distribution of Ya=0 in the group where A=0, and a random subset of the distribution of Ya=1 in the group where A=1.

Exchangeability is a statement about two variables being independent from each other. This means that having information about either one of the variables will not help you predict the value of the other.  Sometimes, variables which are not independent are "conditionally independent".  For example, it is possible that knowing somebody's race helps you predict whether they enjoy eating Hakarl, an Icelandic form of rotting fish.  However, it is also possible that this is just a marker for whether they were born in the ethnically homogenous Iceland. In such a situation, it is possible that once you already know whether somebody is from Iceland, also knowing their race gives you no additional clues as to whether they will enjoy Hakarl.  In this case, the variables "race" and "enjoying hakarl" are conditionally independent, given nationality. 

The reason we care about conditional independence is that sometimes you may be unwilling to assume that marginal exchangeability Ya=1 ∐ A holds, but you are willing to assume conditional exchangeability Ya=1 ∐ A  | L.  In this example, let L be sex.  The assumption then says that you can interpret the data as if it came from two different randomized controlled trials: One in men, and one in women. If that is the case, sex is a "confounder". (We will give a definition of confounding in Part 2 of this sequence. )

If the data came from two different randomized controlled trials, one possible approach is to analyze these trials separately. This is called “stratification”.  Stratification gives you effect measures that are conditional on the confounders:  You get one measure of the effect in men, and another in women.  Unfortunately, in more complicated settings, stratification-based methods (including regression) are always biased. In those situations, it is necessary to focus the inference on the marginal distribution of Ya.


If marginal exchangeability holds (ie, if the data came from a marginally randomized trial), making inferences about the marginal distribution of Ya is easy: You can just estimate E[Ya] as E [Y|A=a].

However, if the data came from a conditionally randomized trial, we will need to think a little bit harder about how to say anything meaningful about E[Ya]. This process is the central idea of causal inference. We call it “identification”:  The idea is to write an expression for the distribution of a counterfactual variable, purely in terms of observed variables.  If we are able to do this, we have sufficient information to estimate causal effects just by looking at the relevant parts of the joint distribution of observed variables.

The simplest example of identification is standardization.  As an example, we will show a simple proof:

Begin by using the law of total probability to factor out the confounder, in this case L:

·         E(Ya) = Σ  E(Ya|L= l) * Pr(L=l)    (The summation sign is over l)

We do this because we know we need to introduce L behind the conditioning sign, in order to be able to use our exchangeability assumption in the next step:   Then,  because Ya  ∐ A | L,  we are allowed to introduce A=a behind the conditioning sign:

·         E(Ya) =  Σ  E(Ya|A=a, L=l) * Pr(L=l)

Finally, use the consistency assumption:   Because we are in the stratum where A=a in all individuals, we can replace Ya by Y

·         E(Ya) = Σ E(Y|A=a, L=l) * Pr (L=l)


We now have an expression for the counterfactual in terms of quantities that can be observed in the real world, ie, in terms of the joint distribution of A, Y and L. In other words, we have linked the data generating mechanism with the joint distribution – we have “identified”  E(Ya).  We can therefore estimate E(Ya)

This identifying expression is valid if and only if L was the only confounder. If we had not observed sufficient variables to obtain conditional exchangeability, it would not be possible to identify the distribution of Ya : there would be intractable confounding.

Identification is the core concept of causal inference: It is what allows us to link the data generating mechanism to the joint distribution, to something that can be observed in the real world. 


The difference between epidemiology and biostatistics

Many people see Epidemiology as «Applied Biostatistics».  This is a misconception. In reality, epidemiology and biostatistics are completely different parts of the problem.  To illustrate what is going on, consider this figure:



The data generating mechanism first creates a joint distribution of observed variables.  Then, we sample from the joint distribution to obtain data. Biostatistics asks:  If we have a sample, what can we learn about the joint distribution?  Epidemiology asks:  If we have all the information about the joint distribution , what can we learn about the data generating mechanism?   This is a much harder problem, but it can still be analyzed with some rigor.

Epidemiology without Biostatistics is always impossible:  It would not be possible to learn about the data generating mechanism without asking questions about the joint distribution. This usually involves sampling.  Therefore, we will need good statistical estimators of the joint distribution.

Biostatistics without Epidemiology is usually pointless:  The joint distribution of observed variables is simply not interesting in itself. You can make the claim that randomized trials is an example of biostatistics without epidemiology.  However, the epidemiology is still there. It is just not necessary to think about it, because the epidemiologic part of the analysis is trivial

Note that the word “bias” means different things in Epidemiology and Biostatistics.  In Biostatistics, “bias” is a property of a statistical estimator:  We talk about whether ŷ is a biased estimator of E(Y |A).   If an estimator is biased, it means that when you use data from a sample to make inferences about the joint distribution in the population the sample came from, there will be a systematic source of error.

In Epidemiology, “bias” means that you are estimating the wrong thing:  Epidemiological bias is a question about whether E(Y|A) is a valid identification of E(Ya).   If there is epidemiologic bias, it means that you estimated something in the joint distribution, but that this something does not answer the question you were interested in.    

These are completely different concepts. Both are important and can lead to your estimates being wrong. It is possible for a statistically valid estimator to be biased in the epidemiologic sense, and vice versa.   For your results to be valid, your estimator must be unbiased in both senses.


Sequence Announcement: Applied Causal Inference

24 Anders_H 30 July 2014 08:55PM

Applied Causal Inference for Observational Research

This sequence is an introduction to basic causal inference.  It was originally written as auxiliary notes for a course in Epidemiology, but it is relevant to almost any kind of applied statistical research, including econometrics, sociology, psychology, political science etc.  I would not be surprised if you guys find a lot of errors, and I would be very grateful if you point them out in the comments. This will help me improve my course notes and potentially help me improve my understanding of the material. 

For mathematically inclined readers, I recommend skipping this sequence and instead reading Pearl's book on Causality.  There is also a lot of good material on causal graphs on Less Wrong itself.   Also, note that my thesis advisor is writing a book that covers the same material in more detail, the first two parts are available for free at his website.

Pearl's book, Miguel's book and Eliezer's writings are all more rigorous and precise than my sequence.  This is partly because I have a different goal:  Pearl and Eliezer are writing for mathematicians and theorists who may be interested in contributing to the theory.  Instead,  I am writing for consumers of science who want to understand correlation studies from the perspective of a more rigorous epistemology.  

I will use Epidemiological/Counterfactual notation rather than Pearl's notation. I apologize if this is confusing.  These two approaches refer to the same mathematical objects, it is just a different notation. Whereas Pearl would use the "Do-Operator" E[Y|do(a)], I use counterfactual variables  E[Ya].  Instead of using Pearl's "Do-Calculus" for identification, I use Robins' G-Formula, which will give the same results. 

For all applications, I will use the letter "A" to represent "treatment" or "exposure" (the thing we want to estimate the effect of),  Y to represent the outcome, L to represent any measured confounders, and U to represent any unmeasured confounders. 

Outline of Sequence:

I hope to publish one post every week.  I have rough drafts for the following eight sections, and will keep updating this outline with links as the sequence develops:

Part 0:  Sequence Announcement / Introduction (This post)

Part 1:  Basic Terminology and the Assumptions of Causal Inference

Part 2:  Graphical Models

Part 3:  Using Causal Graphs to Understand Bias

Part 4:  Time-Dependent Exposures

Part 5:  The G-Formula

Part 6:  Inverse Probability Weighting

Part 7:  G-Estimation of Structural Nested Models and Instrumental Variables

Part 8:  Single World Intervention Graphs, Cross-World Counterfactuals and Mediation Analysis


 Introduction: Why Causal Inference?

The goal of applied statistical research is almost always to learn about causal effects.  However, causal inference from observational is hard, to the extent that it is usually not even possible without strong, almost heroic assumptions.   Because of the inherent difficulty of the task, many old-school investigators were trained to avoid making causal claims.  Words like “cause” and “effect” were banished from polite company, and the slogan “correlation does not imply causation” became an article of faith which, when said loudly enough,  seemingly absolved the investigators from the sin of making causal claims.

However, readers were not fooled:  They always understood that epidemiologic papers were making causal claims.  Of course they were making causal claims; why else would anybody be interested in a paper about the correlation between two variables?   For example, why would anybody want to know about the correlation between eating nuts and longevity, unless they were wondering if eating nuts would cause them to live longer?

When readers interpreted these papers causally, were they simply ignoring the caveats, drawing conclusions that were not intended by the authors?   Of course they weren’t.  The discussion sections of epidemiologic articles are full of “policy implications” and speculations about biological pathways that are completely contingent on interpreting the findings causally. Quite clearly, no matter how hard the investigators tried to deny it, they were making causal claims. However, they were using methodology that was not designed for causal questions, and did not have a clear language for reasoning about where the uncertainty about causal claims comes from. 

This was not sustainable, and inevitably led to a crisis of confidence, which culminated when some high-profile randomized trials showed completely different results from the preceding observational studies.  In one particular case, when the Women’s Health Initiative trial showed that post-menopausal hormone replacement therapy increases the risk of cardiovascular disease, the difference was so dramatic that many thought-leaders in clinical medicine completely abandoned the idea of inferring causal relationships from observational data.

It is important to recognize that the problem was not that the results were wrong. The problem was that there was uncertainty that was not taken seriously by the investigators. A rational person who wants to learn about the world will be willing to accept that studies have errors of margin, but only as long as the investigators make a good-faith effort to examine what the sources of error are, and communicate clearly about this uncertainty to their readers.  Old-school epidemiology failed at this.  We are not going to make the same mistake. Instead, we are going to develop a clear, precise language for reasoning about uncertainty and bias.

In this context, we are going to talk about two sources of uncertainty – “statistical” uncertainty and “epidemiological” uncertainty. 

We are going to use the word “Statistics” to refer to the theory of how we can learn about correlations from limited samples.  For statisticians, the primary source of uncertainty is sampling variability. Statisticians are very good at accounting for this type of uncertainty: Concepts such as “standard errors”, “p-values” and “confidence intervals” are all attempts at quantifying and communicating the extent of uncertainty that results from sampling variability.

The old school of epidemiology would tell you to stop after you had found the correlations and accounted for the sampling variability. They believed going further was impossible. However, correlations are simply not interesting. If you truly believed that correlations tell you nothing about causation, there would be no point in doing the study.

Therefore, we are going to use the terms “Epidemiology” or “Causal Inference” to refer to the next stage in the process:  Learning about causation from correlations.  This is a much harder problem, with many additional sources of uncertainty, including confounding and selection bias. However, recognizing that the problem is hard does not mean that you shouldn't try, it just means that you have to be careful. As we will see, it is possible to reason rigorously about whether correlation really does imply causation in your particular study: You will just need a precise language. The goal of this sequence is simply to give you such a language.

In order to teach you the logic of this language, we are going to make several controversial statements such as «The only way to estimate a causal effect is to run a randomized controlled trial» . You may not be willing to believe this at first, but in order to understand the logic of causal inference, it is necessary that you are at least willing to suspend your disbelief and accept it as true within the course. 

It is important to note that we are not just saying this to try to convince you to give up on observational studies in favor of randomized controlled trials.   We are making this point because understanding it is necessary in order to appreciate what it means to control for confounding: It is not possible to give a coherent meaning to the word “confounding” unless one is trying to determine whether it is reasonable to model the data as if it came from a complex randomized trial run by nature. 



When we say that causal inference is hard, what we mean by this is not that it is difficult to learn the basics concepts of the theory.  What we mean is that even if you fully understand everything that has ever been written about causal inference, it is going to be very hard to infer a causal relationship from observational data, and that there will always be uncertainty about the results. This is why this sequence is not going to be a workshop that teaches you how to apply magic causal methodology. What we are interested in, is developing your ability to reason honestly about where uncertainty and bias comes from, so that you can communicate this to the readers of your studies.  What we want to teach you about, is the epistemology that underlies epidemiological and statistical research with observational data. 

Insisting on only using randomized trials may seem attractive to a purist, it does not take much imagination to see that there are situations where it is important to predict the consequences of an action, but where it is not possible to run a trial. In such situations, there may be Bayesian evidence to be found in nature. This evidence comes in the form of correlations in observational data. When we are stuck with this type of evidence, it is important that we have a clear framework for assessing the strength of the evidence. 




I am publishing Part 1 of the sequence at the same time as this introduction. I would be very interested in hearing feedback, particularly about whether people feel this has already been covered in sufficient detail on Less Wrong.  If there is no demand, there won't really be any point in transforming the rest of my course notes to a Less Wrong format. 

Thanks to everyone who had a look at this before I published, including paper-machine and Vika, Janos, Eloise and Sam from the Boston Meetup group. 

The Correct Use of Analogy

24 SilentCal 16 July 2014 09:07PM

In response to: Failure by AnalogySurface Analogies and Deep Causes

Analogy gets a bad rap around here, and not without reason. The kinds of argument from analogy condemned in the above links fully deserve the condemnation they get. Still, I think it's too easy to read them and walk away thinking "Boo analogy!" when not all uses of analogy are bad. The human brain seems to have hardware support for thinking in analogies, and I don't think this capability is a waste of resources, even in our highly non-ancestral environment. So, assuming that the linked posts do a sufficient job detailing the abuse and misuse of analogy, I'm going to go over some legitimate uses.


The first thing analogy is really good for is description. Take the plum pudding atomic model. I still remember this falsified proposal of negative 'raisins' in positive 'dough' largely because of the analogy, and I don't think anyone ever attempted to use it to argue for the existence of tiny subnuclear particles corresponding to cinnamon. 

But this is only a modest example of what analogy can do. The following is an example that I think starts to show the true power: my comment on Robin Hanson's 'Don't Be "Rationalist"'. To summarize, Robin argued that since you can't be rationalist about everything you should budget your rationality and only be rational about the most important things; I replied that maybe rationality is like weightlifting, where your strength is finite yet it increases with use. That comment is probably the most successful thing I've ever written on the rationalist internet in terms of the attention it received, including direct praise from Eliezer and a shoutout in a Scott Alexander (yvain) post, and it's pretty much just an analogy.

Here's another example, this time from Eliezer. As part of the AI-Foom debate, he tells the story of Fermi's nuclear experiments, and in particular his precise knowledge of when a pile would go supercritical.

What do the above analogies accomplish? They provide counterexamples to universal claims. In my case, Robin's inference that rationality should be spent sparingly proceeded from the stated premise that no one is perfectly rational about anything, and weightlifting was a counterexample to the implicit claim 'a finite capacity should always be directed solely towards important goals'. If you look above my comment, anon had already said that the conclusion hadn't been proven, but without the counterexample this claim had much less impact.

In Eliezer's case, "you can never predict an unprecedented unbounded growth" is the kind of claim that sounds really convincing. "You haven't actually proved that" is a weak-sounding retort; "Fermi did it" immediately wins the point. 

The final thing analogies do really well is crystallize patterns. For an example of this, let's turn to... Failure by Analogy. Yep, the anti-analogy posts are themselves written almost entirely via analogy! Alchemists who glaze lead with lemons and would-be aviators who put beaks on their machines are invoked to crystallize the pattern of 'reasoning by similarity'. The post then makes the case that neural-net worshippers are reasoning by similarity in just the same way, making the same fundamental error.

It's this capacity that makes analogies so dangerous. Crystallizing a pattern can be so mentally satisfying that you don't stop to question whether the pattern applies. The antidote to this is the question, "Why do you believe X is like Y?" Assessing the answer and judging deep similarities from superficial ones may not always be easy, but just by asking you'll catch the cases where there is no justification at all.

Separating the roles of theory and direct empirical evidence in belief formation: the examples of minimum wage and anthropogenic global warming

24 VipulNaik 25 June 2014 09:47PM

I recently asked two questions on Quora with similar question structures, and the similarities and differences between the responses were interesting.

Question #1: Anthropogenic global warming, the greenhouse effect, and the historical weather record

I asked the question here. Question statement:

If you believe in Anthropogenic Global Warming (AGW), to what extent is your belief informed by the theory of the greenhouse effect, and to what extent is it informed by the historical temperature record?

In response to some comments, I added the following question details:

Due to length limitations, the main question is a bit simplistically framed. But what I'm really asking for is the relative importance of theoretical mechanisms and direct empirical evidence. Theoretical mechanisms are of course also empirically validated, but the empirical validation could occur in different settings.

For instance, the greenhouse effect is a mechanism, and one may get estimates of the strength of the greenhouse effect based on an understanding of the underlying physics or by doing laboratory experiments or simulations.

Direct empirical evidence is evidence that is as close to the situation we are trying to predict as possible. In this case, it would involve looking at the historical records of temperature and carbon dioxide concentrations, and perhaps some other confounding variables whose role needs to be controlled for (such as solar activity).

Saying that your belief is largely grounded in direct empirical evidence is basically saying that just looking at the time series of temperature, carbon dioxide concentrations and the other variables can allow one to say with fairly high confidence (starting from very weak priors) that increased carbon dioxide concentrations, due to human activity, are responsible for temperature increases. In other words, if you ran a regression and tried to do the usual tricks to infer causality, carbon dioxide would come out as the culprit.

Saying that your belief is largely grounded in theory is basically saying that the science of the greenhouse effect is sufficiently convincing that the historical temperature and weather record isn't an important factor in influencing your belief: if it had come out differently, you'd probably just have thought the data was noisy or wrong and wouldn't update away from believing in the AGW thesis.

I also posted to Facebook here asking my friends about the pushback to my use of the term "belief" in my question.

Question #2: Effect of increase in the minimum wage on unemployment

I asked the question here. Question statement:

If you believe that raising the minimum wage is likely to increase unemployment, to what extent is your belief informed by the theory of supply and demand and to what extent is it informed by direct empirical evidence?

I added the following question details:

By "direct empirical evidence" I am referring to empirical evidence that  directly pertains to the relation between minimum wage raises and  employment level changes, not empirical evidence that supports the  theory of supply and demand in general (because transferring that to the  minimum wage context would require one to believe the transferability  of the theory).

Also, when I say "believe that raising the minimum wage is likely to increase unemployment" I am talking about minimum wage increases of the sort often considered in legislative measures, and by "likely" I just mean that it's something that should always be seriously considered whenever a proposal to raise the minimum wage is made. The belief would be consistent with believing that in some cases minimum wage raises have no employment effects.

I also posted the question to Facebook here.

Similarities between the questions

The questions are structurally similar, and belong to a general question type of considerable interest to the LessWrong audience. The common features to the questions:

  • In both cases, there is a theory (the greenhouse effect for Question #1, and supply and demand for Question #2) that is foundational to the domain and is supported through a wide range of lines of evidence.
  • In both cases, the quantitative specifics of the extent to which the theory applies in the particular context are not clear. There are prima facie plausible arguments that other factors may cancel out the effect and there are arguments for many different effect sizes.
  • In both cases, people who study the broad subject (climate scientists for Question #1, economists for Question #2) are more favorably disposed to the belief than people who do not study the broad subject.
  • In both cases, a significant part of the strength of belief of subject matter experts seems to be their belief in the theory. The data, while consistent with the theory, does not seem to paint a strong picture in isolation. For the minimum wage, consider the Card and Krueger study. Bryan Caplan discusses how Bayesian reasoning with strong theoretical priors can lead one to continue believing that minimum wage increases cause unemployment to rise, without addressing Card and Krueger at the object level. For the case of anthropogenic global warming, consider the draft by Kesten C. Green (addressing whether a warming-based forecast has higher forecast accuracy than a no-change forecast) or the paper AGW doesn't cointegrate by Beenstock, Reingewertz, and Paldor (addressing whether, looking at the data alone, we can get good evidence that carbon dioxide concentration increases are linked with temperature increases).
  • In both cases, outsiders to the domain, who nonetheless have expertise in other areas that one might expect gives them insight into the question, are often more skeptical of the belief. A number of weather forecasters, physicists, and forecasting experts are skeptical of long-range climate forecasting or confident assertions about anthropogenic global warming. A number of sociologists, lawyers, and politicians often are disparaging of the belief that minimum wage increases cause unemployment levels to rise. The criticism is similar: namely, that a basically correct theory is being overstretched or incorrectly applied to a situation that is too complex, is similar.
  • In both cases, the debate is somewhat politically charged, largely because one's beliefs here affect one's views of proposed legislation (climate change mitigation legislation and minimum wage increase legislation). The anthropogenic global warming belief is more commonly associated with environmentalists, social democrats, and progressives, and (in the United States) with Democrats, whereas opposition to it is more common among conservatives and libertarians. The minimum wage belief is more commonly associated with free market views and (in the United States) with conservatives and Republicans, and opposition to it is more common among progressives and social democrats.

Looking for help

I'm interested in thoughts from the people here on these questions:

  • Thoughts on the specifics of Question #1 and Question #2.
  • Other possible questions in the same reference class (where a belief arises from a mix of theory and data, and the theory plays a fairly big role in driving the belief, while the data on its own is very ambiguous).
  • Other similarities between Question #1 and Question #2.
  • Ways that Question #1 and Question #2 are disanalogous.
  • General thoughts on how this relates to Bayesian reasoning and other modes of belief formation based on a combination of theory and data.


"Follow your dreams" as a case study in incorrect thinking

23 cousin_it 20 August 2014 01:18PM

This post doesn't contain any new ideas that LWers don't already know. It's more of an attempt to organize my thoughts and have a writeup for future reference.

Here's a great quote from Sam Hughes, giving some examples of good and bad advice:

"You and your gaggle of girlfriends had a saying at university," he tells her. "'Drink through it'. Breakups, hangovers, finals. I have never encountered a shorter, worse, more densely bad piece of advice." Next he goes into their bedroom for a moment. He returns with four running shoes. "You did the right thing by waiting for me. Probably the first right thing you've done in the last twenty-four hours. I subscribe, as you know, to a different mantra. So we're going to run."

The typical advice given to young people who want to succeed in highly competitive areas, like sports, writing, music, or making video games, is to "follow your dreams". I think that advice is up there with "drink through it" in terms of sheer destructive potential. If it was replaced with "don't bother following your dreams" every time it was uttered, the world might become a happier place.

The amazing thing about "follow your dreams" is that thinking about it uncovers a sort of perfect storm of biases. It's fractally wrong, like PHP, where the big picture is wrong and every small piece is also wrong in its own unique way.

The big culprit is, of course, optimism bias due to perceived control. I will succeed because I'm me, the special person at the center of my experience. That's the same bias that leads us to overestimate our chances of finishing the thesis on time, or having a successful marriage, or any number of other things. Thankfully, we have a really good debiasing technique for this particular bias, known as reference class forecasting, or inside vs outside view. What if your friend Bob was a slightly better guitar player than you? Would you bet a lot of money on Bob making it big like Jimi Hendrix? The question is laughable, but then so is betting the years of your own life, with a smaller chance of success than Bob.

That still leaves many questions unanswered, though. Why do people offer such advice in the first place, why do other people follow it, and what can be done about it?

Survivorship bias is one big reason we constantly hear successful people telling us to "follow our dreams". Successful people doesn't really know why they are successful, so they attribute it to their hard work and not giving up. The media amplifies that message, while millions of failures go unreported because they're not celebrities, even though they try just as hard. So we hear about successes disproportionately, in comparison to how often they actually happen, and that colors our expectations of our own future success. Sadly, I don't know of any good debiasing techniques for this error, other than just reminding yourself that it's an error.

When someone has invested a lot of time and effort into following their dream, it feels harder to give up due to the sunk cost fallacy. That happens even with very stupid dreams, like the dream of winning at the casino, that were obviously installed by someone else for their own profit. So when you feel convinced that you'll eventually make it big in writing or music, you can remind yourself that compulsive gamblers feel the same way, and that feeling something doesn't make it true.

Of course there are good dreams and bad dreams. Some people have dreams that don't tease them for years with empty promises, but actually start paying off in a predictable time frame. The main difference between the two kinds of dream is the difference between positive-sum games, a.k.a. productive occupations, and zero-sum games, a.k.a. popularity contests. Sebastian Marshall's post Positive Sum Games Don't Require Natural Talent makes the same point, and advises you to choose a game where you can be successful without outcompeting 99% of other players.

The really interesting question to me right now is, what sets someone on the path of investing everything in a hopeless dream? Maybe it's a small success at an early age, followed by some random encouragement from others, and then you're locked in. Is there any hope for thinking back to that moment, or set of moments, and making a little twist to put yourself on a happier path? I usually don't advise people to change their desires, but in this case it seems to be the right thing to do.

Multiple Factor Explanations Should Not Appear One-Sided

22 Stefan_Schubert 07 August 2014 02:10PM

In Policy Debates Should Not Appear One-Sided, Eliezer Yudkowsky argues that arguments on questions of fact should be one-sided, whereas arguments on policy questions should not:

On questions of simple fact (for example, whether Earthly life arose by natural selection) there's a legitimate expectation that the argument should be a one-sided battle; the facts themselves are either one way or another, and the so-called "balance of evidence" should reflect this.  Indeed, under the Bayesian definition of evidence, "strong evidence" is just that sort of evidence which we only expect to find on one side of an argument.

But there is no reason for complex actions with many consequences to exhibit this onesidedness property.

The reason for this is primarily that natural selection has caused all sorts of observable phenomena. With a bit of ingenuity, we can infer that natural selection has caused them, and hence they become evidence for natural selection. The evidence for natural selection thus has a common cause, which means that we should expect the argument to be one-sided.

In contrast, even if a certain policy, say lower taxes, is the right one, the rightness of this policy does not cause its evidence (or the arguments for this policy, which is a more natural expression), the way natural selection causes its evidence. Hence there is no common cause of all of the valid arguments of relevance for the rightness of this policy, and hence no reason to expect that all of the valid arguments should support lower taxes. If someone nevertheless believes this, the best explanation of their belief is that they suffer from some cognitive bias such as the affect heuristic.

(In passing, I might mention that I think that the fact that moral debates are not one-sided indicates that moral realism is false, since if moral realism were true, moral facts should provide us with one-sided evidence on moral questions, just like natural selection provides us with one-sided evidence on the question how Earthly life arose. This argument is similar to, but distinct from, Mackie's argument from relativity.)

Now consider another kind of factual issues: multiple factor explanations. These are explanations which refer to a number of factors to explain a certain phenomenon. For instance, in his book Guns, Germs and Steel, Jared Diamond explains the fact that agriculture first arose in the Fertile Crescent by reference to no less than eight factors. I'll just list these factors briefly without going into the details of how they contributed to the rise of agriculture. The Fertile Crescent had, according to Diamond (ch. 8):

  1. big seeded plants, which were
  2. abundant and occurring in large stands whose value was obvious,
  3. and which were to a large degree hermaphroditic "selfers".
  4. It had a higher percentage of annual plants than other Mediterreanean climate zones
  5. It had higher diversity of species than other Mediterreanean climate zones.
  6. It has a higher range of elevations than other Mediterrenean climate zones
  7. It had a great number of domesticable big mammals.
  8. The hunter-gatherer life style was not that appealing in the Fertile Crescent

(Note that all of these factors have to do with geographical, botanical and zoological facts, rather than with facts about the humans themselves. Diamond's goal is to prove that agriculture arose in Eurasia due to geographical luck rather than because Eurasians are biologically superior to other humans.)

Diamond does not mention any mechanism that would make it less likely for agriculture to arise in the Fertile Crescent. Hence the score of pro-agriculture vs anti-agriculture factors in the Fertile Crescent is 8-0. Meanwhile no other area in the world has nearly as many advantages. Diamond does not provide us with a definite list of how other areas of the world fared but no non-Eurasian alternative seem to score better than about 5-3 (he is primarily interested in comparing Eurasia with other parts of the world).

Now suppose that we didn't know anything about the rise of agriculture, but that we knew that there were eight factors which could influence it. Since these factors would not be caused by the fact that agriculture first arose in the Fertile Crescent, the way the evidence for natural selection is caused by the natural selection, there would be no reason to believe that these factors were on average positively probabilistically dependent of each other. Under these conditions, one area having all the advantages and the next best lacking three of them is a highly surprising distribution of advantages. On the other hand, this is precisely the pattern that we would expect given the hypothesis that Diamond suffers from confirmation bias or another related bias. His theory is "too good to be true" and which lends support to the hypothesis that he is biased.

In this particular case, some of the factors Diamond lists presumably are positively dependent on each other. Now suppose that someone argues that all of the factors are in fact strongly positively dependent on each other, so that it is not very surprising that they all co-occur. This only pushes the problem back, however, because now we want an explanation of a) what the common cause of all of these dependencies is (it being very improbable that they all would correlate in the absence of such a common cause) and b) how it could be that this common cause increases the probability of the hypothesis via eight independent mechanisms, and doesn't decrease it via any mechanism. (This argument is complicated and I'd be happy on any input concerning it.)

Single-factor historical explanations are often criticized as being too "simplistic" whereas multiple factor explanations are standardly seen as more nuanced. Many such explanations are, however, one-sided in the way Diamond's explanation is, which indicates bias and dogmatism rather than nuance. (Another salient example I'm presently studying is taken from Steven Pinker's The Better Angels of Our Nature. I can provide you with the details on demand.*) We should be much better at detecting this kind of bias, since it for the most part goes unnoticed at present.

Generally, the sort of "too good to be true"-arguments to infer bias discussed here are strongly under-utilized. As our knowledge of the systematic and predictable ways our thought goes wrong increase, it becomes easier to infer bias from the structure or pattern of people's arguments, statements and beliefs. What we need is to explicate clearly, preferably using probability theory or other formal methods, what factors are relevant for deciding whether some pattern of arguments, statements or beliefs most likely is the result of biased thought-processes. I'm presently doing research on this and would be happy to discuss these questions in detail, either publicly or via pm.

*Edit: Pinker's argument. Pinker's goal is to explain why violence has declined throughout history. He lists the following five factors in the last chapter:

  • The Leviathan (the increasing influence of the government)
  • Gentle commerce (more trade leads to less violence)
  • Feminization
  • The expanding (moral) circle
  • The escalator of reason
He also lists some "important but inconsistent" factors:
  • Weaponry and disarmanent (he claims that there are no strong correlations between weapon developments and numbers of deaths)
  • Resource and power (he claims that there is little connection between resource distributions and wars)
  • Affluence (tight correlations between affluence and non-violence are hard to find)
  • (Fall of) religion (he claims that atheist countries and people aren't systematically less violen
This case is interestingly different from Diamond's. Firstly, it is not entirely clear to what extent these five mechanisms are actually different. It could be argued that "the escalator of reason" is a common cause of the other one's: that this causes us to have better self-control, which brings out the better angels of our nature, which essentially is feminization and the expanding circle, and which leads to better control over the social environment (the Leviathan) which in turn leads to more trade.

Secondly, the expression "inconsistent" suggests that the four latter factors are comprised by different sub-mechanisms that play in different directions. That is most clearly seen regarding weaponry and disarmament. Clearly, more efficient weapons leads to more deaths when they are being used. That is an important reason why World War II was so comparatively bloody. But it also leads to a lower chance of the weapons actually being used. The terrifying power of nuclear weapons is an important reason why they've only been used twice in wars. Hence we here have two different mechanisms playing in different directions.

I do think that "the escalator of reason" is a fundamental cause behind the other mechanisms. But it also presumably has some effects which increases the level of violence. For one thing, more rational people are more effective at what they do, which means they can kill more people if they want to. (It is just that normally, they don't want to do it as often as irrational people.) (We thus have the same structure that we had regarding weaponry.)

Also, in traditional societies, pro-social behaviour is often underwritten by mythologies which have no basis in fact. When these mythologies were dissolved by reason, many feared that chaous would ensue ("when God is dead, everything is permitted"). This did not happen. But it is hard to deny that such mythologies can lead to less violence, and that therefore their dissolution through reason can lead to more violence.

We shouldn't get too caught up in the details of this particular case, however. What is important is, again, that there is something suspicious with only listing mechanisms that play in the one direction. In this case, it is not even hard to find important mechanisms that play in the other direction. In my view, putting them in the other scale, as it were, leads to a better understanding of how the history of violence has unfolded. That said, I find DavidAgain's counterarguments below interesting.


Politics is hard mode

22 RobbBB 21 July 2014 10:14PM

Summary: I don't think 'politics is the mind-killer' works well rthetorically. I suggest 'politics is hard mode' instead.


Some people in and catawampus to the LessWrong community have objected to "politics is the mind-killer" as a framing (/ slogan / taunt). Miri Mogilevsky explained on Facebook:

My usual first objection is that it seems odd to single politics out as a “mind-killer” when there’s plenty of evidence that tribalism happens everywhere. Recently, there has been a whole kerfuffle within the field of psychology about replication of studies. Of course, some key studies have failed to replicate, leading to accusations of “bullying” and “witch-hunts” and what have you. Some of the people involved have since walked their language back, but it was still a rather concerning demonstration of mind-killing in action. People took “sides,” people became upset at people based on their “sides” rather than their actual opinions or behavior, and so on.

Unless this article refers specifically to electoral politics and Democrats and Republicans and things (not clear from the wording), “politics” is such a frightfully broad category of human experience that writing it off entirely as a mind-killer that cannot be discussed or else all rationality flies out the window effectively prohibits a large number of important issues from being discussed, by the very people who can, in theory, be counted upon to discuss them better than most. Is it “politics” for me to talk about my experience as a woman in gatherings that are predominantly composed of men? Many would say it is. But I’m sure that these groups of men stand to gain from hearing about my experiences, since some of them are concerned that so few women attend their events.

In this article, Eliezer notes, “Politics is an important domain to which we should individually apply our rationality — but it’s a terrible domain in which to learn rationality, or discuss rationality, unless all the discussants are already rational.” But that means that we all have to individually, privately apply rationality to politics without consulting anyone who can help us do this well. After all, there is no such thing as a discussant who is “rational”; there is a reason the website is called “Less Wrong” rather than “Not At All Wrong” or “Always 100% Right.” Assuming that we are all trying to be more rational, there is nobody better to discuss politics with than each other.

The rest of my objection to this meme has little to do with this article, which I think raises lots of great points, and more to do with the response that I’ve seen to it — an eye-rolling, condescending dismissal of politics itself and of anyone who cares about it. Of course, I’m totally fine if a given person isn’t interested in politics and doesn’t want to discuss it, but then they should say, “I’m not interested in this and would rather not discuss it,” or “I don’t think I can be rational in this discussion so I’d rather avoid it,” rather than sneeringly reminding me “You know, politics is the mind-killer,” as though I am an errant child. I’m well-aware of the dangers of politics to good thinking. I am also aware of the benefits of good thinking to politics. So I’ve decided to accept the risk and to try to apply good thinking there. [...]

I’m sure there are also people who disagree with the article itself, but I don’t think I know those people personally. And to add a political dimension (heh), it’s relevant that most non-LW people (like me) initially encounter “politics is the mind-killer” being thrown out in comment threads, not through reading the original article. My opinion of the concept improved a lot once I read the article.

In the same thread, Andrew Mahone added, “Using it in that sneering way, Miri, seems just like a faux-rationalist version of ‘Oh, I don’t bother with politics.’ It’s just another way of looking down on any concerns larger than oneself as somehow dirty, only now, you know, rationalist dirty.” To which Miri replied: “Yeah, and what’s weird is that that really doesn’t seem to be Eliezer’s intent, judging by the eponymous article.”

Eliezer replied briefly, to clarify that he wasn't generally thinking of problems that can be directly addressed in local groups (but happen to be politically charged) as "politics":

Hanson’s “Tug the Rope Sideways” principle, combined with the fact that large communities are hard to personally influence, explains a lot in practice about what I find suspicious about someone who claims that conventional national politics are the top priority to discuss. Obviously local community matters are exempt from that critique! I think if I’d substituted ‘national politics as seen on TV’ in a lot of the cases where I said ‘politics’ it would have more precisely conveyed what I was trying to say.

But that doesn't resolve the issue. Even if local politics is more instrumentally tractable, the worry about polarization and factionalization can still apply, and may still make it a poor epistemic training ground.

A subtler problem with banning “political” discussions on a blog or at a meet-up is that it’s hard to do fairly, because our snap judgments about what counts as “political” may themselves be affected by partisan divides. In many cases the status quo is thought of as apolitical, even though objections to the status quo are ‘political.’ (Shades of Pretending to be Wise.)

Because politics gets personal fast, it’s hard to talk about it successfully. But if you’re trying to build a community, build friendships, or build a movement, you can’t outlaw everything ‘personal.’

And selectively outlawing personal stuff gets even messier. Last year, daenerys shared anonymized stories from women, including several that discussed past experiences where the writer had been attacked or made to feel unsafe. If those discussions are made off-limits because they relate to gender and are therefore ‘political,’ some folks may take away the message that they aren’t allowed to talk about, e.g., some harmful or alienating norm they see at meet-ups. I haven’t seen enough discussions of this failure mode to feel super confident people know how to avoid it.

Since this is one of the LessWrong memes that’s most likely to pop up in cross-subcultural dialogues (along with the even more ripe-for-misinterpretation “policy debates should not appear one-sided“…), as a first (very small) step, my action proposal is to obsolete the ‘mind-killer’ framing. A better phrase for getting the same work done would be ‘politics is hard mode’:

1. ‘Politics is hard mode’ emphasizes that ‘mind-killing’ (= epistemic difficulty) is quantitative, not qualitative. Some things might instead fall under Middlingly Hard Mode, or under Nightmare Mode…

2. ‘Hard’ invites the question ‘hard for whom?’, more so than ‘mind-killer’ does. We’re used to the fact that some people and some contexts change what’s ‘hard’, so it’s a little less likely we’ll universally generalize.

3. ‘Mindkill’ connotes contamination, sickness, failure, weakness. In contrast, ‘Hard Mode’ doesn’t imply that a thing is low-status or unworthy. As a result, it’s less likely to create the impression (or reality) that LessWrongers or Effective Altruists dismiss out-of-hand the idea of hypothetical-political-intervention-that-isn’t-a-terrible-idea. Maybe some people do want to argue for the thesis that politics is always useless or icky, but if so it should be done in those terms, explicitly — not snuck in as a connotation.

4. ‘Hard Mode’ can’t readily be perceived as a personal attack. If you accuse someone of being ‘mindkilled’, with no context provided, that smacks of insult — you appear to be calling them stupid, irrational, deluded, or the like. If you tell someone they’re playing on ‘Hard Mode,’ that’s very nearly a compliment, which makes your advice that they change behaviors a lot likelier to go over well.

5. ‘Hard Mode’ doesn’t risk bringing to mind (e.g., gendered) stereotypes about communities of political activists being dumb, irrational, or overemotional.

6. ‘Hard Mode’ encourages a growth mindset. Maybe some topics are too hard to ever be discussed. Even so, ranking topics by difficulty encourages an approach where you try to do better, rather than merely withdrawing. It may be wise to eschew politics, but we should not fear it. (Fear is the mind-killer.)

7. Edit: One of the larger engines of conflict is that people are so much worse at noticing their own faults and biases than noticing others'. People will be relatively quick to dismiss others as 'mindkilled,' while frequently flinching away from or just-not-thinking 'maybe I'm a bit mindkilled about this.' Framing the problem as a challenge rather than as a failing might make it easier to be reflective and even-handed.

This is not an attempt to get more people to talk about politics. I think this is a better framing whether or not you trust others (or yourself) to have productive political conversations.

When I playtested this post, Ciphergoth raised the worry that 'hard mode' isn't scary-sounding enough. As dire warnings go, it's light-hearted—exciting, even. To which I say: good. Counter-intuitive fears should usually be argued into people (e.g., via Eliezer's politics sequence), not connotation-ninja'd or chanted at them. The cognitive content is more clearly conveyed by 'hard mode,' and if some group (people who love politics) stands to gain the most from internalizing this message, the message shouldn't cast that very group (people who love politics) in an obviously unflattering light. LW seems fairly memetically stable, so the main issue is what would make this meme infect friends and acquaintances who haven't read the sequences. (Or Dune.)

If you just want a scary personal mantra to remind yourself of the risks, I propose 'politics is SPIDERS'. Though 'politics is the mind-killer' is fine there too.

If you and your co-conversationalists haven’t yet built up a lot of trust and rapport, or if tempers are already flaring, conveying the message ‘I’m too rational to discuss politics’ or ‘You’re too irrational to discuss politics’ can make things worse. In that context, ‘politics is the mind-killer’ is the mind-killer. At least, it’s a needlessly mind-killing way of warning people about epistemic hazards.

‘Hard Mode’ lets you speak as the Humble Aspirant rather than the Aloof Superior. Strive to convey: ‘I’m worried I’m too low-level to participate in this discussion; could you have it somewhere else?’ Or: ‘Could we talk about something closer to Easy Mode, so we can level up together?’ More generally: If you’re worried that what you talk about will impact group epistemology, you should be even more worried about how you talk about it.

A Parable of Elites and Takeoffs

22 gwern 30 June 2014 11:04PM

Let me tell you a parable of the future. Let’s say, 70 years from now, in a large Western country we’ll call Nacirema.

One day far from now: scientific development has continued apace, and a large government project (with, unsurprisingly, a lot of military funding) has taken the scattered pieces of cutting-edge research and put them together into a single awesome technology, which could revolutionize (or at least, vastly improve) all sectors of the economy. Leading thinkers had long forecast that this area of science’s mysteries would eventually yield to progress, despite theoretical confusion and perhaps-disappointing initial results and the scorn of more conservative types and the incomprehension (or outright disgust, for ‘playing god’) of the general population, and at last - it had! The future was bright.

Unfortunately, it was hurriedly decided to use an early prototype outside the lab in an impoverished foreign country. Whether out of arrogance, bureaucratic inertia, overconfidence on the part of the involved researchers, condescending racism, the need to justify the billions of grant-dollars that cumulative went into the project over the years by showing some use of it - whatever, the reasons no longer mattered after the final order was signed. The technology was used, but the consequences turned out to be horrific: over a brief period of what seemed like mere days, entire cities collapsed and scores - hundreds - of thousands of people died. (Modern economies are extremely interdependent and fragile, and small disruptions can have large consequences; more people died in the chaos of the evacuation of the areas around Fukushima than will die of the radiation.)

continue reading »

Announcing the 2014 program equilibrium iterated PD tournament

21 tetronian2 31 July 2014 12:24PM

Last year, AlexMennen ran a prisoner's dilemma tournament with bots that could see each other's source code, which was dubbed a "program equilibrium" tournament. This year, I will be running a similar tournament. Here's how it's going to work: Anyone can submit a bot that plays the iterated PD against other bots. Bots can not only remember previous rounds, as in the standard iterated PD, but also run perfect simulations of their opponent before making a move. Please see the github repo for the full list of rules and a brief tutorial.

There are a few key differences this year:

1) The tournament is in Haskell rather than Scheme.

2) The time limit for each round is shorter (5 seconds rather than 10) but the penalty for not outputting Cooperate or Defect within the time limit has been reduced.

3) Bots cannot directly see each other's source code, but they can run their opponent, specifying the initial conditions of the simulation, and then observe the output.

All submissions should be emailed to or PM'd to me here on LessWrong by September 15th, 2014. LW users with 50+ karma who want to participate but do not know Haskell can PM me with an algorithm/psuedocode, and I will translate it into a bot for them. (If there is a flood of such requests, I would appreciate some volunteers to help me out.)

Jokes Thread

21 JosephY 24 July 2014 12:31AM

This is a thread for rationality-related or LW-related jokes and humor. Please post jokes (new or old) in the comments.


Q: Why are Chromebooks good Bayesians?

A: Because they frequently update!


A super-intelligent AI walks out of a box...


Q: Why did the psychopathic utilitarian push a fat man in front of a trolley?

A: Just for fun.

A Story of Kings and Spies

21 Joshua_Blaine 11 June 2014 11:54PM

There exists an old Kingdom with a peculiar, but no altogether uncommon, trait. It is overwhelmingly defensible given adequate forewarning. Its fields are surrounded by rivers on 3 sides and an impassable mountain to the South. The series of bridges commonly used by merchants and farmers to pass over the river can be completely removed by an impressive feat of engineering, unrivaled by any other kingdom, involving elaborate systems of levers and pulleys and large crews of men. This retracting, given the co-operation of all able men, can be done in the time of a single day across the entire length of river. The water is also deep, chilled, and very fast moving all throughout, making crossing without the bridges all but impossible. Fortifications on the inner banks of the river exist for archers and catapults to lease barrages against any foe that dare approach their land. It is this challenge that the enemies of the kingdom try to find a way to overcome.

It is acknowledged by both the King and his enemies that a surprise attack, one with so little warning that the bridges remain in place, would be successful against what is otherwise a poorly defensible region. Even a force of only moderate size could slaughter anyone within the rivers with ease. With this in mind, the King and his cabinet have a large espionage network that's infiltrated every major kingdom's decision making process. Their spies should, and have many times in the past, notified the King long before any attack, and allow for defenses to be raised, and victory to be assured. The King is very happy with his spies. They've never once failed to bring advance notice of any attack, and his network of informants have proven themselves resilient against counter-infiltration. He is, however, a very paranoid king, and wishes there was some way to be even more certain of his kingdom's safety. He is, as he sits upon his throne, ruminating on some such plans when a man of small stature is brought before him by some guards. The little man is wearing mostly simple clothes, but with some vibrant accents in the trimming.

"Why have you brought this fellow before me?" Asked the King of his guards.

"He claims to have word of an attack on the kingdom, sire." A guard said.

"He seems believable enough, sire, that we thought it best to bring him before you instead of merely dismissing him. You have had more training in detecting the truth of matters." The second guard said.

"Very well." He said, gesturing for the guards to relax. "Sir, may I have your name?" The King spoke directly to his small guest.

"Orin Eldirh, my king." He barely manages to say as he stammers on, "I've been told by a f-friend... a very close friend in-indeed... a t-t-trustworthy sort of fellow, you know... the kind who'd n-never lie, you see... And he says, and he's the employee of a very well off member of the Northern Kingdom's leadership, s-so I trust this information is accurate... He says th-that his boss was part of a meeting to plan a surprise attack on our kingdom. And very soon, I might add. He said the meeting was a pre-planned sort of thing, was going to be on a random day, so our spies wouldn't have time to figure things out, and that they'd have an army ready in less than a day! A day, sire! They're surely marching here now, as I speak."

The king quietly held the man's gaze for several moments before speaking. "And, holding what you've said is what you've heard, why would a noble betray his kingdom by speaking of such a secret meeting?"

"Is that important, sire? We have such little time to prepare for the invasion." Orin says. "Surely what I've said is enough to warrant removing the bridges, whatever his motivations." He paused uncertainly as he looked upon his king. "Isn't it?"

The King heaved a sigh before responding, "No. It really isn't." The King began to elaborate, "You see, removing those bridges cost more than you may realize. It takes every able man in the kingdom to work as fast as you claim we need to. That's an entire day's worth of labor used up. With the bridges up, that's maybe a weeks worth of trade and messages that wont be coming or going, seeing as the men wont work themselves so hard for two days in a row to put things back. What you personally lose may well be small, but it will make our kingdom and its stores suffer."

"But what are those costs to the lives of those people, those women, those children, lost to an attack?" Orin admonished.

"There is more at work here than you think, Orin." The King firmly answered, "You do not know how much thought I have put into the defense of my people." Orin's outrage slowly began diminishing as he took a sheepish stature. "Imagine, if you will, that I heed the word of every beggar and peasant who claimed some terrible force was underway. It's a much more common experience than you seem to think. Not a week goes by without someone offering their wisdom of an attack that my spies have somehow missed hearing of. The people of the kingdom would spend more time cowering in fear of an impending attack than doing anything else if I listened to every such piece of obvious paranoia or subterfuge. My people would tire of removing the bridges. Traders would tire of so frequent delays in their travels. It would spell our eventual doom, I'm sure of it." The King took a deep breathe and frowned, calmly continuing, "And yet, how could I ever forgive myself if I left us undefended from a legitimate attack? My spies are not perfect. Such a random meeting as you described may elude them, if we were unlucky and it was well implemented." The King pinched the bridge of his nose and closed his eyes before continuing, "I have to determine, to the best of my abilities, whether or not this threat is legitimate. So I'll ask you again, as I must know, why might this noble betray his kingdom?"

Orin swallowed and said, "If what my friend says is to be believed, and I consider it so, then this noble is not motivated by loyalty for his kingdom. I was told that he was not born into his position, but bought it himself. He has quite a fortune from his ownership of many kinds of businesses and guilds. War hurts him more than it helps the businesses of his kingdom, I've been told he believes. He'd wish to avoid starting any kind of fight, I'd think, if this were true."

"I know of a man of the Northern Kingdom who fits that description. It's possible, not likely, but possible, he's heard things our spies have not." The King said, "And that he might also decide to warn us if he heard such a thing. But there's still the matter of *your* trustworthiness. How should I know that you are not a lying spy, sent form the North to deceive?"

Orin's eyes grew wide with fear as he attempted to speak, "ple-please, s-s-sire, I w-would n-n-never be-betray my kingdom!"

"So a spy would say." Orin opened his mouth to protest but the King interrupted "No, nothing you can say will persuade me you aren't just a well trained spy." The King smiled, "But I have been giving thought to how I may judge your information's usefulness. You are an artisan of some skill, right? You're better dressed than a peasant can afford."

Orin spoke "A potter, sir." After a pause he then bashfully admits, "Of kinds both functional and beautiful, as I've been told by my more affluent clients."

The King smiled wider, "Then you are well off, yes? How much would you consider your current wares and savings worth?"

Nervous about the King's sudden eagerness, Orin hesitantly replies, "700 coins, but p-perhaps even 800 c-coins... If I sold my s-shop and everything w-within."

"Very well. I propose a wager. 20 to 1 against this invasion being real." The king laughed as he saw Orin's shocked face. "What, surprised your King is a betting man? If you'd like to convince me you're not lying, then put your money where your mouth is, I say. If you'd also like to convince me you're right about this, you'll have to bet big. If you're willing to put up 1000 coins I'm willing to call for the removal of the bridges." Orin just stood there silently, jaw agape. The King continued speaking, "That's 200 coins of debt if you're wrong about this, a lifetime of payments for someone of your skills. If you're right, however, you will be rewarded handsomely. 20,000 coins is enough to keep you from working the rest of your life, if you'd like. I think that's fair compensation for saving the kingdom."

The King just smiled as he waited for Orin to speak.

Orin remained silent as he fervently thought about his options. He swallowed several times and wrung his hands together. After several minutes of silence he took a breath and spoke, "I'll take it." The King's smile grew as Orin spoke, "I'll take the bet. After considering his trustworthiness, it seems like my friend is right. I am willing to risk myself for his word. I am not willing to risk my kingdom."

"Very well." The King said before looking towards the two men standing to Orin's side, "Guards, one of you notify the city that an attack is impending. We have 2 days at most before the Northern Kingdom is here." The left one nods at once and left the chamber. "Orin, I hope you understand why you should stay here for the night. We can't have you running off." Orin nodded stiffly in understanding. Looking at the remaining guard, the King said, "Orin here is your responsibility. Keep him occupied and within the castle until you have my word to release him. You may send out someone to notify his family of the circumstances surrounding his stay. They are welcome to come visit as soon as their duties for preparation are complete." After a short thought he said, "Orin is a guest here, not a prisoner, so treat him as such."

Standing up from his throne, the King walked over to where Orin was standing, petrified by what was happening around him. The King, towering over the small man, said, "If you are right about this, I am incredibly grateful that you came to me." The King reached out and grabbed Orin's shoulder, looking into his eyes with his own, and smiled wide. He then released him and returned to his seat. "You may go, I'm soon to be swamped by my bureaucracy for the coming hours as we prepare for this fight." Orin and his escort made their way from the room. As the door closed behind them another one opened as several official looking men rushed in, chatting loudly. The King straightened his stature and forced a smile as he prepared himself for dealing with his government for the next several days.

Funding cannibalism motivates concern for overheads

20 Thrasymachus 30 August 2014 12:42AM

Summary: Overhead expenses' (CEO salary, percentage spent on fundraising) are often deemed a poor measure of charity effectiveness by Effective Altruists, and so they disprefer means of charity evaluation which rely on these. However, 'funding cannibalism' suggests that these metrics (and the norms that engender them) have value: if fundraising is broadly a zero-sum game between charities, then there's a commons problem where all charities could spend less money on fundraising and all do more good, but each is locally incentivized to spend more. Donor norms against increasing spending on zero-sum 'overheads' might be a good way of combating this. This valuable collective action of donors may explain the apparent underutilization of fundraising by charities, and perhaps should make us cautious in undermining it.

The EA critique of charity evaluation

Pre-Givewell, the common means of evaluating charities (GuidestarCharity Navigator) used a mixture of governance checklists 'overhead indicators'. Charities would gain points both for having features associated with good governance (being transparent in the right ways, balancing budgets, the right sorts of corporate structure), but also in spending its money on programs and avoiding 'overhead expenses' like administration and (especially) fundraising. For shorthand, call this 'common sense' evaluation.

The standard EA critique is that common sense evaluation doesn't capture what is really important: outcomes. It is easy to imagine charities that look really good to common sense evaluation yet have negligible (or negative) outcomes.  In the case of overheads, it becomes unclear whether these are even proxy measures of efficacy. Any fundraising that still 'turns a profit' looks like a good deal, whether it comprises five percent of a charity's spending or fifty.

A summary of the EA critique of common sense evaluation that its myopic focus on these metrics gives pathological incentives, as these metrics frequently lie anti-parallel to maximizing efficacy. To score well on these evaluations, charities may be encouraged to raise less money, hire less able staff, and cut corners in their own management, even if doing these things would be false economies.


Funding cannibalism and commons tragedies

In the wake of the ALS 'Ice bucket challenge', Will MacAskill suggested there is considerable of 'funding cannabilism' in the non-profit sector. Instead of the Ice bucket challenge 'raising' money for ALS, it has taken money that would have been donated to other causes instead - cannibalizing other causes. Rather than each charity raising funds independently of one another, they compete for a fairly fixed pie of aggregate charitable giving.

The 'cannabilism' thesis is controversial, but looks plausible to me, especially when looking at 'macro' indicators: proportion of household charitable spending looks pretty fixed whilst fundraising has increased dramatically, for example.

If true, cannibalism is important. As MacAskill points out, the money tens of millions of dollars raised for ALS is no longer an untrammelled good, alloyed as it is with the opportunity cost of whatever other causes it has cannibalized (q.v.). There's also a more general consideration: if there is a fixed pot of charitable giving insensitive to aggregate fundraising, then fundraising becomes a commons problem. If all charities could spend less on their fundraising, none would lose out, so all could spend more of their funds on their programs. However, for any alone to spend less on fundraising allows the others to cannibalize it.


Civilizing Charitable Cannibals, and Metric Meta-Myopia

Coordination among charities to avoid this commons tragedy is far fetched. Yet coordination of  donors on shared norms about 'overhead ratio' can help. By penalizing a charity for spending too much on zero-sum games with other charities like fundraising, donors can stop a race to the bottom fundraising free for all and burning of the charitable commons that implies. The apparently-high marginal return to fundraising might suggest this is already in effect (and effective!)

The contrarian take would be that it is the EA critique of charity evaluation which is myopic, not the charity evaluation itself - by looking at the apparent benefit for a single charity of more overhead, the EA critique ignores the broader picture of the non-profit ecosystem, and their attack undermines a key environmental protection of an important commons - further, one which the right tail of most effective charities benefit from just as much as the crowd of 'great unwashed' other causes. (Fundraising ability and efficacy look like they should be pretty orthogonal. Besides, if they correlate well enough that you'd expect the most efficacious charities would win the zero-sum fundraising game, couldn't you dispense with Givewell and give to the best fundraisers?)

The contrarian view probably goes too far. Although there's a case for communally caring about fundraising overheads, as cannibalism leads us to guess it is zero sum, parallel reasoning is hard to apply to administration overhead: charity X doesn't lose out if charity Y spends more on management, but charity Y is still penalized by common sense evaluation even if its overall efficacy increases. I'd guess that features like executive pay lie somewhere in the middle: non-profit executives could be poached by for-profit industries, so it is not as simple as donors prodding charities to coordinate to lower executive pay; but donors can prod charities not to throw away whatever 'non-profit premium' they do have in competing with one another for top talent (c.f.). If so, we should castigate people less for caring about overhead, even if we still want to encourage them to care about efficacy too.

The invisible hand of charitable pan-handling

If true, it is unclear whether the story that should be told is 'common sense was right all along and the EA movement overconfidently criticised' or 'A stopped clock is right twice a day, and the generally wrong-headed common sense had an unintended feature amongst the bugs'. I'd lean towards the latter, simply the advocates of the common sense approach have not (to my knowledge) articulated these considerations themselves.

However, many of us believe the implicit machinery of the market can turn without many of the actors within it having any explicit understanding of it. Perhaps the same applies here. If so, we should be less confident in claiming the status quo is pathological and we can do better: there may be a rationale eluding both us and its defenders.

Questioning and Respect

20 jkaufman 10 June 2014 10:52AM
A: [Surprising fact]
B: [Question]

When someone has a claim questioned, there are two common responses. One is to treat the question as a challenge, intended as an insult or indicating a lack of trust. If you have this model of interaction you think people should take your word for things, and feel hurt when they don't. Another response is to treat the question as a signal of respect: they take what you're saying seriously and are trying to integrate it into their understanding of the world. If you have this model of interaction then it's the people who smile, nod, and give no indication of their disagreement that are being disrespectful.

Within either of these groups you can just follow the social norm, but it's harder across groups. Recently I was talking to a friend who claimed that in their state income taxes per dollar went down as you earned more. This struck me as really surprising and kind of unlikely: usually it goes the other way around. [1] I'm very much in the latter group described above, while I was pretty sure my friend was in the former. Even though I suspected they would treat it as disrespectful if I asked for details and tried to confirm their claim, it would have felt much more disrespectful for me to just pretend to accept it and move on. What do you do in situations like this?

(Especially given that I think the "disagreement as respect" version builds healthier communities...)

[1] Our tax system does have regressive components, where poor people sometimes pay a higher percentage of their income as tax than richer people, but it's things like high taxes on cigarettes (which rich people don't consume as much), sales taxes (rich people spend less of their income), and a lower capital gains tax rate (poorer people earn way less in capital gains). I tried to clarify to see if this is what my friend meant, but they were clear that they were talking about "report your income to the state, get charged a higher percentage as tax if your income is lower".

I also posted this on my blog.

Moloch: optimisation, "and" vs "or", information, and sacrificial ems

19 Stuart_Armstrong 06 August 2014 03:57PM

Go read Yvain/Scott's Meditations On Moloch. It's one of the most beautiful, disturbing, poetical look at the future that I've ever seen.

Go read it.

Don't worry, I can wait. I'm only a piece of text, my patience is infinite.

De-dum, de-dum.

You sure you've read it?

Ok, I believe you...


I hope you wouldn't deceive an innocent and trusting blog post? You wouldn't be a monster enough to abuse the trust of a being as defenceless as a constant string of ASCII symbols?

Of course not. So you'd have read that post before proceeding to the next paragraph, wouldn't you? Of course you would.


Academic Moloch

Ok, now to the point. The "Moloch" idea is very interesting, and, at the FHI, we may try to do some research in this area (naming it something more respectable/boring, of course, something like "how to avoid stable value-losing civilization attractors").

The project hasn't started yet, but a few caveats to the Moloch idea have already occurred to me. First of all, it's not obligatory for an optimisation process to trample everything we value into the mud. This is likely to happen with an AI's motivation, but it's not obligatory for an optimisation process.

One way of seeing this is the difference between "or" and "and". Take the democratic election optimisation process. It's clear, as Scott argues, that this optimises badly in some ways. It encourages appearance over substance, some types of corruption, etc... But it also optimises along some positive axes, with some clear, relatively stable differences between the parties which reflects some voters preferences, and punishment for particularly inept behaviour from leaders (I might argue that the main benefit of democracy is not the final vote between the available options, but the filtering out of many pernicious options because they'd never be politically viable). The question is whether these two strands of optimisation can be traded off against each other, or if a minimum of each is required. So can we make a campaign that is purely appearance based with any substantive position ("or": maximum on one axis is enough), or do you need a minimum of substance and a minimum of appearance to buy off different constituencies ("and": you need some achievements on all axes)? And no, I'm not interested in discussing current political examples.

Another example Scott gave was of the capitalist optimisation process, and how it in theory matches customers' and producers' interests, but could go very wrong:

Suppose the coffee plantations discover a toxic pesticide that will increase their yield but make their customers sick. But their customers don't know about the pesticide, and the government hasn't caught up to regulating it yet. Now there's a tiny uncoupling between "selling to [customers]" and "satisfying [customers'] values", and so of course [customers'] values get thrown under the bus.

This effect can be combated to some extent with extra information. If the customers (or journalists, bloggers, etc...) know about this, then the coffee plantations will suffer. "Our food is harming us!" isn't exactly a hard story to publicise. This certainly doesn't work in every case, but increased information is something that technological progress would bring, and this needs to be considered when asking whether optimisation processes will inevitably tend to a bad equilibrium as technology improves. An accurate theory of nutrition, for instance, would have great positive impact if its recommendations could be measured.

Finally, Zack Davis's poem about the em stripped of (almost all) humanity got me thinking. The end result of that process is tragic for two reasons: first, the em retains enough humanity to have curiosity, only to get killed for this. And secondly, that em once was human. If the em was entirely stripped of human desires, the situation would be less tragic. And if the em was further constructed in a process that didn't destroy any humans, this would be even more desirable. Ultimately, if the economy could be powered by entities developed non-destructively from humans, and which were clearly not conscious or suffering themselves, this would be no different that powering the economy with the non-conscious machines we use today. This might happen if certain pieces of a human-em could be extracted, copied and networked into an effective, non-conscious entity. In that scenario, humans and human-ems could be the capital owners, and the non-conscious modified ems could be the workers. The connection of this with the Moloch argument is that it shows that certain nightmare scenarios could in some circumstances be adjusted to much better outcomes, with a small amount of coordination.


The point of the post

The reason I posted this is to get people's suggestions about ideas relevant to a "Moloch" research project, and what they thought of the ideas I'd had so far.

An Experiment In Social Status: Software Engineer vs. Data Science Manager

19 JQuinton 15 July 2014 08:24PM

Here is an interesting blog post about a guy who did a resume experiment between two positions which he argues are by experience identical, but occupy different "social status" positions in tech: A software engineer and a data manager.

Interview A: as Software Engineer

Bill faced five hour-long technical interviews. Three went well. One was so-so, because it focused on implementation details of the JVM, and Bill’s experience was almost entirely in C++, with a bit of hobbyist OCaml. The last interview sounds pretty hellish. It was with the VP of Data Science, Bill’s prospective boss, who showed up 20 minutes late and presented him with one of those interview questions where there’s “one right answer” that took months, if not years, of in-house trial and error to discover. It was one of those “I’m going to prove that I’m smarter than you” interviews...

Let’s recap this. Bill passed three of his five interviews with flying colors. One of the interviewers, a few months later, tried to recruit Bill to his own startup. The fourth interview was so-so, because he wasn’t a Java expert, but came out neutral. The fifth, he failed because he didn’t know the in-house Golden Algorithm that took years of work to discover. When I asked that VP/Data Science directly why he didn’t hire Bill (and he did not know that I knew Bill, nor about this experiment) the response I got was “We need people who can hit the ground running.” Apparently, there’s only a “talent shortage” when startup people are trying to scam the government into changing immigration policy. The undertone of this is that “we don’t invest in people”.

Or, for a point that I’ll come back to, software engineers lack the social status necessary to make others invest in them.

Interview B: as Data Science manager.

A couple weeks later, Bill interviewed at a roughly equivalent company for the VP-level position, reporting directly to the CTO.

Worth noting is that we did nothing to make Bill more technically impressive than for Company A. If anything, we made his technical story more honest, by modestly inflating his social status while telling a “straight shooter” story for his technical experience. We didn’t have to cover up periods of low technical activity; that he was a manager, alone, sufficed to explain those away.

Bill faced four interviews, and while the questions were behavioral and would be “hard” for many technical people, he found them rather easy to answer with composure. I gave him the Golden Answer, which is to revert to “There’s always a trade-off between wanting to do the work yourself, and knowing when to delegate.” It presents one as having managerial social status (the ability to delegate) but also a diligent interest in, and respect for, the work. It can be adapted to pretty much any “behavioral” interview question...

Bill passed. Unlike for a typical engineering position, there were no reference checks. The CEO said, “We know you’re a good guy, and we want to move fast on you”. As opposed tot he 7-day exploding offers typically served to engineers, Bill had 2 months in which to make his decision. He got a fourth week of vacation without even having to ask for it, and genuine equity (about 75% of a year’s salary vesting each year)...

It was really interesting, as I listened in, to see how different things are once you’re “in the club”. The CEO talked to Bill as an equal, not as a paternalistic, bullshitting, “this is good for your career” authority figure. There was a tone of equality that a software engineer would never get from the CEO of a 100-person tech company.

The author concludes that positions that are labeled as code-monkey-like are low status, while positions that are labeled as managerial are high status. Even if they are "essentially" doing the same sort of work.

Not sure about this methodology, but it's food for thought.

Robin Hanson's "Overcoming Bias" posts as an e-book.

18 ciphergoth 31 August 2014 01:26PM

At Luke Muehlhauser's request, I wrote a script to scrape all of Robin Hanson's posts to Overcoming Bias into an e-book; here's a first beta release. Please comment here with any problems—posts in the wrong order, broken links, bad formatting, missing posts. Thanks!



Fifty Shades of Self-Fulfilling Prophecy

18 PhilGoetz 24 July 2014 12:17AM

The official story: "Fifty Shades of Grey" was a Twilight fan-fiction that had over two million downloads online. The publishing giant Vintage Press saw that number and realized there was a huge, previously-unrealized demand for stories like this. They filed off the Twilight serial numbers, put it in print, marketed it like hell, and now it's sold 60 million copies.

The reality is quite different.

continue reading »

View more: Next