...which was first discussed by name in The Importance of Goodhart's Law.
The issue has also come up in various other posts listed here in order of their current karma:
32 - Is Google Paperclipping the Web? The Perils of Optimization by Proxy in Social Systems
29 - The Shabbos goy
3 - Imperfect Levers
There was a forward link in the comments from Eliezer's Resist the Happy Death Spiral to Goodhart's Law as well.
I suspect that many rationalists independently figure this issue out before learning of older (better?) names for it. In a discussion group in college we used the term "optimized signifier" for the core concept while talking about ways that groups could go off course in the pursuit of honestly valuable goals that were rather nebulous in their "authentic" form. Our standard examples of optimized signifiers included weight as a proxy for health, grades as a proxy for learning.
Note – When I first started planning this article I was hoping for more down-to-earth examples, but I struggled to find any.
This is a sign that you may be beating up a straw man (which is fun to write, but not as much fun to read). If your big insight doesn't cash out in direct practical advice or illumination of previously confusing phenomena, be very suspicious.
Furthermore, I think you've chosen poor examples in the "perfect Bayesians do X" category. The reference to Aumann's Agreement Theorem in the Bayesian Judo post was a joke, and the example from the comment wasn't suggesting you naively implement it in real life.
Finally, you should be aware of the prior discussion here on this topic.
These bugs aren't fatal, but they're good examples of why one's first post ought to be published in the Discussion section rather than the top level (and promoted later, if everything checks out).
I actually deleted this post about 3 seconds after I first published it, and only put it back up here after a number of people asked me to. That post also had a number of people point out some more near-mode examples.
Ben gave three common far-mode fallacies that people make; I am not sure I agree with the one on changing our minds, but he makes valid points there, and his other two far-mode examples are pretty spot-on.
I suspect that the majority of fallacies that LWers commit are far-mode, simply because humans are naturally bad at far-mode thinking and LWers should be no different. The difference is that, unlike everyone else, LWers fail to compartmentalize and therefore flawed far-mode thinking has a higher potential to be dangerous.
So, real-world examples of far-mode fallacies: good, and something I would like to see more of; this is distinct from non-real-world example of fallacies (either near-mode or far-mode).
One of the examples I brought up in the depublished post was of motivation. That is, people investigate motivated people - and they can't find any motivated people who don't have methods. So, naturally, they go out to the (book)store and purchase some methods, expecting that wearing and using these methods will make them a motivated person. This seems like a case of the dressing like a winner fallacy that isn't really a strawman, and does clear up some confusion regarding akrasia and the like.
(I did also recommend removing the mathematically rational section; I had concerns that it didn't fit as well as the other two.)
Richard Feynman gives many examples in his famous essay "Cargo Cult Science": http://www.lhup.edu/~DSIMANEK/cargocul.htm .Significant discoveries have small p-values, so clearly, if you do something and run some numbers and a small value pops up, your experiment is significant, right?
David Pearce frequently argues that happy people do better - and so we should engineer in happiness genes. I think that is likely to be mostly Dressing Like a Winner.
That sounds so obviously like it that I'd expect anybody making that argument to have noticed, rational or not. Who is so blind to the dangers of correlation vs causation that they wouldn't suspect that doing well makes people happy? (Among those intelligent enough to have anything worthwhile to say about genetic engineering.)
Part of the problem is that is may be true. I give it about a 50% chance of being true - on the grounds that we are not adapted to our modern environment, and chances are that innate happiness levels are one of the ways. Also, self-improvement gurus tell us to fake happiness and smiling - so we generate self-fulfilling prophesies. It isn't that implausible that more genetic happiness might actually be good for us.
On the other hand, our ancestors were mostly miserable. We are happier than they - so assuming that nature tuned their happiness levels "correctly" - modern man's happiness may need taking down a notch.
On the other hand, our ancestors were mostly miserable.
Our ancestors mostly lived in conditions that you and I would find miserable. What is the evidence that they were mostly miserable? I thought the opposite, but I don't know real facts.
I am not sure I can summarise briefly. That the link between GDP per capita and self-reported happiness is positive and strong is one of the more compelling pieces of evidence for me:
http://www.willwilkinson.net/flybottle/2009/09/09/standing-up-for-gdp/
Another factor is that life was violent and short - which was probably not much fun:
http://www.edge.org/3rd_culture/pinker07/pinker07_index.html
That the link between GDP per capita and self-reported happiness is positive and strong is one of the more compelling pieces of evidence for me
Quick correction: that's satisfaction. One of the odd things about mood research is that satisfaction and happiness are distinct things measured distinctly- essentially, happiness is near mode and satisfaction is far mode. When you ask questions like "how many times did you smile in the last week?" which measures near mode happiness, you generally get no correlation between happiness and income in one country*, and a negative correlation between countries (i.e. Nigerians are happier than Americans). When you ask people how satisfied they are with their lives (sorry, but I don't remember an example off hand), then you get a pretty strong relationship between log(income) and satisfaction, both within countries and between countries.
* Research in America shows happiness increases up until a pretty low income, then a flatline.
The second link, again, is conditions that we would find miserable, but some people like to fight.
But thanks for the first link. This indeed "flies against everything we thought we knew ten years ago", or at least everything that I thought I knew, so apparently my understanding (that happiness has more to do with one's relative status within society than anything absolute) is ten years out of date.
However, most of this (all but one paragraph of the second link) is still comparing within civilised societies, whereas
assuming that nature tuned their happiness levels "correctly"
would be for uncivilised people. So what I'd really like to see is a debunking of the "original affluent society"-like claim that hunter-gatherers have happy lives. (I know that Sahlins's research doesn't really hold up, but that just brings me back to simply not knowing the relative happiness levels.)
Here's another possible example of the DLAW fallacy:
Suppose an engineer wishes to reverse-engineer a philsopher's brain, and create a working software model of it. The engineer will consider it a success if the model can experience qualia.
Having read and agreed with Eliezer's argument about zombies, the engineer decides that if the model can write at least a fairly decent paper about qualia, then that model is more probably capable of experiencing qualia than one that cannot. The engineer implements this as a utility function for a genetic algorithm that generates variations on a simple early implementation (which submits the papers to respected phiilosophical journals to determine their legitimacy), starts it running, then leaves for a well-earned vacation.
6 months later, the engineer returns to find that the GA has broken the firewall on its host computer, and produced an amazingly competent English language construction system that randomly looks up philosophical articles, then constructs a new article that reprocesses their statements into a new article, with references to the original article and a title that always follows the pattern "Analyses of [X}, [Y], and [Z]: A Retrospective".
The fallacy was that a metric of qualia-experiencing (the ability to produce coherent English sentences describing qualia) was confused with the thing being measured.
5 seconds later, the running system achieves recursive self-improvement and tiles the solar system with MLA-compliant bibliographies referencing each other.
That reminds me of something my sister did at her last job. Every time there was trouble she would get an email about it. Clearly a problem appearing and receiving mail was correlated, so her solution was to terminate her mail account. She called it the Shrödinger cat approach to life: as long as you aren't sure there is a problem, there is a chance there isn't.
Imagine you are a sprinter, and your one goal in life is to win the 100m sprint in the Olympics. Naturally, you watch the 100m sprint winners of the past in the hope that you can learn something from them, and it doesn't take you long to spot a pattern.
Every one of them can be seen wearing a gold medal around their neck. Not only is there a strong correlation, you also examine the rules of the olympics and find that 100% of winner must wear a gold medal at some point, there is no way that someone could win and never wear a gold medal. So you go out and buy a gold medal from a shop, put it around your neck, and sit back, satisfied.
For another example, imagine that you are now in charge of running a large oil rig. Unfortunately, some of the drilling equipment is old and rusty, and every few hours a siren goes off alerting the divers that they need to go down again and repair the damage. This is clearly not an acceptable state of affairs, so you start looking for solutions.
You think back to a few months ago, before things got this bad, and you remember how the siren barely ever went off at all. In fact, from you knowledge of how the equipment works, the there were no problems, the siren couldn't go off. Clearly the solution the problem is to unplug the siren.
(I would like to apologise in advance for my total ignorance of how oil rigs actually work, I just wanted an analogy)
Both these stories demonstrate a mistake which I call 'Dressing Like a Winner' (DLAW). The general form of the error is when and indicator of success gets treated as an instrumental value, and then sometimes as a terminal value which completely subsumes the thing it was supposed to indicate. As someone noted this can also be seen as a sub-case of the correlative fallacy. This mistake is so obviously wrong that it is pretty much non-existant in near mode, which is why the above stories seem utterly ridiculous. However, once we switch into the more abstract far mode, even the most ridiculous errors become dangerous. In the rest of this post I will point out three places where I think this error occurs.
Changing our minds
In a debate between two people, it is usually the case that whoever is right is unlikely to change their mind. This is not only an empirically observable correlation, but it's also intuitively obvious, would you change your mind if you were right?
At this point, our fallacys steps in with a simple conclusion, "refusing to change your mind will make you right". As we all know, this could not be further from the truth, changing your mind is the only way to become right, or any rate less wrong. I do not think this realization is unique to this community, but it is far from universal (and it is a lot harder to practice than to preach, suggesting it might still hold on in the subconcious).
At this point a lot of people will probably have noticed that what I am talking about bears a close resemblance to signalling, and some of you are probably thinking that that is all there is to it. While I will admit that DLAW and Signalling are easy to confuse, I do think they are seperate things., and that there is more than just ordinary signalling going on in the debate.
One piece of evidence for this is the fact that my unwillingness to change my mind extends even to opinions I have admitted to nobody. If I was only interested in signalling surely I would want to change my mind in that case, since it would reduce the risk of being humiliated once I do state my opinion. Another reason to believe that DLAW exists is the fact that not only do debaters rarely change their minds, those that do are often criticised, sometimes quite brutally, for 'flip-flopping', rather than being praised for becoming smarter and for demonstrating that their loyalty to truth is higher than their ego.
So I think DLAW is at work here, and since I have chosen a fairly uncontroversially bad thing to start off with, I hope you can now agree with me that it is at least slightly dangerous.
Consistency
It is an accepted fact that any map which completely fits the territory would be self-consistent. I have not seen many such maps, but I will agree with the argument that they must be consistent. What I disagree with is the claim that this means we should be focusing on making our maps internally consistent, and that once we have done this we can sit back because our work is done.
This idea is so widely accepted and so tempting, especially to those with a mathematical bent, that I believed it for years before noticing the fallacy that lead to it. Most reasonably intelligent people have gotten over one half of the toxic meme, in that few of them believe consistency is good enough (with the one exception of ethics, where it still seems to apply in full force). However, as with the gold medal, not only is it a mistake to be satisfied with it, but it is a waste of time to aim for it in the first place.
In Robin Hanson's article beware consistency we see that the consistent subjects actually do worse than the inconsistent ones, because they are consistently impatient or consistently risk averse. I think this problem is even more general than his article suggests, and represents a serious flaw in our whole epistemology, dating back to the Ancient Greek era.
Suppose that one day I notice an inconsistency in my own beliefs. Conventional wisdom would tell me that this is a serious problem, and I should discard one of the beliefs as quickly. All else being equal, the belief that gets discarded will probably be the one I am less attached to, which will probably be the one I acquired more recently, which is probably the one which is actually correct, since the other may well date back to long before I knew how to think critically about an idea.
Richard Dawkins gives a good example of this in his book 'The God Delusion'. Kurt Wise, a brilliant young geologist raised as a fundementalist Christian. Realising the contradiction between his beliefs, he took a pair of scissors to the bible and cut out every passage he would have to reject if he accepted the scientific world-view. After realizing his bible was left with so few pages that the poor book could barely hold itself together, he decided to abandon science entirely. Dawkins uses this to make an argument for why religion needs to be removed entirely, and I cannot neccessarily say I disagree with him, but I think a second moral can be drawn from this story.
How much better off would Kurt have been if he had just shrugged his shoulders at the contradiction and continued to believe both? How much worse off we be if Robert Aumann had abandoned the study of Rationality when he noticed it contradicted Orthodox Judaism? Its easy to say that Kurt was right to abandon one belief, he just abandoned the wrong one, but from inside Kurt's mind I'm not sure it was obvious to him which belief was right.
I think a better policy for dealing with contradictions is to put both beliefs 'on notice', be cautious before acting upon either of them and wait for more evidence to decide between them. If nothing else, we should admit more than two possibilities, they could actually be compatible, or they could both be wrong, or one or both of them could be badly confused.
To put this in one sentence "don't strive for consistency, strive for accuracy and consistency will follow".
Mathematical arguments about rationality
In this community, I often see mathematical proofs that a perfect Bayesian would do something. These proofs are interesting from a mathematical perspective, but since I have never met a perfect Bayesian I am sceptical of their relevance to the real world (perhaps they are useful to AI, someone more experienced than me should either confirm or deny that).
The problem comes when we are told that since a perfect Bayesian would do X, then we imperfect Bayesians should do X as well in order to better ourselves. A good example of this is Aumann's Agreement Theorem, which shows that not agreeing to disagree is a consequence of perfect rationality, being treated as an argument for not agreeing to disagree in our quest for better rationality. The fallacy is hopefully clear by now, we have been given no reason to believe that copying this particular by-product of success will bring us closer to our goal. Indeed, in our world of imperfect rationalists, some of whom are far more imperfect than others, an argument against disagreement seems like a very dangerous thing.
Eliezer has already argued against this specific mistake, but since he went on to commit it a few articles later I think it bears mentioning again.
Another example of this mistake is this post (my apologies to the poster, this is not meant as an attack, you just provided a very good example of what I am talking about). The post provides a mathematical argument (a model rather than a proof) that we should be more sceptical of evidence that goes against our beliefs than evidence for them. To be more exact, it gives an argument why a perfect Bayesian, with no human bias and mathematically precise calibration should be more sceptical of evidence going against its beliefs than evidence for them.
The argument is, as far as I can tell, mathematically flawless. However, it doesn't seem to apply to me at all, if for no other reason than that I already have a massive bias overdoing that job, and my role is to counteract it.
In fact, I would say that in general our willingness to give numerical estimates is an example of this fallacy. The Cox theorems prove that any perfect reasoning system is isomorphic to Bayesian probability, but since my reasoning system is not perfect, I get the feeling that saying "80%" instead of "reasonably confident" is just making a mockery of the whole process.
This is not to say I totally reject the relevance of mathematical models and proofs to our pursuit. All else being equal if a perfect Bayesian does X. it is evidence that X is good for an imperfect Bayesian. It's just not overwhelmingly strong evidence, and shouldn't be treated as putting as if it puts a stop to all debate and decides the issue one way or the other (unlike other fields where mathematical arguments can do this).
How to avoid it
I don't think DLAW is particularly insidious as mistakes go, which is why I called it a fallacy rather than a bias. The only advice I would give is to be careful when operating in far mode (which you should do anyway), and always make sure the causal link between your actions and your goals is pointing in the right direction.
If anyone has any other examples they can think of, please post them. Thanks to those who have already pointed some out, particularly the point about akrasia and motivation