Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: BiasedBayes 15 September 2015 02:03:59PM *  0 points [-]

Thanks for the post. I love it.

My comments:

First sidenote that dont assume that if something is a heuristic it is automatically a wrong way of thinking.(sorry if i misinterpret this, because you dont explicitly say this at all :) In some situations simple heuristics will outperform regression analysis for example.

But about your mainpoint. If I understood right this is actually a problem of violating so called "ratio rule".

(1) The degree to which c is representative of S is indicated by the conditional propability p (c | S)- that is, the propability of members of S have characterestic c.

(2) The propability that the characteristic c implies membership S is given by p (S | c). (Like you write)

(3) p (c | S) / p (S | c) = p(c) / p(S)

This is the Ratio Rule= Ratio of inverse propabilities equals the ratio of simple propabilities. So to equate these two propabilities p(c|S) and p(S|c) in the absence of equating ALSO the simple propabilitis is just wrong and bad thinking.

Representative thinking does not reflect these differences between p(c|S) and p(S|c) and introduces a symmetry in the map (thought) that does not exist in the world.

For example: "Home is the most dangerous place in the world because most accidents happen in home. So stay away from home!!!" --> This is confusion about the propability of accident given being home with propability being home given accident.

Comment author: BT_Uytya 16 September 2015 03:41:48PM *  0 points [-]

Thank you. English isn't my first language, so for me feedback means a lot. Especially positive :)

My point was that representative heuristic made two errors: firstly, it violates "ratio rule" (= equates P(S|c) and P(c|S)), and secondly, sometimes it replaces P(c|S) with something else. That means that the popular idea "well, just treat it as P(c|S) instead of P(S|c); if you add P(c|~S) and P(S), then everything will be OK " doesn't always work.

The main point of our disagreement seem to be this:

(1) The degree to which c is representative of S is indicated by the conditional propability p (c | S)- that is, the propability of members of S have characterestic c.

1) Think about stereotypes. They are "represent" their classes well, yet it's extremely unlikely to actually meet the Platonic Ideal of Jew.

(also, sometimes there is some incentive for members of ethnic group to hide their lineage; if so, then P(stereotypical characteristics|member of group) is extremely low, yet the degree of resemblance is very high)

(this is somewhat reminds me of the section about Planning Fallacy in my earlier post).

2) I think that it can be argued that the degree of resemblance should involve P(c|~S) in some way. If it's very low, then c is very representative of S, even if P(c|S) isn't high.

Overall, inferential distances got me this time; I'm probably going to rewrite this post. If you have some ideas about how this text could be improved, I will be glad to hear them.

Comment author: redding 14 September 2015 12:49:20AM 0 points [-]

Just to clarify, I feel that what you're basically saying that often what is called the base-rate fallacy is actually the result of P(E|!H) being too high.

I believe this is why Bayesians usually talk not in terms of P(H|E) but instead use Bayes Factors.

Basically, to determine how strongly ufo-sightings imply ufos, don't look at P(ufos | ufo-sightings). Instead, look at P(ufos | ufo-sightings) / P(no-ufos | ufo-sightings).

This ratio is the Bayes factor.

Comment author: BT_Uytya 15 September 2015 09:08:37PM 0 points [-]

Thank you for your feedback.

Yes, I'm aware of likelihood ratios (and they're awesome, especially for log-odds). Earlier draft of this post ended at "the correct method for answering this query involves imagining world-where-H-is-true, imagining world-where-H-is-false and comparing the frequency of E between them", but I decided against it. And well, if some process involves X and Y, then it is correct (but maybe misleading) to say that in involves just X.

My point was that "what it does resemble?" (process where you go E -> H) was fundamentally different from "how likely is that?" (process where you go H -> E). If you calculate likelihood ratio using the-degree-of-resemblance instead of actual P(E|H) you will get wrong answer.

(Or maybe thinking about likelihood ratios will force you to snap out of representativeness heuristic, but I'm far from sure about it)

I think that I misjudged the level of my audience (this post is an expansion of /r/HPMOR/ comment) and hadn't made my point (that probabilistic thinking is more correct when you go H->E instead of vice versa) visible enough. Also, I was going to blog about likelihood ratios later (in terms of H->E and !H->E) — so again, wrong audience.

I now see some ways in which my post is debacle, and maybe it makes sense to completely rewrite it. So thank you for your feedback again.

The Heuristic About Representativeness Heuristic

2 BT_Uytya 12 September 2015 11:15PM

(x-posted from my blog)


Some people think that the problem with representativeness heuristic is a base rate neglect. I hold that this is incorrect, and the problem is deeper than that, and simple use of a base rate isn't going to fix it. This makes the idea about "look at the base rate!" a heuristic as well.

The thing is, there is a fundamental difference between "How strongly E resembles H" and "How strongly H implies E". The latter question is about P(E|H), and this number could be used in Bayesian reasoning, if you add P(E|!H) and P(H)[1]. The former question — the question humans actually answer when asked to judge about whether something is likely — sometimes just could not be saved at all.

Several examples to get point across:

1) Conspiracy theorists / ufologists: naively, their existence strongly points to a world where UFOs exist, but really, their existence is very weak evidence of UFOs (human psychology suggests that ufologists could exist in a perfectly alienless world), and even could be an evidence against them, because if Secret World Government was real, we expect it to be very good at hiding, and therefore any voices who got close to the truth will be quickly silenced.

So, the answer to "how strongly E resembles H?" is very different from "how much is P(E|H)?". No amount of accounting for base rate is going to fix this.

2) Suppose that some analysis comes too good in a favor of some hypothesis

Maybe some paper argues that leaded gasoline accounts for 90% variation in violent crime (credit for this example goes to /u/yodatsracist on reddit). Or some ridiculously simple school intervention is claimed to have a gigantic effect size.

Let's take leaded gasoline, for example. On the surface, this data strongly "resembles" a world where leaded gasoline is indeed causing a violence, since 90% suggest that effect is very large and is very unlikely to be a fluke. On the other hand, this effect is too large, and 10% of "other factors" (including but not limited to: abortion rate, economic situation, police budget, alcohol consumption, imprisonment rate) is too small of percentage.

The decline we expect in a world of harmful leaded gasoline is more like 10% than 90%, so this model is too good to be true; instead of being very strong evidence in favor, this analysis could be either irrelevant (just a random botched analysis with faked data, nothing to see here) or offer some evidence against (for reasons related to the conservation of expected evidence, for example).

So, how it should be done? Remember that P(E|H) would be written as P(H -> E), were the notation a bit saner. P(E|H) is a "to which degree H implies E?", so the correct method for answering this query involves imagining world-where-H-is-true and asking yourself about "how often does E occur here?" instead of answering the question "which world comes to my mind after seeing E?".

[1] And often just using base rate is good enough, but this is another, even less correct heuristic. See: Prosecutor's Fallacy.
Comment author: Yvain 14 October 2010 07:03:21PM *  79 points [-]

On any task more complicated than sheer physical strength, there is no such thing as inborn talent or practice effects. Any non-retarded human could easily do as well as the top performers in every field, from golf to violin to theoretical physics. All supposed "talent differential" is unconscious social signaling of one's proper social status, linked to self-esteem.

A young child sees how much respect a great violinist gets, knows she's not entitled to as much respect as that violinist, and so does badly at violin to signal cooperation with the social structure. After practicing for many years, she thinks she's signaled enough dedication to earn some more respect, and so plays the violin better.

"Child prodigies" are autistic types who don't understand the unspoken rules of society and so naively use their full powers right away. They end out as social outcasts not by coincidence but as unconscious social punishment for this defection.

Comment author: BT_Uytya 10 April 2015 10:04:54PM *  2 points [-]

It's interesting to note that this is almost exactly how it works in some role-playing games.

Suppose that we have Xandra the Rogue who went into dungeon, killed a hundred rats, got a level-up and now is able to bluff better and lockpick faster, despite those things having almost no connection to rat-killing.

My favorite explanation of this phenomenon was that "experience" is really a "self-esteem" stat which could be increased via success of any kind, and as character becomes more confident in herself, her performance in unrelated areas improves too.

Meetup : Moscow: bayes, language and psychology, now with homework

1 BT_Uytya 02 January 2015 11:46PM

Discussion article for the meetup : Moscow: bayes, language and psychology, now with homework

WHEN: 04 January 2015 03:00:00PM (+0400)

WHERE: Moscow, ulitsa L'va Tolstogo 16

Hello there.

If you don't have plans for the Sunday but aware of Bayes' theorem in odds form, you certainly will be welcome here.

Our plan for 4th is:

  • Victor will try to explain the device of imaginary results by I J Good, himself gaining a deeper understanding of it in the process (yup, it's the same guy who posted this. I'm still a bit confused about Caesar, but I think I understand the big picture now. Let's tackle this together, it should be interesting and educating);

  • Varman will give a short talk on exosemantics. Surprisingly, the usual lack of political correctness isn't included (but counter-signalling and in-group bashing are still here, of course)

  • Natalia will tell you why do you need to know the very basics of behaviorism to properly analyze things like social interactions, and how exactly the use of more complex models could led you to a trouble.

Please do your homework and make sure you are familiar with the odds form of Bayes' theorem and know something about biofeedback.

PS: No, we aren't implying that behaviorism is enough to understand things. But it is necessary.

Discussion article for the meetup : Moscow: bayes, language and psychology, now with homework

Comment author: cameroncowan 03 September 2014 09:32:58PM 1 point [-]

I like your example but there is additional evidence that could be gathered to refine your premise. You can check the traffic situation along your route and make summations about travel time. So there is a chance, given additional tools to up the chances of "everything is fine" to be the more likely scenario over not. I think this is especially true for those of us that drive cars. If you and I decide to go to the Denver Art Museum and you are coming from a hotel in downtown Denver and I'm driving from my house out of town whether I'm gong to be on time or not depends on all the factors you mentioned. However, I can mitigate some of those factors by adding data. I can do the same thing for you by empowering you with a map or by guiding you towards a tool like Google maps to get you from your hotel to the museum more efficiently. I think when you live someplace for a time and you make a trip regularly you get used to certain ideas about your journey which is why "everything is fine" is usually picked by people. To try to compensate for every eventuality is mind-numbing. However, I think making proper use of tools to make things as efficient as possible is also a good idea.

However, I am very much in favor of this line of thinking.

Comment author: BT_Uytya 04 September 2014 09:00:39PM *  1 point [-]

Making sure I understood you: you are saying that people sometimes pick "everything is fine" because:

1) they are confident that if anything goes wrong, they would be able to fix it, so everything is fine once again

2) they are so confident in it they aren't making specific plans, beliving that they would be able to fix everything on the spur of the moment

aren't you?

Looks plausible, but something must be wrong there, because planning fallacy:

a) exists (so people aren't evaluating their abilities well)

b) exists even people aren't familiar with the situation they are predicting (here, people have no ground for "ah, I'm able to fix anything anyway" effect)

c) exists even in people with low confidence (however, maybe the effect is weaker here; it's an interesting theory to test)

I blame overconfidence and similar self-serving biases.

Comment author: BT_Uytya 26 August 2014 05:57:42PM 6 points [-]

So, I made two posts sharing potentionally useful heuristics from Bayesianism. So what?

Should I move one of them to Main? On the one hand, these posts "discuss core Less Wrong topics". On the other, I'm honestly not sure that this stuff is awesome enough. But I feel like I should do something, so these things aren't lost (I tried to do a talk about "which useful principles can be reframed in a Bayesian terms" on a Moscow meetup once, and learned that those things weren't very easy to find using site-wide search).

Maybe we need a wiki page with a list of relevant lessons from probability theory, which can be kept up-to-date?

Comment author: BT_Uytya 02 September 2014 09:47:44PM 1 point [-]

(decided to move everything to Main)

Bayesianism for humans: prosaic priors

22 BT_Uytya 02 September 2014 09:45PM


There are two insights from Bayesianism which occurred to me and which I hadn't seen anywhere else before.
I like lists in the two posts linked above, so for the sake of completeness, I'm going to add my two cents to a public domain. This post is about the second penny, the first one is here.

Prosaic Priors

The second insight can be formulated as «the dull explanations are more likely to be correct because they tend to have high prior probability.»

Why is that? 

1) Almost by definition! Some property X is 'banal' if X applies to a lot of people in an disappointingly mundane way, not having any redeeming features which would make it more rare (and, hence, interesting).

In the other words, X is banal iff base rate of X is high. Or, you can say, prior probability of X is high.

1.5) Because of Occam's Razor and burdensome details. One way to make something boring more exciting is to add interesting details: some special features which will make sure that this explanation is about you as opposed to 'about almost anybody'.

This could work the other way around: sometimes the explanation feels unsatisfying exactly because it was shaved of any unnecessary and (ultimately) burdensome details.

2) Often, the alternative of a mundane explanation is something unique and custom made to fit the case you are interested in. And anybody familiar with overfitting and conjunction fallacy (and the fact that people tend to love coherent stories with blinding passion1) should be very suspicious about such things. So, there could be a strong bias against stale explanations, which should  be countered.

* * *

I fully grokked this when being in process of CBT-induced soul-searching; usage in this context still looks the most natural to me, but I believe that the area of application of this heuristic is wider.


1) I'm fairly confident that I'm an introvert. Still, sometimes I can behave like an extrovert. I was interested in the causes of this "extroversion activation", as I called it2. I suspected that I really had two modes of functioning (with "introversion" being the default one), and some events — for example, mutual interest (when I am interested in a person I was talking to, and xe is interested in me) or feeling high-status — made me switch between them.

Or, you know, it could be just reduction in a social anxiety, which makes people more communicative. Increased anxiety levels wasn't a new element to be postulated; I already knew I had it, yet I was tempted to make up new mental entities, and prosaic explanation about anxiety managed to avoid me for a while.

2) I find it hard to do something I consider worthwhile while on a spring break, despite having lots of a free time. I tend to make grandiose plans — I should meet new people! I should be more involved in sports! I should start using Anki! I should learn Lojban! I should practice meditation! I should read these textbooks including doing most of exercises! — and then fail to do almost anything. Yet I manage to do some impressive stuff during academic term, despite having less time and more commitments.

This paradoxical situation calls for explanation.

The first hypothesis that came to my mind was about activation energy. It takes effort to go  from "procrastinating" to "doing something"; speaking more generally, you can say that it takes effort to go from "lazy day" to "productive day". During the academic term, I am forced to make most of my days productive: I have to attend classes, do homework, etc. And, already having done something good, I can do something else as well. During spring break, I am deprived of that natural structure, and, hence I am on my own in terms of starting doing something I find worthwhile.

The alternative explanation: I was tired. Because, you know, vacation comes right after midterms, and I tend to go all out while preparing for midterms. I am exhausted, my energy and willpower are scarce, so it's no wonder I am having trouble utilizing it.

(I don't really believe in the latter explanation (I think that my situation is caused by several factors, including two outlined above), so it is also an example of descriptive "probable enough" hypothesis)

3) This example comes from Slate Star Codex. Nerds tend to find aversive many group bonding activities usual people supposedly enjoy, such as patriotism, prayer, team sports, and pep rallies. Supposedly, they should feel (with a tear-jerking passion of thousand exploding suns) the great unity with their fellow citizens, church-goers, teammates or pupils respectively, but instead they feel nothing.

Might it be that nerds are unable to enjoy these activities because something is broken inside their brains? One could be tempted to construct an elaborate argument involving autism spectrum and a mild case of schizoid personality disorder. In other words, this calls for postulating a rare form of autism which affects only some types of social behaviour (perception of group activities), leaving other types unchanged.

Or, you know, maybe nerds just don't like the group they are supposed to root for. Maybe nerds don't feel unity and relationship to The Great Whole because they don't feel like they truly belong here.

As Scott put it, "It’s not that we lack the ability to lose ourselves in an in-group, it’s that all the groups people expected us to lose ourselves in weren’t ones we could imagine as our in-group by any stretch of the imagination"3.

4) This example comes from this short comic titled "Sherlock Holmes in real life".

5) Scott Aaronson uses something similar to the Hanlon's Razor to explain that the lack of practical expertise of CS theorists aren't caused by arrogance or something like that:

"If theorists don’t have as much experience building robots as they should have, don’t know as much about large software projects as they should  know, etc., then those are all defects to add to the long list of their other, unrelated defects.  But it would be a mistake to assume that they failed to acquire this knowledge because of disdain for practical people, rather than for mundane reasons like busyness or laziness."

* * *

...and after this the word "prosaic" quickly turned into an awesome compliment. Like, "so, this hypothesis explains my behaviour well; but is it boring enough?", or "your claim is refreshingly dull; I like it!".

1. If you had read Thinking: Fast and Slow, you probably know what I mean. If you hadn't, you can look at narrative fallacy in order to get a general idea.
2. Which was, as I now realize, an excellent way to deceive myself via using word with a lot of hidden assumptions. Taboo your words, folks!
3. As a side note, my friend proposed an alternative explanation: the thing is, often nerds are defined as "sort of people who dislike pep rallies". So, naturally, we have "usual people" who like pep rallies and "nerds" who avoid them. And then "nerds dislike pep rallies" is tautology rather than something to be explained.

Bayesianism for humans: "probable enough"

38 BT_Uytya 02 September 2014 09:44PM

There are two insights from Bayesianism which occurred to me and which I hadn't seen anywhere else before.
I like lists in the two posts linked above, so for the sake of completeness, I'm going to add my two cents to a public domain. Second penny is here.

"Probable enough"

When you have eliminated the impossible, whatever  remains is often more improbable than your having made a mistake in one  of your impossibility proofs.

Bayesian way of thinking introduced me to the idea of "hypothesis which is probably isn't true, but probable enough to rise to the level of conscious attention" — in other words, to the situation when P(H) is notable but less than 50%.

Looking back, I think that the notion of taking seriously something which you don't think is true was alien to me. Hence, everything was either probably true or probably false; things from the former category were over-confidently certain, and things from the latter category were barely worth thinking about.

This model was correct, but only in a formal sense.

Suppose you are living in Gotham, the city famous because of it's crime rate and it's masked (and well-funded) vigilante, Batman. Recently you had read The Better Angels of Our Nature: Why Violence Has Declined by Steven Pinker, and according to some theories described here, Batman isn't good for Gotham at all.

Now you know, for example, the theory of Donald Black that "crime is, from the point of view of the perpetrator, the pursuit of justice". You know about idea that in order for crime rate to drop, people should perceive their law system as legitimate. You suspect that criminals beaten by Bats don't perceive the act as a fair and regular punishment for something bad, or an attempt to defend them from injustice; instead the act is perceived as a round of bad luck. So, the criminals are busy plotting their revenge, not internalizing civil norms.

You believe that if you send your copy of book (with key passages highlighted) to the person connected to Batman, Batman will change his ways and Gotham will become much more nice in terms of homicide rate. 

So you are trying to find out Batman's secret identity, and there are 17 possible suspects. Derek Powers looks like a good candidate: he is wealthy, and has a long history of secretly delegating illegal-violence-including tasks to his henchmen; however, his motivation is far from obvious. You estimate P(Derek Powers employs Batman) as 20%. You have very little information about other candidates, like Ferris Boyle, Bruce Wayne, Roland Daggett, Lucius Fox or Matches Malone, so you assign an equal 5% to everyone else.

In this case you should pick Derek Powers as your best guess when forced to name only one candidate (for example, if you forced to send the book to someone today), but also you should be aware that your guess is 80% likely to be wrong. When making expected utility calculations, you should take Derek Powers more seriously than Lucius Fox, but only by 15% more seriously.

In other words, you should take maximum a posteriori probability hypothesis into account while not deluding yourself into thinking that now you understand everything or nothing at all. Derek Powers hypothesis probably isn't true; but it is useful.

Sometimes I find it easier to reframe question from "what hypothesis is true?" to "what hypothesis is probable enough?". Now it's totally okay that your pet theory isn't probable but still probable enough, so doubt becomes easier. Also, you are aware that your pet theory is likely to be wrong (and this is nothing to be sad about), so the alternatives come to mind more naturally.

These "probable enough" hypothesis can serve as a very concise summaries of state of your knowledge when you simultaneously outline the general sort of evidence you've observed, and stress that you aren't really sure. I like to think about it like a rough, qualitative and more System1-friendly variant of Likelihood ratio sharing.

Planning Fallacy

The original explanation of planning fallacy (proposed by Kahneman and Tversky) is about people focusing on a most optimistic scenario when asked about typical one (instead of trying to do an Outside VIew). If you keep the distinction between "probable" and "probable enough" in mind, you can see this claim in a new light.

Because the most optimistic scenario is the most probable and the most typical one, in a certain sense.

The illustration, with numbers pulled out of thin air, goes like this: so, you want to visit a museum.

The first thing you need to do is to get dressed and take your keys and stuff. Usually (with 80% probability) you do this very quick, but there is a weak possibility of your museum ticket having been devoured by an entropy monster living on your computer table.

The second thing is to catch bus. Usually (p = 80%), bus is on schedule, but sometimes it can be too early or too late. After this, the bus could (20%) or could not (80%) get stuck in a traffic jam.

Finally, you need to find a museum building. You've been there before once, so you sorta remember your route, yet still could be lost with 20% probability.

And there you have it: P(everything is fine) = 40%, and probability of every other scenario is 10% or even less. "Everything is fine" is probable enough, yet likely to be false. Supposedly, humans pick MAP hypothesis and then forget about every other scenario in order to save computations.

Also, "everything is fine" is a good description of your plan. If your friend asks you, "so how are you planning to get to the museum?", and you answer "well, I catch the bus, get stuck in a traffic jam for 30 agonizing minutes, and then just walk from here", your friend is going  to get a completely wrong idea about dangers of your journey. So, in a certain sense, "everything is fine" is a typical scenario. 

Maybe it isn't human inability to pick the most likely scenario which should be blamed. Maybe it is false assumption that "most likely == likely to be correct" which contributes to this ubiquitous error.

In this case you would be better off having picked the "something will go wrong, and I will be late", instead of "everything will be fine".

So, sometimes you are interested in the best specimen out of your hypothesis space, sometimes you are interested in a most likely thingy (and it doesn't matter how vague it would be), and sometimes there are no shortcuts, and you have to do an actual expected utility calculation.
Comment author: BT_Uytya 26 August 2014 05:57:42PM 6 points [-]

So, I made two posts sharing potentionally useful heuristics from Bayesianism. So what?

Should I move one of them to Main? On the one hand, these posts "discuss core Less Wrong topics". On the other, I'm honestly not sure that this stuff is awesome enough. But I feel like I should do something, so these things aren't lost (I tried to do a talk about "which useful principles can be reframed in a Bayesian terms" on a Moscow meetup once, and learned that those things weren't very easy to find using site-wide search).

Maybe we need a wiki page with a list of relevant lessons from probability theory, which can be kept up-to-date?

View more: Next