The Heuristic About Representativeness Heuristic

2 BT_Uytya 12 September 2015 11:15PM

(x-posted from my blog)

 

Some people think that the problem with representativeness heuristic is a base rate neglect. I hold that this is incorrect, and the problem is deeper than that, and simple use of a base rate isn't going to fix it. This makes the idea about "look at the base rate!" a heuristic as well.

The thing is, there is a fundamental difference between "How strongly E resembles H" and "How strongly H implies E". The latter question is about P(E|H), and this number could be used in Bayesian reasoning, if you add P(E|!H) and P(H)[1]. The former question — the question humans actually answer when asked to judge about whether something is likely — sometimes just could not be saved at all.

Several examples to get point across:

1) Conspiracy theorists / ufologists: naively, their existence strongly points to a world where UFOs exist, but really, their existence is very weak evidence of UFOs (human psychology suggests that ufologists could exist in a perfectly alienless world), and even could be an evidence against them, because if Secret World Government was real, we expect it to be very good at hiding, and therefore any voices who got close to the truth will be quickly silenced.

So, the answer to "how strongly E resembles H?" is very different from "how much is P(E|H)?". No amount of accounting for base rate is going to fix this.

2) Suppose that some analysis comes too good in a favor of some hypothesis

Maybe some paper argues that leaded gasoline accounts for 90% variation in violent crime (credit for this example goes to /u/yodatsracist on reddit). Or some ridiculously simple school intervention is claimed to have a gigantic effect size.

Let's take leaded gasoline, for example. On the surface, this data strongly "resembles" a world where leaded gasoline is indeed causing a violence, since 90% suggest that effect is very large and is very unlikely to be a fluke. On the other hand, this effect is too large, and 10% of "other factors" (including but not limited to: abortion rate, economic situation, police budget, alcohol consumption, imprisonment rate) is too small of percentage.

The decline we expect in a world of harmful leaded gasoline is more like 10% than 90%, so this model is too good to be true; instead of being very strong evidence in favor, this analysis could be either irrelevant (just a random botched analysis with faked data, nothing to see here) or offer some evidence against (for reasons related to the conservation of expected evidence, for example).

So, how it should be done? Remember that P(E|H) would be written as P(H -> E), were the notation a bit saner. P(E|H) is a "to which degree H implies E?", so the correct method for answering this query involves imagining world-where-H-is-true and asking yourself about "how often does E occur here?" instead of answering the question "which world comes to my mind after seeing E?".
 
--------------

[1] And often just using base rate is good enough, but this is another, even less correct heuristic. See: Prosecutor's Fallacy.

Meetup : Moscow: bayes, language and psychology, now with homework

1 BT_Uytya 02 January 2015 11:46PM

Discussion article for the meetup : Moscow: bayes, language and psychology, now with homework

WHEN: 04 January 2015 03:00:00PM (+0400)

WHERE: Moscow, ulitsa L'va Tolstogo 16

Hello there.

If you don't have plans for the Sunday but aware of Bayes' theorem in odds form, you certainly will be welcome here.

Our plan for 4th is:

  • Victor will try to explain the device of imaginary results by I J Good, himself gaining a deeper understanding of it in the process (yup, it's the same guy who posted this. I'm still a bit confused about Caesar, but I think I understand the big picture now. Let's tackle this together, it should be interesting and educating);

  • Varman will give a short talk on exosemantics. Surprisingly, the usual lack of political correctness isn't included (but counter-signalling and in-group bashing are still here, of course)

  • Natalia will tell you why do you need to know the very basics of behaviorism to properly analyze things like social interactions, and how exactly the use of more complex models could led you to a trouble.

Please do your homework and make sure you are familiar with the odds form of Bayes' theorem and know something about biofeedback.

PS: No, we aren't implying that behaviorism is enough to understand things. But it is necessary.

Discussion article for the meetup : Moscow: bayes, language and psychology, now with homework

Bayesianism for humans: prosaic priors

22 BT_Uytya 02 September 2014 09:45PM

 

There are two insights from Bayesianism which occurred to me and which I hadn't seen anywhere else before.
I like lists in the two posts linked above, so for the sake of completeness, I'm going to add my two cents to a public domain. This post is about the second penny, the first one is here.


Prosaic Priors

The second insight can be formulated as «the dull explanations are more likely to be correct because they tend to have high prior probability.»

Why is that? 

1) Almost by definition! Some property X is 'banal' if X applies to a lot of people in an disappointingly mundane way, not having any redeeming features which would make it more rare (and, hence, interesting).

In the other words, X is banal iff base rate of X is high. Or, you can say, prior probability of X is high.

1.5) Because of Occam's Razor and burdensome details. One way to make something boring more exciting is to add interesting details: some special features which will make sure that this explanation is about you as opposed to 'about almost anybody'.

This could work the other way around: sometimes the explanation feels unsatisfying exactly because it was shaved of any unnecessary and (ultimately) burdensome details.

2) Often, the alternative of a mundane explanation is something unique and custom made to fit the case you are interested in. And anybody familiar with overfitting and conjunction fallacy (and the fact that people tend to love coherent stories with blinding passion1) should be very suspicious about such things. So, there could be a strong bias against stale explanations, which should  be countered.

* * *

I fully grokked this when being in process of CBT-induced soul-searching; usage in this context still looks the most natural to me, but I believe that the area of application of this heuristic is wider.

Examples

1) I'm fairly confident that I'm an introvert. Still, sometimes I can behave like an extrovert. I was interested in the causes of this "extroversion activation", as I called it2. I suspected that I really had two modes of functioning (with "introversion" being the default one), and some events — for example, mutual interest (when I am interested in a person I was talking to, and xe is interested in me) or feeling high-status — made me switch between them.

Or, you know, it could be just reduction in a social anxiety, which makes people more communicative. Increased anxiety levels wasn't a new element to be postulated; I already knew I had it, yet I was tempted to make up new mental entities, and prosaic explanation about anxiety managed to avoid me for a while.

2) I find it hard to do something I consider worthwhile while on a spring break, despite having lots of a free time. I tend to make grandiose plans — I should meet new people! I should be more involved in sports! I should start using Anki! I should learn Lojban! I should practice meditation! I should read these textbooks including doing most of exercises! — and then fail to do almost anything. Yet I manage to do some impressive stuff during academic term, despite having less time and more commitments.

This paradoxical situation calls for explanation.

The first hypothesis that came to my mind was about activation energy. It takes effort to go  from "procrastinating" to "doing something"; speaking more generally, you can say that it takes effort to go from "lazy day" to "productive day". During the academic term, I am forced to make most of my days productive: I have to attend classes, do homework, etc. And, already having done something good, I can do something else as well. During spring break, I am deprived of that natural structure, and, hence I am on my own in terms of starting doing something I find worthwhile.

The alternative explanation: I was tired. Because, you know, vacation comes right after midterms, and I tend to go all out while preparing for midterms. I am exhausted, my energy and willpower are scarce, so it's no wonder I am having trouble utilizing it.

(I don't really believe in the latter explanation (I think that my situation is caused by several factors, including two outlined above), so it is also an example of descriptive "probable enough" hypothesis)

3) This example comes from Slate Star Codex. Nerds tend to find aversive many group bonding activities usual people supposedly enjoy, such as patriotism, prayer, team sports, and pep rallies. Supposedly, they should feel (with a tear-jerking passion of thousand exploding suns) the great unity with their fellow citizens, church-goers, teammates or pupils respectively, but instead they feel nothing.

Might it be that nerds are unable to enjoy these activities because something is broken inside their brains? One could be tempted to construct an elaborate argument involving autism spectrum and a mild case of schizoid personality disorder. In other words, this calls for postulating a rare form of autism which affects only some types of social behaviour (perception of group activities), leaving other types unchanged.

Or, you know, maybe nerds just don't like the group they are supposed to root for. Maybe nerds don't feel unity and relationship to The Great Whole because they don't feel like they truly belong here.

As Scott put it, "It’s not that we lack the ability to lose ourselves in an in-group, it’s that all the groups people expected us to lose ourselves in weren’t ones we could imagine as our in-group by any stretch of the imagination"3.

4) This example comes from this short comic titled "Sherlock Holmes in real life".

5) Scott Aaronson uses something similar to the Hanlon's Razor to explain that the lack of practical expertise of CS theorists aren't caused by arrogance or something like that:

"If theorists don’t have as much experience building robots as they should have, don’t know as much about large software projects as they should  know, etc., then those are all defects to add to the long list of their other, unrelated defects.  But it would be a mistake to assume that they failed to acquire this knowledge because of disdain for practical people, rather than for mundane reasons like busyness or laziness."

* * *

...and after this the word "prosaic" quickly turned into an awesome compliment. Like, "so, this hypothesis explains my behaviour well; but is it boring enough?", or "your claim is refreshingly dull; I like it!".


1. If you had read Thinking: Fast and Slow, you probably know what I mean. If you hadn't, you can look at narrative fallacy in order to get a general idea.
2. Which was, as I now realize, an excellent way to deceive myself via using word with a lot of hidden assumptions. Taboo your words, folks!
3. As a side note, my friend proposed an alternative explanation: the thing is, often nerds are defined as "sort of people who dislike pep rallies". So, naturally, we have "usual people" who like pep rallies and "nerds" who avoid them. And then "nerds dislike pep rallies" is tautology rather than something to be explained.

Bayesianism for humans: "probable enough"

38 BT_Uytya 02 September 2014 09:44PM

There are two insights from Bayesianism which occurred to me and which I hadn't seen anywhere else before.
I like lists in the two posts linked above, so for the sake of completeness, I'm going to add my two cents to a public domain. Second penny is here.



"Probable enough"

When you have eliminated the impossible, whatever  remains is often more improbable than your having made a mistake in one  of your impossibility proofs.


Bayesian way of thinking introduced me to the idea of "hypothesis which is probably isn't true, but probable enough to rise to the level of conscious attention" — in other words, to the situation when P(H) is notable but less than 50%.

Looking back, I think that the notion of taking seriously something which you don't think is true was alien to me. Hence, everything was either probably true or probably false; things from the former category were over-confidently certain, and things from the latter category were barely worth thinking about.

This model was correct, but only in a formal sense.

Suppose you are living in Gotham, the city famous because of it's crime rate and it's masked (and well-funded) vigilante, Batman. Recently you had read The Better Angels of Our Nature: Why Violence Has Declined by Steven Pinker, and according to some theories described here, Batman isn't good for Gotham at all.

Now you know, for example, the theory of Donald Black that "crime is, from the point of view of the perpetrator, the pursuit of justice". You know about idea that in order for crime rate to drop, people should perceive their law system as legitimate. You suspect that criminals beaten by Bats don't perceive the act as a fair and regular punishment for something bad, or an attempt to defend them from injustice; instead the act is perceived as a round of bad luck. So, the criminals are busy plotting their revenge, not internalizing civil norms.

You believe that if you send your copy of book (with key passages highlighted) to the person connected to Batman, Batman will change his ways and Gotham will become much more nice in terms of homicide rate. 

So you are trying to find out Batman's secret identity, and there are 17 possible suspects. Derek Powers looks like a good candidate: he is wealthy, and has a long history of secretly delegating illegal-violence-including tasks to his henchmen; however, his motivation is far from obvious. You estimate P(Derek Powers employs Batman) as 20%. You have very little information about other candidates, like Ferris Boyle, Bruce Wayne, Roland Daggett, Lucius Fox or Matches Malone, so you assign an equal 5% to everyone else.

In this case you should pick Derek Powers as your best guess when forced to name only one candidate (for example, if you forced to send the book to someone today), but also you should be aware that your guess is 80% likely to be wrong. When making expected utility calculations, you should take Derek Powers more seriously than Lucius Fox, but only by 15% more seriously.

In other words, you should take maximum a posteriori probability hypothesis into account while not deluding yourself into thinking that now you understand everything or nothing at all. Derek Powers hypothesis probably isn't true; but it is useful.

Sometimes I find it easier to reframe question from "what hypothesis is true?" to "what hypothesis is probable enough?". Now it's totally okay that your pet theory isn't probable but still probable enough, so doubt becomes easier. Also, you are aware that your pet theory is likely to be wrong (and this is nothing to be sad about), so the alternatives come to mind more naturally.

These "probable enough" hypothesis can serve as a very concise summaries of state of your knowledge when you simultaneously outline the general sort of evidence you've observed, and stress that you aren't really sure. I like to think about it like a rough, qualitative and more System1-friendly variant of Likelihood ratio sharing.

Planning Fallacy

The original explanation of planning fallacy (proposed by Kahneman and Tversky) is about people focusing on a most optimistic scenario when asked about typical one (instead of trying to do an Outside VIew). If you keep the distinction between "probable" and "probable enough" in mind, you can see this claim in a new light.

Because the most optimistic scenario is the most probable and the most typical one, in a certain sense.

The illustration, with numbers pulled out of thin air, goes like this: so, you want to visit a museum.

The first thing you need to do is to get dressed and take your keys and stuff. Usually (with 80% probability) you do this very quick, but there is a weak possibility of your museum ticket having been devoured by an entropy monster living on your computer table.

The second thing is to catch bus. Usually (p = 80%), bus is on schedule, but sometimes it can be too early or too late. After this, the bus could (20%) or could not (80%) get stuck in a traffic jam.

Finally, you need to find a museum building. You've been there before once, so you sorta remember your route, yet still could be lost with 20% probability.

And there you have it: P(everything is fine) = 40%, and probability of every other scenario is 10% or even less. "Everything is fine" is probable enough, yet likely to be false. Supposedly, humans pick MAP hypothesis and then forget about every other scenario in order to save computations.

Also, "everything is fine" is a good description of your plan. If your friend asks you, "so how are you planning to get to the museum?", and you answer "well, I catch the bus, get stuck in a traffic jam for 30 agonizing minutes, and then just walk from here", your friend is going  to get a completely wrong idea about dangers of your journey. So, in a certain sense, "everything is fine" is a typical scenario. 

Maybe it isn't human inability to pick the most likely scenario which should be blamed. Maybe it is false assumption that "most likely == likely to be correct" which contributes to this ubiquitous error.

In this case you would be better off having picked the "something will go wrong, and I will be late", instead of "everything will be fine".

So, sometimes you are interested in the best specimen out of your hypothesis space, sometimes you are interested in a most likely thingy (and it doesn't matter how vague it would be), and sometimes there are no shortcuts, and you have to do an actual expected utility calculation.

I need help: Device of imaginary results by I J Good

5 BT_Uytya 05 April 2013 07:49PM

In the chapter 5 of the Probability Theory: Logic of Science you can read about so-called device of imaginary results which seems to go back to the book of I J Good named Probability and the Weighing of Evidence.

The idea is simple and fascinating:

1) You want to estimate your probability of something, and you know that this probability is very, very far from 0.5. For the sake of simplicity, let's assume that it's some hypothesis A and P(A|X) << 0.5

2) You imagine the situation where the A and some well-posed alternative ~A are the only possibilities.

(For example, A = "Mr Smith has extrasensory perception and can guess the number you've written down" and ~A = "Mr Smith can guess your number purely by luck". Maybe Omega told you that the room where the experiment is located makes it's impossible for Smith to secretly look at your paper, and you are totally safe from every other form of deception.)

3) You imagine the evidence which would convince you otherwise: P(E|A,X) ~ 1 and P(E|~A,X) is small (you should select E and ~A that way that it's possible to evaluate P(E|~A,X) )

4) After a while, you feel that you are truly in doubt about A: P(A|E1,E2,..., X) ~ 0.5

5) And now you can backtrack everything back to your prior P(A|X) since you know every P(E|A) and P(E|~A).

 

After this explanation with the example about Mr Smith's telepathic powers, Jaynes gives reader the following exercise:

Exercise 5.1. By applying the device of imaginary results, find your own strength of
belief in any three of the following propositions: (1) Julius Caesar is a real historical
person (i.e. not a myth invented by later writers); (2) Achilles is a real historical person;
(3) the Earth is more than a million years old; (4) dinosaurs did not die out; they are
still living in remote places; (5) owls can see in total darkness; (6) the configuration of
the planets influences our destiny; (7) automobile seat belts do more harm than good;
(8) high interest rates combat inflation; (9) high interest rates cause inflation.

I have trouble tackling the first two propositions and would be glad to hear your thoughts about another seven. Anybody care to help me?

(I decided not to share details of my attempt to solve this exercise unless asked. I don't think that my perspective is so valuable and anchoring would be bad.)

 

UPD: here is my attempt to solve the Julius Caesar problem.

Nate Silver will do an AMA on Reddit on Tuesday

2 BT_Uytya 07 January 2013 09:08AM

http://www.reddit.com/r/IAmA/comments/163nqk/nate_silver_is_doing_an_ama_tuesday_at_2_pm/

I'm really excited to see this. Nate Silver might be the most famous present day Bayesian statistician.

UPD: It appears that author of the Reddit post deleted it for some reason. The link still works but it makes sense to post the link to the Nate Silver blog with his original announcement, just in case: http://fivethirtyeight.blogs.nytimes.com/2013/01/06/ask-nate-anything/

A presentation about Cox's Theorem made for my English class

3 BT_Uytya 15 October 2012 07:51PM

At my English class everybody was supposed to make a short presentation about subject of one's choice. I decided to tell people about Cox's Theorem (heavily based on the introduction in "Probability theory: The Logic Of Science" by E T Jaynes and "Constructing a logic of plausible inference: a guide to Cox's theorem" by Kevin S. Van Horn). Thought someone might find that interesting or useful.

 

Make sure that you have speaker notes visible.

https://docs.google.com/file/d/0BwJocL_GupTsNnMtdWFLT3RYWGs/edit

What about a line of retreat for the psychologists?

-11 BT_Uytya 16 July 2012 08:03PM

The road to the truth is paved with revelations; sometimes those revelations are uncomfortable. Also, you can never go back; it is impossible to unlearn something.

The problem is, if you go too far, those who fell behind will stop to hear your voice. You want them to be closer, and sometimes the only way to achieve it is to guide them to the truth. But the road is paved with the uncomfortable revelations. Oops.

So far, I remember two big uncomfortable revelations: the first is that we live beyond the reach of god, and the second is that the psychology isn't nearly effective as everybody thinks.

If I had been completely honest with the people around, I would have told them about House of the Cards. But I'm not. It is too cruel to say "Hey, your world-view is wrong and your competence is just an illusion" to the psychologists and soon-to-be-psychologists. Hence I'm afraid to say even innocent "Hey, I read a very interesting book yesterday" to the fellow CS students (because I don't know whether they have psychologist relatives).

This situation seems very wrong to me, but I understand that the reality is unfair and I'm lucky that I can be an atheist without fear of alienation, unlike the poor souls living in the bible belt. I'm just going to be very careful with my words concerning psychology. Of course I should be more cautious and patient while talking with strangers in *Guardian Of The Truth* mode, no surprise here.

But still.

Sometimes I wonder what I'm going to do if I really need to tell somebody that very often psychology is useless and sometimes it is even dangerous. What should I do to prevent their flinching from the truth? How to make the reality look more comfortable to them?

Meetup : Moscow 11 February meetup

2 BT_Uytya 09 February 2012 06:39PM

Discussion article for the meetup : Moscow 11 February meetup

WHEN: 11 February 2012 06:00:00PM (+0400)

WHERE: Mayakovskaya (Subway Station), Moscow, Russia

Despite the map (I have no idea why, but it shows wrong place), location of the meetup is Metro Mayakovskaya (in the center of the hall). For further information - turchin.livejournal.com

Discussion article for the meetup : Moscow 11 February meetup

Moscow meetup: Saturday 6 PM

3 BT_Uytya 08 February 2012 09:25PM

WHEN: 11 February 2012 06:00:00PM (Moscow time)

WHERE: Place is yet to be determined; I hope this issue will be dealt with tomorrow.

 

UPD.: So it is Metro Mayakovskaya; Meetup link: http://lesswrong.com/meetups/6x

View more: Next