Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: Lumifer 24 July 2015 04:11:50PM *  -1 points [-]

This is a badly written wall of text which isn't improved by pictures. Moreover, it starts by confusing frequency with probability (let me quote Andrew Gelman when faced with the same error: "Fuuuuuuuuuuuuuuuck. No no no no no").

Comment author: Satoshi_Nakamoto 25 July 2015 02:58:32AM 0 points [-]

Ok. Thanks for letting me know. I have removed the first example. I was thinking that it would make it simpler if I started out with an example that didn't look at evidence, but I think it is better without it.

If anyone wants to know the difference between frequency and probability. See the below quote:

“A probability is something that we assign, in order to represent a state of knowledge, or that we calculate from previously assigned probabilities according to the rules of probability theory. A frequency is a factual property of the real world that we measure or estimate. [...] The fundamental, inescapable distinction between probability and frequency lies in this relativity principle: probabilities change when we change our state of knowledge; frequencies do not. It follows that the probability p(E) that we assign to an event E can be equal to its frequency f (E) only for certain particular states of knowledge. Intuitively, one would expect this to be the case when the only information we have about E consists of its observed frequency.” Jaynes, E. (2003), Probability Theory: The Logic of Science, New York, Cambridge University Press, pg. 292

Comment author: Stingray 24 July 2015 09:52:10AM 4 points [-]

Or you can use Venn diagram

Comment author: Satoshi_Nakamoto 24 July 2015 11:51:25AM 3 points [-]

Yes you can. See this site for what I think is a good example of visualizing Bayes' theorem with venn diagrams.

Bayesian Reasoning - Explained Like You're Five

5 Satoshi_Nakamoto 24 July 2015 03:59AM

(This post is not an attempt to convey anything new, but is instead an attempt to convey the concept of Bayesian reasoning as simply as possible. There have been other elementary posts that have covered how to use Bayes’ theorem: here, here, here and here)

 

Bayes’ theorem is about the probability that something is true given some piece or pieces of evidence. In a really simple form it is basically the equation below:


This will be explained using the following coin flipping scenario:

If someone is flipping two coins: one fair and one biased (has heads on both sides), then what is the probability that the coin flipped was the fair coin given that you know that the result of the coin being flipped was heads?

 

Let’s figure this out by listing out the potential states using a decision tree:

 

We know that the tail state is not true because the result of the coin being flipped was heads. So, let’s update the decision tree:

 

 

The decision tree now lists all of the possible states given that the result was heads. 

Let's now plug in the values into the formula. We know that there are three potential states. One in which the coin is fair and two in which it is biased. Let's assume that each state has the same likelihood.

So, the result is: 1 / 1 + 2 which is 1 / 3 which equals 33%.Using the formula we have found out that there is 33% chance that the coin flipped was the fair one when we already know that the result of the flip was heads.

 

At this point you you may be thinking what any of this has to do with bayesian reasoning. Well, the relation is that the above formula is pretty much the same as Bayes’ theorem which in its explicit form is:

 

You can see that P(B|A) * P(A) (in bold) is on both the top and the bottom of the equation. It represents “expected number of times it’s true” in the generic formula above. P(B|~A) * P(~A) represents "expected number of times it's false".

 

You don’t need to worry about what the whole formula means yet as this post is just about how to use Bayesian reasoning and why it is useful. If you want to find out how to deduce Bayes' theorem, check out this post. If you want some examples of how to use Bayes' theorem see one of these posts: 123 and 4.


Let’s now continue on. This time we will be going through a totally different example. This example will demonstrate what it is like to use Bayesian reasoning.

Imagine a scenario with a teacher and a normally diligent student. The student tells the teacher that they have not completed their homework because their dog ate it. Take note of the following:

  • H stands for the hypothesis which is that the student did their home work. This is possible, but the teacher does not think that it is very likely. The teacher only has the evidence of the student’s diligence to back up this hypothesis which does affect the probability that the hypothesis is correct, but not by much.
  • ~H stands for the opposite hypothesis which is that the student did not do their homework. The teacher thinks that this is likely and also believes that the evidence (no extra evidence backing up the students claim and a cliché excuse) points towards this opposite hypothesis.

 

Which do you think is more probable: H or ~H? If you look at how typical ~H is and how likely the evidence is if ~H is correct, then I believe that we must see ~H (which stands for the student did not do their homework) as more probable. The below picture demonstrates this. Please note that higher probability is represented as being heavier i.e. lower in the weight-scale pictures below.

 

The teacher is using Bayesian reasoning, so they don’t actually take ~H (student did not do their homework) as being true. They take it as being probable given the available evidence. The teacher knows that if new evidence is provided then this could make the H more probable and ~H less probable. So, knowing this the teacher tells the student that if they bring in their completed homework tomorrow and provide some new evidence then they will not get a detention tomorrow. 

 

Let’s assume that the next day the student does bring in their completed homework and they also bring in the remains of the original homework that looks like it has been eaten by a dog. Now, the teacher, since they have received new evidence, must update the probabilities of the hypotheses. The teacher also remembers the original evidence (the student’s diligence). When the teacher updates the probabilities of the hypotheses, H (student did their homework) becomes more probable and ~H (student did not do their homework) becomes less probable, but note that it is not considered impossible. After updating the probabilities of the hypotheses the teacher decides to let the student out of the detention. This is because the teacher now sees H as being the best hypothesis that is able to explain the evidence.

 

The below picture demonstrates the updated probabilities.

 

 

If your reasoning is similar to the teachers, then congratulations. Because this means that you are using Bayesian reasoning. Bayesian reasoning involves incorporating conditional probabilities and updating these probabilities when new evidence is provided.

 

You may be looking at this and wondering what all the fuss is over Bayes’ Theorem. You might be asking yourself: why do people think this is so important? Well, it is true that the actual process of weighing evidence and changing beliefs is not a new practice, but the importance of the theorem does not actually come from the process, but from the fact that this process has been quantified, i,e, made it into an expressible equation (Bayes’ Theorem). 

 

Overall, the theorem and its related reasoning are useful because they take into account alternative explanations and how likely they are given the evidence that you are seeing. This means that you can’t just get a theory and take it to be true if it fits the evidence. You need to also look at alternative hypotheses and see if they explain the evidence better. This leads you to start thinking about all hypotheses in terms of probabilities rather than certainties. It also leads you to think about beliefs in terms of evidence. If we follow Bayes’ Theorem, then nothing is just true. Thing are instead only probable because they are backed up by evidence. A corollary of this is that different evidence leads to different probabilities.  

An example demonstrating how to deduce Bayes' Theorem

3 Satoshi_Nakamoto 24 July 2015 03:58AM

(This post is not an attempt to convey anything new, but is instead just an attempt to provide background context on how  Bayes' theorem works by describing how it can be deduced. This is not meant to be a formal proof. There have been other elementary posts that have covered how to use Bayes’ theorem: here, here, here and here)

 

Consider the following example

Imagine that your friend has a bowl that contains cookies in two varieties: chocolate chip and white chip macadamia nut. You think to yourself: “Yum. I would really like a chocolate chip cookie”. So you reach for one, but before you can pull one out your friend lets you know that you can only pick one, that you cannot look into the bowl and that all the cookies are either fresh or stale. Your friend also tells you that there are 80 fresh cookies, 40 chocolate chip cookies, 15 stale white chip macadamia nut cookies and 100 cookies in total. What is the probability that you will pull out a fresh chocolate chip cookie?

 

To figure this out we will create a truth table. If we fill in the values that we do know, then we will end up with the below table. I have highlighted in yellow the cell that we want to find the value of.

 

Chocolate Chip

White Chip Macadamia Nut

Total

Fresh

 

 

80

Stale

 

15

 

Total

40

 

100

 

If we look at the above table we can notice that, like in Sudoku, there are some values that we can fill in based on the information that we already know. These values are coloured in grey and they are:

  • The number of stale cookies. We know that 80 cookies are fresh and that there are 100 cookies in total, so this means that there must be 20 stale cookies.
  • The number of white chip macadamia nut cookies. We know that there are 40 chocolate chip cookies and 100 cookies in total, so this means that there must be 60 white chip macadamia nut cookies

 

If we fill in both these values we end up with the below table:

 

Chocolate Chip

White Chip Macadamia Nut

Total

Fresh

 

 

80

Stale

 

15

20

Total

40

60

100

 

If we look at the table now, we can see that there are two more values that can be filled in. These values are coloured in grey and they are:

  • The number of fresh white chip macadamia nut cookies. We know that there are 60 white chip macadamia nut cookies and that 15 of these are stale, so this means that there must be 45 fresh white chip macadamia nut cookies.
  • The number of stale chocolate chip cookies. We know that there are 20 stale cookies and that 15 of these are white chip macadamia nut, so this means that there must be 5 stale chocolate chip cookies.

 

If we fill in both these values we end up with the below table:

 

Chocolate Chip

White Chip Macadamia Nut

Total

Fresh

 

45

80

Stale

5

15

20

Total

40

60

100

 

We can now find out the number of fresh chocolate chip cookies. It is important to note that there are two ways in which we can do this. These two ways are called the inverse of each other (this will be used later):

  • Using the filled in row values. We know that there are 80 fresh cookies and that 45 of these are white chip macadamia nut, so this means that there must be 35 fresh chocolate chip cookies.
  • Using the filled in column values. We know that there are 40 chocolate chip cookies and the 5 of these are stale, so this means that there must be 35 fresh chocolate chip cookies.

 

 If we fill in the last value in the table we end up with the below table:

 

Chocolate Chip

White Chip Macadamia Nut

Total

Fresh

35

45

80

Stale

5

15

20

Total

40

60

100

 

We can now find out the probability of choosing a fresh chocolate chip cookie by dividing the number of fresh chocolate chip cookies (35) by the total number of cookies (100). This is 35 / 100 which is 35%. We now have the probability of choosing a fresh chocolate chip cookie (35%).

 

To get to the Bayes' theorem I will need to reduce the terms to a simpler form.

  • P(A) = probability of finding some observation A. You can think of this as the probability of the picked cookie being chocolate chip.
  • P(B)  = the probability of finding some observation B. You can think of this as the probability of the picked cookie being fresh. Please note that A is what we want to find given B. If it was desired, then A could be fresh and B chocolate chip.
  • P(~A) = negated version of finding some observation A. You can think of this as the probability of the picked cookie not being chocolate i.e. being a white chip macadamia nut instead.
  • P(~B) = a negated version of finding some observation B. You can think of this as the probability of the picked cookie not being fresh i.e. being stale instead.
  • P(A∩B) = probability of being both A and B. You can think of this as the probability of the picked cookie being fresh and chocolate chip.

 

Now, we will start getting a bit more complicated as we start moving into the basis of the Bayes’ Theorem. Let’s go through another example based on the original.

Let’s assume that before you pull out a cookie you notice that it is fresh. Can you then figure out the likelihood of it being chocolate chip before you pull it out? The answer is yes.

 

We will find this out using the table that we filled in previously. The important row is underlined.

 

Chocolate Chip

White Chip Macadamia Nut

Total

Fresh

35

45

80

Stale

5

15

20

Total

40

60

100

 

Since we already know that the cookie is fresh, we can say that the likelihood of it being a chocolate chip cookie is equal to the number of fresh chocolate chip cookies (35) divided by the total number of fresh cookies (80). This is 35 / 80 which is 43.75%.

 

In a simpler form this is:

  • P(A|B) - The probability of A given B. You can think of this as the probability of the picked cookie being chocolate chip if you already know that it is fresh.

If we relook at the table we can see that there is some extra important information that we can find out about P(A|B). We can discover that it is equal to P(A∩B) / P(B) You can think of this as the probability of the picked cookie being chocolate chip if you know that it is fresh (35 / 80) is equal to the probability of the picked cookie being fresh and chocolate chip (35 / 100) divided by the probability of it being fresh (80 / 100). This is P(A|B) = (35 / 100) / (80 / 100) which becomes 0.35 / 0.8 which is the same as the answer we found out above (43.75%). Take note of the fact that P(A|B) = P(A∩B) / P(B) as we will use this later.

 

Let’s now return to the inverse idea that was raised previously. If we want to know the probability of the picked cookie being fresh and chocolate chip, i.e. P(A∩B), then we can use the underlined parts of the filled in truth table.

 

Chocolate Chip

White Chip Macadamia Nut

Total

Fresh

35

45

80

Stale

5

15

20

Total

40

60

100

If we know that the cookie is known to be fresh like in the top row above, then we can find out that: P(A∩B) = P(A|B) * P(B). This means that the probability of the picked cookie being fresh and chocolate chip (35 / 100)  (remember that there were 100 cookies in total) is equal to the probability of it being chocolate chip given that you know that it is fresh (35 / 80) times the probability of it being fresh (80 / 100) . So, we end up with P(A∩B) = (35 / 80) * (80 / 100) which becomes 35% which is the same as 35 / 100 which we know is the right answer.

 

Alternatively, since we know that we can convert P(A|B) to P(A∩B) / P(B) (we found this out previously) we can also find out that:P(A∩B) = P(A|B) * P(B). We can do this by using the following method:

  1. Assume P(A∩B) = P(A|B) * P(B)
  2. Convert P(A|B) to P(A∩B) / P(B) so we get P(A∩B) = (P(A∩B) * P(B)) / P(B).
  3. Notice that P(B) is on both the top and bottom of the equation, which means that it can be crossed out
  4. Cross out P(B) to give you P(A∩B) = P(A∩B)

 

The inverse situation is when you know that the cookie is chocolate chip like in the left column in the table above. Using the left column we can find out that:  P(A∩B) = P (B|A) * P(A). This means that the probability of the picked cookie being fresh and chocolate chip (35 / 100) is equal to the probability of it being fresh given that you know it is chocolate chip (35 / 40) times the probability of it being chocolate chip (40 / 100). This is: P(A∩B) = (35 / 40) * (40 / 100). This becomes 35% which we know is the right answer.

 

Now, we have enough information to deduce the simple form of Bayes’ Theorem.

Let’s first recount what we know:

  1. P(A|B) = P(A∩B) / P(B)
  2. P(A∩B) = P(B|A) * P(A)

By taking the first fact: P(A|B) = P(A∩B) / P(B) and using the second fact to convert P(A∩B) to P(B|A) * P(A) you end up with P(A|B) = (P(B|A) * P(A)) / P(B) which is Bayes' Theorem in its simple form.

 

From the simple form of Bayes' Theorem, there is one more conversion that we need to make to derive the explicit form of Bayes' Theorem which is the one we are trying to deduce.

 

To get to the explicit form version we need to first find out that P(B) = P(A) * P(B|A) + P(~A) * P(B|~A).

To do this let’s refer to the table again:

 

Chocolate Chip

White Chip Macadamia Nut

Total

Fresh

35

45

80

Stale

5

15

20

Total

40

60

100

 

We can see that the probability that the picked cookie is fresh (80 / 100) is equal to the probability that it is fresh and chocolate chip (35 / 100) plus the probability that it is fresh and white chip macadamia nut (45 / 100). So, we can find out that the probability of P(B) (cookie is fresh) is equal to 35 / 100 + 45 / 100 which is 0.8 or 80% which we know is the answer. This gives the formula:P(B) = P(A∩B) + P(~A∩B)

 

We know that P(A∩B) = P(B|A) * P(A) as we found this out earlier. Similarly we can find out that  P(~A∩B) = P(~A) * P(B|~A)This means that the probability of the picked cookie being fresh and white chip macadamia nut (45 / 100) is equal to the probability of it being white chip macadamia nut (60 / 100) times the probability of it being fresh cookie given that you know that it is white chip macadamia nut (45 / 60). This is: (60 / 100) * (45 / 60) which is 45% which we know is the answer.

 

Using this information, we can now get to the explicit form of Bayes' Theorem:

  1. We know the simple form of Bayes' Theorem: P(A|B) = (P(B|A) * P(A)) / P(B)
  2. We can convert P(B) to P(A∩B) + P(~A∩B) to get P(A|B) = (P(B|A) * P(A)) / (P(A∩B) + P(~A∩B))
  3. We can convert P(A∩B) to P(A) * P(B|A) to get P(A|B) = (P(B|A) * P(A)) / (P(A) * P(B|A) + P(~A∩B))
  4. We can convert P(~A∩B) to P(~A) * P(B|~A) to get P(A|B) = (P(B|A) * P(A)) / (P(A) * P(B|A) + P(~A) * P(B|~A))

Congratulations we have now reached the explicit form of Bayes' Theorem:  


Comment author: Jiro 16 July 2015 03:57:14PM *  6 points [-]

Remember reason as memetic immune disorder.

"Be reasonable" is often a type of cultural immunity to crazy ideas. And someone who wants to "be rational" instead of "being reasonable" may actually be in a position where his "rationality" is bypassing this cultural immunity.

It could be that you're just rational enough to understand that "be reasonable" isn't a rational argument, but you're not rational enough to explicitly figure out what's wrong with this specific idea now that you've bypassed the general immunity to bad ideas.

Comment author: Satoshi_Nakamoto 17 July 2015 03:30:38AM 1 point [-]

Good point. Would you say that this is the problem: when you are rational, you deem your conclusions more valuable than those of non-rational people. This can end up being a problem as you are less likely to update your beliefs when they are opposed. This adds the risk that if you make a one false belief and then rationally deduce a plethora of others from it you will be less likely to update any erronous conclusions.

I think that the predicament highlights the fact that going against what is reasonable is not something that you should do lightly. Maybe, I should make this more explicit.

If you are going against the crowd, then there is a good chance that you have made a mistake somewhere in your reasoning and that your conclusion is crazy or does not work. Reasonable things are not normally like this because they need to be circulated and disseminated. If they were crazy or didn't work, then this could not happen. But, this doesn't mean that they are optimal or that they are right.

If you are going against what is reasonable, then this is a serious reason to doubt your beliefs. It is not a reason in and of iteself to believe that something untrue or irrational.

What do you think is the best way to overcome this problem? This is from the post:

How can you tell when you have removed one set of blind spots from your reasoning without removing its counterbalances? One heuristic to counter this loss of immunity, is to be very careful when you find yourself deviating from everyone around you. I deviate from those around me all the time, so I admit I haven't found this heuristic to be very helpful. Another heuristic is to listen to your feelings. If your conclusions seem repulsive to you, you may have stripped yourself of cognitive immunity to something dangerous.

I would add that it is a good idea to try and explain your beliefs to other people, preferably someone you believe is rational and the more people the better. Try to seriously doubt your beliefs and to see them anew. If other people reach the same conclusion, then you can become more sure of your beliefs.

Comment author: welp 16 July 2015 10:16:04PM 2 points [-]

It strikes me as strange to designate this as "rational" rather than say, "moral", and then use this as the example of the difference between "rational" and "reasonable". If this is considered rational simply because it's a direct, one-step application of your moral values, then the real difference here lies between your terminal values and the terminal values of the general population; both you and the general population are acting rationally. There are surely better examples to use, where your terminal values coincide with society, and your actions optimize them while societal norms do not. Charity for instance.

Comment author: Satoshi_Nakamoto 17 July 2015 02:54:05AM *  1 point [-]

I agree that this is probably not the best example. The scrub one is better.

I think that "moral" is similar to "reasonable" in that it is based on intutition rather than argument and rationality. People have seen slavery as being "moral" in the past. Some of the reasons for this is false beliefs like that it's natural that some people are slaves, that slaves are inferior beings and that slavery is good for slaves,

I guess I was thinking about it from two points of view:

  • Is it rational to have the moral belief that there should be slaves. A rational person would look at all the supporting beliefs and see if they are themselves rational. For example, are slaves inferior beings. The answer, as we know, is no. In terms of the mass slavery of large portions of people, this has often been due to some characteristic like high levels of melanin for the slaves in America. These characteristics don't make people inferior and they sure don't make people inhuman.
  • With the system set up the way it was, was the alternative to slaves inferior? I am not an expert on this, but I was thinking that the alternative was not inferior. Perhaps, it would have been slower in terms of growth, but America still could have thrived as a nation if the south abolished slavery without war.
Comment author: Unknowns 16 July 2015 04:17:48AM 1 point [-]

If "being rational" means choosing the best option, you never have to choose between "being reasonable" and "being rational," because you should always choose the best option. And sometimes the best option is influenced by what other people think of what you are doing; sometimes it's not.

Comment author: Satoshi_Nakamoto 16 July 2015 12:35:23PM *  0 points [-]

I agree that rationality and reasonableness can be similar, but they can also be different. See this post for what I mean by rationality. The idea of it being choosing the best option is too vague.

Some factors that may lead to what others think is reasonable being different from what is the most rational are: the continued use of old paradigms that are known to be faulty, pushing your views as being what is reasonable as a method of control and status quo bias.

Here is are two more examples of the predicament

  • Imagine that you are in family that is heavily religious and you decide that you are an atheist. If you tell anyone in your family you are likely to get chastised for this making it an example of the just-be-reasonable predicament.
  • Imagine that you are a jury member and you are the cause of a hung jury. They tell you: “the guy obviously did it. He is a bad man anyway. How much evidence do you need? Just be reasonable about this so that we can go home”. Now, you may actually be being irrationally under confident or perhaps you are not. The post was about what you should do in this situation. I consider it a predicament because people find it hard to do what they think is the right thing when they are uncertain and when it will cause them social disapproval.

Also, I have updated the below:

The just-be-reasonable predicament occurs when in order to be seen as being reasonable you must do something irrational or non-optimal.

To this to try and more clearly express what I meant:

The just-be-reasonable predicament occurs when you are chastised for doing something that you believe to be more rational and/or optimal than the norm or what is expected or desired. The chastiser has either: not considered, cannot fathom or does not care that what you are doing or want to do might be more rational and/or optimal than what is the default course of action. The predicament is similar to the one described in lonely dissent in that you must choose between making what you to believe to be the most rational and/or optimal course of action and the one that will be meet with the least amount of social disapproval.

Comment author: VoiceOfRa 16 July 2015 06:40:41AM 3 points [-]

You missed one important case: sometimes the right solution is to continue being rational and not care what the "reasonable" person thinks of you. In particular just because you're rational doesn't mean you'll be able to change everyone's mind.

Comment author: Satoshi_Nakamoto 16 July 2015 12:18:52PM *  1 point [-]

I don't think I was very clear. I meant for this case to be covered under "avoid the issue". As by avoiding the issue you just continue whatever course of action or behaviour you were previously undertaking. I have edited the post to make this a bit clearer.

I thought about this later and think you were right. I have updated the process in the picture.

Comment author: Gunnar_Zarncke 16 July 2015 07:54:03AM 1 point [-]

The scrubs you mention seem comparable to bruce.

Comment author: Satoshi_Nakamoto 16 July 2015 12:15:12PM *  1 point [-]

Yes. They seem pretty close to me. I think it is a bit different though. I think the bruce article was trying to convey the idea that Bruce was a kind of gaming masochist. That is, he wanted to lose.

An example quote is:

If he would hit a lucky streak and pile up some winnings he would continue to play until the odds kicked in as he knew they always would thus he was able to jump into the pit of despair and self-loathing head first. Because he needed to. And Bruce is just like that.

The difference as I see it is that bruce loses through self sabotage because of unresolved issues in his psyche and the scrub loses through self sabotage because they are too pedantic.

Comment author: Elo 16 July 2015 08:18:57AM *  3 points [-]

“why can’t you just be a good [insert relevant religion here] boy or girl”

could probably in improved to read:

"Why can't you just be a good conformist"

or

"Why can't you just conform to my belief of what is the best course of action for you here"

Other than that, I like it. I believe this does a good job of explaining a process that probably comes naturally to a lot of people. Making it hard to describe. For conditions such as asbergers it would come in handy where the natural social conditioned process is not always automatically created by the user.

Comment author: Satoshi_Nakamoto 16 July 2015 12:14:04PM 0 points [-]

Good idea. I replaced it with "Why can't you just conform to my belief of what is the best course of action for you here". Thanks.

View more: Next