Conditional probability: Refresher

Discussion1

1

Written by So8res, et al. last updated 11th Oct 2016

Title

Tab title

Tab subtitle

⁠⁠⁠⁠⁠⁠⁠

\mathbb P(\text{left} \mid \text{right})

is the probability that the thing on the left is true assuming the thing on the right is true, and it's defined as

Thus,

\mathbb P(yellow \mid banana)

is the probability that a banana is yellow ("the probability of yellowness given banana"), while is the probability that a yellow object is a banana ("the probability of banana, given yellowness").^[1]

\mathbb P(x \land y)

is used to denote the probability of both and being true simultaneously (according to some probability_distribution ). , pronounced "the conditional probability of x, given y", is defined to be the quantity

For example, in the Diseasitis problem,

\mathbb P({sick}\mid {positive})

is the probability that a patient is sick given a positive test result, and it's calculated by taking the 18% patients who are sick and have positive test results, and dividing by all 42% of the patients who got positive test results. That is,

Using a frequency diagram, we can visualize

\mathbb P(sick \mid positive)

as the probability of drawing a result from a bag of only those people in the population who got a result.

The "given" operator in

\mathbb P(x\mid y)

tells us to assume that is true, to restrict our attention to only possible cases where is true, and then ask about the probability of within those cases.

Note that

\mathbb P(positive \mid sick)

is not the same as To find the probability that a patient has a positive result given that they're sick, we can visualize taking the 20 sick patients and putting them in a group, and then asking the probability that a randomly selected one will have a positive result, which will be — so while Mixing up which one is which is an unfortunate source of of many practical errors when you're trying to do these calculations using only the formal notation, at least until you get used to it. Just remember that is the probability of the thing on the left given that the thing on the right is true.

Summaries

You can edit summaries by clicking on them, reorder them by dragging, or add a new one (up to 3). By default you should avoid creating more than one summary unless the subject matter benefits substantially from multiple kinds of explanation.

Summary

is the probability that the thing on the left is true assuming the thing on the right is true, and it's defined as $\frac{P (left \land right)}{P (right)} .$

Thus, $P (y e l l o w ∣ b a n a n a)$ is the probability that a banana is yellow ("the probability of yellowness given banana"), while $P (b a n a n a ∣ y e l l o w)$ is the probability that a yellow object is a banana ("the probability of banana, given yellowness").

Tab title

is the probability that the thing on the left is true assuming the thing on the right is true, and it's defined as $\frac{P (left \land right)}{P (right)} .$

Thus, $P (y e l l o w ∣ b a n a n a)$ is the probability that a banana is yellow ("the probability of yellowness given banana"), while $P (b a n a n a ∣ y e l l o w)$ is the probability that a yellow object is a banana ("the probability of banana, given yellowness").^[1]

$P (x \land y)$ is used to denote the probability of both $x$ and $y$ being true simultaneously (according to some probability_distribution $P$ ). $P (x ∣ y)$ , pronounced "the conditional probability of x, given y", is defined to be the quantity

$\frac{P (x \land y)}{P (y)} .$

For example, in the Diseasitis problem, $P (s i c k ∣ p o s i t i v e)$ is the probability that a patient is sick given a positive test result, and it's calculated by taking the 18% patients who are sick and have positive test results, and dividing by all 42% of the patients who got positive test results. That is, $P (s i c k ∣ p o s i t i v e)$ $=$ $\frac{P (s i c k \land p o s i t i v e)}{P (p o s i t i v e)} .$

Using a frequency diagram, we can visualize $P (s i c k ∣ p o s i t i v e)$ as the probability of drawing a $s i c k$ result from a bag of only those people in the population who got a $p o s i t i v e$ result.

diseasitis frequency

bag of 18 and 24 patients

The "given" operator in $P (x ∣ y)$ tells us to assume that $y$ is true, to restrict our attention to only possible cases where $y$ is true, and then ask about the probability of $x$ within those cases.

Note that $P (p o s i t i v e ∣ s i c k)$ is not the same as $P (s i c k ∣ p o s i t i v e) .$ To find the probability that a patient has a positive result given that they're sick, we can visualize taking the 20 sick patients and putting them in a group, and then asking the probability that a randomly selected one will have a positive result, which will be $\frac{18}{20} = 0.9$ — so $P (p o s i t i v e ∣ s i c k) = 90 %,$ while $P (s i c k ∣ p o s i t i v e) \approx 43 % .$ Mixing up which one is which is an unfortunate source of of many practical errors when you're trying to do these calculations using only the formal notation, at least until you get used to it. Just remember that $P (left ∣ right)$ is the probability of the thing on the left given that the thing on the right is true.

^{^︎}
In general, $P (v)$ is an abbreviation of $P (V = v)$ for some variable $V$ , which is assumed to be known from the context. For example, $P (y e l l o w)$ might stand for $P (C o l o r O f N e x t O b j e c t I n B a g = y e l l o w)$ where $C o l o r O f N e x t O b j e c t I n B a g$ is a variable in our probability_distribution $P,$ and $y e l l o w$ is one of the values that that variable can take on.

Parents:

Conditional probability