Likelihood function

Let's say you have a piece of evidence and a set of hypotheses $H .$ Each $H_{i} \in H$ assigns some likelihood to $e .$ The function $L_{e} (H_{i})$ that reports this likelihood for each $H_{i} \in H$ is known as a "likelihood function."

For example, let's say that the evidence is $e_{c}$ = "Mr. Boddy was killed with a candlestick," and the hypotheses are $H_{S}$ = "Miss Scarlett did it," $H_{M}$ = "Colonel Mustard did it," and $H_{P}$ = "Mrs. Peacock did it." Furthermore, if Miss Scarlett was the murderer, she's 20% likely to have used a candlestick. By contrast, if Colonel Mustard did it, he's 10% likely to have used a candlestick, and if Mrs. Peacock did it, she's only 1% likely to have used a candlestick. In this case, the likelihood function is

$L_{e_{c}} (h) = ⎧ ⎨ ⎩ \begin{matrix} 0.2 & if h = H_{S} 0.1 & if h = H_{M} 0.01 & if h = H_{P} \end{matrix}$

For an example using a continuous function, consider a possibly-biased coin whose bias $b$ to come up heads on any particular coinflip might be anywhere between $0$ and $1$ . Suppose we observe the coin to come up heads, tails, and tails. We will denote this evidence $e_{H T T} .$ The likelihood function over each hypothesis $H_{b}$ = "the coin is biased to come up heads $b$ portion of the time" for $b \in [0, 1]$ is:

$L_{e_{H T T}} (H_{b}) = b \cdot (1 - b) \cdot (1 - b) .$

There's no reason to normalize likelihood functions so that they sum to 1 — they aren't probability distributions, they're functions expressing each hypothesis' propensity to yield the observed evidence. For example, if the evidence was really obvious ( $e_{s}$ = "the sun rose this morning,") it might be the case that almost all hypotheses have a very high likelihood, in which case the sum of the likelihood function will be much more than 1.

Likelihood functions carry absolute likelihood information, and therefore, they contain information that relative likelihoods do not. Namely, absolute likelihoods can be used to check a hypothesis for strict confusion.