By the law of large numbers, $\frac{1}{N} \sum_{i = 1}^{N} ln Q_{θ} (x_{i}) \to \sum_{x} P (x) ln Q_{θ} (x)$ almost surely. This is the cross entropy of $P$ and $Q_{θ}$ . Also note that if we subtract this from the entropy of $P$ , we get $D_{K L} (P | | Q_{θ})$ . So minimising the cross entropy over $θ$ is equivalent to maximising $D_{K L} (P | | Q_{θ})$ .

I think the cross entropy of $P$ and $Q_{θ}$ is actually $H (P, Q_{θ}) = - \sum_{x} P (x) ln Q_{θ} (x)$ (note the negative sign). The entropy of $P$ is $H (P) = - \sum_{x} P (x) ln P (x)$ . Since $D_{K L} (P | | Q_{θ}) = \sum_{x} P (x) (ln (P (x) - ln Q_{θ} (x)) = \sum_{x} P (x) ln P (x) - \sum_{x} P (x) ln Q_{θ} (x) = - H (P) + H (P, Q_{θ})$ then the KL divergence is actually the cross entropy minus the entropy, not the other way around. So minimising the cross entropy over $θ$ will minimise (not maximise) the KL divergence.

I believe the next paragraph is still correct: the maximum likelihood estimator $θ^{*}$ is the parameter which maximises $L ({^P}_{n}; Q_{θ})$ , which minimises the cross-entropy, which minimises the KL divergence.

Apologies if any of what I've said above is incorrect, I'm not an... (read more)

1

0

Replying toSix (and a half) intuitions for KL divergence

Ethan (EJ) Watkins3y

Six (and a half) intuitions for KL divergence

I think there is a mistake in this equation. $I_{P} (X)$ and $I_{Q} (X)$ are the wrong way round. It should be:
$D_{K L} (P | | Q) = \sum x p_{x} (ln p_{x} - ln q_{x}) = E [I_{Q} (X) - I_{P} (X)]$

1

0