From what I understand, the Kolmogorov axioms make no mention of conditional probability. That is simply defined. If I really want to show how probability actually works, I'm not going to argue "by definition". Does anyone know a modified form that uses simpler axioms than P(A|B) = P(A∩B)/P(B)?
I agree with the OP: simply defining a probability concept doesn't by itself map it to our intuitions about it. For example, if we defined P(A|B) = P(AB) / 2P(B), it wouldn't correspond to our intuitions, and here's why.
Intuitively, P(A|B) is the probability of A happening if we know that B already happened. In other words, the entirety of the elementary outcome space we're taking into consideration now are those that correspond to B. Of those remaining elementary outcomes, the only ones that can lead to A are those that lie in AB. Their measure in absolute terms is equal to P(AB); however, their measure in relation to the elementary outcomes in B is equal to P(AB)/P(B).
Thus, P(A|B) is P(A) as it would be if the only elementary outcomes in existence were those yielding B. P(B) here is a normalizing coefficient: if we were evaluating the conditional probability of A in relation to a set of exhaustive and mutually exclusive experimental outcomes, as it is done in Bayesian reasoning, dividing by P(B) means renormalizing the elementary outcome space after B is fixed.
Basically, P(A|B) = 0 when A and B are disjoint, and P(A|C)/P(B|C) = P(A)/P(B) when A and B are subsets of C?
It's better, but it's still not that good. I have a sneaking suspicion that that's the best I can do, though.