My knowledge of probability theory is based mostly on reading E.T. Jaynes’ Probability Theory book, Andrew Gelman’s blog, and various LessWrong posts. I now want to get a strong grasp of the central limit theorem(s), but YouTube videos and googled pages speak so much in the language of sampling from a population, and random variables, that it’s hard to be sure what they’re saying, given that my background doesn’t really include those ideas. I’m especially interested in the different kinds of CLTs, like the Lyapunov condition, the Berry-Esseen theorem, and so on. I often have a tough time with diving right into algebra - something like http://personal.psu.edu/drh20/asymp/fall2002/lectures/ln04.pdf gives me terrible trouble. Given all these constraints, does anyone know of good resources from which I can gain a strong grasp of the CLTs?
Some things I am confused about after googling so far:
Do distributions converge to gaussians, or do means converge to the mean of a gaussian? Is the former a more difficult convergence to achieve, or are they actually the very same condition?
Is the CLT even about means? Does it say anything about the resulting variance or skewness of the resulting distribution?
Is it actually necessary to be sampling from a population, or does the CLT apply to taking the means of arbitrary distributions, regardless of where they were obtained?
Any form of media is OK, for recommendations - no preference. Please feel free to suggest things even if you’re not sure it’s what I’m looking for - you are probably better than google!
If you don't have a given joint pobability space, you implicitly construct it (for example, by saying RV are independent, you implicitly construct a product space). Generally, the fact that sometimes you talk about X living on one space (on its own) and other time on the other (joint with some Y) doesn't really matter, because in most situations, probability theory is specifically about the properties of random variables that are independent of the of the underlying spaces (although sometimes it does matter).
Your example, by definition, P = Prob(X = 6ft AND Y = raining) = mu{t: X(t) = 6ft and Y(t) = raining}. You have to assume their joint probability space. For example, maybe they are independent, and then it P = Prob(X = 6ft) \* Prob(Y = raining), or maybe it's Y = if X = 6ft than raining else not raining, and then P = Prob(X = 6ft).