Thank you! That's very kind.
I got curious and asked Claude to explain the difference between regressing X-onto-Y and Y-onto-X and it did a really good job---which I found somewhat distressing. Is my blog post even providing any value when an LLM can reproduce 80-90% of the insight in literally a 1000th of the time?
But maybe there's still value in writing up the blog post because it's non-trivial to know what the right questions are to ask. I wrote this blog post because I knew that (a) understanding the difference between the two regression lines was impor...
Yes, that's important clarification. Markov's inequality is tight on the space of all non-negative random variables (the inequality becomes an equality with the two-point distribution shown in the final state of the proof). But it's not constructed to be tight with respect to a generic distribution.
I'm pretty new to these sorts of tail-bound proofs that you see a lot in e.g high-dimensional probability theory. But in general, understanding under what circumstances a bound is tight has been one of the best ways to intuitively understand how a given bound works.
For the first part about " being a formal maneuver"--I don't disagree with the comment as stated nor with what Jaynes did in a technical sense. But I'm trying to imbue the proposition with a "physical interpretation" when I identify it with an infinite collection of evidences. There is a subtlety with my original statement that I didn't expand on, but I've been thinking about ever since I read the post: "infinitude" is probably best understood as a relative term. Maybe the simplest way to think about this is that, as I understand it, if you condition o
Thanks for the reference. You and other commentator both seem to be saying the same thing: that the there isn't much use case for the Ap distribution as Bayesian statisticians have other frameworks for thinking about these sorts of problems. It seems important that I acquaint myself with the basic tools of Bayesian statistics to better contextualize Jaynes' contribution.
This intuition--that the KL is a metric-squared--is indeed important for understanding the KL divergence. It's a property that all divergences have in common. Divergences can be thought of as generalizations of the Euclidean metric where you replace the quadratic--which is in some sense the Platonic convex function--with a convex function of your choice.
This intuition is also important for understanding Talagrand's T2 inequality which says that, under certain conditions like strong log-concavity of the reference measure q, the Wasserstein-2 distance (which...
Thanks for the feedback.
...What you are showing with the coin is a hierarchical model over multiple coin flips, and doesn't need new probability concepts. Let be the flips. All you need in life is the distribution . You can decide to restrict yourself to distributions of the form ∫10dpcoinP(F,G|pcoin)p(pcoin). In practice, you start out thinking about as a variable atop all the in a graph, and then think in terms of and separately, because that's more intuitive. This is the standard way of doing things. All you do
Thank you!
I'm not an expert on this topic, but my impression is that linear regression is useful for when you are trying to a fit a function from input to output (e.g imagine you have the alleles at various loci as your inputs and you want to predict some phenotype as your output. That's the type of problem well-suited for high-dimensional linear regression.) Whereas, for principle component analysis, it's mainly used as a dimensionality reduction technique (so using PCA for the case of two dimensions as I did in this post is a bit overkill.)