In response to falenas108's "Ask an X" thread. I have a PhD in experimental particle physics; I'm currently working as a postdoc at the University of Cincinnati. Ask me anything, as the saying goes.
This is an experiment. There's nothing I like better than talking about what I do; but I usually find that even quite well-informed people don't know enough to ask questions sufficiently specific that I can answer any better than the next guy. What goes through most people's heads when they hear "particle physics" is, judging by experience, string theory. Well, I dunno nuffin' about string theory - at least not any more than the average layman who has read Brian Greene's book. (Admittedly, neither do string theorists.) I'm equally ignorant about quantum gravity, dark energy, quantum computing, and the Higgs boson - in other words, the big theory stuff that shows up in popular-science articles. For that sort of thing you want a theorist, and not just any theorist at that, but one who works specifically on that problem. On the other hand I'm reasonably well informed about production, decay, and mixing of the charm quark and charmed mesons, but who has heard of that? (Well, now you have.) I know a little about CP violation, a bit about detectors, something about reconstructing and simulating events, a fair amount about how we extract signal from background, and quite a lot about fitting distributions in multiple dimensions.
I don't understand how you get more than two dimensions out of data points that are either 0 or 1 (unless perhaps the votes were accompanied by data on age, sex, politics?) and anyway what I usually think of as 'dimension' is just the number of entries in each data point, which is fixed. It seems to me that this is perhaps a term of art which your friend is using in a specific way without explaining that it's jargon.
However, on further thought I think I can bridge the gap. If I understand your explanation correctly, your friend is looking for the minimum set of variables which explains the distribution. I think this has to mean that there is more data than yes-or-no; suppose there is also age and gender, and everyone above 30 votes yes and everyone below thirty votes no. Then you could have had dimensionality two, some combination of age and gender is required to predict the vote; but in fact age predicts it perfectly and you can just throw out gender, so the actual dimensionality is one.
So what we are looking for is the number of parameters in the model that explains the data, as opposed to the number of observables in the data. In physics, however, we generally have a fairly specific model in mind before gathering the data. Let me first give a trivial example: Suppose you have some data that you believe is generated by a Gaussian distribution with mean 0, but you don't know the sigma. Then you do the following: Assume some particular sigma, and for each event, calculate the probability of seeing that event. Multiply the probabilities. (In fact, for practical purposes we take the log-probability and add, avoiding some numerical issues on computers, but obviously this is isomorphic.) Now scan sigma and see which value maximises the probability of your observations; that's your estimate for sigma, with errors given by the values at which the log-probability drops by 0.5. (It's a bit involved to derive, but basically this corresponds to the frequentist 66%-confidence limits assuming the log-probability function is symmetric around the maximum.)
Now, the LessWrong-trained eye can, presumably, immediately see the underlying Bayes-structure here. We are finding the set of parameters that maximises the posterior probability of our data. In my toy example you can just scan the parameter space, point by point. For realistic models with, say, forty parameters - as was the case in my thesis - you have to be a bit more clever and use some sort of search algorithm that doesn't rely on brute force. (With forty parameters, even if you take only 10 points in each, you instantly have 10^40 points to evaluate - that is, at each point you calculate the probability for, say, half a million events with what may be quite a computationally expensive function. Not practical.)
The above is what I think of when I say "fitting a distribution". Now let me try to bring it back into contact with the finding-the-dimensions problem. The difference is that your friend is dealing with a set of variables such that some of them may directly account for others, as in my age/vote toy example. But in the models we fit to physics distributions, not all the parameters are necessarily directly observed in the event. An obvious example is the time resolution of the detector; this is not a property of the event (at least not solely of the event - some events are better measured than others) and anyway you can't really say that the resolution 'explains' the value of the time (and note that decay times are continuous, not multiple-choice as in most survey data.) Rather, the observed distribution of the time is generated by the true distribution convolved with the resolution - you have to do a convolution integral. If you measure a high (and therefore unlikely, since we're dealing with exponential decay) time, it may be that you really have an unusual event, or it may be that you have a common event with a bad resolution that happened to fluctuate up. The point, however, is that there's no single discrete-valued resolution variable that accounts for a discrete-valued time variable; it's all continuous distributions, derived quantities, and convolution integrals.
So, we do not treat our data sets in the way you describe, looking for the true dimensionality. Instead we assume some physics model with a fixed number of parameters and seek the probability-maximising value of those parameters. Obviously this approach has its disadvantages compared to the more data-driven method you describe, but basically this is forced upon us by the shape of the problem. It is common to try several different models, and report the variance as a systematic error.
So, to get back to Lie groups, Weyl integration, and representation theory: None of the above. :)
I definitely agree that the type of analysis I originally had in mind is totally different than what you are describing.
Thinking about distributions without thinking about Lie groups makes my brain hurt, unless the distributions you're discussing have no symmetries or continuous properties at all--my guess is that they're there but for your purposes they're swept under the rug?
But yeah in essence the "fitting a distribution" I was thinking is far less constrained I think--you have no idea a priori what the distribution is, so you first attempt to... (read more)