You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

gwern comments on Stupid Questions August 2015 - Less Wrong Discussion

7 Post author: Grothor 01 August 2015 11:08PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (129)

You are viewing a single comment's thread. Show more comments above.

Comment author: gwern 12 August 2015 12:57:15AM *  3 points [-]

You can estimate the bias of each measurer much more efficiently if you have them measure the same sample, yes, analogous to crossover: now the differences are due less to the wide diversity of the sampled population and more to the particular measurer.

(To put it a little more mathily, when each measurer measures different samples, then the measurements will be spread very widely because it's Var(measurer-bias) + Var(population); but if we have the measurers measure the same sample, then Var(population) drops out and now there's just Var(measurer-bias). If I measure a sample and get 2.9 and you measure it as well and get 3.1, then probably the sample is really ~3.0 and my bias is -0.1 and your bias is +0.1. If I measure one sample and get 2.9 and you measure a different sample and get 3.1, then my bias and your bias are... ???)

For example, the classic example for MLMs is you have n classrooms' test scores, and you want to figure out the teachers' effects. It's hard to tell because the classrooms' average scores will differ a lot on their own. This is analogous to your original description: each measurer gets their own batch of samples. But what if you had a crossed design of one classroom with test scores after it's taught by each teacher? Then much of the differences in the average score will be due to the particular effect of each teacher and that will be much easier to estimate.

So if we look at the difference in differences between x1and x2, and it is greater for some middle latent variables (ways of staining) than for others, can we use it as a measure of 'the overall variability of the measuring method'? Say, if we have ten measurers and four measuring methods...

I guess. From a factor analysis perspective, you just want to pick the one with the highest loading on X, I think.

Comment author: Romashka 17 August 2015 12:12:21PM 1 point [-]

Huh. Your answer was even more useful for me than I expected. My 'secret agenda' is to put forth another mountant medium, which might have advantages over the one in use, but I will have to show that they do not differ in preparation quality. I think I am going to do a 2-by-2 crossover.

So - thank you! Analogies for the win!

Comment author: Romashka 12 August 2015 03:49:21AM 1 point [-]

The problem is that whatever one I will find the most desirable, other people will continue using the methods they are good at. And I will have to somehow compare x(A)1, x(B)32 and x(C)3...

And this is a relatively straightforward situation, things are often much less clear in environmental science, already on the methodology level.

Comment author: gwern 12 August 2015 03:34:48PM *  2 points [-]

The problem is that whatever one I will find the most desirable, other people will continue using the methods they are good at. And I will have to somehow compare x(A)1, x(B)32 and x(C)3...

I don't really understand the problem. Yes, maybe you can't control them and get everyone onto the same method page. But I've already explained how you deal with that, given you the relevant keywords to search for like 'measurement error', and also given you example R code implementing several approaches.

They all take the basic approach of treating it as data/measurements which load on a latent variable for each method, and each method loads on the latent variable which is what you actually want; then you can infer whatever you need to. The first level of latent variables helps you estimate the biases of each category, some of which may be smaller than others, and then you collectively use them to estimate the final latent variable. Now you have a principled way to unify all your data from disparate methods which measure in similar but not identical way the variable you care about. If someone else comes up with a new method, it can be incorporated like the rest.

Comment author: Romashka 12 August 2015 04:22:01PM 0 points [-]

Right - sorry, melting brain. (Also, I had just thought that the assumed 10% difference between two measurers has not, in fact, been established rigorously, and that derailed the still-solid brain...)