ricketson comments on Simpson's Paradox - Less Wrong

68 Post author: bentarm 12 January 2011 11:01PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (58)

You are viewing a single comment's thread. Show more comments above.

Comment author: wnoise 13 January 2011 09:22:06PM *  5 points [-]

Partitioning may reverse the correlation or it may not; either way, it provides a more accurate model.

Usually. But, partitioning reduces the number of samples within each partition, and can thus increase the effects of chance. This is even worse if you have a lot of variables floating around that you can partition against. At some point it becomes easy to choose a partition that purely by coincidence is apparently very predictive on this data set, but that actually has no causal role.

RobinZ is that P(R|G,T) might overfit the data: the accuracy improvement achieved by including G might not justify the increase in model complexity.

Exactly.