I'd really like to see the follow-up on how to decide which data to actually use. Right now, it's pretty unsatisfactory and I'm left quite confused.
(Unless this was an elaborate plot to get me to read Judea Pearl, whose book I just picked up, in which case, gratz.)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Right, so the challenge is to incorporate as much auxiliary information as possible without overfitting. That's what AdaBoost does - if you run it for T rounds, the complexity of the model you get is linear in T, not exponential as you would get from fitting the model to the finest partitions.
This is in general one of the advantages of Bayesian statistics in that you can split the line between aggregate and separated data with techniques that automatically include partial pooling and information sharing between various levels of the analysis. (See pretty much anything written by Andrew Gelman, but Bayesian Data Analysis is a great book to cover Gelman's whole perspective.)