Gelman wants to throw everything he can into his models -- and then use multilevel (a.k.a. hierarchical) models to share information between exchangeable (or conditionally exchangeable) batches of parameters. The key concept: multilevel model structure makes the "effective number of parameters" become a quantity that is itself inferred from the data. So he can afford to take his "against parsimony" stance (which is really a stance against leaving potentially useful predictors out of his models) because his default model choice will induce parsimony just when the data warrant it.
I think one of Gelman's comments in the first link is helpful:
...In principle, models (at least for social-science phenomena) should be ever-expanding flowers that have have within them the capacity to handle small data sets (in which case, inferences will be pulled toward prior knowledge) or large data sets (in which case, the model will automatically unfold to allow the data to reveal more about the phenomenon under study). A single model will have zillions of parameters, most of which will barely be "activated" if sample size is not large.
In pr
In two posts, Bayesian stats guru Andrew Gelman argues against parsimony, though it seems to be favored 'round these parts, in particular Solomonoff Induction and BIC as imperfect formalizations of Occam's Razor.
Gelman says: