I've just skimmed Shalizi's paper, so I might be wrong, but it seems to me his argument can be summarized as follows:
If we suppose that entropy is a measure of subjective uncertainty, then it would only increase if the subject lost information about the state of the system as it evolves. If the dynamical laws governing the microscopic evolution of the system are information-preserving, then this loss of information can only come from the way in which the subject updates his/her beliefs about the system's state. But if the subject updates by simply conditionalizing on the system's new macroscopic state, then this cannot happen. Bayesian conditionalization can only add information; it cannot subtract information. So, generically, updating one's beliefs about the system by conditionalization will lead to decrease in uncertainty about the system and therefore a decrease in the system's entropy.
I don't think points (1) and (3) in Eliezer's comment are an adequate response to this argument. Point (1) says that when the observer measures the system in order to conditionalize, the entropy of the observer's memory registers increases, which I guess is supposed to compensate for the decrease in system entropy induced by measurement. But this is a non-response. When we do statistical mechanics, we are not usually interested in the entropy of the system plus the observer; we are just interested in the entropy of the system, and it is this entropy that is observed to increase. Also, the response seems to beg the question. On what grounds does Eliezer claim that measurement increases the entropy of the observer's memory? Couldn't Shalizi's argument just be re-applied at this level?
Eliezer's point 3 (as far as I can make sens of it) is that in a quantum universe, from a within-a-branch perspective, the system evolution will not be unitary (and therefore not information-preserving) because the system will have decohered. This is the same point jimrandomh makes here. This is fair enough, but I don't think the Bayesian should be happy attributing entropy increase solely to quantum world-splitting. Statistical mechanics originated with the assumption that the underlying laws are classical, and in the majority of applications this assumption is retained for computational convenience. If the Bayesian position amounts to a rejection of a majority of the work done in statistical mechanics, it seems a pretty big bullet to bite.
Eliezer's point 2 is ultimately where I think the action's at. We don't update statistical distributions simply by conditionalization. Every statistical mechanics text points out that there is a coarse-graining step. When we update our distribution, we coarse-grain over the fine details of the distribution, "smoothing" it out. It is this step that accounts for entropy increase. Now Shalizi's response is that if you are a Bayesian then adding this non-Bayesian step is epistemically incoherent. One way to respond to this is as Eliezer does: Yup, none of us are perfect Bayesians. We are not even close to logically omniscient, so we are doomed to incoherence.
I think there's another response, which is that the best way to think about the probability distributions in statistical mechanics is not as accurate representations of our degrees of belief. The distributions are constructed to remove distinctions between microscopic states that are irrelevant to our macroscopic interactions with the system. Suppose I pour a blob of milk into a cup of coffee on the right side of the cup and then stir. Eventually the milk will be completely mixed with the coffee. If I had poured the blob on the left side of the cup, the milk would also eventually have ended up in a mixed state. Now, technically, my state of knowledge about the microstate of the mixed cup is different in these two cases. In the first case I know that the microstate must be one that evolves from the milk being poured on the right. In the second case I know it must be one that evolves from the milk being poured on the left. If the dynamics of the cup are information-preserving, then these are disjoint subsets of phase space. If I was updating as a Bayesian, the distributions would be totally different from one another.
But the thing is, the original position of the blob of milk makes no difference to my practical ability to interact with the milk and coffee system now that the milk is mixed. I might remember this original position, but I cannot now use that information to extract work from the system. My causal capacities are not sufficiently fine-grained to allow me to do that. So the information is irrelevant to how I now treat the system, from a thermodynamic point of view. To conserve computational resources, I might as well pick a distribution that ignores this information. That distribution will not be the distribution that best represents my knowledge of the system, but it will be the distribution that most effectively allows me to plan interactions with the system.
So I guess ultimately I agree with Shalizi. Thinking of thermodynamic entropy as the same thing as subjective uncertainty is wrong. This doesn't mean it doesn't have a lot to do with subjective uncertainty, though, since our uncertainty about systems is a very important constraint on our ability to interact with them.
Link to the Question
I haven't gotten an answer on this yet and I set up a bounty; I figured I'd link it here too in case any stats/physics people care to take a crack at it.