I think you're ignoring the difference between the Boltzmann and Gibbs entropy, both here and in your original comment. This is going to be long, so I apologize in advance.
Gibbs entropy is a property of ensembles, so it doesn't change when there is a spontaneous fluctuation towards order of the type you describe. As long as the gross constraints on the system remain the same, the ensemble remains the same, so the Gibbs entropy doesn't change. And it is the Gibbs entropy that is most straightforwardly associated with the Shannon entropy. If you interpret the ensemble as a probability distribution over phase space, then the Gibbs entropy of the ensemble is just the Shannon entropy of the distribution (ignoring some irrelevant and anachronistic constant factors). Everything you've said in your comments is perfectly correct, if we're talking about Gibbs entropy.
Boltzmann entropy, on the other hand, is a property of regions of phase space, not of ensembles or distributions. The famous Boltzmann formula equates entropy with the logarithm of the volume of a region in phase space. Now, it's true that corresponding to every phase space region there is an ensemble/distribution whose Shannon entropy is identical to the Boltzmann entropy, namely the distribution that is uniform in that region and zero elsewhere. But the converse isn't true. If you're given a generic ensemble or distribution over phase space and also some partition of phase space into regions, it need not be the case that the Shannon entropy of the distribution is identical to the Boltzmann entropy of any of the regions.
So I don't think it's accurate to say that Boltzmann and Shannon entropy are the same concept. Gibbs and Shannon entropy are the same, yes, but Boltzmann entropy is a less general concept. Even if you interpret Boltzmann entropy as a property of distributions, it is only identical to the Shannon entropy for a subset of possible distributions, those that are uniform in some region and zero elsewhere.
As for the question of whether Boltzmann entropy can decrease spontaneously in a closed system -- it really depends on how you partition phase space into Boltzmann macro-states (which are just regions of phase space, as opposed to Gibbs macro-states, which are ensembles). If you define the regions in terms of the gross experimental constraints on the system (e.g. the volume of the container, the external pressure, the external energy function, etc.), then it will indeed be true that the Boltzmann entropy can't change without some change in the experimental constraints. Trivially true, in fact. As long as the constraints remain constant, the system remains within the same Boltzmann macro-state, and so the Boltzmann entropy must remain the same.
However, this wasn't how Boltzmann himself envisioned the partitioning of phase space. In his original "counting argument" he partitioned phase space into regions based on the collective properties of the particles themselves, not the external constraints. So from his point of view, the particles all being scrunched up in one corner of the container is not the same macro-state as the particles being uniformly spread throughout the container. It is a macro-state (region) of smaller volume, and therefore of lower Boltzmann entropy. So if you partition phase space in this manner, the entropy of a closed system can decrease spontaneously. It's just enormously unlikely. It's worth noting that subsequent work in the Boltzmannian tradition, ranging from the Ehrenfests to Penrose, has more or less adopted Boltzmann's method of delineating macrostates in terms of the collective properties of the particles, rather than the external constraints on the system.
Boltzmann's manner of talking about entropy and macro-states seems necessary if you want to talk about the entropy of the universe as a whole increasing, which is something Carroll definitely wants to talk about. The increase in the entropy of the universe is a consequence of spontaneous changes in the configuration of its constituent particles, not a consequence of changing external constraints (unless you count the expansion of the universe, but that is not enough to fully account for the change in entropy on Carroll's view).
This is going to be a somewhat technical reply, but here goes anyway.
Boltzmann entropy, on the other hand, is a property of regions of phase space, not of ensembles or distributions. The famous Boltzmann formula equates entropy with the logarithm of the volume of a region in phase space. Now, it's true that corresponding to every phase space region there is an ensemble/distribution whose Shannon entropy is identical to the Boltzmann entropy, namely the distribution that is uniform in that region and zero elsewhere.
You cannot calculate the Shannon entro...
Sean Carroll et al. posted a preprint with the above title. Sean also has a discussion of it in his blog.
While I am a physicist by training, statistical mechanics and thermodynamics is not my strong suit, and I hope someone with expertise in the area can give their perspective on the paper. For now, here is my summary, apologies for any potential errors:
There is a tension between different definitions of entropy: Boltzmann entropy, which counts macroscopically indistinguishable microstates always increases, except for extremely rare decreases. Gibbs/Shannon entropy, which counts our knowledge of a system, can decrease if an observer examines the system and learns something new about it. Jaynes had a paper on that topic, Eliezer discussed this in the Sequences, and spxtr recently wrote a post about it. Now Carroll and collaborators propose the "Bayesian Second Law" that quantifies this decrease in Gibbs/Shannon entropy due to a measurement:
[...] we derive the Bayesian Second Law of Thermodynamics, which relates the original (un-updated) distribution at initial and final times to the updated distribution at initial and final times. That relationship makes use of the cross entropy between two distributions [...]
[...] the Bayesian Second Law (BSL) tells us that this lack of knowledge — the amount we would learn on average by being told the exact state of the system, given that we were using the un-updated distribution — is always larger at the end of the experiment than at the beginning (up to corrections because the system may be emitting heat)
This last point seems to resolve the tension between the two definitions of entropy, and has applications to non-equilibrium processes, where an observer is replaced with an outcome of some natural process, such as RNA self-assembly.