This is going to be a somewhat technical reply, but here goes anyway.
Boltzmann entropy, on the other hand, is a property of regions of phase space, not of ensembles or distributions. The famous Boltzmann formula equates entropy with the logarithm of the volume of a region in phase space. Now, it's true that corresponding to every phase space region there is an ensemble/distribution whose Shannon entropy is identical to the Boltzmann entropy, namely the distribution that is uniform in that region and zero elsewhere.
You cannot calculate the Shannon entropy of a continuous distribution so this doesn't make sense. However I see what you're getting at here - if we assume that all parts of the phase space have equal probability of being visited, then the 'size' of the phase space can be taken as proportional to the 'number' of microstates (this is studied under ergodic theory). But to make this argument work for actual physical systems where we want to calculate real quantities from theoretical considerations, the phase space must be 'discretized' in some way. A very simple way of doing this is the Sackur-Tetrode formulation which discretizes a continuous space based on the Heisenberg uncertainty principle ('discretize' is the best word I can come up with here -- what I mean is not listing the microstates but instead giving the volume of the phase space in terms of some definite elementary volume). But there's a catch here. To be able to use the HUP, you have to formulate the phase space in terms of complementary parameters. For instance, momentum+position, or time+energy.
However, this wasn't how Boltzmann himself envisioned the partitioning of phase space. In his original "counting argument" he partitioned phase space into regions based on the collective properties of the particles themselves, not the external constraints.
My previous point illustrates why this naive view is not physical - you can't discretize any kind of system. With some systems - like a box full of particles that can have arbitrary position and momentum - you get infinite (non-physical) values for entropy. It's easy to see why you can now get a fluctuation in entropy - infinity 'minus' some number is still infinity!
I tried re-wording this argument several times but I'm still not satisfied with my attempt at explaining it. Nevertheless, this is how it is. Looking at entropy based on models of collective properties of particles may be interesting theoretically but it may not always be a physically realistic way of calculating the entropy of the system. If you go through something like the Sackur-Tetrode way, though, you see that Boltzmann entropy is the same thing as Shannon entropy.
Boltzmann's original combinatorial argument already presumed a discretization of phase space, derived from a discretization of single-molecule phase space, so we don't need to incorporate quantum considerations to "fix" it. The combinatorics relies on dividing single-particle state space into tiny discrete boxes, then looking at the number of different ways in which particles could be distributed among those boxes, and observing that there are more ways for the particles to be spread out evenly among the boxes than for them to be clustered. Witho...
Sean Carroll et al. posted a preprint with the above title. Sean also has a discussion of it in his blog.
While I am a physicist by training, statistical mechanics and thermodynamics is not my strong suit, and I hope someone with expertise in the area can give their perspective on the paper. For now, here is my summary, apologies for any potential errors:
There is a tension between different definitions of entropy: Boltzmann entropy, which counts macroscopically indistinguishable microstates always increases, except for extremely rare decreases. Gibbs/Shannon entropy, which counts our knowledge of a system, can decrease if an observer examines the system and learns something new about it. Jaynes had a paper on that topic, Eliezer discussed this in the Sequences, and spxtr recently wrote a post about it. Now Carroll and collaborators propose the "Bayesian Second Law" that quantifies this decrease in Gibbs/Shannon entropy due to a measurement:
[...] we derive the Bayesian Second Law of Thermodynamics, which relates the original (un-updated) distribution at initial and final times to the updated distribution at initial and final times. That relationship makes use of the cross entropy between two distributions [...]
[...] the Bayesian Second Law (BSL) tells us that this lack of knowledge — the amount we would learn on average by being told the exact state of the system, given that we were using the un-updated distribution — is always larger at the end of the experiment than at the beginning (up to corrections because the system may be emitting heat)
This last point seems to resolve the tension between the two definitions of entropy, and has applications to non-equilibrium processes, where an observer is replaced with an outcome of some natural process, such as RNA self-assembly.