There is so much confusion surrounding the topic of entropy. Which is somewhat sad, since it's fundamentally a very well-defined and useful concept. Entropy is my strong suit, and I'll try to see if I can help.
There are no 'different definitions' of entropy. Boltzmann and Shannon Entropy are the same concept. The problem is that information theory by itself doesn't give the complete physical picture of entropy. Shannon entropy only tells you what the entropy of a given distribution is, but it doesn't tell you what the distribution of states for a physical system is. This is the root of the 'tension' that you're describing. Much of the problems in reconciling information theory with statistical mechanics have been that we don't often have a clear idea what the distribution of states of a given system is.
which counts macroscopically indistinguishable microstates always increases, except for extremely rare decreases.
The 2nd law is never violated, not even a little. Unfortunately the idea that entropy itself can decrease in a closed system is a misconception which has become very widespread. Disorder can sometimes decrease in a closed system, but disorder has nothing to do with entropy!
Gibbs/Shannon entropy, which counts our knowledge of a system, can decrease if an observer examines the system and learns something new about it.
This is exactly the same as Boltzmann entropy. This is the origin of Maxwell's demon, and it doesn't violate the 2nd law.
the Bayesian Second Law (BSL) tells us that this lack of knowledge — the amount we would learn on average by being told the exact state of the system, given that we were using the un-updated distribution — is always larger at the end of the experiment than at the beginning (up to corrections because the system may be emitting heat)
This is precisely correct and is the proper way to view entropy. Ideas similar to this have been floating around for quite some time and this work doesn't seem to be anything fundamentally new. It just seems to be rephrasing of existing ideas. However if it can help people understand entropy then I think it's a quite valuable rephrasing.
I was thinking about writing a series of blog posts explaining entropy in a rigorous yet simple way, and got to the draft level before real-world commitments caught up with me. But if anyone is interested and knows about the subject and is willing to offer their time to proofread, I'm willing to have a go at it again.
this work doesn't seem to be anything fundamentally new. It just seems to be rephrasing of existing ideas. However if it can help people understand entropy then I think it's a quite valuable rephrasing.
Sean Carroll seems to think otherwise, judging by the abstract:
We derive a generalization of the Second Law of Thermodynamics that uses Bayesian updates to explicitly incorporate the effects of a measurement of a system at some point in its evolution.
[...]
...We also derive refined versions of the Second Law that bound the entropy increase from below by
Sean Carroll et al. posted a preprint with the above title. Sean also has a discussion of it in his blog.
While I am a physicist by training, statistical mechanics and thermodynamics is not my strong suit, and I hope someone with expertise in the area can give their perspective on the paper. For now, here is my summary, apologies for any potential errors:
There is a tension between different definitions of entropy: Boltzmann entropy, which counts macroscopically indistinguishable microstates always increases, except for extremely rare decreases. Gibbs/Shannon entropy, which counts our knowledge of a system, can decrease if an observer examines the system and learns something new about it. Jaynes had a paper on that topic, Eliezer discussed this in the Sequences, and spxtr recently wrote a post about it. Now Carroll and collaborators propose the "Bayesian Second Law" that quantifies this decrease in Gibbs/Shannon entropy due to a measurement:
[...] we derive the Bayesian Second Law of Thermodynamics, which relates the original (un-updated) distribution at initial and final times to the updated distribution at initial and final times. That relationship makes use of the cross entropy between two distributions [...]
[...] the Bayesian Second Law (BSL) tells us that this lack of knowledge — the amount we would learn on average by being told the exact state of the system, given that we were using the un-updated distribution — is always larger at the end of the experiment than at the beginning (up to corrections because the system may be emitting heat)
This last point seems to resolve the tension between the two definitions of entropy, and has applications to non-equilibrium processes, where an observer is replaced with an outcome of some natural process, such as RNA self-assembly.