Glad to see engagement on this - and I should probably respond to some of these points, but before doing so, want to point to where I've already done work on this, since much of that work either admits your points, or addresses them.
First, I think you should read the paper I wrote with Scott that extended the thoughts from his post. It certainly doesn't address all of this, but we were very clear that adversarial Goodhart was less clear than the other modes and needed further work. We also more clearly drew the connection to tails fall apart, and clarified some of the sub-cases of both extremal and causal Goodhart. Following that, I wrote another post on the topic, trying to expand on the points made in the paper - but specifically excluding multi-agent issues, because they were hard and I wasn't clear enough about how they worked.
I tried to do a bit of that work in a paper, Multiparty Dynamics and Failure Modes for Machine Learning and Artificial Intelligence. This attempts to provide a categorization for multi-agent cases similar to the one made in Scott's post. It made a few key points that I think need further discussion about the relationship to embedded agents, and other issues. I was less successful than I hoped at cutting through the confusion, but a key point it does make is that all multi-agent failures are actually single agent failure modes, but they are caused by misaligned goals or coordination failures. (And these aren't all principal-agent issues, though I agree that many are. For instance, some cases are tragedy of the commons, and others are more direct corruption of the other agents.) I also summarized the paper a bit and expanded on certain key points in another lesswrong post.
And since I'm giving a reading list, I also think my even more recent, but only partially-completed sequence of posts on optimization and selection versus control (in the single agent cases) might clarify some of the points about Regressional versus Extremal Goodhart further. Post one of that sequence is here.
Some time ago, I mentioned in a comment some issues I had with the Goodhart Taxonomy:
After thinking about things more, I'd like to expand on these points:
"Regressional Goodhart" isn't Goodhart
I am not 100% sure I correctly understand what is meant by "Regressional Goodhart". But looking at the example from the post:
We see that, if optimizing for basketball ability (V) given only height (U), we won't get the best basketball player, simply because they are imperfectly correlated.
If "Regressional Goodhart" is simply the observation that, if we can't perfectly measure something, we won't perfectly optimize it, then this seems not only obvious to anyone (even if they haven't heard of Goodhart's Law), but it doesn't even seem to match the definition of "Goodhart", which is usually formulated in one of two ways:
G1) When a measure becomes a target, it ceases to be a good measure
G2) Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes
Suppose U,V are imperfectly correlated, but their relationship is monotonic, and the other 3 categories of Goodhart don't apply (e.g. the "straightforward" basketball example, where a 7 foot tall player is better on average than one who's merely 6'3'').
Then if you optimize for V through U, the latter won't "cease to be a good measure"(as in G1): it was a decent but imperfect measure before, and it didn't become any worse as a measure at the upper end. Similarly, the statistical regularity between U and V will continue to hold in this region (unlike G2).
(Technically, the correlation coefficient between two variables usually does decrease under range restriction, but I don't think this is the phenomenon that the OP had in mind).
So at this point, we can simply say this sense of "Regressional Goodhart" isn't Goodhart, and reduce the taxonomy to the other 3 categories. Or, if we think it is pointing at a similar phenomenon as the other 3 categories, we can redefine "Goodhart" slightly more broadly to include it. I have two candidates:
G3) Given a goal V and an imperfect but correlated proxy U, attempts to optimize V given only access to U will often fail
G4) Given a goal V and an imperfect but correlated proxy U, attempts to optimize V given only access to U will often fail, over and above the obvious "irreducible error" arising from the imperfection in the correlation
Personally, I favor (G4), since the term "Goodhart" for me carries some connotation of "unexpected things going wrong", in which case "Regressional Goodhart" seems to just point at the irreducible optimization error that's beside the point of the Law.
On the other hand, the Tails Come Apart phenomenon is arguably unintuitive. So if "Regressional Goodhart" stipulates that "highly" correlated variables will have more irreducible optimization error than one naively expects, then that could be a sensible definition. In any case, I feel this should be more explicit.
Further Goodhart Variants
[epistemic status: this section still feels very half-baked, mostly including it because "why not"]
On reflection, I do think "Adversarial" is a good category of Goodhart to have, but it nevertheless seems extremely broad compared to the others, and worth trying to further subdivide. Here are some tentative stabs at doing so, by trying to answer the question:
Misshapen gradients vs. unforeseen maxima
When specifying a reward function, we can err by having its global optimum be misaligned with what we actually want. This has been described and exemplified here.
On the other hand, even if the global optima match up, we can still run into problems if the incentive gradients on our reward function are locally messed up, e.g. taxing 15% of income if one earns > $K, and 10% for <= $K, which sets up bad incentives for those with an income near $K. [ETA] Or this example:
Corrupted information
On a first take, it seems like one could define wireheading as "Adversarial Goodhart where the agent has enough influence over the reporting, that it's easier for it to deceive the principal than to actually achieve the goal."
Compare with examples like this, from Wikipedia:
How different are these? Should they be put in the same bucket?
Simple project idea
It could make for an interesting project if someone simply took a bunch of examples of Goodhart's Law, and put them into the Goodhart Taxonomy. Examples can come from:
This could be instructive for testing the Goodhart Taxonomy (and my proposed modifications to it) or revealing new and better categories.