I really enjoyed Brandon Sanderson's Secret Project #3 and I recommend it to everyone. Without spoiling anything, here is a fun fact: In it, people stack pebbles into heaps; similarly to Sorting Pebbles Into Correct Heaps, a text from this community I still think about semi-frequently (another is The Virtue of Silence). So if you take recommendations from random lesswrong users, give it a try!
Have P proxy and V value. Based on past observances P is correlated with V.
Increase P! (Either directly or by introducing a reward to the agents inside the system for increasing P, who cares)
Two cases:
P does not cause V
P causes V
Case 1: Wow, Goodhart is a genius! Even though I had a correlation, I increased one variable and the other did not increase!
Case 2: Wow, you are pedantic. Obviously if the relationship between the variables is so special that P causes V, Goodhart's law won't apply. If I increase the amount of weight lifted (proxy), then obviously I will get visibly bigger muscles (value). Booring! (Also, I'm really good at seeing causal relationships even when they don't exist (human universal), so I will basically never feel surprise when I actually find one. That will be the expected outcome, so I will look strangely at anyone trying to test Goodhart's law on any two pair of variables which have even a sliver of a chance of being in a causal relationship)
You are right, that is also a possibility. I only considered cases with one intervention, because the examples I've heard given for Goodhart's law only contain one (I'm thinking of UK monetary policy, Soviet nail factory and other cases where some "manager" introduces an incentive toward a proxy to the system). However, multiple intervention cases can also be interesting. Do you know of a real world example where the first intervention on the proxy raised the target value, but the second, more extreme one, did not (or vica versa)? My intuition suggests that in the real world those type of causal influences are rare and also, I don't think we can say that "P causes V" in those cases. Do you think that is too narrow of a definition?
Do you know of a real world example where the first intervention on the proxy raised the target value, but the second, more extreme one, did not (or vica versa)?
Here's a fictional story:
You decide to study more. Your grades go up. You like that, so you decide to study really really hard. You get burnt out. Your grades go down. (There's also an argument here that the metric - grades - isn't necessarily ideal, but that's a different thing.)*
*There might be a less extreme version involving 'you stay up late studying', and 'because you get less sleep it has less effect (memory stuff)'.
This isn't meant as an unsolvable problem - it's just that:
and
are both true.
Maybe this style of mechanism, or 'causal influence' is rare. But its (biological) nature arguably, may characterize a domain (life). So in that area at least, it's worth taking note of.
I guess I'm saying, if you want to know if you have to be worried about Goodhart's Law, in general, I think it depends. Just spend time optimizing your metric, and spend time optimizing for you metric, and see what happens. If you want more specific feedback, I think you'll probably have to be more specific.
Even in your weightlifting example, there is a point where adding more weight no longer improves your outcome.
3 (not so easy) steps to understand consciousness:
epistemic status: layman, so this is unlikely to have any value to those well-versed in philosophy, but their input is ofc appreciated if given
1. Understand what difficult words like consciousness and qualia point to. This is hard because most of our words point to objects/relations in the physical world and "what it is to be like someone/sth"/"the blueness of blue" does not. I've seen people first getting acquainted with these words have trouble disentangling these concepts from things in the physical world, eg: signals travelling through nerves. However, these people aren't usually that interested in philosophy of mind much anyways. The weirdness of consciousness is what makes it interesting, and without noticing the weirdness, why would they be interested in it instead of the workings of the liver eg?
2. Understanding the extreme success of physicalism (ie the belief that everything is physical) through the history of humanity. Ghosts, gods, and vis vitalis are the examples usually cited, but of course we could consider every phenomenon which was initially unexplained and then later explained by science as a victory of the physicalist worldview. On lesswrong it's imo unlikely that there are many people who would have trouble with this point, but given broader society (religion, astrology, occult, etc..) I do consider this point difficult.
After these two steps, quick thinking people might notice the tension between point 1 and 2: How come any time in the history of humanity when we thought we have a non-physical object we turned out to be mistaken yet consciousness is clearly non-physical?
There are (imo) many wrong ways to resolve this tension, however the correct way is one with which I believe many lesswrongers (at least if they're similar to me and dissimilar to the ideal rationalist in this respect) would have some trouble:
3. Humility. You have to conceive of the possibility that you're mistaken about an experience which is, in some ways, closest to your perception: that there is something to be like you. I'm not saying (at least at this point) to accept it, just that you should simply consider the possibility (similarly to how you would consider a given mathematical statement to be true and then false, even though it can logically only be one of true or false) or, as people around here usually refer to this, do a bayesian calculation! It's important that when I say Bayesian calculation I strictly mean the calculation and not any phenomenal part of it: we want that you(r possible counterpart) in the alternate, possibly not-actual, but conceivable world where there is no phenomenal consciousness to also be able to execute the bayesian calculation! So what is the Bayesian calculation in detail exactly? There are two possible worlds/possibilities whose odds we are curious about:
W1. The folk conception of non-physical consciousness exists, there is something to be like me.
W2. The folk conception of non-physical consciousness does not exist, there is nothing to be like me, BUT the world is such that in my physical brain a statement is encoded: I have first-hand, direct access to my own non-physical consciousness.
In turn we have two pieces of (Bayesian) evidence:
E1. point 2 about the previous track-record of the physicalist worldview.
E2. My immediate access to my consciousness: my belief that there is something to be like me and that I can't be mistaken about this.
The key, here, is to notice that E2 is predicted both in W1 AND in W2. We have a specific expression for this type of evidence! That's right: not evidence!
Therefore, irrespective of the priors, we are much, much more likely to be in W2 than in W1. Sure, there is the question of why is there such a weird belief encoded in our brains? Is it society? Is it biology? I don't know. But who cares? Notice that that is a question purely about the physical world: what is the causal chain leading to my incorrect belief about my immediate access to my consciousness existing? Nothing to do with the hard problem.
Now, I would like to talk about an (imo) wrong resolution which might be common here: After someone understands point 1 and point 2, they might try to resolve the tension by insisting that even though consciousness seems non-physical, it IS physical or at least "supervenes" on the physical. These people are usually in the process of noticing their confusion so I urge them to take to plunge, conceive of the possibility that they're wrong, do the the bayesian calculation and don't redefine words (similarly: ghosts are by definition non-physical even though they don't exist)!
Feedback appreciated!