Thank you for your input, I found it very informative!
I agree with your point that any aligned AI will be 100% on board with avoiding value drift, and that certainly does take pressure off of us when it comes to researching this. I also agree that it would be best to avoid this scenario entirely and avoid having a self-improving AI touch its value function at all.
In cases where a self-improving AI can alter its values, I don’t entirely agree that this would only be a concern at subhuman levels of intelligence. It seems plausible to me that an AI of human level intelligence, and maybe slightly higher, could think that marginally adjusting a value for improved performance is safe, only to be wrong about that. From a human perspective, I find it very difficult to reason through how slightly altering one of my values would impact my reflective reasoning about the importance of that value and the acceptable ranges it could take. A self-improving agent would also have to make this prediction about a more intelligent version of itself, with the added complication of calculating potential impact for future iterations as well. It’s possible that an agent of human level intelligence would be able to do this easily, but I’m not entirely confident of that.
And the main reason that I bring up the scenario of self-improving AI with access to its own values is that I see this as a clear path to performance improvement that might seem deceptively safe to some organizations conducting general AI research in the future, especially those where external incentives (such as an international General AI arms race) might push researchers to take risks that they normally wouldn’t take in order to beat the competition. If a general AI was properly aligned, I could see certain organizations allowing that AI to improve itself through marginally altering its values out of fear that a rival organization would do the same.
I’m going to reflect upon what you said in more depth though. Since I’m still new to all of this, it’s very possible that there is relevant external information that I’m missing or not considering thoroughly.
A new study was published today with results that contradict those found in the UCSF study that you've written about.
Human Hippocampal Neurogenesis Persists Throughout Aging
The Sorrells et al study is directly mentioned twice in this paper. The first claim is that the study failed to address medicaton and drug use, which impact adult hippocampal neurogenesis.
The second, more interesting claim is that the Sorrells et al study only looked at incredibly tiny slices of the dentate gyrus section of the hippocampus, and that the conditions of preservation for these samples likely impacted ability to detect neurogenesis.
According to this article in today's LA Times, the UCSF group responded to the study.
The times article continues to describe Boldrini's response, which touches upon the parts of her paper that I cited earlier.