I suspect that, at least if such an answer has good performance, it would be widely used. Quite a few problems with current ML look like some form of goodhearting. So even in a short term pragmatic sense it would be useful. (Of course, making short term goodhearting not in practice a problem is easier than a full alignment solution.
At least a fair few high status people will look over it and say it looks good. Quite a lot of people will be saying "this is what you should do for AI safety". So there is a decent chance the first project uses this technique, especially if a bunch of rationalist types throw their weight behind it. And after that, its decisive strategic advantage.
Donald Hobson has a good point about goodharting, but it's simpler than that. While some people want alignment so that everyone doesn't die, the rest of us still want it for what it can do for us. If I'm prompting a language model with "A small cat went out to explore the world" I want it to come back with a nice children's story about a small cat that went out to explore the world that I can show to just about any child. If I prompt a robot that I want it to "bring me a nice flower" I do not want it to steal my neighbor's rosebushes. And so on, I want it to be safe to give some random AI helper whatever lazy prompt is on my mind and have it improve things by my preferences.
This should be, in fact, a default hypothesis since enough people outside of the EA bubble will actively want to use AI (perhaps, aligned to them personally instead of wider humanity) for their own competitive advantage without any regard to other people well-being or long-term survival of humanity
So, a pivotal act, with all its implied horrors, seems to be the only realistic option
I suppose it would depend on specific properties of the solution; most importantly, whether it makes the AI development significantly more expensive.
If it is not too expensive, projects probably will use it, not necessarily because they genuinely care about safety, but because "safety" is another checkbox they will be able to mark on their product. The greatest risk here is that the version actually implemented will differ from the proposed solution, maybe even become its very opposite (like what happens when software companies try to become "agile").
If it is too expensive, then the projects will probably avoid it. Unless perhaps the rich corporations decide that making it mandatory is a great way to get rid of the competition, in which case they would lobby the regulators to make it mandatory.
(So... does it mean we should actually hope that the solution is super expensive? Not sure, because both things can happen simultaneously: the rich corporations using regulators to make "safety" mandatory, and then creating their own corporate flavor of "safety" that preserves some of the rituals but ignores the inconvenient critical parts.)
Suppose that next year, AI Safety is solved, the solution is approved by Eliezer Yudkowsky, etc.
How do we actually get people to follow this solution?
It seems to me a lot of people or companies will ignore any AI Safety solutions for the same reasons they are currently ignoring AI Safety:
- They think AGI is still very far away, so AI Safety methods won't need to be applied to the development of current narrow AI systems
- The concepts of AI Safety are difficult to understand, leading to incorrect application of an AI Safety solution or failure to apply it at all
- They will fall behind their competitors or make less money if they adhere to the AI Safety solution
- They have a new cool idea they want to test out and just don't care about or believe in the concerns raised by AI Safety
Thoughts?