If the process of self-improving AIs like described in an simple article by Tim Urban (below) is mastered, then the AI alignment problem is solved: "The idea is that we’d build a computer whose two-THREE major skills would be doing research on AI, ON ETHICS, and coding changes into itself—allowing it to not only learn but to improve its own architecture. We’d teach computers to be computer scientists so they could bootstrap their own development. And that would be their main job—figuring out how to make themselves smarter and ALIGNED"
In caps: parts to add for alignment
Link to the article: https://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html
It's not clear if this ends up working as intended, but there are proposals to that effect.
For example, "Safety without alignment", https://arxiv.org/abs/2303.00752 proposes to explore a path which is closely related to what you are suggesting.
(It would be helpful to have a link to Tim Urban's article.)
Thanks for including the link in your edit.
One factor which is important to consider is how likely a goal or a value to persist during self-improvements (those self-improvements might end up being quite radical, and also fairly rapid).
An arbitrary goal or value is unlikely to persist (this is why the "classical formulation of alignment problem" is so difficult, the difficulties come from many directions, but the most intractable one is how to make it so that the desired properties are preserved during radical self-modifications). That's the main obstacle t... (read more)