What ethics? What is ethics? Does the machine mind settle on an ethics that insists on wiping out humanity? I think the intuition of looking for a star able attractor state for the AI to pursue is a good one. But "ethics" is not a sufficiently coherent and objective concept to serve as that target.
If the process of self-improving AIs like described in an simple article by Tim Urban (below) is mastered, then the AI alignment problem is solved.
I would say this has causality backwards. In other words, one of the ways of solving the AI alignment problem is figuring out how to master the plausibly extremely complex process necessary to successfully implement a strategy that can be pointed to in a simple article.
research on AI, ON ETHICS, and coding changes into itself
As I understand it, the vast majority of the difficulty is in figuring out what the second goal in that list actually is, and how to make an AI care about it. Keep in mind that in so many cases we humans are still arguing about the same questions, answers, and frameworks that we've been debating for millennia.
This overall tactic can work well for problems that are difficult to solve, but easy (or at least possible) to test a solution.
I don't think alignment is such a thing. At least I haven't seen any proposals for measuring "how aligned" a system is.
I have seen many. the only ones that seem to have any chance of, after heavy modification, becoming a seed of something that holds up, are QACI and open agency+boundaries. both have big holes that make attempting to implement them as-is guaranteed to fail.
It's not clear if this ends up working as intended, but there are proposals to that effect.
For example, "Safety without alignment", https://arxiv.org/abs/2303.00752 proposes to explore a path which is closely related to what you are suggesting.
(It would be helpful to have a link to Tim Urban's article.)
Thanks for including the link in your edit.
One factor which is important to consider is how likely a goal or a value to persist during self-improvements (those self-improvements might end up being quite radical, and also fairly rapid).
An arbitrary goal or value is unlikely to persist (this is why the "classical formulation of alignment problem" is so difficult, the difficulties come from many directions, but the most intractable one is how to make it so that the desired properties are preserved during radical self-modifications). That's the main obstacle t...
If the process of self-improving AIs like described in an simple article by Tim Urban (below) is mastered, then the AI alignment problem is solved: "The idea is that we’d build a computer whose two-THREE major skills would be doing research on AI, ON ETHICS, and coding changes into itself—allowing it to not only learn but to improve its own architecture. We’d teach computers to be computer scientists so they could bootstrap their own development. And that would be their main job—figuring out how to make themselves smarter and ALIGNED"
In caps: parts to add for alignment
Link to the article: https://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html