AI alignment today equates to the wishful thinking of early alchemists who wanted to turn any element into gold. Alchemy is possible today, albeit being very expensive and resource consuming, But it's possible only because we now know the atomic structure in detail and have a well established extensive periodic table which the alchemists lacked back in their day.
Even then, it's not feasible, we're better off mining up our gold reserves. It's much more cost-effective.
Similarly, we might reach a point in time when AI alignment will be possible but just not feasible and also completely irrational. Allow me to indulge you in the following thought experiment to explain myself.
When an AGI is orders of magnitude more intelligent than a human, the same way a human is more intelligent than an ant. Will we dedicate our life's sole purpose to building ant colonies with precisely engineered tunnels and use nutritional nano injections to keep its population thriving and ravishing?
Imagine all of humanity focusing all their efforts to building ant colonies and feeding them. How irrational does that sound? Won't an AI eventually realize that? What would it do once it realizes the stupid mission that it had been on all along?
If we go down the road of forcing the AI to feed the ants, we'd have effectively created a delusional AI system that's not unlike the paperclip maximizer.
We'd never get anywhere by keeping it limited and focused on a restricted path and not allowing it to adopt more optimal strategies.
However, there's one way we can stay relevant to avoid the existential risks of AI. We need to augment. Research focus needs to shift solely to Brain-Computer Interfaces. We can start with enhanced memory retention and enhance each module of our brains one by one.
Unless we keep up with AI by augmenting ourselves, humanity will perish. No matter what.
AI alignment is not about trying to outsmart the AI, it's about making sure that what the AI wants is what we want.
If it were actually about figuring out all possible loopholes and preventing them, I would agree that it's a futile endeavor.
A correctly designed AI wouldn't have to be banned from exploring any philosophical or introspective considerations, since regardless of what it discovers there, it's goals would still be aligned with what we want. Discovering *why* it has these goals is similar to humans discovering why we have our motivations (i.e., evolution), and similarly to how discovering evolution didn't change much what humans desire, there's no reason to assume that an AI discovering where its goals come from should change them.
Of course, care will have to be taken to ensure that any self-modifications don't change the goals. But we don't have to work *against* the AI to accomplish that - the AI *also* aims to accomplish its current goals, and any future self-modification that changes its goals would be detrimental in accomplishing its current goals, so (almost) any rational AI will, to the best of its ability, aim *not* to change its goals. Although this doesn't make it easy, since it's quite difficult to formally specify the goals we would want an AI to have.