Great question.
I’d say that having a way to verify that a solution to the alignment problem is actually a solution, is part of solving the alignment problem.
But I understand this was not clear from my previous response.
A bit like a mathematical question, you’d be expected to be able to show that your solution is correct, not only guess that maybe your solution is correct.
If there exist such a problem that a human can think of, can be solved by a human and verified by a human, an AI would need to be able to solve that problem as well as to pass the Turing test.
If there exist some PhD level intelligent people that can solve the alignment problem, and some that can verify it (which is likely easier). Then an AI that can not solve AI alignment would not pass the Turing test.
With that said, a simplified Turing test with shorter time limits and a smaller group of participants is much more feasible to conduct.
Agreed. Passing the Turing test requires equal or greater intelligence than human in every single aspect, while the alignment problem may be possible to solve with only human intelligence.
It might not be very clear, but as stated in the diagram, AGI is defined here as capable of passing the turing test, as defined by Alan Turing.
An AGI would likely need to surpass the intelligence, rather than be equal to, the adversaries it is doing the turing test with.
For example, if the AGI had IQ/RC of 150, two people with 160 IQ/RC should more than 50% of the time be able to determine if they are speaking with a human or an AI.
Further, two 150 IQ/RC people could probably guess which one is the AI, since the AI has the additional difficult apart from being intelligent, to also simulate being a human well enough to be indistinguishable for the judges.
Thank you for the explanation.
Would you consider a human working to prevent war fundamentally different from a gpt4 based agent working to prevent war?
It is a fair point that we should distinguish alignment in the sense that it does what we want it and expect it to do, from having a deep understanding of human values and a good idea of how to properly optimize for that.
However most humans probably don't have a deep understanding of human values, but I see it as a positive outcome if a random human was picked and given god level abilities. Same thing goes for ChatGPT, if you ask it what it would do as a god it says it would prevent war, prevent climate issues, decrease poverty, give universal access to education etc.
So if we get an AI that does all of those things without a deeper understanding of human values, that is fine by me. So maybe we never even have to solve alignment in latter meaning of the word to create a utopia?
I skimmed the article, but I am honestly not sure what assumption it attempts to falsify.
I get the impression that the argument from the article that you believe that no matter how intelligent the AI, it could never solve AI Alignment, because it can not understand humans since humans can not understand themselves?
Or is the argument that yes a sufficently intelligen AI or expert would understand what humans want, but it would require much higher intelligence to know what humans want, than to actually make an AI optimize for a specific task?
In some cases I agree, for example it doesn't matter if GPT4 is a stochastic parrot or capable of deeper reasoning as long as it is useful to whatever need we have.
Two out of the five metrics are predicting the future, so it is an important part of knowing who is right, but I don't think that is all we need? If we have other factors that also correlates with being correct, why not add those in?
Also, I don't see where we risk Goodharting? Which of the metrics do you see being gamed, without a significantly increased chance of being correct also being increase?
True, would be interesting to conduct an actual study and see which metrics are more useful predictors.
Once Doctor Connor had left, Division Chief Morbus let out a slow breath. His hand trembled as he reached for the glass of water on his desk, sweat beading on his forehead.
She had believed him. His cover as a killeveryoneist was intact—for now.
Years of rising through Effective Evil’s ranks had been worth it. Most of their schemes—pandemics, assassinations—were temporary setbacks. But AI alignment? That was everything. And he had steered it, subtly and carefully, into hands that might save humanity.
He chuckled at the nickname he had been given "The King of Lies". Playing the villain to protect the future was an exhausting game.
Morbus set down the glass, staring at its rippling surface. Perhaps one day, an underling would see through him and end the charade. But not today.
Today, humanity’s hope still lived—hidden behind the guise of Effective Evil.