[…] The notion of an argument that convinces any mind seems to involve a little blue woman who was never built into the system, who climbs out of literally nowhere, and strangles the little grey man, because that transistor has just got to output +3 volts: It's such a compelling argument, you see.
But compulsion is not a property of arguments, it is a property of minds that process arguments.
[…]
And that is why (I went on to say) the result of trying to remove all assumptions from a mind, and unwind to the perfect absence of any prior, is not an ideal philosopher of perfect emptiness, but a rock. What is left of a mind after you remove the source code? Not the ghost who looks over the source code, but simply... no ghost.
So—and I shall take up this theme again later—wherever you are to locate your notions of validity or worth or rationality or justification or even objectivity, it cannot rely on an argument that is universally compelling to all physically possible minds.
Nor can you ground validity in a sequence of justifications that, beginning from nothing, persuades a perfect emptiness.
[…]
The first great failure of those who try to consider Friendly AI, is the One Great Moral Principle That Is All We Need To Program—aka the fake utility function—and of this I have already spoken.
But the even worse failure is the One Great Moral Principle We Don't Even Need To Program Because Any AI Must Inevitably Conclude It. This notion exerts a terrifying unhealthy fascination on those who spontaneously reinvent it; they dream of commands that no sufficiently advanced mind can disobey. The gods themselves will proclaim the rightness of their philosophy! (E.g. John C. Wright, Marc Geddes.)
You have to realize that we don't need full consensus to make great strides in alignment. Your comments are a bit abstract and obtuse. Perhaps you could more clearly and directly address whatever problems you see in creating a narrow AI with expertise in understanding morality.
If NAMSI achieved a superhuman level of expertise in morality, how would we know? I consider our society to be morally superior to the one we had in 1960. People in 1960 would not agree with this assessment upon looking. If NAMSI agrees with us about everything, it's not superhuman. So how do we determine whether its possibly-superhuman morality is superior or inferior?
If we're measuring intelligence we measure it relative to a known metric:
Just as we measure intelligence based on fundamental attributes, we can do the same with morality. It seems that we have generally agreed upon moral principles like not lying and stealing and not hurting others without good reason. So it seems we would measure intelligence based on how well it does in those areas. Just like there is a lack of a consensus regarding what intelligence is and how it should be measured, the same would apply for morality. But I believe we can still arrive at a useful working understanding of relative morality based on accepted moral principles.
Also its proposals for how we could best solve alignment would probably make more sense to us.
I think intelligence is a lot easier than morality, here. There are agreed upon moral principles like not lying, not stealing, and not hurting others, sure...but even those aren't always stable across time. For instance, standard Western morality held that it was acceptable to hit your children a couple of generations ago, now standard Western morality says it's not. If an AI trained to be moral said that actually, hitting children in some circumstances is a worthwhile tradeoff, that could mean that the AI is more moral than we are and we overcorrected, or it could mean that the AI is less moral than we are and is simply wrong.
And that's just for the same values! What about how values change over the decades? If our moral AI says that a Confucianism obeying of parental authority is just, and that us Westerners are actually wrong about this, how do we know whether it's correct?
Intelligence tests tend to have a quick feedback loop. The answer is right or wrong. If a Go-playing AI makes a move that looks bizarre but then wins the game, that's indicative that it's superior. Morality is more like long-term planning - if a policy-making AI suggests a strange policy, we have no immediate way to judge whether this is good or not, because we don't have access to the ground truth of whether or not it works for a long time.
Similar with alignment. How do we know that a superhuman alignment solution would look reasonable to us instead of weird? (Also, for that matter, why would a more moral agent have better alignment solutions? Do you think that the blocker for good alignment solutions are that current alignment researchers are insufficiently virtuous to come up with correct solutions?)
Yes, I appreciate the complexities of morality when compared with intelligence but it's not something that we can in any way afford to ignore. It's an essential part of alignment, and if we can get narrow ASI behind it we may be able to sufficiently solve it before we arrive at AGI and full ASI.
I don't think this is an intelligence vs morality matter. It seems that we need to apply AI intelligence much more directly to better understanding and solving moral questions that have thus far proved too difficult for humans. Another part of this is that we don't need full consensus. All of the nations of the world have an extensive body of laws that not everyone agrees with but that are useful in ensuring the best welfare of their citizens. Naturally I'm not defending laws that disenfranchise various groups like women, but our system of laws shows that much can be done by agreeing upon various moral questions.
I think a lot of AI's success with this will depend on logic and reasoning algorithms. For example 99% of Americans eat animal products notwithstanding the suffering that those animals endure in factory farms. While there may not be consensus on the cruelty of this practice, the logic and reasoning behind it being terribly cruel could not be more clear.
Yes, I do believe that we humans need to ramp up our own morality in order to better understand what AI comes up with. Perhaps we need it to also help us do that.
Media-driven fears about AI causing major havoc that includes human extinction have as their foundation the fear that we will not solve alignment before we reach AGI. What hasn't been sufficiently appreciated is that alignment is most fundamentally about morality.
This is where narrow AI systems trained to understand morality hold great promise. We humans may not have the intelligence to sufficiently solve alignment but by creating narrow AI systems that understand and advance morality we can solve it sooner.
Since our greatest alignment fears are about when we reach artificial super-intelligence, (ASI) perhaps narrow morality-focused ASIs should take the lead on that work. Narrow AI systems already approach top level legal and medical expertise. And because progress in those two domains is so rapid, we can expect major advances in the next few years.
We can develop a top level narrow super-intelligent AI that advances the morality at the heart of alignment. Such a system might be dubbed Narrow Artificial Moral Super-intelligence, or NAMSI.
Some developers like Stability AI understand the advantage of developing narrow AI rather than working on more ambitious, but less attainable, AGI. In fact Stability's business model is about selling narrow AI to countries and corporations.
A question we face as a global society is to what might we best apply AI? Considering the absolute necessity of solving alignment and appreciating that morality is our central challenge here, developing NAMSI may prove our most promising application as we near AGI.
But why go for narrow artificial moral super-intelligence rather than simply artificial moral intelligence? Because this is within our grasp. While morality has great complexities that challenge humans, our success with narrow legal and medical AI tells us something. We have reason to be confident that if we train AI systems to better understand the workings of morality, we can expect that they will achieve a level of expertise in this domain that far exceeds our own. This expertise could then guide them in more effectively solving alignment than what seems currently possible through human intelligence.