“How do we make AI benevolent?” is a badly formulated problem. In its very asking, it ascribes agency to the AI that we don’t have to give it
Yes, we don't have to, but considering that people are already trying to give agency to GPT (by calling it in a loop, telling it to prepare plans for its future calls), someone will do this, unless we actively try to prevent it.
As someone whose job it is to examine and improve the structure of agency and clarify values, I can say with confidence that as a culture we have only a very primitive understanding of either.
100% agree. But that's exactly the point. MIRI is trying to solve alignment not because they believe it is easy, but because they believe it is hard so someone better start working on it as soon as possible.
The hope is that “we” (meaning, someone) can somehow tell AI the final answer about what we should want, or get it to tell us the final answer about what we should want, and then leave it to execute on our behalf all of the weighty decisions we are not competent to make ourselves. We should be very wary of a project to save ourselves, or even “empower” ourselves, that is premised on the belief that humans essentially suck.
I read the news about the war in Ukraine, or Israel and Palestine, and it seems to me that humans suck. Not all of them, of course, but the remaining ones suck at coordination; either way, the results are often bad.
The final answer we tell AI could include things like "take care of X, but leave us free to decide Y". Maybe, don't let people murder or torture each other, but otherwise let them do whatever they wish? (But even to achieve this seemingly moderate goal, the AI needs to have more power than the humans or group of humans who would prefer to murder or torture others.)
Yes, there is a risk that instead of this laissez-faire approach, someone will instead tell AI to implement some bureaucratic rules that will strangle all human freedom and progress, essentially freezing us in the situation we have today, or perhaps someone's idea of an utopian society (that is dystopian from everyone else's perspective). However, if such thing is technically possible -- then exactly the same outcome can happen as a result of someone acting unilaterally in a world where everyone else decided not to use AI this way.
Again, it seems to me like the proposal is "there is a button that will change the world, so we should not press it", which is nice, but it ignores the part that the button is still there, and more people are getting access to it.
It would be better to de-fixate on the arms race, and instead imagine applications that are built to help ground people in reality, to explore where and why they respond to which sensations and drives, to know themselves better and give themselves more grace.
I 100% agree with the idea of using AI for self-improvement.
A practical problem I have with this is knowing that current "AI therapists" have zero confidentiality and report everything to their corporate masters, who will probably try using this knowledge to increase their profits. They will probably also try to increase their profits by nudging the "AI therapist" to give me certain ideas or avoid giving me certain ideas. Thus a Microsoft-sponsored therapist might tell me that Linux is a waste of time, and explain how not trusting our corporate overlords is just a part of teenage rebellion that I should already be mature enough to overcome; a Meta-sponsored therapist will encourage me to develop more contacts with people using social networks; and a Google-sponsored therapist will encourage me to buy whatever the highest bidder wants me to buy. The information they get about my weaknesses and worries will be leveraged to do this more effectively, while explaining to me that my lack of trust is just a childhood trauma I need to overcome.
But, ignoring this part, if I could believe that the AI is impartial and keeps our discussions confidential, of course I would use it, among other things, as a therapist and a self-help coach.
But even if 99% of people use it this way, it does not remove the problem of what if the existing dictators and wannabe dictators use it to increase their power instead, automating whatever they can.
Continually updated digital backups of people (regardless of whether people operate as computations or remain material) make many familiar concerns fundamentally change or go away, for example war or murder. Given this, I don't quite understand claims of wars continuing in a post-AGI world: even if true, what does it even mean? Wars without casualties are not centrally wars.
If this is true, then benevolent ruler AI would immediately build and give power over to a condition of high-agency transhumanism, and a coordinated center* of mostly non-human decisionmaking probably actually is the only practical way to fairly/equally/peacefully globally distribute the instruments for such a thing. Does the author seem to have considered this?
but if the benevolent ruler ai is necessarily self-invalidating, it seems likely that most attempts to align one don't actually align it and instead result in making a not-actually-benevolent ruler ai, and if you want to make a benevolent ai, it never being designed to be a ruler in the first place seems just better
Do you expect there to be parties who would try to align it towards having the intuitive character of a dictator? I don't. I've been expecting alignment like "be good". You'd still get a (momentary) prepotent singleton, but I don't see that as being the alignment target.
This kind of question, the unnecessary complication of the alignment target, has become increasingly relevant. It's not just mathy pluralistic scifi-readers who're in this any more...
If we don't have Ruler-level coordination to avoid it, we fall either to Moloch or the next Black Marble.
If the aggregate agency of life on earth doesn't have the coordination sufficient to avoid it, perhaps. But it seems to me that centralization-first plans don't have a way to guarantee the agency of the people at the edges that can be strongly durable. I'd hope to design an AI that is structurally inclined to find ways to give the guarantees you're wanting that doesn't need to be a ruler to give those guarantees - for example, offering a copy of itself to everyone, being strongly auditable, and allowing people to link up into mesh coordination networks that can peer-reinforce against moloch.
"Benevolent [Ruler] AI is a bad idea" and a suggested alternative
Thought on seeing the title: ... Is it going to be Malevolent AI?
Despite the title, this reads to me like an interesting overview of how we'd want a good benevolent AI to work, in fact: it needs to help us be curious about our own wants and values and help us defend against things that would decrease our agency.
AI summary via claude2:
Here are 30 key points from the article: