Stuart Russell: AI value alignment problem must be an "intrinsic part" of the field's mainstream agenda
Edge.org has recently been discussing "the myth of AI". Unfortunately, although Superintelligence is cited in the opening, most of the participants don't seem to have looked into Bostrom's arguments. (Luke has written a brief response to some of the misunderstandings Pinker and others exhibit.) The most interesting comment is Stuart Russell's, at the very bottom:
Of Myths and Moonshine
"We switched everything off and went home. That night, there was very little doubt in my mind that the world was headed for grief."
So wrote Leo Szilard, describing the events of March 3, 1939, when he demonstrated a neutron-induced uranium fission reaction. According to the historian Richard Rhodes, Szilard had the idea for a neutron-induced chain reaction on September 12, 1933, while crossing the road next to Russell Square in London. The previous day, Ernest Rutherford, a world authority on radioactivity, had given a "warning…to those who seek a source of power in the transmutation of atoms – such expectations are the merest moonshine."
Thus, the gap between authoritative statements of technological impossibility and the "miracle of understanding" (to borrow a phrase from Nathan Myhrvold) that renders the impossible possible may sometimes be measured not in centuries, as Rod Brooks suggests, but in hours.
None of this proves that AI, or gray goo, or strangelets, will be the end of the world. But there is no need for a proof, just a convincing argument pointing to a more-than-infinitesimal possibility. There have been many unconvincing arguments – especially those involving blunt applications of Moore's law or the spontaneous emergence of consciousness and evil intent. Many of the contributors to this conversation seem to be responding to those arguments and ignoring the more substantial arguments proposed by Omohundro, Bostrom, and others.
The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:
1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.
2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.
A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer's apprentice, or King Midas: you get exactly what you ask for, not what you want. A highly capable decision maker – especially one connected through the Internet to all the world's information and billions of screens and most of our infrastructure – can have an irreversible impact on humanity.
This is not a minor difficulty. Improving decision quality, irrespective of the utility function chosen, has been the goal of AI research – the mainstream goal on which we now spend billions per year, not the secret plot of some lone evil genius. AI research has been accelerating rapidly as pieces of the conceptual framework fall into place, the building blocks gain in size and strength, and commercial investment outstrips academic research activity. Senior AI researchers express noticeably more optimism about the field's prospects than was the case even a few years ago, and correspondingly greater concern about the potential risks.
No one in the field is calling for regulation of basic research; given the potential benefits of AI for humanity, that seems both infeasible and misdirected. The right response seems to be to change the goals of the field itself; instead of pure intelligence, we need to build intelligence that is provably aligned with human values. For practical reasons, we will need to solve the value alignment problem even for relatively unintelligent AI systems that operate in the human environment. There is cause for optimism, if we understand that this issue is an intrinsic part of AI, much as containment is an intrinsic part of modern nuclear fusion research. The world need not be headed for grief.
I'd quibble with a point or two, but this strikes me as an extraordinarily good introduction to the issue. I hope it gets reposted somewhere it can stand on its own.
Russell has previously written on this topic in Artificial Intelligence: A Modern Approach and the essays "The long-term future of AI," "Transcending complacency on superintelligent machines," and "An AI researcher enjoys watching his own execution." He's also been interviewed by GiveWell.
Teapots and Soda Cans
Reading an earnest and thought provoking editorial1 from one James Wood, reviewing 'Letter To a Christian Nation' by Sam Harris. Though atheist himself, he admits a flagging patience with certain attitudes of atheists. I can concede that an atheist's superior and glib demeanor may be due to frustration and no small amount of pessimistic inference about the human condition, though I had to comment about a rebuttal he gives regarding Bertrand Russell's celestial teapot2.
He claims that God, so much grander and more complex than a teapot, cannot be banished with such a simplistic comparison, when I would insist that God is actually much less believable than the teapot for that exact reason. I think Russell's teapot is due for an update which is more approachable and grounded. Here goes:
I claim that there is a discarded Coke can somewhere in the vastness of the Sahara, but I will brook absolutely no discussion about doubting my claim or investigating it for veracity. "Okay," you think, "I suppose I can assume that much to be true. Whatever this man's sources, the odds of a Coke can being somewhere in the desert must be considerable." But I then elaborate with claims that it's actually many, many cans, folded into glorious and artistically pleasing forms, and my obdurate refusal to discuss how it can be proved continues. At this point even the most generous theists would likely start getting annoyed with my odd behavior, yet at the very least what I'm asking you to believe isn't outside the realm of possibility. For all you know (though I refuse to allow you to check) there could be a folk art bazaar currently set up in the Sahara, so really it costs you very little to entertain my view.
And then I say that the cans have taken on beautiful, shimmering consciousness and are forming a society which hides from humanity, burying their chrome castles beneath the sand and moving their aluminum cities whenever we get too close to discovering them. "But..." you try to cut in. Before you can even begin to tell me what you find odd about my fantasy, I'm on the next detail. I claim that all of our major technological achievements of the last several hundred years are all thanks to the secret influence of the Shiny Can People.
Now you have countless legitimate doubts, but every time you try to tell me that, for starters, soda didn't even come in aluminum cans several hundred years ago, I insist that you weren't there so you can't be sure, and how could a mere burden of proof destroy the mighty empire of the Shiny Cans?
I like the utility of the can people because it doesn't start with an outlandish proposition, but if you stick around it gets absolutely ridiculous. Not only does that remind me more of how religion is actually sold, but it also serves to strengthen the original analogy of the teapot by reminding the curious mind that Russell's teapot is infinitely smaller and less complex than God, making it much less embarrassing to genuinely believe in since it would have so much more room to hide.
Odinn Celusta
Betrand Russell's Ten Commandments
Betrand Russell's Ten Commandments for teachers.
- Do not feel absolutely certain of anything.
- Do not think it worth while to proceed by concealing evidence, for the evidence is sure to come to light.
- Never try to discourage thinking for you are sure to succeed.
- When you meet with opposition, even if it should be from your husband or your children, endeavour to overcome it by argument and not by authority, for a victory dependent upon authority is unreal and illusory.
- Have no respect for the authority of others, for there are always contrary authorities to be found.
- Do not use power to suppress opinions you think pernicious, for if you do the opinions will suppress you.
- Do not fear to be eccentric in opinion, for every opinion now accepted was once eccentric.
- Find more pleasure in intelligent dissent that in passive agreement, for, if you value intelligence as you should, the former implies a deeper agreement than the latter.
- Be scrupulously truthful, even if the truth is inconvenient, for it is more inconvenient when you try to conceal it.
- Do not feel envious of the happiness of those who live in a fool’s paradise, for only a fool will think that it is happiness.
I find this to be of use not just for teachers but for rationalists in general. #8, especially, is an especially eloquent formulation of Aumann's Agreement Theorem.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)