The Urgent Meta-Ethics of Friendly Artificial Intelligence

lukeprog

Barring a major collapse of human civilization (due to nuclear war, asteroid impact, etc.), many experts expect the intelligence explosion Singularity to occur within 50-200 years.

That fact means that many philosophical problems, about which philosophers have argued for millennia, are suddenly very urgent.

Those concerned with the fate of the galaxy must say to the philosophers: "Too slow! Stop screwing around with transcendental ethics and qualitative epistemologies! Start thinking with the precision of an AI researcher and solve these problems!"

If a near-future AI will determine the fate of the galaxy, we need to figure out what values we ought to give it. Should it ensure animal welfare? Is growing the human population a good thing?

But those are questions of applied ethics. More fundamental are the questions about which normative ethics to give the AI: How would the AI decide if animal welfare or large human populations were good? What rulebook should it use to answer novel moral questions that arise in the future?

But even more fundamental are the questions of meta-ethics. What do moral terms mean? Do moral facts exist? What justifies one normative rulebook over the other?

The answers to these meta-ethical questions will determine the answers to the questions of normative ethics, which, if we are successful in planning the intelligence explosion, will determine the fate of the galaxy.

Eliezer Yudkowsky has put forward one meta-ethical theory, which informs his plan for Friendly AI: Coherent Extrapolated Volition. But what if that meta-ethical theory is wrong? The galaxy is at stake.

Princeton philosopher Richard Chappell worries about how Eliezer's meta-ethical theory depends on rigid designation, which in this context may amount to something like a semantic "trick." Previously and independently, an Oxford philosopher expressed the same worry to me in private.

Eliezer's theory also employs something like the method of reflective equilibrium, about which there are many grave concerns from Eliezer's fellow naturalists, including Richard Brandt, Richard Hare, Robert Cummins, Stephen Stich, and others.

My point is not to beat up on Eliezer's meta-ethical views. I don't even know if they're wrong. Eliezer is wickedly smart. He is highly trained in the skills of overcoming biases and properly proportioning beliefs to the evidence. He thinks with the precision of an AI researcher. In my opinion, that gives him large advantages over most philosophers. When Eliezer states and defends a particular view, I take that as significant Bayesian evidence for reforming my beliefs.

Rather, my point is that we need lots of smart people working on these meta-ethical questions. We need to solve these problems, and quickly. The universe will not wait for the pace of traditional philosophy to catch up.

Barring a major collapse of human civilization (due to nuclear war, asteroid impact, etc.), many experts expect the intelligence explosion Singularity to occur within 50-200 years.

That fact means that many philosophical problems, about which philosophers have argued for millennia, are suddenly very urgent.

If a near-future AI will determine the fate of the galaxy, we need to figure out what values we ought to give it. Should it ensure animal welfare? Is growing the human population a good thing?

But even more fundamental are the questions of meta-ethics. What do moral terms mean? Do moral facts exist? What justifies one normative rulebook over the other?

Here is a simple moral rule that should make an AI much less likely to harm the interests of humanity:

Never take any action that would reduce the number of bits required to describe the universe by more than X.

where X is some number smaller than the number of bits needed to describe an infant human's brain. For information-reductions smaller than X, the AI should get some disutility, but other considerations could override. This 'information-based morality' assigns moral weight to anything that makes the universe a more information-filled or complex place, and it does so without any need to program complex human morality into the thing. It is just information theory, which is pretty fundamental. Obviously actions are evaluated based on how they alter the expected net present value of the information in the universe, and not just the immediate consequences.

This rule, by itself, prevents the AI from doing many of the things we fear. It will not kill people; a human's brain is the most complex known structure in the universe and killing a person reduces it to a pile of fat and protein. It will not hook people up to experience machines; doing so would dramatically reduce the uniqueness of each individual and make the universe a much simpler place.

Human society is extraordinarily complex. The information needed to describe a collection of interacting humans is much greater than the information needed to describe isolated humans. Breaking up a society of humans destroys information, just like breaking up a human brain into individual neurons. Thus an AI guided by this rule would not do anything to threaten human civilization.

This rule also prevents the AI from making species extinct or destroying ecosystems and other complex natural systems. It ensures that the future will continue to be inhabited by a society of unique humans interacting in a system where nature has been somewhat preserved. As a first approximation, that is all we really care about.

Clearly this rule is not complete, nor is it symmetric. The AI should not be solely devoted to increasing information. If I break a window in your house, it takes more information to describe your house. More seriously, a human body infected with diseases and parasites requires more information to describe than a healthy body. The AI should not prevent humans from reducing the information content of the universe if we choose to do so, and it should assign some weight to human happiness.

The worst-case scenario is that this rule generates an AI that is an extreme pacifist and conservationist, one that refuses to end disease or alter the natural world to fit our needs. I can live with that. I'd rather have to deal with my own illnesses than be turned into paperclips.

One final note: I generally agree with Robin Hanson that rule-following is more important than values. If we program an AI with an absolute respect for property rights, such that it refuses to use or alter anything that it has not been given ownership of, we should be safe no matter what its values or desires are. But I'd like information-based morality in there as well.

This doesn't work, because the universe could require many bits to describe while those bits were allocated to describing things we don't care about. Most of the information in the universe is in non-morally-significant aspects of the arrangement of molecules, such that things like simple combustion increase the number of bits required to describe the universe (aka the entropy) by a large amount while tiling the universe with paperclips only decreases it by a small amount.

75

The Urgent Meta-Ethics of Friendly Artificial Intelligence

75

75

75

The Urgent Meta-Ethics of Friendly Artificial Intelligence

75

75