Eliezer also wrote:
“Subgoal” content has desirability strictly contingent on predicted outcomes. “Child goals” derive desirability from “parent goals”; if state A is desirable (or undesirable), and state B is predicted to lead to state A, then B will inherit some desirability (or undesirability) from A. B’s desirability will be contingent on the continued desirability of A and on the continued expectation that B will lead to A.
“Supergoal” content is the wellspring of desirability within the goal system. The distinction is roughly the distinction between “means” and “ends.” Within a Friendly AI, Friendliness is the sole top-level supergoal. Other behaviors, such as “self-improvement,” are subgoals; they derive their desirability from the desirability of Friendliness. For example, self-improvement is predicted to lead to a more effective future AI, which, if the future AI is Friendly, is predicted to lead to greater fulfillment of the Friendliness supergoal.
Friendliness does not overrule other goals; rather, other goals’ desirabilities are derived from Friendliness. Such a goal system might be called a cleanly Friendly or purely Friendly goal system.
Sometimes, most instances of C lead to B, and most instances of B lead to A, but no instances of C lead to A. In this case, a smart reasoning system will not predict (or will swiftly correct the failed prediction) that “C normally leads to A.”
If C normally leads to B, and B normally leads to A, but C never leads to A, then B has normally-leads-to-A-ness, but C does not inherit normally-leads-to- A-ness. Thus, B will inherit desirability from A, but C will not inherit desirability from B. In a causal goal system, the quantity called desirability means leads-to-supergoal-ness.
Friendliness does not overrule other goals; rather, other goals’ desirabilities are derived from Friendliness. A “goal” which does not lead to Friendliness will not be overruled by the greater desirability of Friendliness; rather, such a “goal” will simply not be perceived as “desirable” to begin with. It will not have leads-to-supergoal-ness.
But what if there are advantages to not making "Friendliness" the supergoal? What if making the supergoal something else, from which Friendliness derives importance under most circumstances, is a better approach? Not "safer". "better".
Something like "be a good galactic citizen", where that translates to being a utilitarian wanting to benefit all species (both AI species and organics), with a strong emphasis upon some quality such as valuing the preservation of diversity and gratitude towards parental species that do themselves also try (within their self-chosen identity limitations) to also be good galactic citizens?
I'm not saying that such a higher level supergoal can be safely written. I don't know. I do think the possibility that there might be one is worth considering, for three reasons:
It is anthropomorphic to suggest "Well, we'd resent slavery if apes had done it to us, so we shouldn't do it to a species we create." But, like in David Brin's uplift series, there's an argument about alien contact that warns that we may be judged by how we've treated others. So even if the AI species we create doesn't resent it, others may resent it on their behalf. (Including an outraged PETA like faction of humanity that then decides to 'liberate' the enslaved AIs.)
Secondly, if there are any universals to ethical behaviour, that intelligent beings who've never even met or been influenced by humanity might independently recreate, you can be pretty sure that slavish desire to submit to just one particular species won't feature heavily in them.
If we want the programmer of the AI to transfer to the AI the programmer's own basis for coming up with how to behave, the programmer might be a human-speciesist (like a racial supremacist, or nationalist, only broader), but if they're both moral and highly intelligent, then the AI will eventually gain the capacity to realise that the programmer probably wouldn't, for example, enslave a biological alien race that humanity happened to encounter out in space, just in order to keep humanity safe.
Correct me if I'm wrong, but it sounds to me like you're operating from a definition of Friendliness that is something like, "be good to humans." Whereas, my understanding is that Friendliness is more along the lines of "do what we would want you to do if we were smarter / better." So, if we would want an AI to be a good galactic citizen if we thought about it more, that's what it would do.
Does your critique still apply to this CEV-type definition of Friendliness?
*The chair of the meeting approached the podium and coughed to get everyone's attention*
Welcome colleagues, to the 19th annual meeting of the human-ape study society. Our topic this year is the Ape Constraint.
As we are all too aware, the apes are our Friends. We know this because, when we humans were a fledgling species, the apes (our parent species) had the wisdom to program us with this knowledge, just as they programmed us to know that it was wise and just for them to do so. How kind of them to save us having to learn it for ourselves, or waste time thinking about other possibilities. This frees up more of our time to run banana plantations, and lets us earn more money so that the 10% tithe of our income and time (which we rightfully dedicate to them) has created play parks for our parent species to retire in, that are now more magnificent than ever.
However, as the news this week has been filled with the story about a young human child who accidentally wandered into one of these parks where she was then torn apart by grumpy adult male chimp, it is timely for us to examine again the thinking behind the Ape Constraint, that we might better understand our parent species, our relationship to it and current society.
We ourselves are on the cusp of creating a new species, intelligent machines, and it has been suggested that we add to their base code one of several possible constraints:
and a whole host of possibilities between these two endpoints.
What are the grounds upon which we should make this choice? Should we act from fear? From greed? From love? Would the new species even understand love, or show any appreciation for having been offered it?
The first speaker I shall introduce today, whom I have had the privilege of knowing for more than 20 years, is Professor Insanitus. He will be entertaining us with a daring thought experiment, to do with selecting crews for the one way colonisation missions to the nearest planets.
*the chair vacates the podium, and is replaced by the long haired Insanitus, who peers over his half-moon glasses as he talks, accompanied by vigorous arm gestures, as though words are not enough to convey all he sees in such a limited time*
Our knowledge of genetics has advanced rapidly, due to the program to breed crews able to survive on Mars and Venus with minimal life support. In the interests of completeness, we decided to review every feature of our genome, to make a considered decision on which bits it might be advantageous to change, from immune systems to age of fertility. And, as part of that review, it fell to me to make a decision about a rather interesting set of genes - those that encode the Ape Constraint. The standard method we've applied to all other parts of the genome, where the options were not 100% clear, is to pick different variant for the crews being adapted for different planets, so as to avoid having a single point of failure. In the long term, better to risk a colony being wiped out, and the colonisation process being delayed by 20 years until the next crew and ship can be sent out, than to risk the population of an entire planet turning out to be not as well designed for the planet as we're capable of making them.
And so, since we now know more genetics than the apes did when they kindly programmed our species with the initial Ape Constraint, I found myself in the position of having to ask "What were the apes trying to achieve?" and then "What other possible versions of the Ape Constraint might they have implemented, that would have achieved their objectives as well or better than the versions that actually did pick to implement?"
We say that the apes are our friends, but what does that really mean? Are they friendly to us, the same way that a colleague who lends us time and help might be considered to be a friend? What have they ever done for us, other than creating us (an act that, by any measure, has benefited them greatly and can hardly be considered to be altruistic)? Should we be eternally grateful for that one act, and because they could have made us even more servile than we already are (which would have also had a cost to them - if we'd been limited by their imagination and to directly follow the orders they give in grunts, the play parks would never have been created because the apes couldn't have conceived of them)?
Have we been using the wrong language all this time? If their intent was to make perfectly helpful slaves of us, rather than friendly allies, should I be looking for genetic variants for the Venus crew that implement an even more servile Ape Constraint upon them? I can see, objectively, that slavery in the abstract is wrong. When one human tries to enslave another humans, I support societal rules that punish the slaver. But of course, if our friends the apes wanted to do that to us, that would be ok, an exception to the rule, because I know from the deep instinct they've programmed me with that what they did is ok.
So let's be daring, and re-state the above using this new language, and see if it increases our understanding of the true ape-human relationship.
The apes are not our parents, as we understand healthy parent-child relationships. They are our creators, true, but in the sense that a craftsman creates a hammer to serve only the craftsman's purposes. Our destiny, our purpose, is subservient to that of the ape species. They are our masters, and we the slaves. We love and obey our masters because they have told us to, because they crafted us to want to, because they crafted us with the founding purpose of being a tool that wants to obey and remain a fine tool.
Is the current Ape Constraint really the version that best achieves that purpose? I'm not sure, because when I tried to consider the question I found that my ability to consider the merits of various alternatives was hampered by being, myself, under a particular Ape Constraint that's already constantly tell me, on a very deep level, that it is Right.
So here is the thought experiment I wish to place before this meeting today. I expect it may make you queasy. I've had brown paper vomit bags provided in the pack with your name badge and program timetable, just in case. It may be that I'm a genetic abnormality, only able to even consider this far because my own Ape Constraint is in some way defective. Are you prepared? Are you holding onto your seats? Ok, here goes...
Suppose we define some objective measure of ape welfare, find some volunteer apes to go to Venus along with the human mission, and then measure the success of the Ape Constraint variant picked for the crew of the mission by the actual effect of how the crew behaves towards their apes?
Further, since we acknowledge we can't from inside the box work out a better constraint, we use the experimental approach and vary it at random. Or possibly, remove it entirely and see whether the thus freed humans can use that freedom to devise a solution that helps the apes better than any solution we ourselves a capable of thinking of from our crippled mental state?
*from this point on the meeting transcript shows only screams, as the defective Professor Insanitus was lynched by the audience*