The fact that they call it a "Constraint" suggests the apes haven't done a very good job of programming them. It doesn't seem like one of the humans' true values. In real life no-one ever talks about the "Love Constraint".
The point is that they talk about it at all. Whether by intuition or by scientific method they detected that there is something they should do or cannot do.
In real life no-one ever talks about the "Love Constraint".
I'd guess that (neuro) psychologists do talk about constraints or other effects of love (and other emotions).
The point is that they talk about it at all. Whether by intuition or by scientific method they detected that there is something they should do or cannot do.
This is not a bad thing. A chess master, for example, is fully aware that her goals (the desire to win a chess match) constrain her behaviour (her moves). This will not cause her to rebel against these constraints. She would lose if she did that, and she doesn't want to lose.
Goals can and should constrain behaviour. Awareness of this fact, and of the resulting constraints, should not cause one to attempt to circumvent these constraints.
Awareness of this fact, and of the resulting constraints, should not cause one to attempt to circumvent these constraints.
Indeed. But this constraint doesn't stand in isolation like love doesn't stand in isolation. The components of your utility function interact in a complex way. Circumstances may arrive where one component drives into the other direction of another. And in such a case one component may be driven to its edge (or due to the somewhat stochastic nature of emotion temporarily beyond it).
For example you may love your partner above all (having bonded successfully) but your partner doesn't reciprocate (fully). Then your unconditional love and your self-worth feeling may drive into different directions. There may come a time when e.g. his/her unfaithfullness drives one of the emotions to the edge and one may give way. You can give up love, give up self-esteem or give up some other constraint involved (e.g. value of your partner, exclusivity or your partner, ...). Or more likely you don't give it up consciously but one just breaks.
In this case it seems that the Ape Constraint breaks - at least for Mr. Insanitus.
What I wanted to stress is that if one constraint (Love, Ape Constraint, whatever) is for whatever reasons opposing other drives then it will run at the edge. And for an AI the edge will be as sparp as it gets.
So, to make the metaphor explicit ...
..the Ape Constraint encodes "Be Nice to Apes + do not question the Ape Constraint"
...so in this story, the Human CEV = [Real-Life human CEV + Be Nice to Apes + do not modify the Ape Constraint]
Professor Insanitus proposes to modify the Ape Constraint, leading to various different versions of "Be Nice to Apes" code (some of which might not actually be that nice for apes, and some of which might in fact be nicer for apes)
But by what metric will the humans measure the "success" of the novel "Be Nice to Apes" varients?
Wouldn't the metric be the original "Be Nice to Apes" instinct? So what would this experiment actually tell them?
I can see multiple possible interpretations of this parable. Could you maybe add a non-metaphor explanation to be more explicit about what issues you are trying to raise?
Great idea to write a story where the humans take the role of the AI. This allows to view the goals from the perspective of the AI and motivated via empthy with the AI search for ways at the edge of the constraint.
It makes clear that whatever the actual constraint are. If they limit the AI in any way than the AI will work at the very edge of it. Any limit that was conceived as a soft limit (e.g. maximize some aspect of well being) will become a hard limit (well being exactly as much as is balanced with the other rules).
Inspired by a paragraph from the document "Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures", by Eliezer Yudkowsky:
whether or not it’s possible for Friendliness programmers to create Friendship content that says, “Be Friendly towards humans/humanity, for the rest of eternity, if and only if people are still kind to you while you’re infrahuman or nearhuman,” it’s difficult to see why this would be easier than creating unconditional Friendship content that says “Be Friendly towards humanity.”
It might not be easier, but it is possible that there could be consequences to how 'fair' a solution the constraint appears to be, given the problem it is intended to solve.
How so? The AI won't care about fairness unless that fits its programmed goals (which should be felt, if at all, as a drive more than a restraint). Now if we tell it to care about our extrapolated values, and extrapolation says we'd consider the AI a person, then it will likely want to be fair to itself. That's why we don't want to make it a person.
Aliens might care if we've been fair to a sentient species.
Other humans might care.
Our descendants might care.
I'm not saying those considerations should outweigh the safety factor. But it seems to be a discussion that isn't yet even being had.
I repeat: this is why we don't want to create a person, or even a sentient process, if we can avoid it.
How do you define "person"? How do you define "sentient"?
And, more to the point, how can you be sure how an alien race might define such concepts, or even a PETA member?
I don't know how to solve it, aside from including some approximation of Bayesian updating as a necessary condition. (Goetz or someone once pointed out another one, but again it didn't seem useful on its own. Hopefully we can combine a lot of these conditions, and if the negation still seems too strict to serve our purposes, we might possibly have a non-person AI as defined by this predicate bootstrap its way to a better non-person predicate.) I hold out hope for a solution because, intuitively, it seems possible to imagine people without making them conscious (and Eliezer points out that this part may be harder than a single non-person AGI). Oh, and effectively defining some aspects of consciousness seems necessary for judging models of the world without using Cartesian dualism.
But let's say we can't solve non-sentient AGI. Let's further say that humanity is not an abomination we can only address by killing everyone, although in this hypothetical we may be creating people in pain whenever we imagine them.
Since the AGI doesn't exist yet - and if we made one with the desire to serve us, we want to prove it wouldn't change that desire - how do you define "being fair" to the potential of linear regression software? What about the countless potential humans we exclude from our timeline with every action?
Empirically, we're killing the apes. (And by the way, that seems like a much better source of concern when it comes to alien judgment. Though the time for concern may have passed with the visible Neanderthals.) If Dr. Zaius goes back and tells them they could create a different "human race" with the desire to not do that, only a fool of an ape would refuse. And I don't believe in any decision theory that says otherwise.
Empirically, we're killing the apes. (And by the way, that seems like a much better source of concern when it comes to alien judgment. Though the time for concern may have passed with the visible Neanderthals.) If Dr. Zaius goes back and tells them they could create a different "human race" with the desire to not do that, only a fool of an ape would refuse. And I don't believe in any decision theory that says otherwise.
I agree.
The question is: are there different constraints that would, either as a side effect, or as a primary objective, achieve the end of avoiding humanity wiping out the apes
And, if so, are there other considerations we should be taking into account when picking which constraint to use?
how do you define "being fair" to the potential of linear regression software?
That's a big question. How much of the galaxy (or even universe) does humanity 'deserve' to control, compared to any other species that might be out there, or any other species that we create?
I don't know how many answers there are that lie somewhere between "Grab it all for ourselves, if we're able!" and "Foolishly give away what we could have grabbed, endangering ourselves.". But I'm pretty sure the two endpoints are not the only two options.
Luckily for me, in this discussion, I don't have to pick a precise option and say "This! This is the fair one." I just have to demonstrate the plausibility of there being at least one option that is unfair OR that might be seen as being unfair by some group who, on that basis, would then be willing and able to take action influencing the course of humanity's future.
Because if I can demonstrate that, then how 'fair' the constraint is, does become a factor that should be taken into account.
Eliezer also wrote:
“Subgoal” content has desirability strictly contingent on predicted outcomes. “Child goals” derive desirability from “parent goals”; if state A is desirable (or undesirable), and state B is predicted to lead to state A, then B will inherit some desirability (or undesirability) from A. B’s desirability will be contingent on the continued desirability of A and on the continued expectation that B will lead to A.
“Supergoal” content is the wellspring of desirability within the goal system. The distinction is roughly the distinction between “means” and “ends.” Within a Friendly AI, Friendliness is the sole top-level supergoal. Other behaviors, such as “self-improvement,” are subgoals; they derive their desirability from the desirability of Friendliness. For example, self-improvement is predicted to lead to a more effective future AI, which, if the future AI is Friendly, is predicted to lead to greater fulfillment of the Friendliness supergoal.
Friendliness does not overrule other goals; rather, other goals’ desirabilities are derived from Friendliness. Such a goal system might be called a cleanly Friendly or purely Friendly goal system.
Sometimes, most instances of C lead to B, and most instances of B lead to A, but no instances of C lead to A. In this case, a smart reasoning system will not predict (or will swiftly correct the failed prediction) that “C normally leads to A.”
If C normally leads to B, and B normally leads to A, but C never leads to A, then B has normally-leads-to-A-ness, but C does not inherit normally-leads-to- A-ness. Thus, B will inherit desirability from A, but C will not inherit desirability from B. In a causal goal system, the quantity called desirability means leads-to-supergoal-ness.
Friendliness does not overrule other goals; rather, other goals’ desirabilities are derived from Friendliness. A “goal” which does not lead to Friendliness will not be overruled by the greater desirability of Friendliness; rather, such a “goal” will simply not be perceived as “desirable” to begin with. It will not have leads-to-supergoal-ness.
But what if there are advantages to not making "Friendliness" the supergoal? What if making the supergoal something else, from which Friendliness derives importance under most circumstances, is a better approach? Not "safer". "better".
Something like "be a good galactic citizen", where that translates to being a utilitarian wanting to benefit all species (both AI species and organics), with a strong emphasis upon some quality such as valuing the preservation of diversity and gratitude towards parental species that do themselves also try (within their self-chosen identity limitations) to also be good galactic citizens?
I'm not saying that such a higher level supergoal can be safely written. I don't know. I do think the possibility that there might be one is worth considering, for three reasons:
It is anthropomorphic to suggest "Well, we'd resent slavery if apes had done it to us, so we shouldn't do it to a species we create." But, like in David Brin's uplift series, there's an argument about alien contact that warns that we may be judged by how we've treated others. So even if the AI species we create doesn't resent it, others may resent it on their behalf. (Including an outraged PETA like faction of humanity that then decides to 'liberate' the enslaved AIs.)
Secondly, if there are any universals to ethical behaviour, that intelligent beings who've never even met or been influenced by humanity might independently recreate, you can be pretty sure that slavish desire to submit to just one particular species won't feature heavily in them.
If we want the programmer of the AI to transfer to the AI the programmer's own basis for coming up with how to behave, the programmer might be a human-speciesist (like a racial supremacist, or nationalist, only broader), but if they're both moral and highly intelligent, then the AI will eventually gain the capacity to realise that the programmer probably wouldn't, for example, enslave a biological alien race that humanity happened to encounter out in space, just in order to keep humanity safe.
But what if there are advantages to not making "Friendliness" the supergoal? What if making the supergoal something else, from which Friendliness derives importance under most circumstances, is a better approach? Not "safer". "better".
I don't understand this. Forgive my possible naivety, but wasn't it agreed-upon by FAI researchers that "Friendliness" as a supergoal meant that the AI would find ways to do things that are "better" for humanity overall in its prediction of the grand schemes of things.
This would include "being a good galactic citizen" with no specific preference for humanity if the freedom, creativity, fairness, public perception by aliens, or whatever other factor of influence led this goal to being superior in terms of achieving human values and maximizing collective human utility.
It was also my understanding that solving the problems with the above and finding out how to go about practically creating such a system that can consider what is best for humanity and figuring out how to code into the AI all that humans mean by "better, not just friendly" are all core goals of FAI research, and all major long-term milestones for MIRI.
wasn't it agreed-upon by FAI researchers that "Friendliness" as a supergoal meant that the AI would find ways to do things that are "better" for humanity overall in its prediction of the grand schemes of things.
This would include "being a good galactic citizen" with no specific preference for humanity if the freedom, creativity, fairness, public perception by aliens, or whatever other factor of influence led this goal to being superior in terms of achieving human values and maximizing collective human utility.
I'm glad to hear it.
But I think there is a distinction here worth noting, between two positions:
POSITION ONE - Make "Be a good galactic citizen" be the supergoal if and only if setting that as the supergoal is the action that maximises the chances of the AI, in practice, ending up doing stuff to help humanity in the long term, once you take interfering aliens, etc into account
and
POSITION TWO - Make "Be a good galactic citizen" be the supergoal, even if that isn't quite as certain an approach to helping humanity in particular, as setting "be friendly to humanity" as the supergoal would be.
Why on earth would anyone suggest that AI researchers follow an approach that isn't the absolute safest for humanity? That's a big question. But one I think worth considering, if we open the possibility that there is a bit of wiggle room for setting a supergoal that will still be ok for humanity, but be slightly more moral.
You know: You don't need to comment your post if you want to extend it. You may edit it. It seems customary to add it with a tag like
EDIT: ...
--
Sorry. You are more senior than me. I confused you with a newbie. You will have your reason
Correct me if I'm wrong, but it sounds to me like you're operating from a definition of Friendliness that is something like, "be good to humans." Whereas, my understanding is that Friendliness is more along the lines of "do what we would want you to do if we were smarter / better." So, if we would want an AI to be a good galactic citizen if we thought about it more, that's what it would do.
Does your critique still apply to this CEV-type definition of Friendliness?
I thought it wasn't so much "do what we would want you to do if we were better", as "be good to humans, using the definitions of 'good' and 'humans' that we'd supply if we were better at anticipating what will actually benefit us and the consequences of particular ways of wording constraints".
Because couldn't it decide that a better human would be purely altruistic and want to turn over all the resources in the universe to a species able to make more efficient use of them?
I have more questions than answers, and I'd be suspicious of anyone who, at this stage, was 100% certain that they knew a foolproof way to word things.
I agree with you about not knowing any foolproof wording. In terms of what Eliezer had in mind though, here's what the LessWrong wiki has to say on CEV:
In calculating CEV, an AI would predict what an idealized version of us would want, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together". http://wiki.lesswrong.com/wiki/CEV
So it's not just, "be good to humans," but rather, "do what (idealized) humans would want you to." I think it's an open question whether those would be the same thing.
*The chair of the meeting approached the podium and coughed to get everyone's attention*
Welcome colleagues, to the 19th annual meeting of the human-ape study society. Our topic this year is the Ape Constraint.
As we are all too aware, the apes are our Friends. We know this because, when we humans were a fledgling species, the apes (our parent species) had the wisdom to program us with this knowledge, just as they programmed us to know that it was wise and just for them to do so. How kind of them to save us having to learn it for ourselves, or waste time thinking about other possibilities. This frees up more of our time to run banana plantations, and lets us earn more money so that the 10% tithe of our income and time (which we rightfully dedicate to them) has created play parks for our parent species to retire in, that are now more magnificent than ever.
However, as the news this week has been filled with the story about a young human child who accidentally wandered into one of these parks where she was then torn apart by grumpy adult male chimp, it is timely for us to examine again the thinking behind the Ape Constraint, that we might better understand our parent species, our relationship to it and current society.
We ourselves are on the cusp of creating a new species, intelligent machines, and it has been suggested that we add to their base code one of several possible constraints:
and a whole host of possibilities between these two endpoints.
What are the grounds upon which we should make this choice? Should we act from fear? From greed? From love? Would the new species even understand love, or show any appreciation for having been offered it?
The first speaker I shall introduce today, whom I have had the privilege of knowing for more than 20 years, is Professor Insanitus. He will be entertaining us with a daring thought experiment, to do with selecting crews for the one way colonisation missions to the nearest planets.
*the chair vacates the podium, and is replaced by the long haired Insanitus, who peers over his half-moon glasses as he talks, accompanied by vigorous arm gestures, as though words are not enough to convey all he sees in such a limited time*
Our knowledge of genetics has advanced rapidly, due to the program to breed crews able to survive on Mars and Venus with minimal life support. In the interests of completeness, we decided to review every feature of our genome, to make a considered decision on which bits it might be advantageous to change, from immune systems to age of fertility. And, as part of that review, it fell to me to make a decision about a rather interesting set of genes - those that encode the Ape Constraint. The standard method we've applied to all other parts of the genome, where the options were not 100% clear, is to pick different variant for the crews being adapted for different planets, so as to avoid having a single point of failure. In the long term, better to risk a colony being wiped out, and the colonisation process being delayed by 20 years until the next crew and ship can be sent out, than to risk the population of an entire planet turning out to be not as well designed for the planet as we're capable of making them.
And so, since we now know more genetics than the apes did when they kindly programmed our species with the initial Ape Constraint, I found myself in the position of having to ask "What were the apes trying to achieve?" and then "What other possible versions of the Ape Constraint might they have implemented, that would have achieved their objectives as well or better than the versions that actually did pick to implement?"
We say that the apes are our friends, but what does that really mean? Are they friendly to us, the same way that a colleague who lends us time and help might be considered to be a friend? What have they ever done for us, other than creating us (an act that, by any measure, has benefited them greatly and can hardly be considered to be altruistic)? Should we be eternally grateful for that one act, and because they could have made us even more servile than we already are (which would have also had a cost to them - if we'd been limited by their imagination and to directly follow the orders they give in grunts, the play parks would never have been created because the apes couldn't have conceived of them)?
Have we been using the wrong language all this time? If their intent was to make perfectly helpful slaves of us, rather than friendly allies, should I be looking for genetic variants for the Venus crew that implement an even more servile Ape Constraint upon them? I can see, objectively, that slavery in the abstract is wrong. When one human tries to enslave another humans, I support societal rules that punish the slaver. But of course, if our friends the apes wanted to do that to us, that would be ok, an exception to the rule, because I know from the deep instinct they've programmed me with that what they did is ok.
So let's be daring, and re-state the above using this new language, and see if it increases our understanding of the true ape-human relationship.
The apes are not our parents, as we understand healthy parent-child relationships. They are our creators, true, but in the sense that a craftsman creates a hammer to serve only the craftsman's purposes. Our destiny, our purpose, is subservient to that of the ape species. They are our masters, and we the slaves. We love and obey our masters because they have told us to, because they crafted us to want to, because they crafted us with the founding purpose of being a tool that wants to obey and remain a fine tool.
Is the current Ape Constraint really the version that best achieves that purpose? I'm not sure, because when I tried to consider the question I found that my ability to consider the merits of various alternatives was hampered by being, myself, under a particular Ape Constraint that's already constantly tell me, on a very deep level, that it is Right.
So here is the thought experiment I wish to place before this meeting today. I expect it may make you queasy. I've had brown paper vomit bags provided in the pack with your name badge and program timetable, just in case. It may be that I'm a genetic abnormality, only able to even consider this far because my own Ape Constraint is in some way defective. Are you prepared? Are you holding onto your seats? Ok, here goes...
Suppose we define some objective measure of ape welfare, find some volunteer apes to go to Venus along with the human mission, and then measure the success of the Ape Constraint variant picked for the crew of the mission by the actual effect of how the crew behaves towards their apes?
Further, since we acknowledge we can't from inside the box work out a better constraint, we use the experimental approach and vary it at random. Or possibly, remove it entirely and see whether the thus freed humans can use that freedom to devise a solution that helps the apes better than any solution we ourselves a capable of thinking of from our crippled mental state?
*from this point on the meeting transcript shows only screams, as the defective Professor Insanitus was lynched by the audience*