You sort of mentioned some of these points at the end of your essay, but just to spell them out more explicitly:
While I consider the individual cells in my body to be a part of me, I'd be ready to kill some of them (e.g. ones infected with a disease) if necessary. Likewise, I consider my arm a part of me, but I would be (albeit quite reluctantly) ready to have it amputated if that was needed to save the rest of the body.
These aren't merely examples of self-preservation: if somebody developed cybernetic implants that were as good and better than my original body parts, I might be willing to swap.
Looking at a more abstract level, emotions such as phobias, fears and neuroses are also an important part of me, and they're generated by parts of my brain which quite certainly are important parts of me. Yet I would mostly be just glad to be rid of these emotions. Although I would not want to get rid of the parts of my brain that generate them, I would like to have those parts substantially modified.
Simply having something designated as a part of yourself doesn't mean that you'll protect it. Even if it's an important part of yourself and you do want to protect it, you might be willing to rebuild it entirely, in effect destroying the original and replacing it with something almost completely different. An AI that considered humanity an important part of itself might still be completely willing to replace humanity with robots of its own design, if it thought that it was upgrading a part of itself into something better that way.
What you actually want is to make the preservation of humanity important for the AI's goals, for the kinds of definitions of "humanity" that we'd want the AI to consider "humanity", and with goals that the correspond to what humanity wants to want. And then the problem reduces back to just defining goals that make the AI treat humans as we'd want it to treat us, with the (anthropomorphic) concept of identity becoming entirely redundant.
Personally I don't consider uploading as preserving myself, as I extend "that which I wish to preserve" to my body, rather than just my computations. So we are going to clash somewhat.
What you actually want is to make the preservation of humanity important for the AI's goals, for the kinds of definitions of "humanity" that we'd want the AI to consider "humanity", and with goals that the correspond to what humanity wants to want.
I'd rather not have to define anything, due to ontological issues. Our brains generally do an okay job of fulfilling our biological imperative without having it explicitly defined. So engineer computer systems with a humanistic imperative. 1) we should be able to do a better job of it than evolution 2) with the society of AI's idea to balance and punish those who stray to far from the humanistic (there are some examples in human society of pressure to be biologically normal, e.g. straight, non self mutilating etc).
And then the problem reduces back to just defining goals that make the AI treat humans as we'd want it to treat us, with the (anthropomorphic) concept of identity becoming entirely redundant.
Do you accept Omohondro's drives? One of them being self-preservation. This needs some notion of self that is not anthropomorphic.
Our brains generally do an okay job of fulfilling our biological imperative without having it explicitly defined.
"The biological imperative" is "survive, have offspring, and make sure your offspring do well". That's a vastly simpler goal than "create the kind of a society that humans would want to want to have". (An AI that carries out the biological imperative is called a paperclipper.)
Also, even if our brain don't have an explicit definition for it, it's still implicitly there. You can't not define a goal for an AI - the only question is whether you do it explicitly or implicitly.
2) with the society of AI's idea to balance and punish those who stray to far from the humanistic (there are some examples in human society of pressure to be biologically normal, e.g. straight, non self mutilating etc).
Postulating a society of AIs being in control, instead of a single AI being in control, wouldn't seem to make the task of goal system design any easier. Now, instead of just having to figure out a way to make a single AI's goals to be what we want, you have to design an entire society of AIs in a way that the goals the society ends up promoting overall will be what we want. Taking a society of AIs as the basic unit only makes things more complex.
Do you accept Omohondro's drives? One of them being self-preservation. This needs some notion of self that is not anthropomorphic.
I accept them. Like I said in the comment above, the definition of a self in the context of the "self-preservation" drive will be based on the AI's goals.
Remember that the self-preservation drive is based on the simple notion that the AI wants its goals to be achieved, and if it is destroyed, then those goals (probably) cannot be achieved, since there's nobody around who'd work to achieve them. Whether or not the AI itself survives isn't actually relevant - what's relevant is whether such AIs (or minds-in-general) survive which will continue to carry out the goals, regardless of whether or not those AIs happen to be "same" AI.
If an AI has "maximize the amount of paperclips in the universe" as its goal, then the "self-preservation drive" corresponds to "protect agents which share this goal (and are the most capable of achieving it)".
If an AI has as its goal "maximize the amount of paperclips produced by this unit", then the "this unit" that it will try to preserve will depend on how its programming (implicitly or explicitly) defines "this unit"... or so it would seem at first.
If we taboo "this unit", we get "maximize the amount of paperclips produced by [something that meets some specific criteria]". To see the problem in this, consider that if we had an AI that had been built to "maximize the amount of paperclips produced by the children of its inventor", that too would get taboo'd into "maximize the amount of paperclips produced by [something that meets some specific criteria]". The self-preservation drive again collapses into "protect agents which share this goal (and are the most capable of achieving it)": there's no particular reason to use the term "self".
To put it differently: with regard to the self-preservation drive, there's no difference in whether the goal is to "maximize paperclips produced by this AI" or "maximize the happiness of humanity". In both cases, the AI is trying to do something to some entity which is defined in some specific way. In order for something to be done to such an entity, some agent must survive which has as its goal the task of doing such things to such an entity, and the AI will make sure that such agents try to survive.
Of course it must also make sure that the target of the optimizing-activity survives, but that's separate from the self-preservation drive.
(I'm not sure of how clearly I'm expressing myself here - did folks understand what I was trying to say?)
Also, even if our brain don't have an explicit definition for it, it's still implicitly there. You can't not define a goal for an AI - the only question is whether you do it explicitly or implicitly.
Can we make a system that has a human as (part of) an implicit definition of its goal system? When you allow implicit definitions they can be made non-spatially located, although some information will need to flow between them.
I'm not sure if I am making myself clear. Just to check. I am interested in exploring systems where a human is an important computational component (not just pointed at) of an implicit goal system for an advanced computer system.
Because the human part is implicit, the system might not make the correct inferences and judge the human to be important. If there was a society of these systems, then if we engineered things correctly then most of them would make the correct inference and judge the human an important part of their goal system and they may be able to exert pressure on those that didn't.
Does that make more sense?
Okay. I thought that you meant something like that, but this clarified it.
I'm not sure why you think it's better to build a society of these systems than to build just a single one. It seems to just make things more difficult: instead of trying to make sure that one AI does things right, we need to make sure that the overall dynamic that emerges from a society of interacting AIs does things right. That sounds a lot harder.
A few reasons.
1) I am more skeptical of singleton take off. While I think it is possible, I don't think it is likely that humans will be able to engineer it.
2)Logistics. If identity requires high bandwidth data connections between the two parts it would be easier to have a distributed system.
3) Politics. I doubt politicians will trust anyone to build a giant system to look after the world.
4) Letting the future take care of itself. If the systems do consider the human part of themselves, then they might be better placed to figure out an overarching way to balance everyones needs.
It's good that you realize that the way a general mind thinks about identity is arbitrary (as is the way a general mind thinks of anything) and that, if we chose too, we could build an AI that thinks of humans as part of itself. However, you should dissolve this idea of identity even further. Recognizing objects as part of oneself or not is useful for humans' cognition, but there is no reason that an AI should find it similarly useful.
Even in your model, the AI would still need to have some concept of the computer that implements it and some concept of humanity. In order for these to both be subsumed into one concept of self, it would need to be designed to take actions based on this concept of self. Humans do that, but just putting the AI in the same part of the AI's map with humanity isn't going to automatically change its actions. Sentences like "But it highlights a danger, humans should be integrated with the system in a way that should seem important, rather than something that can be discarded like hair." assume that the AI already has certain preferences regarding its 'self', but those are only there if we program them in. If we are going to do that, we might as well just program it to have certain attitudes toward humans and certain attitudes toward the computer running it.
Classifying all humans as human helps it make some decisions, because there are certain ways in which humans are similar and need to be treated similarly, but there is no obvious reason why humans and the AI need to be treated any more similarly to each other than to a number of other possible objects, so this added classification does not seem like it will be very helpful for us when we are specifying its utility function.
Side note: With a few minutes more effort, I could add a number of links to the sequences to this comment and to other, similar comments I make in the future. How helpful would you find that?
I agree with Omohundro's conclusions in this paper. The important concept here, though Omohundro does not use the term, is a subgoal. A subgoal is a goal that one adopts because, and only insofar as, it furthers another goal. Eliezer has a good explanation of this here.
For example, a paperclip maximizer does not care whether it exists, as long as the same amount of paperclips are created. However, a world without the paperclip maximizer would have far fewer paperclips because there would be no one who would want to create so many. Therefore, it decides to preserve its existence because, and only insofar as, its existence causes more paperclips to exist. We can't hack this by changing its idea of identity; it wants to preserve those things that will cause paperclips to exist, regardless of whether we give them tiny XML tags that say 'self'. Omohundro's drives are properties of goal systems, not things that we can change by categorizing objects differently.
If a system identifies a human as an important part of itself it will strive to protect it and its normal functioning, as we instinctively protect important parts of ourselves such as the head and genitals. So what possible objections to this are there?
How about: the use of the word "important" begs the question.
Okay assuming omohundro's AI drives, how do we get humans as part of the self that is preserved.
"So any identity needs to be constructed by systems within physics, and the boundaries are arbitrary." is very insightful. Even very smart people still wonder "will my upload really be myself?" and "why would an AI want to rewrite itself?" without realizing that the definition of "self" might be less rigid than they imagine.
On the other hand, "If a system identifies a human as an important part of itself it will strive to protect it and its normal functioning, as we instinctively protect important parts of ourselves" is nothing more than anthroporphizing. The word that should have set off warning sirens here is "instinctively": artificial systems act on a combination of designs and mistakes, but not on instincts.
I'd taboo the word instinct. I was going for as "a pre-created quick acting method for achieving a goal that doesn't involve huge amount of computation". What do you mean by it?
"The inherent inclination of an organism toward a particular behavior" seems to be the clearest definition I could find. The catch is then that AIs have no inherent inclinations. Animals all instinctively try to protect themselves, because the ones that didn't mostly died. But an AI will only protect itself if that goal happens to be part of the design decisions and bugs it was programmed with. Self-protection may be more likely than many other goals, since "protect myself" is a useful subgoal for a huge class of other instrumental goals, but in those cases the definition of "myself" isn't subject to redefinition.
If a system identifies a human as an important part of itself it will strive to protect it and its normal functioning, as we instinctively protect important parts of ourselves such as the head and genitals.
I think where this actually leads is to an augmented human, or to symbiosis.
For it to be stable, you need the human also to alter (to consider the AI as something it protects as a vital organ, rather than as a handbag-like accessory or even as a trusted servant) or at least to have a convincing reason to consistently act in the AI's self interest.
What do I mean by "stable"? It is a relationship where, if you distort it or break one of the constraints, the natural tendency is to move back to how things were, rather than move further away.
Relationships based upon coercion, deception or unequal distribution of benefits are not stable.
Yeah being considered a part of an AI. I might hate to be,say, its "hair". Just thinking about its next metaphorical "fashion induced haircut and coloring" gives me the chills.
Just because something is a part of something else doesn't mean it'll be treated in ways that it finds acceptable, let alone pleasant.
The idea may be interesting for human-like minds and ems derived from humans - and even then still dangerous. I don't see how that could apply in any marginally useful way to minds in general.
Are you saying that if an AI could be built with an explicitly programmed with a sense of identity and a self-preservation goal, then we could get a measure of safety by including humanity in its sense of identity ? That sounds rather indirect - why not include preservation of humanity in the goal ?
Or are you expecting that a sense of identity and self-preservation will arise as naturally, and thinking about ways of getting humanity into that sense of identity ?
Assuming that the AI drive concept of identity comes about when the AI notices where its goal system is written down (and instrumentally protects that place), then extraneous things like humanity aren't going to be in that sense of identity. Unless it considers humanity as being where its goal is "written down" ? (Is that something like what Alicorn was suggesting ?)
This article made me think of an idea, which I'm not sure if it's actually contained in the article but it is sort of nearby. It's a Basic AI Drive to ensure that it persists with its goal system intact. An AI that identified itself with humanity and identified its goals with our aggregate goals sounds like a potentially useful approach to something in the neighborhood of CEV (as opposed to trying to figure out what we want ourselves and then figuring out how to program that in, say).
It is somewhat implicit. I figured I might do a sequence if I had time. I'm interested in whether it would be a good idea to make computational systems that lack the equivalent of parts of the brains that deals with what is sort after or avoided (this isn't the whole goal system but it shapes the goal system in humans) and most likely placing a single human in its place. And having multiple systems for society type effects and redundency.
And well done for linking in with concepts from Omohundro, I should have done that in the article.
We often assume that an AI will have an identity and goals of its own. That it will be some separate entity from a human being or group of humans.
In physics there are no separate entities, merely a function evolving through time. So any identity needs to be constructed by systems within physics, and the boundaries are arbitrary. We have been built by evolution and all the cells in our body have the same programming so we have a handy rule of thumb that our body is "us" as it is created by a single replicating complex. So we assume that a computational entity, if it develops a theory of self, will only include its processing elements or code and nothing else in its notion of identity. But what an system identifies with can be controlled and specifed.
If a system identifies a human as an important part of itself it will strive to protect it and its normal functioning, as we instinctively protect important parts of ourselves such as the head and genitals.
So what possible objections to this are there?
1) Humans are spatially separate from the machine so they won't consider it part of themselves
We have a habit of identifying with groups larger than our own, such as countries and integrating our goals with theirs to different extents. Spatial co-location is not required.
2) Humans are very different from computers they will see them as "other"
Different parts of the human body are very diverse, but all of it is seen as a singular entity. Spleen and all.
3) A human will do things the computer doesn't know why, so It will not see it as part of itself.
Self-knowledge is not required for self-identification. Different parts of the brain are black boxes to others, we make up explanations for why we do things, in cases like blind sight, so there is no need for all the parts of the system to be self-reflective.
So can we make advanced computational systems that consider humanity as part of them? One possible problem with this approach is that if it doesn't get information from a part of humanity, it may well ignore its needs even if it still considers it part of itself. So perhaps it could bond with a single human being and have a myriad of the systems and rely on negotiations between them to [form a balance](http://lesswrong.com/r/discussion/lw/a7y/friendly_ai_society/).
One thing to consider is that some people only consider the program encoded by their neural patterns to be them. Which is somewhat odd, why put the boundary there? The whole body is a computer, the whole world is according to physicalism. There is no particular reason for neuro chauvinism I can see, apart from perhaps our shared culture that emphasises the importance of what the brain does. But it highlights a danger, humans should be integrated with the system in a way that should seem important, rather than something that can be discarded like hair. Evolutionary pressure may eventually make them see the human portion as unimportant, unless some form of agreement is made to limit replication without the human component. Also 7 billion humans takes a small amount of resources to maintain in a big picture view of the universe.
This way of thinking suggests a concrete research path. Develop a theory of identity by analysing what sort of interactions makes a human feel that a it is an important part of them. Also look at the part of the brain that deals with identity and see how it malfunctions.
*This is just a place holder article, I'll try and dig up references and flesh out previous philosophical positions later on*