Funny timing! Or, good Baader-Meinhoffing :P
Although selfishness w.r.t. copies is a totally okay preference structure, rational agents (with a world-model like we have, and no preferences explicitly favoring conflict between copies) will want to precommit or self-modify so that their causal descendants will cooperate non-selfishly.
In fact, if there is a period where the copies don't have distinguishing indexical information that greatly uncorrelates their decision algorithm, copies will even do the precommitting themselves.
Therefore, upon waking up and learning that I am a copy, but before learning much more, I will attempt to sign a contract with a bystander stating that if I do not act altruistically towards my other copies who have signed similar contracts, I have to pay them my life savings.
If signing a contract was all that we needed to coordinate well, we would already be coordinating as much as is useful now. We already have good strong reasons to want to coordinate for mutual benefit.
This is a good point - one could already sign a future-altruism-contract with someone, and it would already be an expected gain if it worked. But we only see approximations to this, like insurance or marriage. So unless my copies are enough more trustworthy and thoughtful of me to make this work, maybe something on the more efficacious self-modification end of the spectrum is actually necessary.
We don't have the identity and value overlap with others that we'd have with copies. The contract would just be formalizing it. I think it's a silly way of formalizing it. I respect my copies' right to drift differently than I do, and will then cease cooperating as absolutely. I certainly don't want to lose all of my assets in this case!
Moreover, when copying becomes the primary means of reproduction, caring for one's copies becomes the ultimate in kin-selection. That puts a lot of evolutionary pressure on to favoring copy-cooperation. Imagine how siblings would care for each other if identical twins (triplets, N-tuples) were the norm.
I think me and my copies would cooperate well and we would get a lot of utility from our cooperation.
Philosophically, I would want to value each of my copies equally, and I suspect that initially, my copies would be pretty altruistic towards each other. Using some mechanism to keep it that way, as Manfred suggests, seems appealing to me, but it isn't clear how feasible it would be. I would expect that absent some such mechanism, I would gradually become less altruistic towards copies for psychological reasons: If I benefited another copy of myself at my own expense, I remember the expense and not the benefit, so even though I would endorse it as good for me in aggregate (if the benefit outweighed the expense), I would be trained not to do that via reinforcement learning. I expect that I would remain able to cooperate with copies pretty well for quite a long time, in the sense of coordinating for mutual benefit, since I would trust myself and there would therefore be lower enforcement costs to contracts, but I might fairly quickly stop being any more altruistic towards copies, in the sense of willingness to help them at my expense without an expectation of return, than I am towards close friends.
I would like to think that I would cooperate reasonably with my copies, especially when there is a strong reason to prioritize global values over selfish values.
However, in practice I would also expect that System 1 would still see copies as separate but related individuals rather than as myself, and this would limit the amount of cooperation that occurs. I might have to engage in some self-deceptive reasoning to accomplish selfishness, but the human brain is good at that ("I've been working harder than my copies - I deserve a little extra!")
I've written about this previously on LessWrong:
He isn't me. He is a separate person that just happens to share all of the same memories and motivations that I have. I want to say that I wouldn't even give this copy of me the time of day, but that would be rhetorical. In some ventures he would be my greatest friend, in others my worst enemy. (Interestingly I could accuratly tell which right now by application of decision theory to the variants of the prisoner's delima.) But even when I choose to interfere in his affairs, it is not for directly self-serving reasons - I help him for the same reason I'd help a really close friend, I hurt him for the same reason I'd hinder a competitor.
My first thought (in response to the second question) is 'immediately terminate myself, leaving the copy as the only valid continuation of my identity'.
Of course, it is questionable whether I would have the willpower to go through with it. I believe that my copy's mind would constitute just as 'real' a continuation of my consciousness as would my own mind following a procedure that removed the memories of the past few days (or however long since the split) whilst leaving all else intact (which is of course just a contrived-for-the-sake-of-the-thought-experiment variety of the sort of forgetting that we undergo all the time), but I have trouble alieving it.
This is a lot more interesting a response if you would also agree with Lalartu in the more general case.
Kind of. I wouldn't defect against my copy without his consent, but I would want the pool trimmed down to only a single version of myself (ideally whichever one had the highest expected future utility, all else equal). The copy, being a copy, should want the same thing. The only time I wouldn't be opposed to the existence of multiple instances of myself would be if those instances could regularly synchronize their memories and experiences (and thus constitute more a single distributed entity with mere synchronization delays than multiple diverging entities).
Really? Can you say a little more about why you think you have that value? I guess I'm not convinced that it's really a terminal value if it varies so widely across people of otherwise similar beliefs. Presumably that's what lalartu meant as well, but I just don't get it. I like myself, so I'd like more of myself in the world!
I think a big part of it is that I don't really care about other people except instrumentally. I care terminally about myself, but only because I experience my own thoughts and feelings first-hand. If I knew I were going to be branched, then I'd care about both copies in advance as both are valid continuations of my current sensory stream. However, once the branch had taken place, both copies would immediately stop caring about the other (although I expect they would still practice altruistic behavior towards each other for decision-theoretic reasons). I suspect this has also influenced my sense of morality: I've never been attracted to total utilitarianism, as I've never been able to see why the existence of X people should be considered superior to the existence of Y < X equally satisfied people.
So yeah, that's part of it, but not all of it (if that were the extent of it, I'd be indifferent to the existence of copies, not opposed to it). The rest is hard to put into words, and I suspect that even were I to succeed in doing so I'd only have succeeded in manufacturing a verbal rationalization. Part of it is instrumental, each copy would be a potential competitor, but that's insufficient to explain my feelings on the matter. This wouldn't be applicable to, say, the Many-Worlds Interpretation of quantum mechanics, and yet I'm still bothered by that interpretation as it implies constant branching of my identity. So in the end, I think that I can't offer a verbal justification for this preference precisely because it's a terminal preference.
Communism and the Internet seem the most likely candidates to approach ability to form a Singleton. But a Singleton by definition can stop evolutionary processes at all levels of complexity and system dynamics below itself. Neither the internet or communism can sustain sufficient levels of coordination to stymie evolution.
Some of the most awesome Superorganisms have specialized sub-classes of individuals (workers, soldiers, queens, and three distinct sizes of leafcutter and chefs for instance). Instead of assuming that a Superorganism needs to be made of hard to distinguish entities, we should consider entities that have trigger mechanisms for becoming a specific subset kind of organism, fine tuned for necessary activities. Ants use pheromonal control and other biochemical mechanisms to determine who does what when, and epigenetics to determine who becomes of what kind. Superorganisms made of emulations may do similar things without being made of copies or near copies. Nature is cleverer than you are.
I keep returning to one gnawing question that haunts the whole idea of Friendly AI: how do you program a machine to "care"? I can understand how a machine can appear to "want" something, favoring a certain outcome over another. But to talk about a machine "caring" is ignoring a very crucial point about life: that as clever as intelligence is, it cannot create care. We tend to love our kid more than someone else's. So you could program a machine to prefer another in which it recognizes a piece of its own code. That may LOOK like care but it's really just an outcome. How could you replicate, for example, the love a parent shows for a kid they didn't produce? What if that kid were humanity? So too with Coherent Extrapolated Volition, you can keep refining the resolution of an outcome, but I don't see how any mechanism can actually care about anything but an outcome.
While "want" and "prefer"may be useful terms, such terms as "care", "desire", "value" constitute an enormous and dangerous anthropomorphizing. We cannot imagine outside our own frame, and this is one place where that gets us into real trouble. Once someone creates a code that will recognize something truly metaphysical I would be convinced that FAI is possible. Even whole brain emulation assumes that both that our thoughts are nothing but code and a brain with or without a body is the same thing. Am I missing something?
Leaving aside other matters, what does it matter if an FAI 'cares' in the sense that humans do so long as its actions bring about high utility from a human perspective?
Because what any human wants is a moving target. As soon as someone else delivers exactly what you ask for, you will be disappointed unless you suddenly stop changing. Think of the dilemma of eating something you know you shouldn't. Whatever you decide, as soon as anyone (AI or human) takes away your freedom to change your mind, you will likely rebel furiously. Human freedom is a huge value that any FAI of any description will be unable to deliver until we are no longer free agents.
What would an AI that 'cares' in the sense you spoke of be able to do to address this problem that a non-'caring' one wouldn't?
How can you distinguish "recognizing something truly metaphysical" from (1) "falsely claiming to recognize something truly metaphysical" and (2) "sincerely claiming to recognize something truly metaphysical, but wrong because actually the thing in question isn't real or is very different from what seems to have been recognized"?
Perhaps "caring" offers a potential example of #1. A machine says it cares about us, it consistently acts in ways that benefit us, it exhibits what look like signs of distress and gratification when we do ill or well -- but perhaps it's "just an outcome" (whatever exactly that means). How do you tell? (I am worried that the answer might be "by mere prejudice: if it's a machine putatively doing the caring, then of course it isn't real". I think that would be a bad answer.)
Obvious example of #2: many people have believed themselves to be in touch with gods, and I guess communion with a god would count as "truly metaphysical". But given what those people have thought about the gods they believed themselves to be in touch with, it seems fairly clear that most of them must have been wrong (because where a bunch of people have mutually-fundamentally-incompatible ideas about their gods, at most one can be right).
This is Yudkowsky's Hidden Complexity of Wishes problem from the human perspective. The concept of "caring" is rooted so deeply (in our flesh, I insist) that we cannot express it. Getting across the idea to AI that you care about your mother is not the same as asking for an outcome. This is why the problem is so hard. How would you convince the AI, in your first example, that your care was real? Or in your #2, that your wish was different from what it delivered? And how do you tell, you ask? By being disappointed in the result! (For instance in Yudkowsky's example, when the AI delivers Mom out of the burning building as you requested, but in pieces.)
My point is that value is not a matter of cognition of the brain, but caring from the heart. When AI calls your insistence that it didn't deliver what you wanted "prejudice", I don't think you'd be happy with the above defense.
[Ref: http://lesswrong.com/lw/ld/the_hidden_complexity_of_wishes/]
What I wrote wasn't intended as a defense of anything; it was an attempt to understand what you were saying. Since you completely ignored the questions I asked (which is of course your prerogative), I am none the wiser.
I think you may have misunderstood my conjecture about prejudice; if an AI professes to "care" but doesn't in fact act in ways we recognize as caring, and if we conclude that actually it doesn't care in the sense we meant, that's not prejudice. (But it is looking at "outcomes", which you disdained before.)
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the nineteenth section in the reading guide: post-transition formation of a singleton. This corresponds to the last part of Chapter 11.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: : “Post-transition formation of a singleton?” from Chapter 11
Summary
Another view
Many of the ideas around superorganisms come from Carl Shulman's paper, Whole Brain Emulation and the Evolution of Superorganisms. Robin Hanson critiques it:
Notes
1. The natural endpoint
Bostrom says that a singleton is natural conclusion of long-term trend toward larger scales of political integration (p176). It seems helpful here to be more precise about what we mean by singleton. Something like a world government does seem to be a natural conclusion to long term trends. However this seems different to the kind of singleton I took Bostrom to previously be talking about. A world government would by default only make a certain class of decisions, for instance about global level policies. There has been a long term trend for the largest political units to become larger, however there have always been smaller units as well, making different classes of decisions, down to the individual. I'm not sure how to measure the mass of decisions made by different parties, but it seems like the individuals may be making more decisions more freely than ever, and the large political units have less ability than they once did to act against the will of the population. So the long term trend doesn't seem to point to an overpowering ruler of everything.
2. How value-aligned would emulated copies of the same person be?
Bostrom doesn't say exactly how 'emulations that were wholly altruistic toward their copy-siblings' would emerge. It seems to be some combination of natural 'altruism' toward oneself and selection for people who react to copies of themselves with extreme altruism (confirmed by a longer interesting discussion in Shulman's paper). How easily one might select for such people depends on how humans generally react to being copied. In particular, whether they treat a copy like part of themselves, or merely like a very similar acquaintance.
The answer to this doesn't seem obvious. Copies seem likely to agree strongly on questions of global values, such as whether the world should be more capitalistic, or whether it is admirable to work in technology. However I expect many—perhaps most—failures of coordination come from differences in selfish values—e.g. I want me to have money, and you want you to have money. And if you copy a person, it seems fairly likely to me the copies will both still want the money themselves, more or less.
From other examples of similar people—identical twins, family, people and their future selves—it seems people are unusually altruistic to similar people, but still very far from 'wholly altruistic'. Emulation siblings would be much more similar than identical twins, but who knows how far that would move their altruism?
Shulman points out that many people hold views about personal identity that would imply that copies share identity to some extent. The translation between philosophical views and actual motivations is not always complete however.
3. Contemporary family clans
Family-run firms are a place to get some information about the trade-off between reducing agency problems and having access to a wide range of potential employees. Given a brief perusal of the internet, it seems to be ambiguous whether they do better. One could try to separate out the factors that help them do better or worse.
4. How big a problem is disloyalty?
I wondered how big a problem insider disloyalty really was for companies and other organizations. Would it really be worth all this loyalty testing? I can't find much about it quickly, but 59% of respondents to a survey apparently said they had some kind of problems with insiders. The same report suggests that a bunch of costly initiatives such as intensive psychological testing are currently on the table to address the problem. Also apparently it's enough of a problem for someone to be trying to solve it with mind-reading, though that probably doesn't say much.
5. AI already contributing to the surveillance-secrecy arms race
Artificial intelligence will help with surveillance sooner and more broadly than in the observation of people's motives. e.g. here and here.
6. SMBC is also pondering these topics this week
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about the 'value loading problem'. To prepare, read “The value-loading problem” through “Motivational scaffolding” from Chapter 12. The discussion will go live at 6pm Pacific time next Monday 26 January. Sign up to be notified here.