Superintelligence 19: Post-transition formation of a singleton

KatjaGrace

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.

Welcome. This week we discuss the nineteenth section in the reading guide: post-transition formation of a singleton. This corresponds to the last part of Chapter 11.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: : “Post-transition formation of a singleton?” from Chapter 11

Summary

Even if the world remains multipolar through a transition to machine intelligence, a singleton might emerge later, for instance during a transition to a more extreme technology. (p176-7)
If everything is faster after the first transition, a second transition may be more or less likely to produce a singleton. (p177)
Emulations may give rise to 'superorganisms': clans of emulations who care wholly about their group. These would have an advantage because they could avoid agency problems, and make various uses of the ability to delete members. (p178-80)
Improvements in surveillance resulting from machine intelligence might allow better coordination, however machine intelligence will also make concealment easier, and it is unclear which force will be stronger. (p180-1)
Machine minds may be able to make clearer precommitments than humans, changing the nature of bargaining somewhat. Maybe this would produce a singleton. (p183-4)

Another view

Many of the ideas around superorganisms come from Carl Shulman's paper, Whole Brain Emulation and the Evolution of Superorganisms. Robin Hanson critiques it:

...It seems to me that Shulman actually offers two somewhat different arguments, 1) an abstract argument that future evolution generically leads to superorganisms, because their costs are generally less than their benefits, and 2) a more concrete argument, that emulations in particular have especially low costs and high benefits...

...On the general abstract argument, we see a common pattern in both the evolution of species and human organizations — while winning systems often enforce substantial value sharing and loyalty on small scales, they achieve much less on larger scales. Values tend to be more integrated in a single organism’s brain, relative to larger families or species, and in a team or firm, relative to a nation or world. Value coordination seems hard, especially on larger scales.

This is not especially puzzling theoretically. While there can be huge gains to coordination, especially in war, it is far less obvious just how much one needs value sharing to gain action coordination. There are many other factors that influence coordination, after all; even perfect value matching is consistent with quite poor coordination. It is also far from obvious that values in generic large minds can easily be separated from other large mind parts. When the parts of large systems evolve independently, to adapt to differing local circumstances, their values may also evolve independently. Detecting and eliminating value divergences might in general be quite expensive.

In general, it is not at all obvious that the benefits of more value sharing are worth these costs. And even if more value sharing is worth the costs, that would only imply that value-sharing entities should be a bit larger than they are now, not that they should shift to a world-encompassing extreme.

On Shulman’s more concrete argument, his suggested single-version approach to em value sharing, wherein a single central em only allows (perhaps vast numbers of) brief copies, can suffer from greatly reduced innovation. When em copies are assigned to and adapt to different tasks, there may be no easy way to merge their minds into a single common mind containing all their adaptations. The single em copy that is best at doing an average of tasks, may be much worse at each task than the best em for that task.

Shulman’s other concrete suggestion for sharing em values is “psychological testing, staged situations, and direct observation of their emulation software to form clear pictures of their loyalties.” But genetic and cultural evolution has long tried to make human minds fit well within strongly loyal teams, a task to which we seem well adapted. This suggests that moving our minds closer to a “borg” team ideal would cost us somewhere else, such as in our mental agility.

On the concrete coordination gains that Shulman sees from superorganism ems, most of these gains seem cheaply achievable via simple long-standard human coordination mechanisms: property rights, contracts, and trade. Individual farmers have long faced starvation if they could not extract enough food from their property, and farmers were often out-competed by others who used resources more efficiently.

With ems there is the added advantage that em copies can agree to the “terms” of their life deals before they are created. An em would agree that it starts life with certain resources, and that life will end when it can no longer pay to live. Yes there would be some selection for humans and ems who peacefully accept such deals, but probably much less than needed to get loyal devotion to and shared values with a superorganism.

Yes, with high value sharing ems might be less tempted to steal from other copies of themselves to survive. But this hardly implies that such ems no longer need property rights enforced. They’d need property rights to prevent theft by copies of other ems, including being enslaved by them. Once a property rights system exists, the additional cost of applying it within a set of em copies seems small relative to the likely costs of strong value sharing.

Shulman seems to argue both that superorganisms are a natural endpoint of evolution, and that ems are especially supportive of superorganisms. But at most he has shown that ems organizations may be at a somewhat larger scale, not that they would reach civilization-encompassing scales. In general, creatures who share values can indeed coordinate better, but perhaps not by much, and it can be costly to achieve and maintain shared values. I see no coordinate-by-values free lunch...

Notes

1. The natural endpoint

Bostrom says that a singleton is natural conclusion of long-term trend toward larger scales of political integration (p176). It seems helpful here to be more precise about what we mean by singleton. Something like a world government does seem to be a natural conclusion to long term trends. However this seems different to the kind of singleton I took Bostrom to previously be talking about. A world government would by default only make a certain class of decisions, for instance about global level policies. There has been a long term trend for the largest political units to become larger, however there have always been smaller units as well, making different classes of decisions, down to the individual. I'm not sure how to measure the mass of decisions made by different parties, but it seems like the individuals may be making more decisions more freely than ever, and the large political units have less ability than they once did to act against the will of the population. So the long term trend doesn't seem to point to an overpowering ruler of everything.

2. How value-aligned would emulated copies of the same person be?

Bostrom doesn't say exactly how 'emulations that were wholly altruistic toward their copy-siblings' would emerge. It seems to be some combination of natural 'altruism' toward oneself and selection for people who react to copies of themselves with extreme altruism (confirmed by a longer interesting discussion in Shulman's paper). How easily one might select for such people depends on how humans generally react to being copied. In particular, whether they treat a copy like part of themselves, or merely like a very similar acquaintance.

The answer to this doesn't seem obvious. Copies seem likely to agree strongly on questions of global values, such as whether the world should be more capitalistic, or whether it is admirable to work in technology. However I expect many—perhaps most—failures of coordination come from differences in selfish values—e.g. I want me to have money, and you want you to have money. And if you copy a person, it seems fairly likely to me the copies will both still want the money themselves, more or less.

From other examples of similar people—identical twins, family, people and their future selves—it seems people are unusually altruistic to similar people, but still very far from 'wholly altruistic'. Emulation siblings would be much more similar than identical twins, but who knows how far that would move their altruism?

Shulman points out that many people hold views about personal identity that would imply that copies share identity to some extent. The translation between philosophical views and actual motivations is not always complete however.

3. Contemporary family clans

Family-run firms are a place to get some information about the trade-off between reducing agency problems and having access to a wide range of potential employees. Given a brief perusal of the internet, it seems to be ambiguous whether they do better. One could try to separate out the factors that help them do better or worse.

4. How big a problem is disloyalty?

I wondered how big a problem insider disloyalty really was for companies and other organizations. Would it really be worth all this loyalty testing? I can't find much about it quickly, but 59% of respondents to a survey apparently said they had some kind of problems with insiders. The same report suggests that a bunch of costly initiatives such as intensive psychological testing are currently on the table to address the problem. Also apparently it's enough of a problem for someone to be trying to solve it with mind-reading, though that probably doesn't say much.

5. AI already contributing to the surveillance-secrecy arms race

Artificial intelligence will help with surveillance sooner and more broadly than in the observation of people's motives. e.g. here and here.

6. SMBC is also pondering these topics this week

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

What are the present and historical barriers to coordination, between people and organizations? How much have these been lowered so far? How much difference has it made to the scale of organizations, and to productivity? How much further should we expect these barriers to be lessened as a result of machine intelligence?
Investigate the implications of machine intelligence for surveillance and secrecy in more depth.
Are multipolar scenarios safer than singleton scenarios? Muehlhauser suggests directions.
Explore ideas for safety in a singleton scenario via temporarily multipolar AI. e.g. uploading FAI researchers (See Salamon & Shulman, “Whole Brain Emulation, as a platform for creating safe AGI.”)
Which kinds of multipolar scenarios would be more likely to resolve into a singleton, and how quickly?
Can we get whole brain emulation without producing neuromorphic AGI slightly earlier or shortly afterward? See section 3.2 of Eckersley & Sandberg (2013).

If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about the 'value loading problem'. To prepare, read “The value-loading problem” through “Motivational scaffolding” from Chapter 12. The discussion will go live at 6pm Pacific time next Monday 26 January. Sign up to be notified here.

If you had copies, how altruistic do you think you would be toward them?

Funny timing! Or, good Baader-Meinhoffing :P

Although selfishness w.r.t. copies is a totally okay preference structure, rational agents (with a world-model like we have, and no preferences explicitly favoring conflict between copies) will want to precommit or self-modify so that their causal descendants will cooperate non-selfishly.

In fact, if there is a period where the copies don't have distinguishing indexical information that greatly uncorrelates their decision algorithm, copies will even do the precommitting themselves.

Therefore, upon waking up and learning that I am a copy, but before learning much more, I will attempt to sign a contract with a bystander stating that if I do not act altruistically towards my other copies who have signed similar contracts, I have to pay them my life savings.

If signing a contract was all that we needed to coordinate well, we would already be coordinating as much as is useful now. We already have good strong reasons to want to coordinate for mutual benefit.

This is a good point - one could already sign a future-altruism-contract with someone, and it would already be an expected gain if it worked. But we only see approximations to this, like insurance or marriage. So unless my copies are enough more trustworthy and thoughtful of me to make this work, maybe something on the more efficacious self-modification end of the spectrum is actually necessary.

We don't have the identity and value overlap with others that we'd have with copies. The contract would just be formalizing it. I think it's a silly way of formalizing it. I respect my copies' right to drift differently than I do, and will then cease cooperating as absolutely. I certainly don't want to lose all of my assets in this case!

Moreover, when copying becomes the primary means of reproduction, caring for one's copies becomes the ultimate in kin-selection. That puts a lot of evolutionary pressure on to favoring copy-cooperation. Imagine how siblings would care for each other if identical twins (triplets, N-tuples) were the norm.

I think me and my copies would cooperate well and we would get a lot of utility from our cooperation.

Philosophically, I would want to value each of my copies equally, and I suspect that initially, my copies would be pretty altruistic towards each other. Using some mechanism to keep it that way, as Manfred suggests, seems appealing to me, but it isn't clear how feasible it would be. I would expect that absent some such mechanism, I would gradually become less altruistic towards copies for psychological reasons: If I benefited another copy of myself at my own expense, I remember the expense and not the benefit, so even though I would endorse it as good for me in aggregate (if the benefit outweighed the expense), I would be trained not to do that via reinforcement learning. I expect that I would remain able to cooperate with copies pretty well for quite a long time, in the sense of coordinating for mutual benefit, since I would trust myself and there would therefore be lower enforcement costs to contracts, but I might fairly quickly stop being any more altruistic towards copies, in the sense of willingness to help them at my expense without an expectation of return, than I am towards close friends.

I would like to think that I would cooperate reasonably with my copies, especially when there is a strong reason to prioritize global values over selfish values.

However, in practice I would also expect that System 1 would still see copies as separate but related individuals rather than as myself, and this would limit the amount of cooperation that occurs. I might have to engage in some self-deceptive reasoning to accomplish selfishness, but the human brain is good at that ("I've been working harder than my copies - I deserve a little extra!")

I've written about this previously on LessWrong:

He isn't me. He is a separate person that just happens to share all of the same memories and motivations that I have. I want to say that I wouldn't even give this copy of me the time of day, but that would be rhetorical. In some ventures he would be my greatest friend, in others my worst enemy. (Interestingly I could accuratly tell which right now by application of decision theory to the variants of the prisoner's delima.) But even when I choose to interfere in his affairs, it is not for directly self-serving reasons - I help him for the same reason I'd help a really close friend, I hurt him for the same reason I'd hinder a competitor.

I would want both copy and person who created it dead.

What would your copy want?

What if it was a near-copy without $fatalMedicalCondition?

My first thought (in response to the second question) is 'immediately terminate myself, leaving the copy as the only valid continuation of my identity'.

Of course, it is questionable whether I would have the willpower to go through with it. I believe that my copy's mind would constitute just as 'real' a continuation of my consciousness as would my own mind following a procedure that removed the memories of the past few days (or however long since the split) whilst leaving all else intact (which is of course just a contrived-for-the-sake-of-the-thought-experiment variety of the sort of forgetting that we undergo all the time), but I have trouble alieving it.

This is a lot more interesting a response if you would also agree with Lalartu in the more general case.

Kind of. I wouldn't defect against my copy without his consent, but I would want the pool trimmed down to only a single version of myself (ideally whichever one had the highest expected future utility, all else equal). The copy, being a copy, should want the same thing. The only time I wouldn't be opposed to the existence of multiple instances of myself would be if those instances could regularly synchronize their memories and experiences (and thus constitute more a single distributed entity with mere synchronization delays than multiple diverging entities).

Why would you want to actively avoid having a copy?

Because I terminally value the uniqueness of my identity.

Really? Can you say a little more about why you think you have that value? I guess I'm not convinced that it's really a terminal value if it varies so widely across people of otherwise similar beliefs. Presumably that's what lalartu meant as well, but I just don't get it. I like myself, so I'd like more of myself in the world!

I think a big part of it is that I don't really care about other people except instrumentally. I care terminally about myself, but only because I experience my own thoughts and feelings first-hand. If I knew I were going to be branched, then I'd care about both copies in advance as both are valid continuations of my current sensory stream. However, once the branch had taken place, both copies would immediately stop caring about the other (although I expect they would still practice altruistic behavior towards each other for decision-theoretic reasons). I suspect this has also influenced my sense of morality: I've never been attracted to total utilitarianism, as I've never been able to see why the existence of X people should be considered superior to the existence of Y < X equally satisfied people.

So yeah, that's part of it, but not all of it (if that were the extent of it, I'd be indifferent to the existence of copies, not opposed to it). The rest is hard to put into words, and I suspect that even were I to succeed in doing so I'd only have succeeded in manufacturing a verbal rationalization. Part of it is instrumental, each copy would be a potential competitor, but that's insufficient to explain my feelings on the matter. This wouldn't be applicable to, say, the Many-Worlds Interpretation of quantum mechanics, and yet I'm still bothered by that interpretation as it implies constant branching of my identity. So in the end, I think that I can't offer a verbal justification for this preference precisely because it's a terminal preference.

Was a singleton likely to emerge in previous technological transitions?

Communism and the Internet seem the most likely candidates to approach ability to form a Singleton. But a Singleton by definition can stop evolutionary processes at all levels of complexity and system dynamics below itself. Neither the internet or communism can sustain sufficient levels of coordination to stymie evolution.

Do you think brain emulations would form superorganisms?

Some of the most awesome Superorganisms have specialized sub-classes of individuals (workers, soldiers, queens, and three distinct sizes of leafcutter and chefs for instance). Instead of assuming that a Superorganism needs to be made of hard to distinguish entities, we should consider entities that have trigger mechanisms for becoming a specific subset kind of organism, fine tuned for necessary activities. Ants use pheromonal control and other biochemical mechanisms to determine who does what when, and epigenetics to determine who becomes of what kind. Superorganisms made of emulations may do similar things without being made of copies or near copies. Nature is cleverer than you are.

What did you find most interesting?

Bostrom points to various forces in the direction of coordination. How can you tell whether these would lead to slightly larger companies, or the world coordinating into a single agent?

There's a whole book about this: Non-Zero by Robert Wright.

Did you find any arguments unpersuasive this week?

I keep returning to one gnawing question that haunts the whole idea of Friendly AI: how do you program a machine to "care"? I can understand how a machine can appear to "want" something, favoring a certain outcome over another. But to talk about a machine "caring" is ignoring a very crucial point about life: that as clever as intelligence is, it cannot create care. We tend to love our kid more than someone else's. So you could program a machine to prefer another in which it recognizes a piece of its own code. That may LOOK like care but it's really just an outcome. How could you replicate, for example, the love a parent shows for a kid they didn't produce? What if that kid were humanity? So too with Coherent Extrapolated Volition, you can keep refining the resolution of an outcome, but I don't see how any mechanism can actually care about anything but an outcome.

While "want" and "prefer"may be useful terms, such terms as "care", "desire", "value" constitute an enormous and dangerous anthropomorphizing. We cannot imagine outside our own frame, and this is one place where that gets us into real trouble. Once someone creates a code that will recognize something truly metaphysical I would be convinced that FAI is possible. Even whole brain emulation assumes that both that our thoughts are nothing but code and a brain with or without a body is the same thing. Am I missing something?

Leaving aside other matters, what does it matter if an FAI 'cares' in the sense that humans do so long as its actions bring about high utility from a human perspective?

Because what any human wants is a moving target. As soon as someone else delivers exactly what you ask for, you will be disappointed unless you suddenly stop changing. Think of the dilemma of eating something you know you shouldn't. Whatever you decide, as soon as anyone (AI or human) takes away your freedom to change your mind, you will likely rebel furiously. Human freedom is a huge value that any FAI of any description will be unable to deliver until we are no longer free agents.

What would an AI that 'cares' in the sense you spoke of be able to do to address this problem that a non-'caring' one wouldn't?

How can you distinguish "recognizing something truly metaphysical" from (1) "falsely claiming to recognize something truly metaphysical" and (2) "sincerely claiming to recognize something truly metaphysical, but wrong because actually the thing in question isn't real or is very different from what seems to have been recognized"?

Perhaps "caring" offers a potential example of #1. A machine says it cares about us, it consistently acts in ways that benefit us, it exhibits what look like signs of distress and gratification when we do ill or well -- but perhaps it's "just an outcome" (whatever exactly that means). How do you tell? (I am worried that the answer might be "by mere prejudice: if it's a machine putatively doing the caring, then of course it isn't real". I think that would be a bad answer.)

Obvious example of #2: many people have believed themselves to be in touch with gods, and I guess communion with a god would count as "truly metaphysical". But given what those people have thought about the gods they believed themselves to be in touch with, it seems fairly clear that most of them must have been wrong (because where a bunch of people have mutually-fundamentally-incompatible ideas about their gods, at most one can be right).

This is Yudkowsky's Hidden Complexity of Wishes problem from the human perspective. The concept of "caring" is rooted so deeply (in our flesh, I insist) that we cannot express it. Getting across the idea to AI that you care about your mother is not the same as asking for an outcome. This is why the problem is so hard. How would you convince the AI, in your first example, that your care was real? Or in your #2, that your wish was different from what it delivered? And how do you tell, you ask? By being disappointed in the result! (For instance in Yudkowsky's example, when the AI delivers Mom out of the burning building as you requested, but in pieces.)

My point is that value is not a matter of cognition of the brain, but caring from the heart. When AI calls your insistence that it didn't deliver what you wanted "prejudice", I don't think you'd be happy with the above defense.

[Ref: http://lesswrong.com/lw/ld/the_hidden_complexity_of_wishes/]

What I wrote wasn't intended as a defense of anything; it was an attempt to understand what you were saying. Since you completely ignored the questions I asked (which is of course your prerogative), I am none the wiser.

I think you may have misunderstood my conjecture about prejudice; if an AI professes to "care" but doesn't in fact act in ways we recognize as caring, and if we conclude that actually it doesn't care in the sense we meant, that's not prejudice. (But it is looking at "outcomes", which you disdained before.)

How likely do you think a singleton is to form in a second transition?

[Ref: http://lesswrong.com/lw/ld/the_hidden_complexity_of_wishes/]