I continue worried with what I posted last week: the fragility of human minds. The more I learn about social endocrynology, glands, neurotransmitters and cognitive neuroscience, the more I notice that the alleged "robustness" of cognition, usually attributed to a combination of plasticity and redundancy, is only robust to the sorts of challenges and problems an animal brain may encounter, problems like firmly memorizing the name of your significant other, hunting in different environments, getting old, and internal bleeding.
But if we had emulations, the sorts of shifts, tweaks and twists that can be done are numerously more than that, and they could well act on subsets of the mind which have no robustness at all, to Minsky's dismay. The fear here is that evolution selected for some kinds of robustness, and completely did not worry about others, and we will soon be able to modify minds in that way, inadvertently so.
For the second time while going through these posts: the more I think about Superintelligence and delve into it's literature, the more skeptical I'm becoming that we will make it through. Everything seems so brittle.
To check I understand: you are saying lack of robustness will make it easy to modify minds a lot?
I'm saying that the kind of robustness which minds/brains are famous for is not sufficient once you have a digital version of the brain, where the things you will change are of a different nature.
So current squishy minds and brains are robust, but they would not be robust once virtually implemented.
Responding to Paul's related skepticism in my other post:
But that seems to make it easier to specify a person precisely, not harder. The differences in observations allow someone to quickly rule out alternative models by observations of people. Indeed, the way in which a human brain is physically represented makes little difference to the kind of predictions someone would make.
There are many ways of creating something inside a virtual black box that does - as seen from the outside - what my brain does here on earth. Let's go through a few and see where their robustness fails:
1) Scan copy my brain and put it in there.
Failures:
a) You may choose to scan the wrong level of granularity, say, synapses instead of granular cells, neural columns instead of 3d voxels, molecular gates instead of quantum data relevant to microtubule distribution.
b) You may scan the right level of granularity - hoping there is only one! - and fail to find a translation schema for transducers and effectors, the ways in which the box interacts with the outer world.
2) Use an indirect method similar to those Paul described which vastly constrains the output a human can generate (like a typing board which ignores the rest), create a model of the human based on it, when the distinction between the doppelganger model and actor falls below a certain Epsilon, consider that model equivalent to the human and use it.
Failures:
a) It may turn out to be the case that having a brain like neural network/Markov network is actually the most predictive way of simulating a human, so you'd end up with a model that looks like, and behaves like an embedded cognition, physically distributed in space, with all the perks and perils that carries. Tamper with the virtual adrenal glands, and who knows what happens.
b) It may also be that a vastly distinct model from the way we do it would result in similar behavior. Then a whole different realm of completely unexplored confounds, polysemic and synonimic IF THEN gates and chain reactions we never had the chance to even glimpse at would be the virtual entity we are dealing with. This would make me much less comfortable with turning this emulation on than turning on a brain based one. It seems way less predictable (despite having matched the behavior of that human up to that point) once it's environment, embedment and inner structure changes.
It is worth keeping in mind that we are comparing the
robustness of these minds to tweaks available in the virtual world
to the
robustness of the alternatives, one of which is motivational scaffolding and concept teaching.
We should consider whether teaching language, reference, and moral systems is not easier than simulating a mind without distorting it's morals.
You'd have to go through a few google tradutor translations to transform a treatise on morality into a treatise on immorality, but - to exapt a Yudkowskian old example - you only have to give Ghandi one or two pills, or a tumor smaller than his toe, to completely change his moral stance on pacifism.
How well do you think the earlier methods apply in multipolar outcomes? How well do this week's methods apply to unipolar outcomes? Is value loading easier for multipolar or unipolar outcomes?
Do you think the described emulation institution would be feasible in an applicable future scenario?
This week's methods are more designed for messy, multipolar outcomes. Are there other safety methods that are especially applicable there?
The reason "the task of designing values and institutions is complicated by selection effects" is because that design is not very effective. Everyone makes this way to complicated. Life is a complex adaptive system: a few simple initial conditions iterating over time with feedback. The more integrated things are, the more, and more effective, emergent properties. As Alex Wissner-Gross and others suggest, you don't really design for value: large value is an emergent property. Design the initial conditions. But we don't have to do that: it's already been done! All we have to do is recognize, then codify evolution's initial conditions: Private property. Connections that are both tangible and intangible. Classification: Everything has a scope of relationships. It's the classification that holds all the meta data. And add value first: Iteration http://wp.me/p4neeB-4Y
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the twenty-second section in the reading guide: Emulation modulation and institutional design.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Emulation modulation” through “Synopsis” from Chapter 12.
Summary
The chapter synopsis includes a good summary of all of the value-loading techniques, which I'll remind you of here instead of re-summarizing too much:
Another view
Robin Hanson also favors institution design as a method of making the future nice, though as an alternative to worrying about values:
Hanson engages in more debate with David Chalmers' paper on related matters.
Notes
1. Relatively much has been said on how the organization and values of brain emulations might evolve naturally, as we saw earlier. This should remind us that the task of designing values and institutions is complicated by selection effects.
2. It seems strange to me to talk about the 'emulation modulation' method of value loading alongside the earlier less messy methods, because they seem to be aiming at radically different levels of precision (unless I misunderstand how well something like drugs can manipulate motivations). For the synthetic AI methods, it seems we were concerned about subtle differences in values that would lead to the AI behaving badly in unusual scenarios, or seeking out perverse instantiations. Are we to expect there to be a virtual drug that changes a human-like creature from desiring some manifestation of 'human happiness' which is not really what we would want to optimize on reflection, to a truer version of what humans want? It seems to me that if the answer is yes, at the point when human-level AI is developed, then it is very likely that we have a great understanding of specifying values in general, and this whole issue is not much of a problem.
3. Brian Tomasik discusses the impending problem of programs experiencing morally relevant suffering in an interview with Dylan Matthews of Vox. (p202)
4. If you are hanging out for a shorter (though still not actually short) and amusing summary of some of the basics in Superintelligence, Tim Urban of WaitButWhy just wrote a two part series on it.
5. At the end of this chapter about giving AI the right values, it is worth noting that it is mildly controversial whether humans constructing precise and explicitly understood AI values is the key issue for the future turning out well. A few alternative possibilities:
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will start talking about how to choose what values to give an AI, beginning with 'coherent extrapolated volition'. To prepare, read “The need for...” and “Coherent extrapolated volition” from Chapter 13. The discussion will go live at 6pm Pacific time next Monday 16 February. Sign up to be notified here.