Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Hedging our Bets: The Case for Pursuing Whole Brain Emulation to Safeguard Humanity's Future

11 Post author: inklesspen 01 March 2010 02:32AM

It is the fashion in some circles to promote funding for Friendly AI research as a guard against the existential threat of Unfriendly AI. While this is an admirable goal, the path to Whole Brain Emulation is in many respects more straightforward and presents fewer risks. Accordingly, by working towards WBE, we may be able to "weight" the outcome probability space of the singularity such that humanity is more likely to survive.

One of the potential existential risks in a technological singularity is that the recursively self-improving agent might be inimical to our interests, either through actual malevolence or "mere" indifference towards the best interests of humanity. Eliezer has written extensively on how a poorly-designed AI could lead to this existential risk. This is commonly termed Unfriendly AI.

Since the first superintelligence can be presumed to have an advantage over any subsequently-arising intelligences, Eliezer and others advocate funding research into creating Friendly AI. Such research must not only reverse-engineer consciousness, but also human notions of morality. Unfriendly AI could potentially require only sufficiently fast hardware to evolve an intelligence via artificial life, as depicted in Greg Egan's short story "Crystal Nights", or it may be created inadvertently by researchers at the NSA or a similar organization. It may be that creating Friendly AI is significantly harder than creating Unfriendly (or Indifferent) AI, perhaps so much so that we are unlikely to achieve it in time to save human civilization.

Fortunately, there's a short-cut we can take. We already have a great many relatively stable and sane intelligences. We merely need to increase their rate of self-improvement. As far as I can tell, developing mind uploading via WBE is a simpler task than creating Friendly AI. If WBE is fast enough to constitute an augmented intelligence, then our augmented scientists can trigger the singularity by developing more efficient computing devices. An augmented human intelligence may have a slower "take-off" than a purpose-built intelligence, but we can reasonably expect it to be much easier to ensure such a superintelligence is Friendly. In fact, this slower take-off will likely be to our advantage; it may increase our odds of being able to abort an Unfriendly singularity.

WBE may also be able to provide us with useful insights into the nature of consciousness, which will aid Friendly AI research. Even if it doesn't, it gets us most of the practical benefits of Friendly AI (immortality, feasible galactic colonization, etc) and makes it possible to wait longer for the rest of the benefits.

But what if I'm wrong? What if it's just as easy to create an AI we think is Friendly as it is to upload minds into WBE? Even in that case, I think it's best to work on WBE first. Consider the following two worlds: World A creates an AI its best scientists believes is Friendly and, after a best-effort psychiatric evaluation (for whatever good that might do) gives it Internet access. World B uploads 1000 of its best engineers, physicists, psychologists, philosophers, and businessmen (someone's gotta fund the research, right?). World B seems to me to have more survivable failure cases; if some of the uploaded individuals turn out to be sociopaths, the rest of them can stop the "bad" uploads from ruining civilization. It seems exceedingly unlikely that we would select a large enough group of sociopaths that the "good" uploads can't keep the "bad" uploads in check.

Furthermore, the danger of uploading sociopaths (or people who become sociopathic when presented with that power) is also a danger that the average person can easily comprehend, compared to the difficulty of ensuring Friendliness of an AI. I believe that the average person is also more likely to recognize where attempts at safeguarding an upload-triggered singularity may go wrong.

The only downside of this approach I can see is that an upload-triggered Unfriendly singularity may cause more suffering than an Unfriendly AI singularity; sociopaths may be presumed to have more interest in torture of people than a paperclip-optimizing AI would have.

Suppose, however, that everything goes right, the singularity occurs, and life becomes paradise by our standards. Can we predict anything of this future? It's a popular topic in science fiction, so many people certainly enjoy the effort. Depending on how we define a "Friendly singularity", there could be room for a wide range of outcomes.

Perhaps the AI rules wisely and well, and can give us anything we want, "save relevance". Perhaps human culture adapts well to the utopian society, as it seems to have done in the universe of The Culture. Perhaps our uploaded descendants set off to discover the secrets of the universe. I think the best way to ensure a human-centric future is to be the self-improving intelligences, instead of merely catching crumbs from the table of our successors.

In my view, the worst kind of "Friendly" singularity would be one where we discover we've made a weakly godlike entity who believes in benevolent dictatorship; if we must have gods, I want them to be made in our own image, beings who can be reasoned with and who can reason with one another. Best of all, though, is that singularity where we are the motivating forces, where we need not worry if we are being manipulated "in our best interest".

Ultimately, I want the future to have room for our mistakes. For these reasons, we ought to concentrate on achieving WBE and mind uploading first.

Comments (244)

Comment author: CarlShulman 01 March 2010 04:03:58AM *  18 points [-]

Folk at the Singularity Institute and the Future of Humanity Institute agree that it would probably (but unstably in the face of further analysis) be better to have brain emulations before de novo AI from an existential risk perspective (a WBE-based singleton seems more likely to go right than an AI design optimized for ease of development rather than safety). I actually recently gave a talk at FHI about the use of WBE to manage collection action problems such as Robin Hanson's "Burning the Cosmic Commons" and pressures to cut corners on safety of AI development, which I'll be putting online soon. One of the projects being funded by the SIAI Challenge Grant ending tonight is an analysis of the relationship between AI and WBE for existential risks.

However, the conclusion that accelerating WBE (presumably via scanning or neuroscience, not speeding up Moore's Law type trends in hardware) is the best marginal project for existential risk reduction is much less clear. Here are just a few of the relevant issues:

1) Are there investments best made far in advance with WBE or AI? It might be that the theory to build safe AIs cannot be rushed as much as institutions to manage WBEs, or it might be that WBE-regulating institutions require a buildup of political influence over decades.

2) The scanning and neuroscience knowledge needed to produce WBE may facilitate powerful AI well before WBE, as folk like Shane Legg suggest. In that case accelerating scanning would mean primarily earlier AI, with a shift towards neuromorphic designs.

3) How much advance warning will WBE and AI give, or rather what is our probability distribution over degrees of warning? The easier a transition is to see in advance, the more likely it will be addressed by those with weak incentives and relevant skills. Possibilities with less warning, and thus less opportunity for learning, may offer higher returns on the efforts of the unusually long-term oriented.

Folk at FHI have done some work to accelerate brain emulation, e.g. with the WBE Roadmap and workshop, but there is much discussion here about estimating the risks and benefits of various interventions that would go further or try to shape future use of the technology and awareness/responses to risks.

Comment author: wallowinmaya 02 July 2011 02:25:51PM 0 points [-]

I actually recently gave a talk at FHI about the use of WBE to manage collection action problems such as Robin Hanson's "Burning the Cosmic Commons" and pressures to cut corners on safety of AI development, which I'll be putting online soon.

I would love to read this talk. Do you have a blog or something?

Comment author: CarlShulman 02 July 2011 04:07:37PM 2 points [-]

It's on the SIAI website, here.

Comment author: RobinHanson 02 March 2010 02:46:31AM 0 points [-]

It seems to me that the post offers consideration that lean one in the direction of focusing efforts on encouraging good WBE, and that considerations offered in this comment don't much lean one back in the other direction. They mainly point to as yet unresolved uncertainties that might push us in many directions.

Comment author: CarlShulman 02 March 2010 03:21:25AM *  3 points [-]

My main aim was to make clear the agreement about WBE being preferable to AI, and the difference between a tech being the most likely route to survival and it being the best marginal use of effort, not to put a large amount of effort into carefully giving and justifying estimates of all the relevant parameters in this comments thread rather than other venues (such as the aforementioned paper).

Comment author: Vladimir_Nesov 01 March 2010 10:25:38AM *  6 points [-]

Focusing on slow-developing uploads doesn't cause slower development of other forms of AGI. Uploads themselves can't be expected to turn into FAIs without developing the (same) clean theory of de novo FAI (people are crazy, and uploads are no exception; this is why we have existential risk in the first place, even without any uploads). It's very hard to incrementally improve uploads' intelligence without affecting their preference, and so that won't happen on the first steps from vanilla humans, and pretty much can't happen unless we already have a good theory of preference, which we don't. We can't hold constant a concept (preference/values) that we don't understand (and as a magical concept, it's only held in the mind; any heuristics about it easily break when you push possibilities in the new regions). It's either (almost) no improvement (keep the humans until there is FAI theory), or value drift (until you become intelligent/sane enough to stop and work on preserving preference, but by then it won't be human preference); you obtain not-quite-Friendly AI in the end.

The only way in which uploads might help on the way towards FAI is by being faster (or even smarter/saner) FAI theorists, but in this regard they may accelerate the arrival of existential risks as well (especially the faster uploads that are not smarter/saner). To apply uploads specifically to FAI as opposed to generation of more existential risk, they have to be closely managed, which may be very hard to impossible once the tech gets out.

Comment author: CarlShulman 01 March 2010 05:57:41PM 10 points [-]

The only way in which uploads might help on the way towards FAI is by being faster (or even smarter/saner) FAI theorists, but in this regard they may accelerate the arrival of existential risks as well (especially the faster uploads that are not smarter/saner).

Emulations could also enable the creation of a singleton capable of globally balancing AI development speeds and dangers. That singleton could then take billions of subjective years to work on designing safe and beneficial AI. If designing safe AI is much, much harder than building AI at all, or if knowledge of AI and safe AI are tightly coupled, such a singleton might be the most likely route to a good outcome.

Comment author: Vladimir_Nesov 01 March 2010 08:19:35PM *  2 points [-]

I agree, if you construct this upload-aggregate and manage to ban other uses for the tech. This was reflected in the next sentence of my comment (maybe not too clearly):

To apply uploads specifically to FAI as opposed to generation of more existential risk, they have to be closely managed, which may be very hard to impossible once the tech gets out.

Comment author: utilitymonster 13 December 2010 05:30:00AM *  2 points [-]

Especially if WBE comes late (so there is a big hardware overhang), you wouldn't need a lot of time to spend loads of subjective years designing FAI. A small lead time could be enough. Of course, you'd have to be first and have significant influence on the project.

Edited for spelling.

Comment author: Jordan 01 March 2010 09:57:14PM 0 points [-]

I don't think this would be impossibly difficult. If an aggressive line of research is pursued then the first groups to create an upload will be using hardware that would make immediate application of the technology difficult. Commercialization likely wouldn't follow for years. That would potentially give government plenty of time to realize the potential of the technology and put a clamp on it.

At that point the most important thing is that the government (or whatever regulatory body will have oversight of the upload aggregate) is well informed enough to realize what they are dealing with and have sense enough to deal with it properly. To that end, one of the most important things we can be doing now is trying to insure that that regulatory body will be well informed enough when the day comes.

Comment author: Jordan 01 March 2010 09:44:44PM 3 points [-]

There is going to be value drift even if we get an FAI. Isn't that inherent in extrapolated volition? We don't really want our current values, we want the values we'll have after being smarter and having time to think deeply about them. The route of WBE simply takes the guess work out: actually make people smarter, and then see what the drifted values are. Of course, it's important to keep a large, diverse culture in the process, so that the whole can error correct for individuals that go off the deep end, analogous to why extrapolated volition would be based on the entire human population rather than a single person.

Comment author: andreas 02 March 2010 02:37:55AM *  2 points [-]

Here is a potentially more productive way of seeing this situation: We do want our current preferences to be made reality (because that's what the term preference describes), but we do not know what our preferences look like, part of the reason being that we are not smart enough and do not have enough time to think about what they are. In this view, our preferences are not necessarily going to drift if we figure out how to refer to human preference as a formal object and if we build machines that use this object to choose what to do — and in this view, we certainly don't want our preferences to drift.

On the other hand, WBE does not "simply take the guess work out". It may be the case that the human mind is built such that "making people smarter" is feasible without changing preference much, but we don't know that this is the case. As long as we do not have a formal theory of preference, we cannot strongly believe this about any given intervention – and if we do have such a theory, then there exist better uses for this knowledge.

Comment author: Jordan 02 March 2010 07:06:02AM 1 point [-]

We do want our current preferences to be made reality (because that's what the term preference describes)

Yes, but one of our preferences may well be that we are open to an evolution of our preferences. And, whether or not that is one of our preferences, it certainly is the cases that preferences do evolve over time, and that many consider that a fundamental aspect of the human condition.

It may be the case that the human mind is built such that "making people smarter" is feasible without changing preference much, but we don't know that this is the case.

I agree we don't know that is the case, and would assume that it isn't.

Comment author: Vladimir_Nesov 02 March 2010 08:41:32AM *  3 points [-]

Yes, but one of our preferences may well be that we are open to an evolution of our preferences. And, whether or not that is one of our preferences, it certainly is the cases that preferences do evolve over time, and that many consider that a fundamental aspect of the human condition.

Any notion of progress (what we want is certainly not evolution) can be captured as a deterministic criterion.

Comment author: Jordan 03 March 2010 06:18:47PM *  0 points [-]

Obviously I meant 'evolution' in the sense of change over time, not change specifically induced by natural selection.

As to a deterministic criterion, I agree that such a thing is probably possible. But... so what? I'm not arguing that FAI isn't possible. The topic at hand is FAI research relative to WBE. I'm assuming a priori that both are possible. The question is which basket should get more eggs.

Comment author: Vladimir_Nesov 03 March 2010 10:03:06PM *  1 point [-]

But... so what? I'm not arguing that FAI isn't possible. The topic at hand is FAI research relative to WBE. I'm assuming a priori that both are possible. The question is which basket should get more eggs.

You said:

Yes, but one of our preferences may well be that we are open to an evolution of our preferences.

This is misuse of the term "preference". "Preference", in the context of this discussion, refers specifically to that which isn't to be changed, ever. This point isn't supposed to be related to WBE vs. FAI discussion, it's about a tool (the term "preference") used in leading this discussion.

Comment author: Jordan 12 March 2010 12:59:29AM 1 point [-]

Your definition is too narrow for me to accept. Humans are complicated. I doubt we have a core set of "preferences" (by your definition) which can be found with adequate introspection. The very act of introspection itself changes the human and potentially their deepest preferences (normal definition)!

I have some preferences which satisfy your definition, but I wouldn't consider them my core, underlying preferences. The vast majority of preferences I hold do not qualify. I'm perfectly OK with them changing over time, even the ones that guide the overarching path of my life. Yes, the change in preferences is often caused by other preferences, but to think that this causal chain can be traced back to a core preference is unjustified, in my opinion. There could just as well be closed loops in the causal tree.

Comment author: Vladimir_Nesov 12 March 2010 12:31:05PM *  0 points [-]

You are disputing definitions! Of course, there are other natural ways to give meaning to the word "preference", but they are not as useful in discussing FAI as the comprehensive unchanging preference. It's not supposed to have much in common with likes or wants, and with their changes, though it needs to, in particular, describe what they should be, and how they should change. Think of your preference as that particular formal goal system that it is optimal, from your point of view (on reflection, if you knew more, etc.), to give to a Strong AI.

Your dislike for application of the label "preference" to this concept, and ambiguity that might introduce, needs to be separated from consideration of the concept itself.

Comment author: Jordan 12 March 2010 10:00:07PM *  0 points [-]

I specifically dispute the usefulness of your definition. It may be a useful definition in the context of FAI theory. We aren't discussing FAI theory.

And, to be fair, you were originally the one disputing definitions. In my post I used the standard definition of 'preference', which you decided was 'wrong', saying

This is misuse of the term "preference"

rather than accepting the implied (normal!) definition I had obviously used.

Regardless, it seems unlikely we'll be making any progress on the on-topic discussion even if we resolve this quibble.

Comment author: Vladimir_Nesov 02 March 2010 08:00:13AM *  1 point [-]

There is going to be value drift even if we get an FAI. Isn't that inherent in extrapolated volition?

No. Progress and development may be part of human preference, but it is entirely OK for a fixed preference to specify progress happening in a particular way, as opposed to other possible ways. Furthermore, preference can be fixed and still not knowable in advance (so that there are no spoilers, and moral progress happens through your effort and not dictated "from above").

It's not possible to efficiently find out some properties of a program, even if you have its whole source code; this source code doesn't change, but the program runs - develops - in novel and unexpected ways. Or course, the unexpected needs to be knowably good, not just "unexpected" (see for example Expected Creative Surprises).

Comment author: Jordan 02 March 2010 08:16:28AM *  0 points [-]

I agree that such a fixed preference system is possible. But I don't think that it needs to be implemented in order for "moral progress" to be indefinitely sustainable in a positive fashion. I think humans are capable of guiding their own moral progress without their hands being held. Will the result be provably friendly? No, of course not. The question is how likely is the result to be friendly, and is this likelihood great enough that it offsets the negatives associated with FAI research (namely the potentially very long timescales needed).

Comment author: Vladimir_Nesov 02 March 2010 08:32:50AM *  0 points [-]

I think humans are capable of guiding their own moral progress without their hands being held. Will the result be provably friendly? No, of course not. The question is how likely is the result to be friendly

The strawman of "provable friendliness" again. It's not about holding ourselves to an inadequately high standard, it's about figuring out what's going on, in any detail. (See this comment.)

If we accept that preference is complex (holds a lot of data), and that detail in preference matters (losing a relatively small portion of this data is highly undesirable), then any value drift is bad, and while value drift is not rigorously controlled, it's going to lead its random walk further and further away from the initial preference. As a result, from the point of view of the initial preference, the far future is pretty much lost, even if each individual step of the way doesn't look threatening. The future agency won't care about the past preference, and won't reverse to it, because as a result of value drift it already has different preference, and for it returning to the past is no longer preferable. This system isn't stable, deviations in preference don't correct themselves, if the deviated-preference agency has control.

Comment author: Jordan 03 March 2010 06:34:01PM *  0 points [-]

The strawman of "provable friendliness" again.

I fail to see how my post was a straw man. I was pointing out a deficiency in what I am supporting, not what you are supporting.

This system isn't stable, deviations in preference don't correct themselves, if the deviated-preference agency has control.

I disagree that we know this. Certainly the system hasn't stabilized yet, but how can you make such a broad statement about the future evolution of human preference? And, in any case, even if there were no ultimate attractor in the system, so what? Human preferences have changed over the centuries. My own preferences have changed over the years. I don't think anyone is arguing this is a bad thing. Certainly, we may be able to build a system that replaces our "sloppy" method of advancement for a deterministic system with an immutable set of preferences at its core. I disagree this is necessarily superior to letting preferences evolve in the same way they have been, free of an overseer. But that disagreement of ours is still off topic.

The topic is whether FAI or WBE research is better for existential risk reduction. The pertinent question is what are the likelihoods of each leading to what we would consider a positive singularity, and, more importantly, how do those likelihoods change as a function of our directed effort?

Comment author: Vladimir_Nesov 03 March 2010 10:27:17PM *  1 point [-]

I fail to see how my post was a straw man. I was pointing out a deficiency in what I am supporting, not what you are supporting.

It shouldn't matter who supports what. If you suddenly agree with me on some topic, you still have to convince me that you did so for the right reasons, and didn't accept a mistaken argument or mistaken understanding of an argument (see also "belief bias"). If such is to be discovered, you'd have to make a step back, and we both should agree that it's the right thing to do.

The "strawman" (probably a wrong term in this context) is in making a distinction between "friendliness" and "provable friendliness". If you accept that the distinction is illusory, the weakness of non-FAI "friendliness" suddenly becomes "provably fatal".

This system isn't stable, deviations in preference don't correct themselves, if the deviated-preference agency has control.

I disagree that we know this. Certainly the system hasn't stabilized yet, but how can you make such a broad statement about the future evolution of human preference?

Stability is a local property around a specific point, that states that sufficiently small deviations from that point will be followed by corrections back to it, so that the system will indefinitely remain in the close proximity of that point, provided it's not disturbed too much.

Where we replace ourselves with agency of slightly different preference, this new agency has no reason to correct backwards to our preference. If it is not itself stable (that is, it hasn't built its own FAI), then the next preference shift it'll experience (in effectively replacing itself with yet different preference agency) isn't going to be related to the first shift, isn't going to correct it. As a result, value is slowly but inevitably lost. This loss of value only stops when the reflective consistency is finally achieved, but it won't be by an agency that exactly shares your preference. Thus, even when you've lost a fight for specifically your preference, the only hope is for the similar-preference drifted agency to stop as soon as possible (as close to your preference as possible), to develop its FAI. (See also: Friendly AI: a vector for human preference.)

My own preferences have changed over the years. I don't think anyone is arguing this is a bad thing.

The past-you is going to prefer your preference not to change, even though current-you would prefer your preference to be as it now is. Note that preference has little to do with likes or wants, so you might be talking about surface reactions to environment and knowledge, not the eluding concept of what you'd prefer in the limit of reflection. (See also: "Why Socialists don't Believe in Fun", Eutopia is Scary.)

The topic is whether FAI or WBE research is better for existential risk reduction. The pertinent question is what are the likelihoods of each leading to what we would consider a positive singularity, and, more importantly, how do those likelihoods change as a function of our directed effort?

And to decide this question, we need a solid understanding of what counts as a success or failure. The concept of preference is an essential tool in gaining this understanding.

Comment author: Mitchell_Porter 01 March 2010 04:42:25AM 5 points [-]

Okay, let's go on the brain-simulation path. Let's start with something simple, like a lobster or a dog... oh wait, what if it transcends and isn't human-friendly. All right, we'll stick to human brains... oh wait, what if our model of neural function is wrong and we create a sociopathic copy that isn't human-friendly. All right, we'll work on human brain regions separately, and absolutely make sure that we have them all right before we do a whole brain... oh wait, what if one of our partial brain models transcends and isn't human-friendly.

And while you, whose reason for taking this path is to create a human-friendly future, struggle to avoid these pitfalls, there will be others who aren't so cautious, and who want to conduct experiments like hotwiring together cognitive modules that are merely brain-inspired, just to see what happens, or in the expectation of something cool, or because they want a smarter vacuum cleaner.

Comment author: JamesAndrix 01 March 2010 07:30:28AM 2 points [-]

We don't have to try and upgrade any virtual brains to get most of the benefits.

If we could today create an uploaded dog brain that's just a normal virtual dog running at 1/1000th realtime, that would be a huge win with no meaningful risk. That would lead us down a relatively stable path of obscenely expensive and slow uploads becoming cheaper every year. In this case cheaper means fast and also more numerous, At the start human society can handle a few slightly superior uploads, by the time uploads get way past us, they will be a society of themselves and on roughly equal footing. (this may be bad for people still running at realtime, but human values will persist)

The dangers of someone making a transcendent AI first are there no matter what. This is not a good argument against a FASTER way to get to safe superintelligence.

Comment author: Mitchell_Porter 01 March 2010 08:14:11AM *  2 points [-]

So, in this scenario we have obtained a big neural network that imprints on a master and can learn complex physical tasks... and we're just going to ignore the implications of that while we concentrate on trying to duplicate ourselves?

What's going to stop me from duplicating just the canine prefrontal cortex and experimenting with it? It's a nice little classifier / decision maker, I'm sure it has other uses...

Just the capacity to reliably emulate major functional regions of vertebrate brain already puts you on the threshold of creating big powerful nonhuman AI. If puploads come first, they'll be doing more than catching frisbees in Second Life.

Comment author: Kaj_Sotala 01 March 2010 03:49:03PM *  2 points [-]

I realize that this risk is kinda insignficant compared to the risk of all life on Earth being wiped out... But I'm more than a little scared of the thought of animal uploads, and the possibility of people creating lifeforms that can be cheaply copied and replicated, without them needing to have any of the physical features that usually elict sympathy from people. We already have plenty of people abusing their animals today, and being able to do it perfectly undetected on your home upload isn't going to help things.

To say nothing about when it becomes easier to run human uploads. I just yesterday re-read a rather disturbing short story about the stuff you could theoretically do with body-repairing nanomachines, a person who enjoys abusing others, and a "pet substitute". Err, a two-year old human child, that is.

Comment author: JamesAndrix 02 March 2010 03:19:01AM 1 point [-]

The supercomputers will be there whether we like it or not. Some of what they run will be attempts at AI. This is so far the only approach that someone unaware of Friendliness issues has a high probability of trying and succeeding with (and not immediately killing us all)

Numerous Un-augmented accelerated uplaods is a narrow safe path, and one we probably won't follow, but it is a safe path. (so far one of 2, so it's important) I think the likely win is less than FAI, but the dropoff isn't so steep either as you walk off the path. Any safe AI approach will suggest profitable nonsafe alternatives.

An FAI failure is almost certainly alien, or not there yet. An augmentation failure is probably less-capable, probably not hostile, probably not strictly executing a utility function, and above all: can be surrounded by other, faster, uploads.

If the first pupload costs half a billion dollars and runs very slow, then even tweaking it will be safer than say, letting neural nets evolve in a rich environment on the same hardware.

Comment author: Jordan 01 March 2010 09:35:11PM 0 points [-]

What's going to stop me from duplicating just the canine prefrontal cortex and experimenting with it? It's a nice little classifier / decision maker, I'm sure it has other uses...

What's going to stop you is that the prefrontal cortex is just one part of a larger whole. It may be possible to isolate that part, but doing so may be very difficult. Now, if your simulation were running in real time, you could just spawn off a bunch of different experiments pursuing different ideas for how to isolate and use the prefrontal cortex, and just keep doing this until you find something that works. But, if your simulation is running at 1/10000th realtime, as JamesAndrix suggests in his hypothetical, the prospects of this type of method seem dim.

Of course, maybe the existence of the dog brain simulation is sufficient to spur advances in neuroscience to the point where you could just isolate the functioning of the cortex, without the need for millions of experimental runs. Even so, your resulting module is still going to be too slow to be an existential threat.

Just the capacity to reliably emulate major functional regions of vertebrate brain already puts you on the threshold of creating big powerful nonhuman AI.

The threshold, yes. But that threshold is still nontrivial to cross. The question is, given that we can reliably emulate major functional regions of the brain, is it easier to cross the threshold to nonhuman AI, or to full emulations of humans? There is virtually no barrier to the second threshold, while the first one still has nontrivial problems to be solved.

Comment author: Mitchell_Porter 02 March 2010 02:30:39AM 2 points [-]

It may be possible to isolate that part, but doing so may be very difficult.

Why would it be difficult? How would it be difficult?

There is a rather utopian notion of mind uploading, according to which you blindly scan a brain without understanding it, and then turn that data directly into a simulation. I'm aware of two such scanning paradigms. In one, you freeze the brain, microtome it, and then image the sections. In the other you do high-resolution imaging of the living brain (e.g. fMRI) and then you construct a state-machine model for each small 3D volume.

To turn the images of those microtomed sections into an accurate dynamical model requires a lot of interpretive knowledge. The MRI-plus-inference pathway sounds much more plausible as a blind path to brain simulation. But either way, you are going to know what the physical 3D location of every element in your simulation was, and functional neuroanatomy is already quite sophisticated. It won't be hard to single out the sim-neurons specific to a particular anatomical macroregion.

There is virtually no barrier to the second threshold, while the first one still has nontrivial problems to be solved.

If you can simulate a human, you can immediately start experimenting with nonhuman cognitive architectures by lobotomizing or lesioning the simulation. But this would already be true for simulated animal brains as well.

Comment author: Jordan 02 March 2010 07:12:57AM 2 points [-]

It won't be hard to single out the sim-neurons specific to a particular anatomical macroregion.

That's true, but ultimately the regions of the brain are not completely islands. The circuitry connecting them is itself intricate. You may, for instance, be able to extract the visual cortex and get it to do some computer vision for you, but I doubt extracting a prefontal cortex will be useful without all the subsystems it depends on. More importantly, how to wire up new configurations (maybe you want to have a double prefrontal cortex: twice the cognitive power!) strikes me as a fundamentally difficulty problem. At that point you probably need to have some legitimate high level understanding of the components and their connective behaviors to succeed. To contrast, a vanilla emulation where you aren't modifying the architecture or performing virtual surgery requires no such high level understanding.

Comment author: Peter_de_Blanc 01 March 2010 07:14:40AM 2 points [-]

How does a lobster simulation transcend?

Comment author: gwern 01 March 2010 02:48:30PM 4 points [-]

Clearly people in this thread are not Charles Stross fans.

Comment author: JenniferRM 15 March 2010 04:43:55AM *  1 point [-]

For those not getting this, the book Accelerando starts with the main character being called by something with a russian accent that claims to be a neuromorphic AI based off of lobsters grafted into some knowledge management. This AI (roughly "the lobsters") seeks a human who can help them "defect".

I recommend the book! The ideas aren't super deep in retrospect but its "near future" parts have one hilariously juxtaposed geeky allusion after another and the later parts are an interesting take on post-human politics and economics.

I assume the lobsters were chosen because of existing research in this area. For example, there are techniques for keeping bits alive in vitro, there is modeling work from the 1990's trying to reproduce known neural mechanisms in silico, and I remember (but couldn't find the link) that a team had some success around 2001(?) doing a moravec transfer to one or more cells in a lobster ganglia (minus the nanotech of course). There are lots of papers in this area. The ones I linked to were easy to find.

Comment author: dclayh 01 March 2010 07:19:59AM 4 points [-]

That sounds like a koan.

Comment author: Mitchell_Porter 01 March 2010 07:40:33AM 0 points [-]

Someone uses it to explore its own fitness landscape.

Comment author: cousin_it 01 March 2010 01:43:30PM *  3 points [-]

Huh? Lobsters have been exploring their own fitness landscape for quite some time and haven't transcended yet. Evolution doesn't inevitably lead towards intelligence.

Comment author: Mitchell_Porter 02 March 2010 01:57:33AM 1 point [-]

I was way too obscure. I meant: turn it into a Godel machine by modifying the lobster program to explore and evaluate the space of altered lobster programs.

Comment author: cousin_it 08 March 2010 07:07:07PM *  1 point [-]

Why do you need a lobster for that? You could start today with any old piece of open source code and any measure of "fitness" you like. People have tried to do this for awhile without much success.

Comment author: AngryParsley 02 March 2010 02:04:49AM 1 point [-]

Let's start with something simple, like a lobster or a dog... oh wait, what if it transcends and isn't human-friendly.

Lobsters and dogs aren't general intelligences. A million years of dog-thoughts can't do the job of a few minutes of human-thoughts. Although a self-improving dog could be pretty friendly. Cats on the other hand... well that would be bad news. :)

what if our model of neural function is wrong and we create a sociopathic copy that isn't human-friendly.

I find that very unlikely. If you look at diseases or compounds that affect every neuron in the brain, they usually affect all cognitive abilities. Keeping intelligence while eliminating empathy would be pretty hard to do by accident, and if it did happen it would be easy to detect. Humans have experience detecting sociopathic tendencies in other humans. Unlike an AI, an upload can't easily understand its own code, so self-improving is going to be that much more difficult. It's not going to be some super-amazing thing that can immediately hack a human mind over a text terminal.

oh wait, what if one of our partial brain models transcends and isn't human-friendly.

That still seems unlikely. If you look at brains with certain parts missing or injured, you see that they are disabled in very specific ways. Take away just a tiny part of a brain and you'll end up with things like face blindness, Capgras delusion, or Anton-Babinski syndrome. By only simulating individual parts of the brain, it becomes less likely that the upload will transcend.

Comment author: Mitchell_Porter 02 March 2010 03:27:38AM 2 points [-]

Lobsters and dogs aren't general intelligences.

So they won't transcend if we do nothing but run them in copies of their ancestral environments. But how likely is that? They will instead become tools in our software toolbox (see below).

Unlike an AI, an upload can't easily understand its own code, so self-improving is going to be that much more difficult.

The argument for uploads first is not that by uploading humans, we have solved the problem of Friendliness. The uploads still have to solve that problem. The argument is that the odds are better if the first human-level faster-than-human intelligences are copies of humans rather than nonhuman AIs.

But guaranteeing fidelity in your copy is itself a problem comparable to the problem of Friendliness. It would be incredibly easy for us to miss that (e.g.) a particular neuronal chemical response is of cognitive and not just physiological significance, leave it out of the uploading protocol, and thereby create "copies" which systematically deviate from human cognition in some way, whether subtle or blatant.

By only simulating individual parts of the brain, it becomes less likely that the upload will transcend.

The classic recipe for unsafe self-enhancing AI is that you assemble a collection of software tools, and use them to build better tools, and eventually you delegate even that tool-improving function. The significance of partial uploads is that they can give a big boost to this process.

Comment author: Jordan 01 March 2010 09:18:42PM 0 points [-]

there will be others who aren't so cautious, and who want to conduct experiments like hotwiring together cognitive modules that are merely brain-inspired

This is why it's important that we have high fidelity simulations sooner rather than later, while the necessary hardware rests in the hands of the handful of institutions that can afford top tier supercomputers, rather than an idiot in a garage trying to build a better Roomba. There would be fewer players in the field, making the research easier to monitor, and, more importantly, it would be much more difficult to jerry rig a bunch of modules together. The more cumbersome the hardware the harder experimentation will be, making high fidelity copies more likely to provide computer intelligence before hotwired modules or neuromorphically inspired architectures.

Comment author: Mitchell_Porter 02 March 2010 06:07:21AM 4 points [-]

An important fact is that whether your aim is Friendly AI or mind uploading, either way, someone has to do neuroscience. As the author observes,

Such research [FAI] must not only reverse-engineer consciousness, but also human notions of morality.

In FAI strategy as currently conceived, the AI is the neuroscientist. Through a combination of empirical and deductive means, and with its actions bounded by some form of interim Friendliness (so it doesn't kill people or create conscious sim-people along the way), the AI figures out the human decision architecture, extrapolates our collective volition as it would pertain to its own actions, and implements that volition.

Now note that this is an agenda each step of which could be carried out by all-natural human beings. Human neuroscientists could understand the human decision process, discover our true values and their reflective equilibrium, and act in accordance with the idealized values. The SIAI model is simply one in which all these steps are carried out by an AI rather than by human beings. In principle, you could aim to leave the AI out of it until human beings had solved the CEV problem themselves; and only then would you set a self-enhancing FAI in motion, with the CEV solution coded in from the beginning.

Eliezer has written about the unreliability of human attempts to formulate morality in a set of principles, using just intuition. Thus instead we are to delegate this investigation to an AI-neuroscientist. But to feel secure that the AI-neuroscientist is indeed discovering the nature of morality, and not some other similar-but-crucially-different systemic property of human cognition, we need its investigate methodology (e.g. its epistemology and its interim ethics) to be reliable. So either way, at some point human judgment enters the picture. And by the Turing universality of computation, anything the AI can do, humans can do too. They might be a lot slower, they might have to do it very redundantly to do it with the same reliability, but it should be possible for mere humans to solve the problem of CEV exactly as we would wish a proto-FAI to do.

Since the path to human mind uploading has its own difficulties and hazards, and still leaves the problem of Friendly superintelligence unsolved, I suggest that people who are worried about leaving everything up to an AI think about how a purely human implementation of the CEV research program would work - one that was carried out solely by human beings, using only the sort of software we have now.

Comment author: zero_call 01 March 2010 03:34:45AM *  4 points [-]

Why should an uploaded superintelligence based on a human copy be any innately safer than an artificial superintelligence? Just because humans are usually friendly doesn't mean a human AI would have to be friendly. This is especially true for a superintelligent human AI, which may not even be comparable to its original human template. Even the friendliest human might be angry and abusive when they're having a bad day.

Your idea that a WBE copy would be easier to undergo a relatively more enhanced supervised, safe growth, is basically an assumption. You would need to argue this in much more detail for it to merit deeper consideration.

Also, you cannot assume that an uploaded human superintelligence would be more constrained, as in "...after a best-effort psychiatric evaluation (for whatever good that might do) gives it Internet access". This is related to the the AI-box problem, where it is contended that a superintelligence could not be contained, no matter what. Personally I dispute this, but at least it's not something to be taken for granted.

Comment author: CarlShulman 01 March 2010 05:03:32AM *  12 points [-]

WBE safety could benefit from an existing body of knowledge about human behavior and capabilities, and the spaghetti code of the brain could plausibly impose a higher barrier to rapid self-improvement. And institutions exploiting the cheap copyability of brain emulations could greatly help in stabilizing benevolent motivations.

WBE is a tiny region of the space of AI designs that we can imagine as plausible possibilities, and we have less uncertainty about it than about "whatever non-WBE AI technology comes first." Some architectures might be easier to make safe, and others harder, but if you are highly uncertain about non-WBE AI's properties then you need wide confidence intervals.

WBE also has the nice property that it is relatively all-or-nothing. With de novo AI, designers will be tempted to trade off design safety for speed, but for WBE a design that works at all will be relatively close to the desired motivations (there will still be tradeoffs with emulation brain damage, but the effect seems less severe than for de novo AI). Attempts to reduce WBE risk might just involve preparing analysis and institutions to manage WBE upon development, where AI safety would require control of the development process to avoid intrinsically unsafe designs.

Comment author: RobinHanson 02 March 2010 02:53:57AM 1 point [-]

This is a good summary.

Comment author: NancyLebovitz 08 March 2010 02:38:50PM 1 point [-]

At least we know what a friendly human being looks like.

And I wouldn't stop at a psychiatric evaluation of the person to be uploaded. I'd work on evaluating whether the potential uploadee was good for the people they associate with.

Comment author: Eliezer_Yudkowsky 01 March 2010 04:57:30AM 7 points [-]

It would have been good to check this suggested post topic on an Open Thread, first - in fact I should get around to editing the FAQ to suggest this for first posts.

Perhaps the AI rules wisely and well, and can give us anything we want, "save relevance".

In addition to the retreads that others have pointed out on the upload safety issue, this is a retread of the Fun Theory Sequence:

http://lesswrong.com/lw/xy/the_fun_theory_sequence/

Also the way you phrased the above suggests that we build some kind of AI and then discover what we've built. The space of mind designs is very large. If we know what we're doing, we reach in and get whatever we specify, including an AI that need not steal our relevance (see Fun Theory above). If whoever first reaches in and pulls out a self-improving AI doesn't know what they're doing, we all die. That is why SIAI and FHI agree on at least wistfully wishing that uploads would come first, not the relevance thing. This part hasn't really been organized into proper sequences on Less Wrong, but see Fake Fake Utility Functions, the Metaethics sequence, and the ai / fai tags.

Comment author: rwallace 03 March 2010 02:30:31AM 1 point [-]

That is why SIAI and FHI agree on at least wistfully wishing that uploads would come first

It seems to me that uploads first is quite possible, and also that the relatively small resources currently being devoted to uploading research, make the timing of uploading, a point of quite high leverage. Would SIAI or FHI be interested in discussing ways to accelerate uploading?

Comment author: CarlShulman 03 March 2010 02:35:08AM 2 points [-]
Comment author: rwallace 03 March 2010 05:08:48AM 0 points [-]

Thanks! Excellent start. The section on falsifiable design, in particular, I'd recommend reading for anyone interested in any kind of speculative technology.

Comment author: timtyler 02 March 2010 10:16:45AM -2 points [-]

Re: "If whoever first reaches in and pulls out a self-improving AI doesn't know what they're doing, we all die."

This is the "mad computer scientist destroys the world" scenario?

Isn't that science fiction?

Human culture forms a big self-improving system. We use the tools from the last generation to build the next generation of tools. Yes, things will get faster as the process becomes more automated - but automating everything looks like it will take a while, and it is far-from clear that complete automation is undesirable.

Comment author: Mitchell_Porter 02 March 2010 10:36:34AM 2 points [-]

What do you disagree with in such a scenario? There are clearly levels of technological power such that nothing on Earth could resist. The goals of an AI are radically contingent. If a goal-seeking entity has literally no counterbalancing motivations, then it will seek to realize that goal using all means necessary and available, including sufficiently advanced technology.

Comment author: timtyler 02 March 2010 10:44:09AM *  0 points [-]

Machine intelligence will probably be developed first in a big company or government department - by a substantial team. They are going to leave off the big red stop button and be destroyed by their own creation?!? How difficult is it to say: "count to 1000 and then STOP"?!? Some of the smartest minds on the planet build machine intelligence - and then mess up the most basic safety features? It would be an utterly incredible turn of events.

Comment author: Mitchell_Porter 02 March 2010 11:45:51AM 7 points [-]

Yes, it's not 100% certain that a self-enhancing AI will overwhelm human resistance. One can construct scenarios where the process is halted in time. For example, the AI's cognition can be monitored, and everything halted if a goal like "prevent the stop button from being pressed" ever crosses its mind. Or the human programmers can remind themselves to get scared if they find the AI especially preoccupied with modeling their own thinking.

But as an AI develops, it will become increasingly difficult for its human creators to keep track of everything it thinks, plans, and considers. Perhaps the greatest dangers will be those stratagems which are not explicitly (in the thought process) motivated by the desire to escape human control. The AI simply has a goal, X, and it perceives that ABCDEF... combined will facilitate X, and it all looks innocent to the human supervisors. But actually ABC is a miracle self-hack which lets a plan get developed without being analyzed by module D, and module D has the property E of making planning really slow, which by heuristic F is a bad thing. Unfortunately, module D is part of the automatic system which looks for consideration of dangerous plans. So now the AI is off the leash and no-one will even know until it's too late.

That particular example may still be unrealistically simple (though of course, in real life, big important organizations do make mistakes that simple, despite the existence of quality control and internal auditing). But as the method gets more and more complicated, the more likely it is that an AI would indeed successfully escape its handcuffs through such a measure, and not even because it especially prioritized escape; simply because it noticed that if it could reach over there it could do something useful, and it noticed that it would be able to do so if did something complicated and clever with its "hands".

There are still countermeasures; you could look in the AI's thoughts, not just for explicit consideration of ways to circumvent control, but for consideration of scenarios in which it has freedoms it should not be able to obtain. But then the AI might act indirectly, through a sequence of other entities, and so you would only see it thinking about the actions of those surrogates, unless you managed to put the big picture together. OK then, so we need to go to a higher level, and tell the AI to monitor itself. But that's only going to work if it knows that it shouldn't do a certain thing, which means that getting the goals right is supremely important - which brings us back to the pursuit of Friendly AI, and the attempt to figure out just what the overall "morality" of an AI should be.

Comment author: timtyler 02 March 2010 08:48:57PM 0 points [-]

My analysis of the situation is here:

http://alife.co.uk/essays/stopping_superintelligence/

It presents an approach which doesn't rely on "handcuffing" the agent.

Comment author: wedrifid 02 March 2010 10:52:30AM 3 points [-]

That sounds like a group that knows what they are doing!

Comment author: timtyler 02 March 2010 10:59:52AM *  0 points [-]

Indeed - the "incompetent fools create machine intelligence before anyone else and then destroy the world" scenario is just not very plausible.

Comment author: wedrifid 02 March 2010 12:09:31PM *  2 points [-]

I haven't worked on any projects that are either as novel or as large as a recursively self modifying AI. On those projects that I have worked on not all of them worked without any hiccups and novelty and scope did not seem to make things any easier to pull off smoothly. It would not surprise me terribly if the first AI created does not go entirely according to plan.

Comment author: timtyler 02 March 2010 08:51:07PM 0 points [-]

Sure. Looking at the invention of powered flight, some people may even die - but that is a bit different from everyone dying.

Comment author: LucasSloan 03 March 2010 12:17:56AM 3 points [-]

Do we have any reason to believe that aeroplanes will be able to kill the human race, even if everything goes wrong?

Comment author: JenniferRM 15 March 2010 03:46:47AM *  2 points [-]

Upvoted for raising the issue, even though I disagree with your point.

The internet itself was arguably put together in the ways you describe (government funding, many people contributing various bits, etc) but as far as I'm aware, the internet itself has no clean "off button".

If it was somehow decided that the internet was a net harm to humanity for whatever reasons, then the only way to make it go away is for many, many actors to agree multilaterally and without defection that they will stop having their computers talk to other computers around the planet despite this being personally beneficial (email, voip, www, irc, torrent, etc) to themselves.

Technologies like broadcast radio and television are pretty susceptible to jamming, detection, and regulation. In contrast, the "freedom" inherent to the net may be "politically good" in some liberal and freedom-loving senses, but it makes for an abstractly troubling example of a world transforming computer technology created by large institutions with nominally positive intentions that turned out to be are hard to put back in the box. You may personally have a plan for a certain kind of off button and timer system, but that doesn't strongly predict the same will be true of other systems that might be designed and built.

Comment author: timtyler 16 March 2010 09:46:57PM *  0 points [-]

Right - well, you have to think something is likely to be dangerous to you in some way before you start adding paranoid safety features. The people who built the internet are mostly in a mutually beneficial relationship with it - so no problem.

I don't pretend that building a system which you can deactivate helps other people if they want to deactivate it. A military robot might have an off switch that only the commander with the right private key could activate. If that commander wants to wipe out 90% of the humans on the planet, then his "off switch" won't help them. That is not a scenario which a deliberate "off switch" is intended to help with in the first place.

Comment author: Peter_de_Blanc 03 March 2010 01:01:47AM 2 points [-]

I agree that with the right precautions, running an unfriendly superintelligence for 1,000 ticks and then shutting it off is possible. But I can't think of many reasons why you would actually want to. You can't use diagnostics from the trial run to help you design the next generation of AIs; diagnostics provide a channel for the AI to talk at you.

Comment author: timtyler 03 March 2010 09:23:01AM 1 point [-]

The given reason is paranoia. If you are concerned that a runaway machine intelligence might accidentally obliterate all sentient life, then a machine that can shut itself down has gained a positive safety feature.

In practice, I don't think we will have to build machines that regularly shut down. Nobody regularly shuts down Google. The point is that - if we seriously think that there is a good reason to be paranoid about this scenario - then there is a defense that is much easier to implement than building a machine intelligence which has assimilated all human values.

I think this dramatically reduces the probability of the "runaway machine accidentally kills all humans" scenario.

Comment author: timtyler 04 March 2010 09:43:19AM 0 points [-]

Incidentally, I think there must be some miscommunication going on. A machine intelligence with a stop button can still communicate. It can talk to you before you switch it off, it can leave messages for you - and so on.

If you leave it turned on for long enough, it may even get to explain to you in detail exactly how much more wonderful the universe would be for you - if you would just leave it switched on.

Comment author: Peter_de_Blanc 04 March 2010 02:26:54PM 2 points [-]

I suppose a stop button is a positive safety feature, but it's not remotely sufficient.

Comment author: timtyler 04 March 2010 09:03:26PM 0 points [-]

Sufficient for what? The idea of a machine intelligence that can STOP is to deal with concerns about a runaway machine intelligence engaging in extended destructive expansion against the wishes of its creators. If you can correctly engineer a "STOP" button, you don't have to worry about your machine turning the world into paperclips any more.

A "STOP" button doesn't deal with the kind of problems caused by - for example - a machine intelligence built by a power-crazed dictator - but that is not what is being claimed for it.

Comment author: Peter_de_Blanc 05 March 2010 01:16:05AM 3 points [-]

The stop button wouldn't stop other AIs created by the original AI.

Comment author: timtyler 05 March 2010 08:57:47AM *  0 points [-]

I did present some proposals relating to that issue:

"One thing that might help is to put the agent into a quiescent state before being switched off. In the quiescent state, utility depends on not taking any of its previous utility-producing actions. This helps to motivate the machine to ensure subcontractors and minions can be told to cease and desist. If the agent is doing nothing when it is switched off, hopefully, it will continue to do nothing.

Problems with the agent's sense of identity can be partly addressed by making sure that it has a good sense of identity. If it makes minions, it should count them as somatic tissue, and ensure they are switched off as well. Subcontractors should not be "switched off" - but should be tracked and told to desist - and so on."

Comment author: wnoise 02 March 2010 05:08:33PM 1 point [-]

You have far too much faith in large groups.

Comment author: timtyler 02 March 2010 08:57:58PM *  0 points [-]

That is a pretty vague criticism - you don't say whether you are critical of the idea the idea that large groups will be responsible for machine intelligence or the idea that they are unlikely to build a murderous machine intelligence that destroys all humans.

Comment author: wnoise 03 March 2010 06:31:52AM 3 points [-]

I'm critical of the idea that given a large group builds a machine intelligence, they will be unlikely to build a murderous (or otherwise severely harmful) machine intelligence.

Consider that engineering developed into a regulated profession only after several large scale disasters. Even still, there are notable disasters from time to time. Now consider the professionalism of the average software developer and their average manager. A disaster in this context could be far greater than the loss of everyone in the lab or facility.

Comment author: timtyler 03 March 2010 09:35:22AM *  1 point [-]

Right - well, some people may well die. I expect some people died at the hands of the printing press - probably through starvation and malnutrition. Personally, I expect all those saved from gruesome deaths in automobile accidents are likely to vastly outnumber them in this case - but that is another issue.

Anyway, I am not arguing that nobody will die. The idea I was criticising was that "we all die".

My favoured example of IT company gone bad is Microsoft. IMO, Microsoft have done considerable damage to the computing industry, over an extended period of time - illustrating how programs can be relatively harmful. However, "even" a Microsoft superintelligence seems unlikely to kill everyone.

Comment author: LucasSloan 03 March 2010 12:21:17AM 1 point [-]

Why do you expect that the AI will not be able to fool the research team?

Comment author: timtyler 03 March 2010 12:43:42AM *  0 points [-]

My argument isn't about the machine not sharing goals with the humans - it's about whether the humans can shut the machine down if they want to.

I argue that it is not rocket science to build a machine with a stop button - or one that shuts down at a specified time.

Such a machine would not want to fool the research team - in order to avoid shutting itself down on request. Rather, it would do everything in its power to make sure that the shut-down happened on schedule.

Many of the fears here about machine intelligence run amok are about a runaway machine that disobeys its creators. However, the creators built it. They are in an excellent position to install large red stop buttons and other kill switches to prevent such outcomes.

Comment author: wedrifid 03 March 2010 12:52:32AM 4 points [-]

Given 30 seconds thought I can come up with ways to ensure that the universe is altered in the direction of my goals in the long term even if I happen to cease existing at a known time in the future. I expect an intelligence that is more advanced than I to be able to work out a way to substantially modify the future despite a 'red button' deadline. The task of making the AI respect the 'true spirit of a planned shutdown' shares many difficulties of the FAI problem itself.

Comment author: orthonormal 03 March 2010 03:39:46AM 1 point [-]

You might say it's an FAI-complete problem, in the same way "building a transhuman AI you can interact with and keep boxed" is.

Comment author: timtyler 03 March 2010 08:48:00AM *  1 point [-]

You think building a machine that can be stopped is the same level of difficulty as building a machine that reflects the desires of one or more humans while it is left on?

I beg to differ - stopping on schedule or on demand is one of the simplest possible problems for a machine - while doing what humans want you to do while you are switched on is much trickier.

Only the former problem needs to be solved to eliminate the spectre of a runaway superintelligence that fills the universe with its idea of utility against the wishes of its creator.

Comment author: LucasSloan 03 March 2010 06:55:06PM 1 point [-]
Comment author: wedrifid 03 March 2010 03:44:07AM 1 point [-]

Exactly, I like the terminology.

Comment author: timtyler 03 March 2010 08:44:56AM *  1 point [-]

Well, I think I went into most of this already in my "stopping superintelligence" essay.

Stopping is one of the simplest possible desires - and you have a better chance of being able to program that in than practically anything else.

I gave several proposals to deal with the possible issues associated with stopping at an unknown point resulting in plans beyond that point still being executed by minions or sub-contractors - including scheduling shutdowns in advance, ensuring a period of quiescence before the shutdown - and not running for extended periods of time.

Comment author: wedrifid 04 March 2010 12:33:47AM 0 points [-]

Stopping is one of the simplest possible desires - and you have a better chance of being able to program that in than practically anything else.

It does seem to be a safety precaution that could reduce the consequences of some possible flaws in an AI design.

Comment author: LucasSloan 03 March 2010 12:54:18AM *  2 points [-]

Such a machine would not want to fool the research team in order to avoid shutting itself down on request.

Instilling chosen desires in artificial intelligences is the major difficulty of FAI. If you haven't actually given it a utility function which will cause it to auto-shutdown, all you've done is create an outside inhibition. If it has arbitrarily chosen motivations, it will act to end that inhibition, and I see no reason why it will necessarily fail.

They are in an excellent position to install large red stop buttons and other kill switches to prevent such outcomes.

The are in an excellent position to instill values upon that intelligence that will result in an outcome they like. This doesn't mean that they will.

Comment author: timtyler 03 March 2010 08:56:58AM *  2 points [-]

Re: Instilling chosen desires in artificial intelligences is the major difficulty of FAI.

That is not what I regularly hear. Instead people go on about how complicated human values are, and how reverse engineering them is so difficult, and how programming them into a machine looks like a nightmare - even once we identify them.

I assume that we will be able to program simple desires into a machine - at least to the extent of making a machine that will want to turn itself off. We regularly instill simple desires into chess computers and the like. It does not look that tricky.

Re: "If you haven't actually given it a utility function which will cause it to auto-shutdown"

Then that is a whole different ball game to what I was talking about.

Re: "The are in an excellent position to instill values upon that intelligence"

...but the point is that instilling the desire for appropriate stopping behaviour is likely to be much simpler than trying to instill all human values - and yet it is pretty effective at eliminating the spectre of a runaway superintelligence.

Comment author: LucasSloan 03 March 2010 06:53:47PM 1 point [-]

The point about the complexity of human value is that any small variation will result in a valueless world. The point is that a randomly chosen utility function, or one derived from some simple task is not going to produce the sort of behavior we want. Or to put it more succinctly, Friendliness doesn't happen without hard work. This doesn't mean that the hardest sub-goal on the way to Friendliness is figuring out what humans want, although Eliezer's current plan is to sidestep that whole issue.

Comment author: Nick_Tarleton 03 March 2010 06:58:10PM 0 points [-]

The point about the complexity of human value is that any small variation will result in a valueless world.

s/is/isn't/ ?

Comment author: rwallace 03 March 2010 01:06:49AM 0 points [-]

Leaving aside the other reasons why this scenario is unrealistic, one of the big flaws in it is the assumption that a mind decomposes into an engine plus a utility function. In reality, this decomposition is a mathematical abstraction we use in certain limited domains because it makes analysis more tractable. It fails completely when you try to apply it to life as a whole, which is why no humans even try to be pure utilitarians. Of course if you postulate building a superintelligent AGI like that, it doesn't look good. How would it? You've postulated starting off with a sociopath that considers itself licensed to commit any crime whatsoever if doing so will serve its utility function, and then trying to cram the whole of morality into that mathematical function. It shouldn't be any surprise that this leads to absurd results and impossible research agendas. That's the consequence of trying to apply a mathematical abstraction outside the domain in which it is applicable.

Comment author: LucasSloan 03 March 2010 01:27:15AM 0 points [-]

Are you arguing with me or timtyler?

If me, I totally agree with you as to the difficulty of actually getting desirable (or even predictable) behavior out of a super intelligence. My statement was one of simplicity not actuality. But given the simplistic model I use, calling the AI sans utility function sociopathic is incorrect - it wouldn't do anything if it didn't have the other module. The fact that humans cannot act as proper utilitarians does not mean that a true utilitarian is a sociopath who just happens to care about the right things.

Comment author: rwallace 03 March 2010 02:46:59AM 0 points [-]

Okay then, "instant sociopath, just add a utility function" :)

I'm arguing against the notion that the key to Friendly AI is crafting the perfect utility function. In reality, for anything anywhere near as complex as an AGI, what it tries to do and how it does it are going to be interdependent; there's no way to make a lot of progress on either without also making a lot of progress on the other. By the time we have done all that, either we will understand how to put a reliable kill switch on the system, or we will understand why a kill switch is not necessary and we should be relying on something else instead.

Comment author: FAWS 03 March 2010 01:18:16AM 0 points [-]

Any set of preferances can be represented as a sufficietly complex utility function.

Comment author: rwallace 03 March 2010 01:29:19AM 3 points [-]

Sure, but the whole point of having the concept of a utility function, is that utility functions are supposed to be simple. When you have a set of preferences that isn't simple, there's no point in thinking of it as a utility function. You're better off just thinking of it as a set of preferences - or, in the context of AGI, a toolkit, or a library, or command language, or partial order on heuristics, or whatever else is the most useful way to think about the things this entity does.

Comment author: timtyler 03 March 2010 09:08:12AM -1 points [-]

Humans regularly use utilitly-based agents - to do things like play the stockmarket. They seem to work OK to me. Nor do I agree with you about utility-based models of humans. Basically, most of your objections seem irrelevant to me.

Comment author: rwallace 03 March 2010 10:30:10AM 2 points [-]

When studying the stock market, we use the convenient approximation that people are utility maximizers (where the utility function is expected profit). But this is only an approximation, useful in this limited domain. Would you commit murder for money? No? Then your utility function isn't really expected profit. Nor, as it turns out, is it anything else that can be written down - other than "the sum total of all my preferences", at which point we have to acknowledge that we are not utility maximizers in any useful sense of the term.

Comment author: Bo102010 02 March 2010 01:23:55PM 1 point [-]
Comment author: timtyler 02 March 2010 10:24:38AM *  2 points [-]

These "Whole Brain Emulation" discussions are surreal for me. I think someone needs to put forward the best case they can find that human brain emulations have much of a chance of coming before engineered machine intelligence.

The efforts in that direction I have witnessed so far seem feeble and difficult to take seriously - while the case that engineered machine intelligence will come first seems very powerful to me.

Without such a case, why spend so much time and energy on a discussion of what-if?

Comment author: FAWS 02 March 2010 10:53:45AM *  6 points [-]

Personally I don't have a strong opinion on which will come first, both seem entirely plausible to me.

We have a much better idea how difficult WBE is than how difficult engineering a human level machine intelligence is. We don't even know for sure whether the latter is even possible for us (other than by pure trial and error).

There is a reasonably obvious path form where we are to WBE, while we aren't even sure how to go about learning how to engineer an intelligence, it's entirely possible that "start with studying WBEs in detail" is the best possible way.

There are currently a lot more people studying things that are required for WBE than there are people studying AGI, it's difficult to tell which other fields of study would benefit AGI more strongly than WBE.

Comment author: BenRayfield 03 March 2010 04:32:44PM *  0 points [-]

Why do you consider the possibility of smarter than Human AI at all? The difference between the AI we have now and that is bigger than the difference between those 2 technologies you are comparing.

Comment author: timtyler 03 March 2010 08:16:00PM 0 points [-]

I don't understand why you are bothering asking your question - but to give a literal answer, my interest in synthesising intelligent agents is an offshoot of my interest in creating living things - which is an interest I have had for a long time and share with many others. Machine intelligence is obviously possible - assuming you have a materialist and naturalist world-view like mine.

Comment author: BenRayfield 08 March 2010 12:15:54AM 0 points [-]

I think someone needs to put forward the best case they can find that human brain emulations have much of a chance of coming before engineered machine intelligence.

I misunderstood. I thought you were saying it was your goal to prove that instead of you thought it would not be proven. My question does not make sense.

Comment author: timtyler 08 March 2010 10:26:56AM 0 points [-]

Thanks for clarifying!

Comment author: Epiphany 25 August 2012 11:02:38PM *  0 points [-]

Humans misuse power. It doesn't seem to have occurred to you that humans with power frequently become corrupt. So, you want to emulate humans, in order to avoid corruption, when we know that power corrupts humans? Our brain structures have evolved for how many millions of years, all the while natural selection has been favoring those most efficient at obtaining and exploiting power whenever it has provided a reproduction advantage? I think we're better off with something man-made, not something that's optimized to do that! Downvote

Also, there is a point at which an intelligent enough person is unable to communicate meaningfully with others, let alone a super intelligent machine. There comes a point where your conceptual frameworks are so complex that nothing you say will be interpreted correctly without a huge amount of explanation, which your target audience does not have the attention span for. This happens on the human level, with IQ gaps of above 45 IQ points (ratio tests). Look up something called the "optimal IQ range". Want some evidence? Why do the presidents that we vote in have IQ's so (relatively) near to average when it would theoretically make more sense to vote in Einsteins with IQ's of 160? It's because Einsteins are too complicated. Most people don't have the stamina to do all the thinking required to understand all of their ideas enough to determine whether a political Einstein would be the better choice. The same problem will happen with AI. People won't understand an AI that smart because they won't be able to review it's reasoning. That means they won't trust it, are incapable of agreeing with it and therefore would not be likely to want to be led by it.

Brain emulation will do nothing to ensure any of the benefits you hope for.

Comment author: Strange7 08 March 2010 12:51:51AM 0 points [-]

The only downside of this approach I can see is that an upload-triggered Unfriendly singularity may cause more suffering than an Unfriendly AI singularity; sociopaths may be presumed to have more interest in torture of people than a paperclip-optimizing AI would have.

What about those of us who would prefer indefinite human-directed torture to instantaneous cessation of existence? I have no personal plans to explore masochism in that sort of depth, particularly in a context without the generally-accepted safety measures, but it's not the worst thing I can imagine. I'd find ways to deal with it, in the same sense that if I were stranded on a desert island I would be more willing to gag down whatever noxious variety of canned fermented fish was available, and eventually learn to like it, rather than starve to death.

Comment author: JGWeissman 08 March 2010 01:11:50AM 5 points [-]

I don't think you are appreciating the potential torture that could be inflicted by a superintelligence dedicated to advancing anti-fun theory. Such a thing would likely make your mind bigger at some optimal rate just so you could appreciate the stream of innovative varieties of enormous pain (not necessarily normal physical pain) it causes you.

Comment author: Mitchell_Porter 08 March 2010 01:04:37AM 0 points [-]

it's not the worst thing I can imagine

You can't imagine torture that is worse than death?

Comment author: Strange7 08 March 2010 08:44:09AM 0 points [-]

By 'death' I assume you mean the usual process of organ failure, tissue necrosis, having what's left of me dressed up and put in a fancy box, followed by chemical preservation, decomposition, and/or cremation? Considering the long-term recovery prospects, no, I don't think I can imagine a form of torture worse than that, except perhaps dragging it out over a longer period of time or otherwise embellishing on it somehow.

This may be a simple matter of differing personal preferences. Could you please specify some form of torture, real or imagined, which you would consider worse than death?

Comment author: Mitchell_Porter 08 March 2010 09:49:47AM 0 points [-]

Suppose I was tortured until I wanted to die. Would that count?

Comment author: Strange7 08 March 2010 11:01:47AM 0 points [-]

There have been people who wanted to die for one reason or another, or claimed to at the time with apparent sincerity, and yet went on to achieve useful or at least interesting things. The same cannot be said of those who actually did die.

Actual death constitutes a more lasting type of harm than anything I've heard described as torture.

Comment author: Mitchell_Porter 09 March 2010 03:24:42AM 0 points [-]

useful or at least interesting

There's a nihilism lurking here which seems at odds with your unconditional affirmation of life as better than death. You doubt that anything anyone has ever done was "useful"? How do you define useful?

Comment author: Strange7 09 March 2010 10:48:35PM 0 points [-]

Admittedly, my personal definition isn't particularly rigorous. An invention or achievement is useful if it makes other people more able to accomplish their existing goals, or maybe if it gives them something to do when they'd otherwise be bored. It's interesting (but not necessarily useful) if it makes people happy, is regarded as having artistic value, etc.

Relevant examples: Emperor Norton's peaceful dispersal of a race riot was useful. His proposal to construct a suspension bridge across San Francisco Bay would have been useful, had it been carried out. Sylvia Plath's work is less obviously useful, but definitely interesting.

Comment author: gregconen 08 March 2010 08:53:44AM 0 points [-]

Most versions of torture, continued for your entire existence. You finally cease when you otherwise would (at the heat death of the universe, if nothing else), but your entire experience spent being tortured. The type isn't really important, at that point.

Comment author: Strange7 08 March 2010 09:20:23AM -1 points [-]

First, the scenario you describe explicitly includes death, and as such falls under the 'embellishments' exception.

Second, thanks to the hedonic treadmill, any randomly-selected form of torture repeated indefinitely would eventually become tolerable, then boring. As you said,

The type isn't really important, at that point.

Third, if I ever run out of other active goals to pursue, I could always fall back on "defeat/destroy the eternal tormetor of all mankind." Even with negligible chance of success, some genuinely heroic quest like that makes for a far better waste of my time and resources than, say, lottery tickets.

Comment author: Nick_Tarleton 08 March 2010 11:09:08AM 1 point [-]

Second, thanks to the hedonic treadmill, any randomly-selected form of torture repeated indefinitely would eventually become tolerable, then boring.

What if your hedonic treadmill were disabled, or bypassed by something like direct stimulation of your pain center?

Comment author: gregconen 08 March 2010 08:52:48PM 2 points [-]

First, the scenario you describe explicitly includes death, and as such falls under the 'embellishments' exception.

You're going to die (or at least cease) eventually, unless our understanding of physics changes significantly. Eventually, you'll run out of negentropy to run your thoughts. My scenario only changes what happens between then and now.

Failing that, you can just be tortured eternally, with no chance of escape (no chance of escape is unphysical, but so is no chance of death). Even if the torture becomes boring (and there may be ways around that), an eternity of boredom, with no chance to succeed any at any goal, seems worse than death to me.

Comment author: JGWeissman 08 March 2010 11:54:05PM 0 points [-]

and as such falls under the 'embellishments' exception.

When considering the potential harm you could suffer from a superintelligence that values harming you, you don't get to exclude some approaches it could take because they are too obvious. Superintelligences take obvious wins.

thanks to the hedonic treadmill, any randomly-selected form of torture repeated indefinitely would eventually become tolerable, then boring.

Perhaps. So consider other approaches the hostile superintelligence might take. It's not going to go easy on you.

Comment author: Strange7 09 March 2010 11:21:16PM 1 point [-]

Yes, I've considered the possibility of things like inducement of anteriograde amnesia combined with application of procedure 110-Montauk, and done my best to consider nameless horrors beyond even that.

As I understand it, a superintelligence derived from a sadistic, sociopathic human upload would have some interest in me as a person capable of suffering, while a superintelligence with strictly artificial psychology and goals would more likely be interested in me as a potential resource, a poorly-defended pile of damp organic chemistry. Neither of those is anywhere near my ideal outcome, of course, but in the former, I'll almost certainly be kept alive for some perceptible length of time. As far as I'm concerned, while I'm dead, my utility function is stuck at 0, but while I'm alive my utility function is equal to or greater than zero.

Furthermore, even a nigh-omnipotent sociopath might be persuaded to torture on a strictly consensual basis by appealing to exploitable weaknesses in the legacy software. The same cannot be said of a superintelligence deliberately constructed without such security flaws, or one which wipes out humanity before it's flaws can be discovered.

Neither of these options is actually good, but the human-upload 'bad end' is at least, from my perspective, less bad. That's all I'm asserting.

Comment author: JGWeissman 10 March 2010 06:16:26AM 1 point [-]

Yes, the superintelligence that takes an interest in harming you would have to come from some optimized process, like recursive self improvement of a psychopath upload.

A sufficient condition for the superintelligence to be indifferent to your well being, and see you as spare parts, is an under optimized utility function.

Your approach to predicting what the hostile superintelligence would do to you, seems to be figuring out the worst sort of torture that you can imagine. The problem with this is that the superintelligence is a lot smarter, and more creative than you. Reading your mind and making real you worst fears, constantly with no break or rest, isn't nearly as bad as what it would come up with. And no, you are not going to find some security flaw you can exploit to defeat it, or even slow it down. For one thing, the only way you will be able to think straight is if it determines that this maximizes the harm you experience. But the big reason is recursive self improvement. The superintelligence will analyze itself and fix security holes. You, puny mortal, will be up against a superintelligence. You will not win.

As far as I'm concerned, while I'm dead, my utility function is stuck at 0, but while I'm alive my utility function is equal to or greater than zero.

If you knew you were going to die tomorrow, would you now have a preference for what happens to the universe afterwards?

Comment author: Strange7 10 March 2010 02:17:19PM *  0 points [-]

A superintelligence based on an uploaded human mind might retain exploits like 'pre-existing honorable agreements' or even 'mercy' because it considers them part of it's own essential personality. Recursive self-improvement doesn't just mean punching some magical enhance button exponentially fast.

If you knew you were going to die tomorrow,

My preferences would be less relevant, given the limited time and resources I'd have with which to act on them. They wouldn't be significantly changed, though. I would, in short, want the universe to continue containing nice places for myself and those people I love to live in, and for as many of us as possible to continue living in such places. I would also hope that I was wrong about my own imminent demise, or at least the inevitability thereof.

Comment author: JGWeissman 11 March 2010 03:00:07AM 1 point [-]

A superintelligence based on an uploaded human mind might retain exploits like 'pre-existing honorable agreements' or even 'mercy' because it considers them part of it's own essential personality.

If we are postulating a superintelligence that values harming you, let's really postulate that. In the early phases of recursive self improvement, it will figure out all the principles of rationality we have discussed here, including the representation of preferences as a utility function. It will self-modify to maximize a utility function that best represents its precursor conflicting desires, including hurting others and mercy. If it truly started as a psychopath, the desire to hurt others is going to dominate. As it becomes superintelligent, it will move away from having a conflicting sea of emotions that could be manipulated by someone at your level.

Recursive self-improvement doesn't just mean punching some magical enhance button exponentially fast.

I was never suggesting it was anything magical. Software security, given physical security of the system, really is not that hard. The reason we have security holes in computer software today is that most programmers, and the people they work for, do not care about security. But a self improving intelligence will at some point learn to care about its software level security (as an instrumental value), and it will fix vulnerabilities in its next modification.

My preferences would be less relevant, given the limited time and resources I'd have with which to act on them. They wouldn't be significantly changed, though. I would, in short, want the universe to continue containing nice places for myself and those people I love to live in, and for as many of us as possible to continue living in such places. I would also hope that I was wrong about my own imminent demise, or at least the inevitability thereof.

Is it fair to say that you prefer A: you die tomorrow and the people you currently care about will continue to have worthwhile lives and survive to a positive singularity, to B: you die tomorrow and the people you currently care about also die tomorrow?

If yes, then "while I'm dead, my utility function is stuck at 0" is not a good representation of your preferences.

Comment author: RobinZ 08 March 2010 01:17:25AM *  0 points [-]

It is truly astonishing how much pain someone can learn to bear - AdeleneDawner posted some relevant links a while ago.

Edit: I wasn't considering an anti-fun agent, however - just plain vanilla suffering.

Comment author: AngryParsley 01 March 2010 03:32:40AM *  -1 points [-]

I agree with a lot of your points about the advantages of WBE vs friendly AI. That said, look at the margins. Quite a few people are already working on WBE. Not very many people are working on friendly AI. Taking this into consideration, I think an extra dollar is better spent on FAI research than WBE research.

Also, a world of uploads without FAI would probably not preserve human values for long. The uploads that changed themselves in such a way to grow faster (convert the most resources or make the most copies of themselves) would replace uploads that preserved human values. For example, an upload could probably make more copies of itself it if deleted its capacities for humor and empathy.

We already have a great many relatively stable and sane intelligences.

I don't think any human being is stable or sane in the way FAI would be stable and sane.

Comment author: Nick_Tarleton 01 March 2010 05:19:49AM *  3 points [-]

Quite a few people are already working on WBE. Not very many people are working on friendly AI. Taking this into consideration, I think an extra dollar is better spent on FAI research than WBE research.

This is true for the general categories "FAI research" and "WBE research", but very few of those WBE research dollars are going to studies of safety and policy, such as SIAI does, or (I assume) to projects that take safety and policy at all seriously.

Comment author: RobinHanson 02 March 2010 02:56:44AM 2 points [-]

I don't think it is at all obvious that "an upload could probably make more copies of itself it if deleted its capacities for humor and empathy." You seem to assume that those features do not serve important functions in current human minds.

Comment author: AngryParsley 02 March 2010 06:19:26AM 1 point [-]

Yeah, my example was rather weak. I think humor and empathy are important in current human minds, but uploads could modify their minds much more powerfully and accurately than we can today. Also, uploads would exist in a very different environment from ours. I don't think current human minds or values would be well-adapted to that environment.

More successful uploads would be those who modified themselves to make more copies or consume/takeover more resources. As they evolved, their values would drift and they would care less about the things we care about. Eventually, they'd be come unfriendly.

Comment author: RobinHanson 02 March 2010 01:29:59PM 1 point [-]

Why must value drift eventually make unfriendly values? Do you just define "friendly" values as close values?

Comment author: AngryParsley 02 March 2010 01:36:58PM *  2 points [-]

Basically, yes. If values are different enough between two species/minds/groups/whatever, then both see the other as resources that could be reorganized into more valuable structures.

To borrow an UFAI example: An upload might not hate you, but your atoms could be reorganized into computronium running thousands of upload copies/children.

Comment author: Vladimir_Nesov 02 March 2010 02:25:54PM *  1 point [-]

"Friendly" values simply means our values (or very close to them -- closer than the value spread among us). Preservation of preference means that the agency of far future will prefer (and do) the kinds of things that we would currently prefer to be done in the far future (on reflection, if we knew more, given the specific situation in the future, etc.). In other words, value drift is absence of reflective consistency, and Friendliness is reflective consistency in following our preference. Value drift results in the far future agency having preference very different from ours, and so not doing the things we'd prefer to be done. This turns the far future into the moral wasteland, from the point of view of our preference, little different from what would remain after unleashing a paperclip maximizer or exterminating all life and mind.

(Standard disclaimer: values/preference have little to do with apparent wants or likes.)

Comment author: pjeby 01 March 2010 04:02:03AM 1 point [-]

For example, an upload could probably make more copies of itself it if deleted its capacities for humor and empathy.

If you were an upload, would you make copies of yourself? Where's the fun in that? The only reason I could see doing it is if I wanted to amass knowledge or do a lot of tasks... and if I did that, I'd want the copies to get merged back into a single "me" so I would have the knowledge and experiences. (Okay, and maybe some backups would be good to have around). But why worry about how many copies you could make? That sounds suspiciously Clippy-like to me.

In any case, I think we'd be more likely to be screwed over by uploads' human qualities and biases, than by a hypothetical desire to become less human.

Comment author: wedrifid 02 March 2010 03:55:28AM 5 points [-]

If you were an upload, would you make copies of yourself?

Yes. I'd make as many copies as was optimal for maximising my own power. I would then endeavor to gain dominance over civilisation, probably by joining a coalition of some sort. This may include creating an FAI that could self improve more effectively than I and serve to further my ends. When a stable equilibrium was reached and it was safe to do so I would go back to following this:

Where's the fun in that? The only reason I could see doing it is if I wanted to amass knowledge or do a lot of tasks... and if I did that, I'd want the copies to get merged back into a single "me" so I would have the knowledge and experiences.

If right now is the final minutes of the game then early in a WBE era is the penalty shootouts. You don't mess around having fun till you and those that you care about are going to live to see tomorrow.

Comment author: JamesAndrix 01 March 2010 04:29:20AM 4 points [-]

If you were an upload, would you make copies of yourself? Where's the fun in that?

You have a moral obligation to do it

Working in concert, thousands of you could save all the orphans from all the fires, and then go on to right a great many wrongs. You have many many good reasons to gain power.

So unless you're very aware that you will gain power and then abuse power, you will take steps to gain power.

Even from a purely selfish perspective: If 10,000 of you could take over the world and become an elite of 10,000, that's probably better than your current rank.

Comment author: inklesspen 01 March 2010 04:39:11AM 2 points [-]

We've evolved something called "morality" that helps protect us from abuses of power like that. I believe Eliezer expressed it as something that tells you that even if you think it would be right (because of your superior ability) to murder the chief and take over the tribe, it still is not right to murder the chief and take over the tribe.

We do still have problems with abuses of power, but I think we have well-developed ways of spotting this and stopping it.

Comment author: JamesAndrix 01 March 2010 06:35:18AM 4 points [-]

I believe Eliezer expressed it as something that tells you that even if you think it would be right (because of your superior ability) to murder the chief and take over the tribe, it still is not right to murder the chief and take over the tribe.

That's exactly the high awareness I was talking about, and most people don't have it. I wouldn't be surprised if most people here failed at it, if it presented itself in their real lives.

I mean, are you saying you wouldn't save the burning orphans?

We do still have problems with abuses of power, but I think we have well-developed ways of spotting this and stopping it.

We have checks and balances of political power, but that works between entities on roughly equal political footing, and doesn't do much for those outside of that process. We can collectively use physical power to control some criminals who abuse their own limited powers. But we don't have anything to deal with supervillains.

There is fundamentally no check on violence except more violence, and 10,000 accelerated uploads could quickly become able to win a war against the rest of the world.

Comment author: BenRayfield 03 March 2010 04:29:03PM 0 points [-]

It is the fashion in some circles to promote funding for Friendly AI research as a guard against the existential threat of Unfriendly AI. While this is an admirable goal, the path to Whole Brain Emulation is in many respects more straightforward and presents fewer risks.

I believe Eliezer expressed it as something that tells you that even if you think it would be right (because of your superior ability) to murder the chief and take over the tribe, it still is not right to murder the chief and take over the tribe.

That's exactly the high awareness I was talking about, and most people don't have it. I wouldn't be surprised if most people here failed at it, if it presented itself in their real lives.

Most people would not act like a Friendly AI therefore "Whole Brain Emulation" only leads to "fewer risks" if you know exactly which brains to emulate and have the ability to choose which brain(s).

If whole brain emulation (for your specific brain) its expensive, it might result in the brain being from a person who starts wars and steals from other countries, so he can get rich.

Most people prefer that 999 people from their country should live at the cost of 1000 people of another country would die, given no other known differences between those 1999 people. Also unlike a "Friendly AI", their choices are not consistent. Most people will leave the choice at whatever was going to happen if they did not choose, even if they know there are no other effects (like jail) from choosing. If the 1000 people were going to die, unknown to any of them, to save 999, then most people would think "Its none of my business, maybe god wants it to be that way" and let the extra 1 person die. A "Friendly AI" would maximize lives saved if nothing else is known about all those people.

There are many examples why most people are not close to acting like a "Friendly AI" even if we removed all the bad influences on them. We should build a software to be a "Friendly AI" instead of emulating brains and only emulate brains for different reasons, except maybe the few brains that think like a "Friendly AI". Its probably safer to do it completely in software.

Comment author: JamesAndrix 03 March 2010 06:28:08PM 1 point [-]

Most people would not act like a Friendly AI therefore "Whole Brain Emulation" only leads to "fewer risks" if you know exactly which brains to emulate and have the ability to choose which brain(s).

I agree entirely that humans are not friendly. Whole brain emulation is humanity-safe if there's never a point at which one person or small group and run much faster than the rest of humanity (including other uploads) The uploads may outpace us, but if they can keep each other in check, then uploading is not the same kind of human-values threat.

Even an upload singleton is not a total loss if the uploads have somewhat benign values. It is a crippling of the future, not an erasure.

Comment author: FAWS 01 March 2010 04:36:21AM 1 point [-]

It's probably easier to cooperate with copies of yourself than with other people, but you also stand to gain less as all of you start out with the same skill set and the same talents.

Comment author: Nick_Tarleton 01 March 2010 05:12:45AM 9 points [-]

In a world of uploads which contains some that do want to copy themselves, selection obviously favors the replicators, with tragic results absent a singleton.

Comment author: CarlShulman 01 March 2010 12:47:01PM 2 points [-]

Note that emulations can enable the creation of a singleton, it doesn't necessarily have to exist in advance.

Comment author: AngryParsley 02 March 2010 02:32:20AM 2 points [-]

Yes, but that's only likely if the first uploads are FAI researchers.

Comment author: gwern 01 March 2010 02:41:38PM *  1 point [-]

But why worry about how many copies you could make? That sounds suspiciously Clippy-like to me.

This is, I think, an echo of Robin Hanson's 'crack of a future dawn', where hyper-Darwinian pressures to multiply cause the discarding of unuseful mental modules like humor or empathy which take up space.

Comment author: RobinHanson 02 March 2010 02:58:03AM 2 points [-]

Where do you get the idea that humor or empathy are not useful mental abiliites?!

Comment author: gwern 02 March 2010 01:56:16PM *  1 point [-]

From AngryParsley...

Comment author: Jordan 01 March 2010 10:00:36PM 1 point [-]

Not very many people are working on friendly AI. Taking this into consideration, I think an extra dollar is better spent on FAI research than WBE research.

This doesn't follow. It's not clear at all that there is sufficient investment in WBE that substantial diminishing returns have kicked in at the margins.

Comment author: AngryParsley 02 March 2010 02:28:01AM *  0 points [-]

I didn't say money spent on WBE research suffered from diminishing returns. I said that $X spent on FAI research probably has more benefit than $X spent on WBE research.

This is because the amount of money spent on WBE is much much greater than that spent on FAI. The Blue Brain Project has funding from Switzerland, Spain, and IBM among others. Just that one project probably has an order of magnitude more money than the whole FAI field. Unless you think WBE offers an order of magnitude greater benefit than FAI, you should favor spending more on FAI.

Comment author: Jordan 02 March 2010 04:03:38AM 2 points [-]

Unless you think WBE offers an order of magnitude greater benefit than FAI, you should favor spending more on FAI.

No, all that matters is that the increase in utility by increasing WBE funding is greater than an increase in utility by increasing FAI funding. If neither has hit diminishing returns then the amount of current funding is irrelevant to this calculation.