Patternist friendly AI risk

bokov

LESSWRONG
LW

Patternist friendly AI risk — LessWrong

1 Patternist friendly AI risk

by bokov

12th Sep 2013

2 min read

1

It seems to me that most AI researchers on this site are patternists in the sense of believing that the anti-zombie principle necessarily implies:

1. That it will ever become possible *in practice* to create uploads or sims that are close enough to our physical instantiations that their utility to us would be interchangeable with that of our physical instantiations.

2. That we know (or will know) enough about the brain to know when this threshold is reached.

But, like any rationalists extrapolating from unknown unknowns... or heck, extrapolating from anything... we must admit that one or both of the above statements could be wrong without also making friendly AI impossible. What would be the consequences of such error?

I submit that one such consequence could be an FAI that is also wrong on these issues but not only do we fail to check for such a failure mode, it actually looks to us like what we would expect the right answer to look because we are making the same error.

If simulation/uploading really does preserve what we value about our lives then the safest course of action is to encourage as many people to upload as possible. It would also imply that efforts to solve the problem of mortality by physical means will at best be given an even lower priority than they are now, or at worst cease altogether because they would seem to be a waste of resources.

Result: people continue to die and nobody including the AI notices, except now they have no hope of reprieve because they think the problem is already solved.

Pessimistic Result: uploads are so widespread that humanity quietly goes extinct, cheering themselves onward the whole time

Really Pessimistic Result: what replaces humanity are zombies, not in the qualia sense but in the real sense that there is some relevant chemical/physical process that is not being simulated because we didn't realize it was relevant or hadn't noticed it in the first place.

Possible Safeguards:

* Insist on quantum level accuracy (yeah right)

* Take seriously the general scenario of your FAI going wrong because you are wrong in the same way and fail to notice the problem.

* Be as cautious about destructive uploads as you would be about, say, molecular nanotech.

* Make sure you knowledge of neuroscience is at least as good as you knowledge of computer science and decision theory before you advocate digital immortality as anything more than an intriguing idea that might not turn out to be impossible.

Personal Blog

1

New Comment

57 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:53 PM

[-]AlexMennen12y300

This sounds like just a special case of the principle that Friendly AI should believe what is true and want what we want, rather than believe what we believe and want what we profess to want.

[-]bokov12y20

Most of the specific methods by which a UFAI could actually destroy us could also be employed by unfriendly humans. Adding an AI to the scenario presumably makes it worse mainly by amplifying the speed with which the scenario plays out and adding the unpredictability of an alien mindset.

[-]AlexMennen12y20

Most of the specific methods by which a UFAI could actually destroy us could also be employed by unfriendly humans.

Disagree. There are probably many strategies that a merely human-level intelligence cannot carry out or even think of.

[-]bokov12y00

Oops, good point. I should have said

"Most of the specific methods we've thought of by which a UFAI could actually destroy us could also be employed by unfriendly humans."

[-][anonymous]12y20

Adding an AI to the scenario presumably makes it worse mainly by amplifying the speed with which the scenario plays out and adding the unpredictability of an alien mindset.

I think it just confused the question. It is unclear whether the OP is a FAI discussion or a patternism discussion, and now there's people talking about both.

[-]MrMind12y00

Upvoted for being the maximally concise explanation of what FAI is.

[-]wedrifid12y50

Upvoted for being the maximally concise explanation of what FAI is.

It is too concise to be a description of what an FAI is. It doesn't seem to do anything except believe.

[-][anonymous]12y50

Friendly AI should believe what is true and do what we want, rather than believe what we believe and do what we profess to want.

Fixed to add action. The new one is even shorter.

philosophical golf

[-]wedrifid12y00

Fixed to add action. The new one is even shorter.

I like it!

[-]Dentin12y50

Clarification issue - in item 1:

That it will ever become possible in practice to create uploads or sims that are close enough to our physical instantiations that their utility to us would be interchangeable with that of our physical instantiations.

Recommend remove the word 'ever':

That it will become possible in practice to create uploads or sims that are close enough to our physical instantiations that their utility to us would be interchangeable with that of our physical instantiations.

[-][anonymous]12y50

First of all, any FAI worthy of the name can realize this risk and plan around it. Unsure whether operation X(Y) preserves the value of system Y? Don't do anything rash, gather as much information as possible and be appropriately risk averse.

Second, if we create an upload, it will be pretty obvious whether it's a person or not by the time we are considering going full matrix-mode. We'll talk to it, have it participate in our society, see if it can be distinguished in important ways from other people, etc before we are anywhere near comfortable enough with that existence to gamble the entire future on it being worthwhile. We will be comfortable enough, I think, that a mistake on that is about as likely as any other extreme tail-risk (eg physics disasters).

Third, if it is possible that there is some magical essence that contains all the value of a person, and yet is undetectable by a very cautious barrage of tests including a full-on field test, and unknown to us right now, why do you think you have that magical essence? How are you distinguishable from a crappy chemical simulation of a person?

[-]bokov12y30

How are you distinguishable from a crappy chemical simulation of a person?

Maybe I am a crappy simulation of a person. It still doesn't follow that I should be indifferent to being replaced with an even crappier simulation.

[-][anonymous]12y50

What if it's a better simulation, though? What if we do some neuroscience to characterize how the whole system works, and some philosophy to characterize some of what is valuable about this thing built from neurons, and realize "Hey, this whole dependence on blood sugar levels, vitamin levels, oxygen availability, etc is kind of a raw deal. I don't want my very nature to be subject to such stupid things as what I did or did not eat this morning."

Also, the OP did not merely say "I'm not indifferent", it said "a person running on a different computational substrate might have no value, even though they are indistinguishable for all practical purposes, and an FAI thinks it's OK." At that level of conservatism, we might as well not do anything. What if the people we shoot into space have no value once they leave earth's gravity well? What if only that one tribe in the amazon that's never interacted with technology are actually people? What if the humane, rational, superintelligent best guess of the thing to do is X, but that's actually wrong and we should do Y?

These problems have very low probability, and we have much more important problems to worry about, like "what if we build what we think is an FAI, but it can't do philosophy and we programmed in a bunch of stupid philosophical assumptions", which is the more serious problem you alluded to in OP. This problem was also discussed by MIRI people in 2004 in CEV, and by Wei Dai with his "Philosophical AI" ideas. It could use more discussion.

I guess I got confused by you mixing in the FAI stuff with the patternism stuff. Are you more interested in whether patternism is true, or whether and FAI would get the right answer on that question?

[-]bokov12y00

What if it's a better simulation, though?

What if there's some guy next door who shares a lot of my personality and background but is smarter, funnier, saner, healthier and more hard working than I am. Maybe to you we are interchangeable, and if I die you'll say "Oh well, we still have bokov-prime, they're equivalent". But it turns out that I'm not okay with that arrangement. His existence would not cause me to become any less concerned about protecting my own.

Also, the OP did not merely say "I'm not indifferent", it said "a person running on a different computational substrate might have no value

I didn't say no value. I said, less value to me than myself.

even though they are indistinguishable for all practical purposes, and an FAI thinks it's OK

No. More like "even though they are indistinguishable by measurements possible at the time of the upload/copy except if one could somehow directly experience both mental states".

I guess I got confused by you mixing in the FAI stuff with the patternism stuff. Are you more interested in whether patternism is true, or whether and FAI would get the right answer on that question?

I'm interested in FAI not ending up with values antagonistic to my own. The value most at risk appears to be continuity. Therefore, I'm engaging FAI people on that issue in hopes that they will convince me, I convince them, or we discover that there are enough unknown unknowns that we should plan for the possibility that both or either points of view could be wrong and treat as dangerous proposals to solve human problems via uploading in the absence of these unknowns being filled in.

The approach I propose is not doing nothing, nor rejecting uploading. This conversation has helped me figure out what safeguard I do advocate: rejecting destructive uploading and placing a priority on developing brain-machine interfaces, so we aren't operating blind on whether we have achieved equivalence or not.

[-][anonymous]12y40

I'm interested in FAI not ending up with values antagonistic to my own. The value most at risk appears to be continuity. Therefore, I'm engaging FAI people on that issue in hopes that they will convince me, I convince them, or we discover that there are enough unknown unknowns that we should plan for the possibility that both or either points of view could be wrong and treat as dangerous proposals to solve human problems via uploading in the absence of these unknowns being filled in.

Ok, but the current state of the debate on FAI is already that we don't trust human philosophers, and we need to plan for the possibility that all our assumptions are wrong, and build capability to deal with that into the FAI.

What we decide on patternism today has no relevance to what happens post-FAI, because everyone seriously working on it realizes that it would be stupid for the FAI not to be able to revise everything to the correct position, or discover the truth itself if we didn't bother. So the only purpose of these philosophical discussions is either for our own entertainment, or for making decisions before FAI. So the FAI thing doesn't actually come into it at all.

rejecting destructive uploading and placing a priority on developing brain-machine interfaces, so we aren't operating blind on whether we have achieved equivalence or not.

This is very sensible, even if you were a die-hard patternist. In that way, even patternism probably doesn't come into the point you are making, which is that we should be really, really cautious with irreversible technological change, especially of the transhuman variety, because we can't recover from it and the stakes are so high.

I, for one, think doing any transhuman stuff, and even a lot of mundane stuff like universal networking and computation, without adult supervision (FAI) is a really bad idea. We need to get FAI right as fast as possible so that we flawed humans don't even have to make these decisions.

[-]bokov12y-10

What if we do some neuroscience to characterize how the whole system works, and some philosophy to characterize some of what is valuable about this thing built from neurons

Hidden in the phrases "do some neuroscience" and "some philosophy" are hard problems. What reason do you have for believing that either of them is an easier problem than creating a brain simulation that third parties will find convincing?

[-]A1987dM12y00

any FAI worthy of the name

It's not like it's that easy to know whether a FAI is worthy of the name before running it, though.

[-][anonymous]12y30

It's not like it's that easy to know whether a FAI is worthy of the name before running it, though.

That's a very different problem than OP, though. Yes FAI is super difficult, but the OP is implying that even if we win at FAI, this could still be a problem:

we must admit that one or both of the above statements could be wrong without also making friendly AI impossible.

[+]Baughn12y-90

[-]A1987dM12y40

quantum level accuracy

What do you mean by that? If you mean what I suspect you mean, you can't do that.

[-]itaibn012y00

The no-cloning theorem doesn't rule out destructive upload, i.e., uploading while destroying the original human.

[-]bokov12y-20

Thanks for the link.

Well, then, so much for the following:

Perfect simulations
...and therefore ever being sure that a different instantiation of me has the same utility to me as myself.
...and therefore having to care about what some future diety might or might not claim to do to/for 10^^10 "copies" of me (not that I did anyway, because while inferential distance should not disqualify a conclusion, it damn well should discount it)
The proposition that instantiations of me occasionally appear out of nowhere at random points in the universe and usually die horribly soon after (which I suspected had a hidden error somewhere from the moment I first heard it).

Unfortunately, this also probably means I should write into my will stipulations against destructive uploading and pray that the overconfident patternists at whose mercy I'll probably be will find it cost-effective to respect my wishes.

[-][anonymous]12y70

Well, then, so much for the following:

No. We ruled out perfect quantum-level copies, which does not rule out near-copies being morally relevant.

For example, you plus one second is completely different on the quantum level due to uninteresting things like thermal noise, but they are just as important as you are. Likewise, a molecule-level copy of you would be pretty much the same (at least physically) as movement through time (to within the tolerances we normally deal with), as would, I suspect, cell-level plus a few characterization parameters like "connection strength" and "activation level", and I bet you could even drop activation level (as happens in sleep) and many low-level details like exact cell arrangement.

Basically, humans are not exactly isolated 0.01 Kelvin quantum computers (and even they have decoherence time much less than a second), so if you want exact continuity, you already don't have it. You have to generalize your moral intuitions to approximate continuity just to accept normal things like existing at 310 K, sneezes destroying thousands of brain cells, sleep rebooting everything, and random chemicals influencing your cognition. Many people who do so decide that the details of the computational substrate don't really matter; it's the high level behaviors that matter. Hence patternism.

I'd be interested in whether you still disagree and why.

[-]bokov12y30

I'd be interested in whether you still disagree and why.

I'm still trying to figure out what does and does not matter to my concept of continuity and why.

Let me ask you this: which person from five seconds in the future do you care more about protecting: yourself or a random person similar to you?

Is there some theoretical threshold of similarity between yourself and another person beyond which you will no longer be sure which location will be occupied by the brain-state your current brain-state will evolve to in the next second?

[-][anonymous]12y30

Let me ask you this: which person from five seconds in the future do you care more about protecting: yourself or a random person similar to you?

All else, equal, I think I prefer my own future self. It depends how similar, though. If this "random" person were in fact more similar to myself than I currently am, I'd prefer them. As for what I mean by "more similar to myself than I currently am", I'm just leaving open the possibility that there are things that make my default future self not the optimum in terms of continuity. For example, what if this other person remembers things that I have forgotten?

The way I think of this, there isn't really a fundamental concept of continuity, and the question really is "What kind of processes do I want the universe to be turned into?" There's no fundamental concept of "me", it's just that I prefer the future to contain people who have property X, and Y, as opposed to property W and Z.

Likewise, there is no fundamental concept of "person" apart from the structure of our preferences over what gets done with this computational substrate we have.

Is there some theoretical threshold of similarity between yourself and another person beyond which you will no longer be sure which location will be occupied by the brain-state your current brain-state will evolve to in the next second?

I'm not sure that makes sense. I certainly have intuitions about "continuity", but they might be broken in cases like that. For a question like that, I think talking about continuity has to be replaced with the more general question of what I want the future to look like.

(A future in which there are two of me? Yes please. As for which is the "real me", who cares?)

[-]bokov12y20

Just to be clear: I'm not talking about qualia. Don't mistake what I'm saying for qualia or silent observers or such. I'm talking three broad classes of physical problems with uploads:

Real but as yet undiscovered biochemical or biophysical phenomena whose disruption you would notice but other people would not. In other words, the fidelity of an upload sufficient to convince other people will likely be lower than the fidelity sufficient to convince the original. So if the original must get disassembled in order to create the upload, this is automatically a red flag.
Perhaps an atom-perfect copy of you is you. But I don't see how it follows that an atom-perfect simulation of you must also be you. But it's a bold assertion that anything close to that level of accuracy will even be feasible in the first place, especially on a species-wide scale.
Even if we ignore substrate and assume that an atom-perfect simulation of you really is you at the instant it's created, your paths diverge from that point forward: it becomes a happy sparkly transcendent immortal being and you are still the poor bastard trapped inside a meat-puppet doomed to die from cancer if heart disease doesn't get you first. You can spawn off as many uploads as you want, with the exact same result. Nor does it help if it's a destructive upload-- it just changes the length of the delay from when you spawn your last upload and when you die from several decades to 0 seconds.

The above is not a dualism. It's the opposite view that smacks of sympathetic magic: that your copy is somehow linked to you even after the copying process is complete and that because it's immortal you no longer have to worry about dying yourself.

Now, if uploading is accompanied by some sort of continuous brain-sync technology, that may be a different matter.

[-]RolfAndreassen12y10

On the subject of patterns, there's an old joke: Suppose you replace a human's neurons, one by one, with techno-doodads that have precisely the same input-output patterns. As you replace, you ask the subject, about once a minute, "Do you still have qualia?" Now, what do you do if he starts saying "No"?

[-]Eliezer Yudkowsky12y30

Check their stream of consciousness to see if they're trolling. If they're not, YOU TURNED INTO A CAT!!

[-][anonymous]12y00

Your claim seems to require more knowledge about biology than most people actually have. Suppose you have an upload saying "I'm conscious". You start optimizing the program, step by little step, until you get a tiny program that just outputs the string "I'm conscious" without actually being conscious. How do we tell at what point the program lost consciousness? And if we can't tell, then why are we sure that the process of scanning and uploading a biological brain doesn't have similar problems?

[This comment is no longer endorsed by its author]Reply

[-]RolfAndreassen12y-10

That's what makes it a joke.

[-]MugaSofer12y10

I think you may need to repeat the fact that this is a joke at the bottom, since you already have two replies that didn't get it ...

[-]wedrifid12y10

I think you may need to repeat the fact that this is a joke at the bottom, since you already have two replies that didn't get it ...

The punchline seemed too much like what people actually say for it to be sufficiently absurd to qualify as a joke. This related anecdote explains why it would seem funny to Rolf.

[-]FeepingCreature12y10

precisely the same input-output patterns

so by definition, he would have said "No" with neurons as well. Slap him for scaring you.

[-]Richard_Kennaway12y-10

Not by definition, but by consequence of the materialist belief, that the neurons are everything there is to a mind. There may be excellent reasons for that belief, but the experiment, if carried out, would be an empirical test of it, not a joke.

Hence Eliezer's response.

[-]FeepingCreature12y20

Weeell, if there was some supernatural influence wouldn't it need to show itself, somehow, in neuron input/output patterns?

[-]Richard_Kennaway12y30

You'd have to ask someone who believes in such a supernatural influence, where it intervenes. You'd also have to ask the materialist how they determined that they were replacing neurons with physically equivalent devices. It's difficult to determine the input-output behaviour of a single component when it's embedded in a complex machine whose overall operation one knows very little about, and cutting it out to analyse it might destroy its ablity to respond to the supernatural forces.

As context to these remarks, I've read some of the discussion between Rolf Andreassen and John C Wright on the latter's blog, and whatever I might think of supernatural stuff, I must agree with Wright that Rolf is persistently smuggling his materialist assumptions into his arguments and then pulling them out as the conclusion.

[-]A1987dM12y10

By what definition outputting "No" when input "Do you still have qualia?" is not an input-output pattern?

[-]RolfAndreassen12y-10

That's the joke, yes.

[-]Luke_A_Somers12y10

In the pessimistic and really pessimistic cases, do you mean that there could be some effect that is so subtle that we can't notice it, neither from BEING an upload, nor speaking or interacting with them for extended periods, but it would still be enough that we wouldn't count them as preserving the value in our lives - that we wouldn't even consider them our metaphorical children.

I can definitely see the first pass uploads having a problem like that, but the idea that no one would notice is hard to accept.

I'm not even sure what the distinction between the first and second cases is besides scale of uploading - is there one?

[-]bokov12y00

I'm not even sure what the distinction between the first and second cases is besides scale of uploading - is there one

It's scale, just in the first case we're basically stuck forever at the status quo until someone makes a neurological discovery that reveals our mistake versus frank extinction leaving no living humans to observe. I suppose that if the entire planet isn't uploaded, the mistake might still be discoverable from animal research. There is no guarantee that this missing ingredient will be the only one.

[-]bokov12y-10

There is information you have about yourself that other people do not. Therefore you are the only one who would detect inconsistencies in that set of information. Maybe. If your original copy is dissassembled, there is no way to verify at that level.

Non destructive uploads are definitely a good first step at a safeguard. Finding a way to interface a biological mind with its own simulation so it can experience the other from the inside (whatever that means, if anything at all) might be a good second step. I can't think of any foolproof safeguard other than a quark level simulation that you somehow have a way of verifying.

Even then I'm still not sure if moving it to a different substrate creates a perfect copy of the territory or just a perfect map of the territorry.

[-]FeepingCreature12y10

(whatever that means, if anything at all)

Well, if consciousness is real, it should be a distinct part of our brains. It shouuld be possible to wire that part up to the upload's brain directly, disable it, switch it back and forth. (Create true zombies, in other words) Maybe interacting with a consciousness-less upload will convince people that it's really conscious when that part is re-enabled.

Actually, are there any cases of brain damage where consciousness has become permanently or intermittently disabled? (Sleep walking aside) Also, has anybody tried to compare MRIs of sleepwalkers and awake people performing the same tasks?

[-]bokov12y10

Actually, are there any cases of brain damage where consciousness has become permanently or intermittently disabled? (Sleep walking aside) Also, has anybody tried to compare MRIs of sleepwalkers and awake people performing the same tasks?

Yes, quite a few.

[-]Adele Lopez12y20

I'm not convinced that people suffering from a monothematic delusion or depersonalization disorder are unconscious, or have a damaged consciousness specifically.

[-]bokov12y20

I'm not saying it's exactly like that, just that this is what a "missing consciousness" might resemble. Maybe some subtler set of symptoms that are harder to articulate to others and less disruptive to outwardly normal behavior.

So, non-philosophical zombies: they could be detectable by physical means but we have not yet developed such means. Yes, this sounds overly cautious, but if being alive is our most valuable asset, perhaps infinitely valuable, it might be the one thing that we should be overly cautious about.

[-]Adele Lopez12y00

Well okay, but your comment was very misleading as written.

[-]Adele Lopez12y00

I have since experienced a dissociative episode (similar to what is described on the depersonalization disorder page), and I can confirm that I was conscious throughout.

[-]Luke_A_Somers12y00

There is information you have about yourself that other people do not.

Yes. I was quite careful about the wording there. I'm not saying that whoever comes out of the uploading procedure will necessarily be you. I'm saying that if it doesn't act like you, they're going to notice. And if it doesn't feel on the inside like it used to feel on the inside, then unless the capacity to notice this feeling on the inside was also erased, the upload is going to notice.

So, for this scenario:

Something needs to be missing on the inside - something important to human value.
This thing needs to be non-noticeable from the outside under any degree of examination
This thing needs to be non-noticeable from the inside under any degree of examination

this is a very heavy conjunction.

[-]bokov12y30

You know... I think you're right that the uploads themselves at least would notice.

I think the risk of non-philosophical zombies is a smaller one than the other risk I mentioned having to do with continuity-- a copy of you (no matter how accurate) no longer being you after the two of you diverge, since you no longer have access to each other's minds.

My speculating that your copy might also be a zombie only muddies the waters.

[-]FeepingCreature12y00

I don't have access to the mind of me one second in the future or past either, so I don't put much stock in continuity as something that I stand to lose.

[-]bokov12y20

You have access to your future mind in the sense that it is an evolution of your current mind. Your copy's future mind is an evolution of your copy's current mind, not yours.

Perhaps this tight causal link is what makes me care more about the mes that will branch off in the future more than I care about the past me of which I am a branch. Perhaps I would see a copy of myself as equivalent to me if we had at least sporadic direct access to each other's mind states. So my skepticism toward immortality-through-backup-copies is not unconditional.

I don't put much stock in continuity as something that I stand to lose.

You might not put much stock into that, and you might also be rationalizing away your basic will to live. What do you stand to lose?

[-]FeepingCreature12y00

You have access to your future mind in the sense that it is an evolution of your current mind. Your copy's future mind is an evolution of your copy's current mind, not yours.

My copy's future mind is an evolution of me pre-copy's current mind, and correlates overwhelmingly for a fairly long time after the copy was made. That means that making the copy is good for all me's pre-copy and to some (large) degree even post-copy. I'd certainly be more willing to take risks if I had a backup. After all, what do I stand to lose? A few days of memory?

(I don't see any situation, basically, crippling computing scarcity aside, in which I would be better off not uploading.)

To clarify: I don't put stock into single-instance continuity. I want the future to have me's in it, I don't particularly care what their substrate is, or if they're second-to-second continuous.

[-]cousin_it12y20

For what it's worth, I agreed with your position for years, but changed my opinion after Wei Dai suggested a new argument to me.

Suppose you have an upload saying "I'm conscious". You start optimizing the program, step by little step, until you get a tiny program that just outputs the string "I'm conscious" without actually being conscious. How can we tell at which point the program lost consciousness? And if we can't tell, then why are we sure that the process of scanning and uploading a biological brain doesn't have similar problems? Especially if the uploading is done by an AI who might want to fit more people into the universe.

[-]Luke_A_Somers12y10

That answers problems 1 and 3, but not problem 2.

Moreover, it notices that the slope is slippery at the very very bottom after all introspective capability has been lost, but no argument is provided about the top, AND you're applying it to a single-step procedure with easy before/after comparison, so we can't get a boiled-frog effect.

Overeager optimization is a serious concern once digitized, for sure.

[-]cousin_it12y00

Sorry, what before/after comparison are you thinking of?

[-]Luke_A_Somers12y00

The transition is the one in the OP - the digitization process itself, going from meat to, well, not-meat.

You only need to do that once.

The comparison would be by behavior - do they think differently, beyond what you'd expect from differing circumstances? Do they still seem human enough? Unless it is all very sudden, there will be plenty of time to notice inhumanity in the uploads.

Goes doubly if they can be placed in convincing androids, so the circumstances differ as little as possible.

[-][anonymous]12y-20

There is information you have about yourself that other people do not.

Tell them.

[This comment is no longer endorsed by its author]Reply

Moderation Log