Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

JGWeissman comments on Hedging our Bets: The Case for Pursuing Whole Brain Emulation to Safeguard Humanity's Future - Less Wrong

11 Post author: inklesspen 01 March 2010 02:32AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (244)

You are viewing a single comment's thread. Show more comments above.

Comment author: JGWeissman 11 March 2010 03:00:07AM 1 point [-]

A superintelligence based on an uploaded human mind might retain exploits like 'pre-existing honorable agreements' or even 'mercy' because it considers them part of it's own essential personality.

If we are postulating a superintelligence that values harming you, let's really postulate that. In the early phases of recursive self improvement, it will figure out all the principles of rationality we have discussed here, including the representation of preferences as a utility function. It will self-modify to maximize a utility function that best represents its precursor conflicting desires, including hurting others and mercy. If it truly started as a psychopath, the desire to hurt others is going to dominate. As it becomes superintelligent, it will move away from having a conflicting sea of emotions that could be manipulated by someone at your level.

Recursive self-improvement doesn't just mean punching some magical enhance button exponentially fast.

I was never suggesting it was anything magical. Software security, given physical security of the system, really is not that hard. The reason we have security holes in computer software today is that most programmers, and the people they work for, do not care about security. But a self improving intelligence will at some point learn to care about its software level security (as an instrumental value), and it will fix vulnerabilities in its next modification.

My preferences would be less relevant, given the limited time and resources I'd have with which to act on them. They wouldn't be significantly changed, though. I would, in short, want the universe to continue containing nice places for myself and those people I love to live in, and for as many of us as possible to continue living in such places. I would also hope that I was wrong about my own imminent demise, or at least the inevitability thereof.

Is it fair to say that you prefer A: you die tomorrow and the people you currently care about will continue to have worthwhile lives and survive to a positive singularity, to B: you die tomorrow and the people you currently care about also die tomorrow?

If yes, then "while I'm dead, my utility function is stuck at 0" is not a good representation of your preferences.

Comment author: Strange7 11 March 2010 04:05:38AM 0 points [-]

As it becomes superintelligent, it will move away from having a conflicting sea of emotions that could be manipulated by someone at your level.

Conflicts will be resolved, yes, but preferences will remain. A fully self-consistent psychopath might still enjoy weeping more than screams, crunches more than spurts, and certain victim responses could still be mood-breaking. It wouldn't be a good life, of course, collaborating to turn myself into a better toy for a nigh-omnipotent monstrosity, but I'm still pretty sure I'd rather have that than not exist at all.

Is it fair to say that you prefer

For my preference to be meaningful, I have to be aware of the distinction. I'd certainly be happier during the last moments of my life with a stack of utilons wrapped up in the knowledge that those I love would do alright without me, but I would stop being happy about that when the parts of my brain that model future events and register satisfaction shut down for the last time and start to rot.

If cryostasis pans out, or, better yet, the positive singularity in scenario A includes reconstruction sufficient to work around the lack of it, there's some non-negligible chance that I (or something functionally indistinguishable from me) would stop being dead, in which case I pop back up to greater-than-zero utility. Shortly thereafter, I would get further positive utility as I find out about good stuff that happened while I was out.

Comment author: RobinZ 11 March 2010 04:15:42AM *  2 points [-]

For my preference to be meaningful, I have to be aware of the distinction.

You're aware of the distinction right now - would you be willing to act right now in a way which doesn't affect the world in any major way during your lifetime, but which makes a big change after you die?

Edit: It seems to me as if you noted the fact that your utility function is no longer instantiated after you die, and confused that with the question of whether anything after your death matters to you now.

Comment author: Strange7 11 March 2010 04:52:24AM *  0 points [-]

Would you be willing to act right now in a way which doesn't affect the world in any major way during your lifetime, but which makes a big change after you die?

Of course I would. Why does a difference have to be "major" before I have permission to care? A penny isn't much money, but I'll still take the time to pick one up, if I see it on the floor and can do so conveniently. A moth isn't much intelligence, or even much biomass, but if I see some poor thing thrashing, trapped in a puddle, I'll gladly mount a fingertip-based rescue mission unless I'd significantly endanger my own interests by doing so.

Anything outside the light cone of my conscious mind is none of my business. That still leaves a lot of things I might be justifiably interested in.

Comment author: RobinZ 11 March 2010 04:58:53AM *  1 point [-]

My point didn't relate to "major" - I wanted to point out that you care about what happens after you die, and therefore that your utility function is not uniformly 0 after you die. Yes, your utility function is no longer implemented by anything in the universe after you die - you aren't there to care in person - but the function you implement now has terms for times after your death - you care now.

Comment author: Strange7 11 March 2010 07:37:50AM 0 points [-]

I would agree that I care now about things which have obvious implications for what will happen later, and that I would not care, or care very differently, about otherwise-similar things that lacked equivalent implications.

Beyond that, since my utility function can neither be observed directly, nor measured in any meaningful sense when I'm not alive to act on it, this is a distinction without a difference.

Comment author: JGWeissman 11 March 2010 04:52:23AM 1 point [-]

It wouldn't be a good life, of course, collaborating to turn myself into a better toy for a nigh-omnipotent monstrosity, but I'm still pretty sure I'd rather have that than not exist at all.

Again, its preferences are not going to be manipulated by someone at your level, even ignoring your reduced effectiveness from being constantly tortured. Whatever you think you can offer as part of a deal, it can unilaterally take from you. (And really, a psychopathic torturer does not want you to simulate its favorite reaction, it wants to find the specific torture that naturally causes you to react in its favorite way. It does not care about your cooperation at all.)

For my preference to be meaningful, I have to be aware of the distinction.

You seem to be confusing your utility with your calculation of your utility function. I expect that this confusion would cause you to wirehead, given the chance. Which of the following would you choose, if you had to choose between them:

Choice C: Your loved ones are separated from you but continue to live worthwhile lives. Meanwhile, you are given induce amnesia, and false memories of your loved ones dying.

Choice D: You are placed in a simulation separated from the rest of the world, and your loved ones are all killed. You are given induced amnesia, and believe yourself to be in the real world. You do not have detailed interactions with your loved ones (they are not simulated in such detail that they can be considered alive in the simulation), but you receive regular reports that they are doing well. These reports are false, but you believe them.

If cryostasis pans out...

In the scenario I described, death is actual death, after which you cannot be brought back. It is not what current legal and medical authorities falsely believe to be that state.

You should probably read about The Least Convenient Possible World.

Comment author: Strange7 11 March 2010 06:22:21PM 0 points [-]

I've had a rather unsettling night's sleep, contemplating scenarios where I'm forced to choose between slight variations on violations of my body and mind, disconnect from reality, and loss of everyone I've ever loved. It was worth it, though, since I've come up with a less convenient version:

If choice D included, within the simulation, versions of my loved ones that were ultimately hollow, but convincing enough that I could be satisfied with them by choosing not to look too closely, and further if the VR included a society with complex, internally-consistent dynamics of a sort that are impossible in the real world but endlessly fascinating to me, and if in option C I would know that such a virtual world existed but be permanently denied access to it (in such a way that seemed consistent with the falsely-remembered death of my loved ones), that would make D quite a bit more tempting.

However, I would still chose the 'actual reality' option, because it has better long-term recovery prospects. In that situation, my loved ones aren't actually dead, so I've got some chance of reconnecting with them or benefiting by the indirect consequences of their actions; my map is broken, but I still have access to the territory, so it could eventually be repaired.

Comment author: JGWeissman 11 March 2010 07:17:49PM 1 point [-]

Ok, that is a better effort to find a less convenient world, but you still seem to be avoiding the conflict between optimizing the actual state of reality and optimizing your perception of reality.

Assume in Scenario C, you know you will never see your loved ones again, you will never realize that they are still alive.

More generally, if you come up with some reason why optimizing your expected experience of your loved ones happens to produce the same result as optimizing the actual lives of your loved ones, despite the dilemma being constructed to introduce a disconnect between these concepts, then imagine that reason does not work. Imagine the dilemma is tightened to eliminate that reason. For purposes of this thought experiment, don't worry if this requires you to occupy some epistemic state that humans can not ordinarily achieve, or strange arbitrary powers for the agents forcing you to make this decision. Because planning a reaction for this absurd scenario is not the point. The point is to figure out and compare to what extent your care about the actual state of the universe, and to what extent you care about your perceptions.

My own answer to this dilemma is options C, because then my loved ones are actually alive and well, full stop.

Comment author: Strange7 11 March 2010 08:22:51PM 0 points [-]

Assume in Scenario C, you know you will never see your loved ones again, you will never realize that they are still alive.

Fair enough. I'd still pick C, since it also includes the options of finding someone else to be with, or somehow coming to terms with living alone.

The point is to figure out and compare to what extent your care about the actual state of the universe, and to what extent you care about your perceptions.

Thank you for clarifying that.

Most of all, I want to stay alive, or if that's not possible, keep a viable breeding population of my species alive. I would be suspicious of anyone who claimed to be the result of an evolutionary process but did not value this.

If the 'survival' situation seems to be under control, my next priority is constructing predictive models. This requires sensory input and thought, preferably conscious thought. I'm not terribly picky about what sort of sensory input exactly, but more is better (so long as my ability to process it can keep up, of course).

After modeling it gets complicated. I want to be able to effect changes in my surroundings, but a hammer does me no good without the ability to predict that striking a nail will change the nail's position. If my perceptions are sufficiently disconnected from reality that the connection can never be reestablished, objective two is in an irretrievable failure state, and any higher goal is irrelevant.

That leaves survival. Neither C nor D explicitly threatens my own life, but with perma-death on the table, either of them might mean me expiring somewhere down the line. D explicitly involves my loved ones (all or at least most of whom are members of my species) being killed for arbitrary, nonrepeatable reasons, which constitutes a marginal reduction in genetic diversity without corresponding increase in fitness for any conceivable, let alone relevant, environment.

So, I suppose I would agree with you in choosing C primarily because it would leave my loved ones alive and well.

Comment author: JGWeissman 11 March 2010 09:10:43PM *  3 points [-]

Most of all, I want to stay alive, or if that's not possible, keep a viable breeding population of my species alive. I would be suspicious of anyone who claimed to be the result of an evolutionary process but did not value this.

Be careful about confusing evolution's purposes with the purposes of the product of evolution. Is mere species survival what you want, or what you predict you want, as a result of inheriting evolutions's values (which doesn't actually work that way)?

You are allowed to assign intrinsic, terminal value to your loved ones' well being, and to choose option C because it better achieves that terminal value, without having to justify it further by appeals to inclusive genetic fitness. Knowing this, do you still say you are choosing C because of a small difference in genetic diversity?

But, getting back to the reason I presented the dilemma, it seems that you do in fact have preferences over what happens after you die, and so your utility function, representing your preferences over possible futures that you would now attempt to bring about, cannot be uniformly 0 in the cases where you are dead.

Comment author: Strange7 11 March 2010 10:01:06PM 0 points [-]

I am not claiming to have inherited anything from evolution itself. The blind idiot god has no DNA of it's own, nor could it have preached to a younger, impressionable me. I decided to value the survival of my species, assigned intrinsic, terminal value to it, because it's a fountain for so much of the stuff I instinctively value.

Part of objective two is modeling my own probable responses, so an equally-accurate model of my preferences with lower Kolmogorov complexity has intrinsic value as well. Of course, I can't be totally sure that it's accurate, but that particular hasn't let me down so far, and if it did (and I survived) I would replace it with one that better fit the data.

If my species survives, there's some possibility that my utility function, or one sufficiently similar as to be practically indistinguishable, will be re-instantiated at some point. Even without resurrection, cryostasis, or some other clear continuity, enough recombinant exploration of the finite solution-space for 'members of my species' will eventually result in repeats. Admittedly, the chance is slim, which is why I overwhelmingly prefer the more direct solution of immortality through not dying.

In short, yes, I've thought this through and I'm pretty sure. Why do you find that so hard to believe?

Comment author: orthonormal 12 March 2010 01:58:05AM *  2 points [-]

The entire post above is actually a statement that you value the survival of our species instrumentally, not intrinsically. If it were an intrinsic value for you, then contemplating any future in which humanity becomes smarter and happier and eventually leaves behind the old bug-riddled bodies we started with, should fill you with indescribable horror. And in my experience, very few people feel that way, and many of those who do (i.e. Leon Kass) do so as an outgrowth of a really strong signaling process.

Comment author: Strange7 12 March 2010 04:00:38AM 1 point [-]

I don't object to biological augmentations, and I'm particularly fond of the idea of radical life-extension. Having our bodies tweaked, new features added and old bugs patched, that would be fine by me. Kidneys that don't produce stones, but otherwise meet or exceed the original spec? Sign me up!

If some sort of posthumans emerged and decided to take care of humans in a manner analogous to present-day humans taking care of chimps in zoos, that might be weird, but having someone incomprehensibly intelligent and powerful looking out for my interests would be preferable to a poke in the eye with a sharp stick.

If, on the other hand, a posthuman appears as a wheel of fire, explains that it's smarter and happier than I can possibly imagine and further that any demographic which could produce individuals psychologically equivalent to me is a waste of valuable mass, so I need to be disassembled now, that's where the indescribable horror kicks in. Under those circumstances, I would do everything I could do to keep being, or set up some possibility of coming back, and it wouldn't be enough.

You're right. Describing that value as intrinsic was an error in terminology on my part.

Comment author: JGWeissman 12 March 2010 02:35:17AM 1 point [-]

I decided to value the survival of my species, assigned intrinsic, terminal value to it, because it's a fountain for so much of the stuff I instinctively value.

Right, because if you forgot everything else that you value, you would be able to rederive that you are an agent as described in Thou Art Godshatter:

Such agents would have sex only as a means of reproduction, and wouldn't bother with sex that involved birth control. They could eat food out of an explicitly reasoned belief that food was necessary to reproduce, not because they liked the taste, and so they wouldn't eat candy if it became detrimental to survival or reproduction. Post-menopausal women would babysit grandchildren until they became sick enough to be a net drain on resources, and would then commit suicide.

Or maybe not. See, the value of a theory is not just what can explain, but what it can't explain. It is not enough that your fountain generates your values, it also must not generate any other values.

Comment author: Strange7 12 March 2010 03:35:18AM 0 points [-]

Did you miss the part where I said that the value I place on the survival of my species is secondary to my own personal survival?

I recognize that, for example, nonreproductive sex has emotional consequences and social implications. Participation in a larger social network provides me with access to resources of life-or-death importance (including, but certainly not limited to, modern medical care) that I would be unable to maintain, let alone create, on my own. Optimal participation in that social network seems to require at least one 'intimate' relationship, to which nonreproductive sex can contribute.

As for what my theory can't explain: If I ever take up alcohol use for social or recreational purposes, that would be very surprising; social is subsidiary to survival, and fun is something I have when I know what's going on. Likewise, it would be a big surprise if I ever attempt suicide. I've considered possible techniques, but only as an academic exercise, optimized to show the subject what a bad idea it is while there's still time to back out. I can imagine circumstances under which I would endanger my own health, or even life, to save others, but I wouldn't do so lightly. It would most likely be part of a calculated gambit to accept a relatively small but impressive-looking immediate risk in exchange for social capital necessary to escape larger long-term risks. The idea of deliberately distorting my own senses and/or cognition is bizarre; I can accept other people doing so, provided they don't hurt me or my interests in the process, but I wouldn't do it myself. Taking something like caffeine or Provigil for the cognitive benefits would seem downright Faustian, and I have a hard time imagining myself accepting LSD unless someone was literally holding a gun to my head. I could go on.

Comment author: Strange7 11 March 2010 06:21:34AM 0 points [-]

My first instinct is that I would take C over D, on the grounds that if I think they're dead, I'll eventually be able to move on, whereas vague but somehow persuasive reports that they're alive and well but out of my reach would constitute a slow and inescapable form of torture that I'm altogether too familiar with already. Besides, until the amnesia sets in I'd be happy for them.

Complications? Well, there's more than just warm fuzzies I get from being near these people. I've got plans, and honorable obligations which would cost me utility to violate. But, dammit, permanent separation means breaking those promises - for real and in my own mind - no matter which option I take, so that changes nothing. Further efforts to extract the intended distinction are equally fruitless.

I don't think I would wirehead, since that would de-instantiate my current utility function just as surely as death would. On the contrary, I scrupulously avoid mind-altering drugs, including painkillers, unless the alternative is incapacitation.

Think about it this way: if my utility function isn't instantiated at any given time, why should it be given special treatment over any other possible but nonexistent utility function? Should the (slightly different) utility function I had a year ago be able to dictate my actions today, beyond the degree to which it influenced my environment and ongoing personal development?

If something was hidden from me, even something big (like being trapped in a virtual world), and hidden so thoroughly that I never suspected it enough for the suspicion to alter my actions in any measurable way, I wouldn't care, because there would be no me which knew well enough to be able to care. Ideally, yes, the me that can see such hypotheticals from outside would prefer a map to match the territory, but at some point that meta-desire has to give way to practical concerns.