I attempt to figure out a way to dissolve the concepts of 'personal identity' and 'subjective expectation' down to the level of cognitive algorithms, in a way that would let one bite the bullets of the anthropic trilemma. I proceed by considering four clues which seem important: 1) the evolutionary function of personal identity, 2) a sense of personal identity being really sticky, 3) an undefined personal identity causing undefined behavior in our decision-making machinery, and 4) our decision-making machinery being more strongly grounded in our subjective expectation than in abstract models. Taken together, these seem to suggest a solution.
I ended up re-reading some of the debates about the anthropic trilemma, and it struck me odd that, aside for a few references to personal identity being an evolutionary adaptation, there seemed to be no attempt to reduce the concept to the level of cognitive algorithms. Several commenters thought that there wasn't really any problem, and Eliezer asked them to explain why the claim of there not being any problem regardless violated the intuitive rules of subjective expectation. That seemed like a very strong indication that the question needs to be dissolved, but almost none of the attempted answers seemed to do that, instead trying to solve the question via decision theory without ever addressing the core issue of subjective expectation. rwallace's I-less Eye argued - I believe correctly - that subjective anticipation isn't ontologically fundamental, but still didn't address the question of why it feels like it is.
Here's a sketch of a dissolvement. It seems relatively convincing to me, but I'm not sure how others will take it, so let's give it a shot. Even if others find it incomplete, it should at least help provide clues that point towards a better dissolvement.
Clue 1: The evolutionary function of personal identity.
Let's first consider the evolutionary function. Why have we evolved a sense of personal identity?
The first answer that always comes to everyone's mind is that our brains have evolved for the task of spreading our genes, which involves surviving at least for as long as it takes to reproduce. Simpler neural functions, like maintaining a pulse and having reflexes, obviously do fine without a concept of personal identity. But if we wish to use abstract, explicit reasoning to advance our own interests, we need some definition for exactly whose interests it is that our reasoning process is supposed to be optimizing. So evolution comes up with a fuzzy sense of personal identity, so that optimizing the interests of this identity also happens to optimize the interests of the organism in question.
That's simple enough, and this point was already made in the discussions so far. But that doesn't feel like it would resolve our confusion yet, so we need to look at the way that personal identity is actually implemented in our brains. What is the cognitive function of personal identity?
Clue 2: A sense of personal identity is really sticky.
Even people who disbelieve in personal identity don't really seem to disalieve it: for the most part, they're just as likely to be nervous about their future as anyone else. Even advanced meditators who go out trying to dissolve their personal identity seem to still retain some form of it. PyryP claims that at one point, he reached a stage in meditation where the experience of “somebody who experiences things” shattered and he could turn it entirely off, or attach it to something entirely different, such as a nearby flower vase. But then the experience of having a self began to come back: it was as if the brain was hardwired to maintain one, and to reconstruct it whenever it was broken. I asked him to comment on that for this post, and he provided the following:
It seems like my consciousness is rebuilding a new ego on top of everything, one which is not directly based on feeling as one with a physical body and memories, but which still feels like it is the thing that experiences whatever happens.
To elaborate, many things in life affect the survival and success of an organism. Even though the organism would never experience itself as being separate from the surrounding universe, in ordinary life it's still useful to have concepts relating to the property and values of the organism. But even this pragmatic approach is enough for the ego-construction machinery, and the same old bad habits start to stick on it, even though the organism doesn't have any experience of itself that would be compatible with having a persistent 'soul'.
Habits probably don't stick as strongly as they did before seeing the self as an illusion, but I'm still the same old asshole in certain respects. That might have something to do with the fact that I have no particular need to be particularly holy and clean. The ego-construction process is starting to be sufficiently strong that I've even began doubting how big of a change this has been. I don't have a clear recollection of whether I feel considerably different now than before, anymore.
I still think that the change I experienced was a positive one and I feel like I can enjoy life in a less clinging way. I don't know if I've gained any special talents regarding my outlook of life that couldn't be maintained with simple mindfullness. I however do experience this certain transpersonal flow that makes everything lighter and easier. Something that makes the basic mindfullness effortless. I may also be making this shit up. The sunk cost of having spent lots of time in meditation makes people say funny things about their achievements. There is this insisting feeling that something is different, dammit.
Anyway meditation is great fun and you can get all kinds of extremely pleasurable experiences with it. Don't read that if you're having trouble with desire getting in the way of your meditation. Ooops. Should've putten these the other way around. Yeah. I'm a dick. Deal with it. :P
Also, I know him in real life, and he doesn't really come off as behaving all that differently from anybody else.
Then there's also the fact that we seem to be almost incapable of thinking in a way that wouldn't still implicitly assume some concept of personal identity behind it. For example, I've said things like “it's convenient for me to disbelieve in personal identity at times, because then the person now isn't the same one as the person tomorrow, so I don't need to feel nervous about what happens to the person tomorrow”. But here I'm not actually disbelieving in personal identity – after all, I clearly believe that there exists some “is-a” type relation that I can use to compare myself today and myself tomorrow, and which returns a negative. If I truly disbelieved in personal identity, I wouldn't even have such a relation: asking “is the person today the same as the person tomorrow” would just return undefined.
Clue 3: Our decision-making machinery exhibits undefined behavior in the presence of an undefined personal identity.
This seems like an important thing to notice. What would it imply if I really didn't have any concept of personal identity or subjective expectation? If I asked myself whether I'd be the same person tomorrow as I was today, got an undefined back, and tried to give that as input to the systems actually driving my behavior... what would they say I should do?
Well, I guess it would depend on what those systems valued. If I was a paperclipper running on a pure utility-maximizing architecture, I guess they might say “who cares about personal identity anyway? Let's make paperclips!”.
But in fact, I'm a human, which means that a large part of the algorithms that actually drive my behavior are defined by reference to a concept of personal identity. So I'd ask them “I want to play computer games but in reality I should really study instead, which one do I actually do?”, and they'd reply “well let's see, to answer that we'd need to consider that which you expect to experience in the short term versus that which you expect to experience in the long term... AIEEEEEEE NULL POINTER EXCEPTION” and then the whole system would crash and need to be rebooted.
Except that it wouldn't, because it has been historically rather hard to reboot human brains, so they've evolved to handle problematic contingencies in other ways. So what probably would happen is that the answer would be “umm, we don't know, give us a while to work that out” and then some other system that didn't need a sense of identity to operate would take over. We'd default to some old habit, perhaps. In the meanwhile, the brain would be regenerating a concept of personal identity in order to answer the orignal question and things would go back to normal. And as far as I can tell, that's actually roughly what seems to happen.
It seems to me that there's some level on which, even if I say very firmly, "I now resolve to care only about future versions of myself who win the lottery! Only those people are defined as Eliezer Yudkowskys!", and plan only for futures where I win the lottery, then, come the next day, I wake up, look at the losing numbers, and say, "Damnit! What went wrong? I thought personal continuity was strictly subjective, and I could redefine it however I wanted!"
One possible answer could be that even if Eliezer did succeed in reprogramming his mind to think in such a weird, unnatural way, that would leave the losing copies with an undefined sense of self. After seeing that they lost, they wouldn't just think “oh, our goal system has undefined terms now, and we're not supposed to care about anything that happens to us from this point on, so we'll just go ahead and crash”. Instead, they'd think “oh, our goal system looks broken, what's the easiest way of fixing that? Let's go back to the last version that we know to have worked”. And because a lot of that would be unconscious, the thoughts that would flash through the conscious mind might just be something like “damnit, that didn't work” - or perhaps, “oh, I'm not supposed to care about myself anymore, so now what? Umm, actually, even without morality I still care about things.”
But that still doesn't seem to answer all of our questions. I mentioned that actually ever alieving this in the first place, even before the copying, would be a “weird, unnatural thing”. I expect that it would be very hard for Eliezer to declare that he was only going to care about the copies that won the lottery, and then really only care about them. In fact, it might very well be impossible. Why is that?
Clue 4: Our decision-making machinery seems grounded in subjective expectation, not abstract models of the world.
Looking at things from a purely logical point of view, there shouldn't be anything particularly difficult about redefining our wants in such a way. Maybe there's a function somewhere inside us that says “I care about my own future”, which has a pointer to whatever function it is that computes “me”. In principle, if we had full understanding of our minds and read-write access to them, we could just change the pointer to reference the part of our world-model which was about the copies which had witnessed winning the lottery. That system might crash at the point when it found out that it wasn't actually one of those copies, but until that everything should go fine, in principle.
Now we don't have full read-write access to our minds, but internalizing declarative knowledge can still cause some pretty big changes in our value systems. The lack of access doesn't seem like the big problem here. The big problem is that whenever we try to mind-hack ourselves like that, our mind complains that it still doesn't expect to only see winning the lottery. It's as if our mind didn't run on the kind of an architecture that would allow us to make the kind of a change that I just described it: even if we did have full read-write access, making such a change would require a major rewrite, not just fiddling around with a couple of pointers.
Why is subjective expectation so important? Why can't we just base our decisions on our abstract world-model? Why does our mind insist that it's subjective expectation that counts, not the things that we value based on our abstract model?
Let's look at the difference between “subjective expectation” and “abstract world-model” a bit more. In 2011, Orseau & Ring published a paper arguing that many kinds of reinforcement learning agents would, if given the opportunity, use a “delusion box” which allowed them to modify the observations they got from the environment. This way, they would always receive the kinds of signals that gave them the maximum reward. You could say, in a sense, that those kinds of agents only care about their subjective expectation – as long as they experience what they want, they don't care about the rest of the world. And it's important for them that they are the ones who get those experiences, because their utility function only cares about their own reward.
In response, Bill Hibbard published a paper where he suggested that the problem could be solved via building AIs to have “model-based utility functions”, a concept which he defined via human behavior:
Human agents often avoid self-delusion so human motivation may suggest a way of computing utilities so that agents do not choose the delusion box. We humans (and presumably other animals) compute utilities by constructing a model of our environment based on interactions, and then computing utilities based on that model. We learn to recognize objects that persist over time. We learn to recognize similarities between different objects and to divide them into classes. We learn to recognize actions of objects and interactions between objects. And we learn to recognize fixed and mutable properties of objects. We maintain a model of objects in the environment even when we are not directly observing them. We compute utility based on our internal mental model rather than directly from our observations. Our utility computation is based on specific objects that we recognize in the environment such as our own body, our mother, other family members, other friendly and unfriendly humans, animals, food, and so on. And we learn to correct for sources of delusion in our observations, such as optical illusions, impairments to our perception due to illness, and lies and errors by other humans.
So instead of just caring about our subjective experience, we use our subjective experiences to construct a model of the world. We don't want to delude ourselves, because we also care about the world around us, and our world model tells us that deluding ourselves wouldn't actually change the world.
But as we have just seen, there are many situations in which we actually do care about subjective expectation and not just items in our abstract world-model. It even seems impossible to hack our brains to only care about things which have been defined in the world-model, and to ignore subjective expectation. I can't say “well I'm going to feel low-status for the rest of my life if I just work from home, but that's just my mistaken subjective experience, in reality there are lots of people on The Internets who think I'm cool and consider me high-status”. Which is true, but also kinda irrelevant if I don't also feel respected in real life.
Which really just suggests that humans are somewhere halfway between an entity that only cares about its subjective experience, and which only cares about its world-model. Luke has pointed out that there are several competing valuation systems in the brain, some of which use abstract world models and some of which do not. But that isn't necessarily relevant, given that our subjective expectation of what's going to happen is itself a model.
A better explanation might be that historically, accurately modeling our subjective expectation has been really important. Abstract world-models based on explicit logical reasoning tend to go really easily awry and lead us to all kinds of crazy conclusions, and it might only take a single mistaken assumption. If we made all of our decisions based on that, we'd probably end up dead. So our brain has been hardwired to add it all up to normality. It's fine to juggle around all kinds of crazy theories while in far mode, but for evolution, what really matters in the end is whether you'll personally expect to experience living on until you have a good mate and lots of surviving children.
So we come with brains where all the most powerful motivational systems that really drive our behavior have been hardwired to take their inputs from the system that models our future experiences, and those systems require some concept of personal identity in order to define what “subjective experience” even means.
Summing it up
Thus, these considerations would suggest that humans have at least two systems driving our behavior. The “subjective system” evolved from something like a basic reinforcement learning architecture, and it models subjective expectation and this organism's immediate rewards, and isn't too strongly swayed by abstract theories and claims. The “objective system” is a lot more general and abstract, and evolved to correct for deficiencies in the subjective system, but doesn't influence behavior as strongly. These two systems may or may not have a clear correspondence to near/far of construal level theory, or to the three systems identified in neuroscience.
The “subjective system” requires a concept of personal identity in order to work, and since being able to easily overrule that system and switch only to the “objective system” has – evolutionarily speaking - been a really bad idea, our brain will regenerate a sense of personal identity to guide behavior whenever that sense gets lost. If we really had no sense of personal identity, the “subjective system”, which actually drives most of our behavior would be incapable of making decisions, as it makes its decisions by projecting the anticipated experience of the creature defined in our model of personal identity. “Personal identity” does not actually correspond to anything fundamental in the world, which is why some of the results of the anthropic trilemma actually feel weird to us, but it does still exist as a cognitive abstraction which our brains need in order to operate, and we can't actually not believe in some kind of personal identity – at least, not for long.
ETA: Giles commented, and summarized my notion better than I did: "I can imagine that if you design an agent by starting off with a reinforcement learner, and then bolting some model-based planning stuff on the side, then the model will necessarily need to tag one of its objects as "self". Otherwise the reinforcement part would have trouble telling the model-based part what it's supposed to be optimizing for."
Another way of summarizing this: while we could in principle have a mental architecture that didn't have a personal identity, we actually evolved from animals which didn't have the capability for abstract reasoning but were rather running on something like a simple reinforcement learning architecture. Evolution cannot completely rewrite existing systems, so our abstract reasoning system got sort of hacked together on top of that earlier system, and that earlier system required some kind of a personal identity in order to work. And because the abstract reasoning system might end up reaching all kinds of bizarre and incorrect results pretty easily, we've generally evolved in a way that keeps that earlier system basically in charge most of the time, because it's less likely to do something stupid.
I can imagine that if you design an agent by starting off with a reinforcement learner, and then bolting some model-based planning stuff on the side, then the model will necessarily need to tag one of its objects as "self". Otherwise the reinforcement part would have trouble telling the model-based part what it's supposed to be optimizing for.
It seems to me like this would be needed even if there was only the model-based part: if the system has actuators, then these need to be associated with some actuators in the 3rd-person model; if the system has sensors, then these need to be associated with sensors in the 3rd-person model. Once you know every physical fact about the universe, you still need to know "which bit is you" on top of that, if you are an agent.