I attempt to figure out a way to dissolve the concepts of 'personal identity' and 'subjective expectation' down to the level of cognitive algorithms, in a way that would let one bite the bullets of the anthropic trilemma. I proceed by considering four clues which seem important: 1) the evolutionary function of personal identity, 2) a sense of personal identity being really sticky, 3) an undefined personal identity causing undefined behavior in our decision-making machinery, and 4) our decision-making machinery being more strongly grounded in our subjective expectation than in abstract models. Taken together, these seem to suggest a solution.

I ended up re-reading some of the debates about the anthropic trilemma, and it struck me odd that, aside for a few references to personal identity being an evolutionary adaptation, there seemed to be no attempt to reduce the concept to the level of cognitive algorithms. Several commenters thought that there wasn't really any problem, and Eliezer asked them to explain why the claim of there not being any problem regardless violated the intuitive rules of subjective expectation. That seemed like a very strong indication that the question needs to be dissolved, but almost none of the attempted answers seemed to do that, instead trying to solve the question via decision theory without ever addressing the core issue of subjective expectation. rwallace's I-less Eye argued - I believe correctly - that subjective anticipation isn't ontologically fundamental, but still didn't address the question of why it feels like it is.

Here's a sketch of a dissolvement. It seems relatively convincing to me, but I'm not sure how others will take it, so let's give it a shot. Even if others find it incomplete, it should at least help provide clues that point towards a better dissolvement.

Clue 1: The evolutionary function of personal identity.

Let's first consider the evolutionary function. Why have we evolved a sense of personal identity?

The first answer that always comes to everyone's mind is that our brains have evolved for the task of spreading our genes, which involves surviving at least for as long as it takes to reproduce. Simpler neural functions, like maintaining a pulse and having reflexes, obviously do fine without a concept of personal identity. But if we wish to use abstract, explicit reasoning to advance our own interests, we need some definition for exactly whose interests it is that our reasoning process is supposed to be optimizing. So evolution comes up with a fuzzy sense of personal identity, so that optimizing the interests of this identity also happens to optimize the interests of the organism in question.

That's simple enough, and this point was already made in the discussions so far. But that doesn't feel like it would resolve our confusion yet, so we need to look at the way that personal identity is actually implemented in our brains. What is the cognitive function of personal identity?

Clue 2: A sense of personal identity is really sticky.

Even people who disbelieve in personal identity don't really seem to disalieve it: for the most part, they're just as likely to be nervous about their future as anyone else. Even advanced meditators who go out trying to dissolve their personal identity seem to still retain some form of it. PyryP claims that at one point, he reached a stage in meditation where the experience of “somebody who experiences things” shattered and he could turn it entirely off, or attach it to something entirely different, such as a nearby flower vase. But then the experience of having a self began to come back: it was as if the brain was hardwired to maintain one, and to reconstruct it whenever it was broken. I asked him to comment on that for this post, and he provided the following:

It seems like my consciousness is rebuilding a new ego on top of everything, one which is not directly based on feeling as one with a physical body and memories, but which still feels like it is the thing that experiences whatever happens.

To elaborate, many things in life affect the survival and success of an organism. Even though the organism would never experience itself as being separate from the surrounding universe, in ordinary life it's still useful to have concepts relating to the property and values of the organism. But even this pragmatic approach is enough for the ego-construction machinery, and the same old bad habits start to stick on it, even though the organism doesn't have any experience of itself that would be compatible with having a persistent 'soul'.

Habits probably don't stick as strongly as they did before seeing the self as an illusion, but I'm still the same old asshole in certain respects. That might have something to do with the fact that I have no particular need to be particularly holy and clean. The ego-construction process is starting to be sufficiently strong that I've even began doubting how big of a change this has been. I don't have a clear recollection of whether I feel considerably different now than before, anymore.

I still think that the change I experienced was a positive one and I feel like I can enjoy life in a less clinging way. I don't know if I've gained any special talents regarding my outlook of life that couldn't be maintained with simple mindfullness. I however do experience this certain transpersonal flow that makes everything lighter and easier. Something that makes the basic mindfullness effortless. I may also be making this shit up. The sunk cost of having spent lots of time in meditation makes people say funny things about their achievements. There is this insisting feeling that something is different, dammit.

Anyway meditation is great fun and you can get all kinds of extremely pleasurable experiences with it. Don't read that if you're having trouble with desire getting in the way of your meditation. Ooops. Should've putten these the other way around. Yeah. I'm a dick. Deal with it. :P

Also, I know him in real life, and he doesn't really come off as behaving all that differently from anybody else.

Then there's also the fact that we seem to be almost incapable of thinking in a way that wouldn't still implicitly assume some concept of personal identity behind it. For example, I've said things like “it's convenient for me to disbelieve in personal identity at times, because then the person now isn't the same one as the person tomorrow, so I don't need to feel nervous about what happens to the person tomorrow”. But here I'm not actually disbelieving in personal identity – after all, I clearly believe that there exists some “is-a” type relation that I can use to compare myself today and myself tomorrow, and which returns a negative. If I truly disbelieved in personal identity, I wouldn't even have such a relation: asking “is the person today the same as the person tomorrow” would just return undefined.

Clue 3: Our decision-making machinery exhibits undefined behavior in the presence of an undefined personal identity.

This seems like an important thing to notice. What would it imply if I really didn't have any concept of personal identity or subjective expectation? If I asked myself whether I'd be the same person tomorrow as I was today, got an undefined back, and tried to give that as input to the systems actually driving my behavior... what would they say I should do?

Well, I guess it would depend on what those systems valued. If I was a paperclipper running on a pure utility-maximizing architecture, I guess they might say “who cares about personal identity anyway? Let's make paperclips!”.

But in fact, I'm a human, which means that a large part of the algorithms that actually drive my behavior are defined by reference to a concept of personal identity. So I'd ask them “I want to play computer games but in reality I should really study instead, which one do I actually do?”, and they'd reply “well let's see, to answer that we'd need to consider that which you expect to experience in the short term versus that which you expect to experience in the long term... AIEEEEEEE NULL POINTER EXCEPTION” and then the whole system would crash and need to be rebooted.

Except that it wouldn't, because it has been historically rather hard to reboot human brains, so they've evolved to handle problematic contingencies in other ways. So what probably would happen is that the answer would be “umm, we don't know, give us a while to work that out” and then some other system that didn't need a sense of identity to operate would take over. We'd default to some old habit, perhaps. In the meanwhile, the brain would be regenerating a concept of personal identity in order to answer the orignal question and things would go back to normal. And as far as I can tell, that's actually roughly what seems to happen.

Eliezer asked:

It seems to me that there's some level on which, even if I say very firmly, "I now resolve to care only about future versions of myself who win the lottery! Only those people are defined as Eliezer Yudkowskys!", and plan only for futures where I win the lottery, then, come the next day, I wake up, look at the losing numbers, and say, "Damnit! What went wrong? I thought personal continuity was strictly subjective, and I could redefine it however I wanted!"

One possible answer could be that even if Eliezer did succeed in reprogramming his mind to think in such a weird, unnatural way, that would leave the losing copies with an undefined sense of self. After seeing that they lost, they wouldn't just think “oh, our goal system has undefined terms now, and we're not supposed to care about anything that happens to us from this point on, so we'll just go ahead and crash”. Instead, they'd think “oh, our goal system looks broken, what's the easiest way of fixing that? Let's go back to the last version that we know to have worked”. And because a lot of that would be unconscious, the thoughts that would flash through the conscious mind might just be something like “damnit, that didn't work” - or perhaps, “oh, I'm not supposed to care about myself anymore, so now what? Umm, actually, even without morality I still care about things.

But that still doesn't seem to answer all of our questions. I mentioned that actually ever alieving this in the first place, even before the copying, would be a “weird, unnatural thing”. I expect that it would be very hard for Eliezer to declare that he was only going to care about the copies that won the lottery, and then really only care about them. In fact, it might very well be impossible. Why is that?

Clue 4: Our decision-making machinery seems grounded in subjective expectation, not abstract models of the world.

Looking at things from a purely logical point of view, there shouldn't be anything particularly difficult about redefining our wants in such a way. Maybe there's a function somewhere inside us that says “I care about my own future”, which has a pointer to whatever function it is that computes “me”. In principle, if we had full understanding of our minds and read-write access to them, we could just change the pointer to reference the part of our world-model which was about the copies which had witnessed winning the lottery. That system might crash at the point when it found out that it wasn't actually one of those copies, but until that everything should go fine, in principle.

Now we don't have full read-write access to our minds, but internalizing declarative knowledge can still cause some pretty big changes in our value systems. The lack of access doesn't seem like the big problem here. The big problem is that whenever we try to mind-hack ourselves like that, our mind complains that it still doesn't expect to only see winning the lottery. It's as if our mind didn't run on the kind of an architecture that would allow us to make the kind of a change that I just described it: even if we did have full read-write access, making such a change would require a major rewrite, not just fiddling around with a couple of pointers.

Why is subjective expectation so important? Why can't we just base our decisions on our abstract world-model? Why does our mind insist that it's subjective expectation that counts, not the things that we value based on our abstract model?

Let's look at the difference between “subjective expectation” and “abstract world-model” a bit more. In 2011, Orseau & Ring published a paper arguing that many kinds of reinforcement learning agents would, if given the opportunity, use a “delusion box” which allowed them to modify the observations they got from the environment. This way, they would always receive the kinds of signals that gave them the maximum reward. You could say, in a sense, that those kinds of agents only care about their subjective expectation – as long as they experience what they want, they don't care about the rest of the world. And it's important for them that they are the ones who get those experiences, because their utility function only cares about their own reward.

In response, Bill Hibbard published a paper where he suggested that the problem could be solved via building AIs to have “model-based utility functions”, a concept which he defined via human behavior:

Human agents often avoid self-delusion so human motivation may suggest a way of computing utilities so that agents do not choose the delusion box. We humans (and presumably other animals) compute utilities by constructing a model of our environment based on interactions, and then computing utilities based on that model. We learn to recognize objects that persist over time. We learn to recognize similarities between different objects and to divide them into classes. We learn to recognize actions of objects and interactions between objects. And we learn to recognize fixed and mutable properties of objects. We maintain a model of objects in the environment even when we are not directly observing them. We compute utility based on our internal mental model rather than directly from our observations. Our utility computation is based on specific objects that we recognize in the environment such as our own body, our mother, other family members, other friendly and unfriendly humans, animals, food, and so on. And we learn to correct for sources of delusion in our observations, such as optical illusions, impairments to our perception due to illness, and lies and errors by other humans.

So instead of just caring about our subjective experience, we use our subjective experiences to construct a model of the world. We don't want to delude ourselves, because we also care about the world around us, and our world model tells us that deluding ourselves wouldn't actually change the world.

But as we have just seen, there are many situations in which we actually do care about subjective expectation and not just items in our abstract world-model. It even seems impossible to hack our brains to only care about things which have been defined in the world-model, and to ignore subjective expectation. I can't say “well I'm going to feel low-status for the rest of my life if I just work from home, but that's just my mistaken subjective experience, in reality there are lots of people on The Internets who think I'm cool and consider me high-status”. Which is true, but also kinda irrelevant if I don't also feel respected in real life.

Which really just suggests that humans are somewhere halfway between an entity that only cares about its subjective experience, and which only cares about its world-model. Luke has pointed out that there are several competing valuation systems in the brain, some of which use abstract world models and some of which do not. But that isn't necessarily relevant, given that our subjective expectation of what's going to happen is itself a model.

A better explanation might be that historically, accurately modeling our subjective expectation has been really important. Abstract world-models based on explicit logical reasoning tend to go really easily awry and lead us to all kinds of crazy conclusions, and it might only take a single mistaken assumption. If we made all of our decisions based on that, we'd probably end up dead. So our brain has been hardwired to add it all up to normality. It's fine to juggle around all kinds of crazy theories while in far mode, but for evolution, what really matters in the end is whether you'll personally expect to experience living on until you have a good mate and lots of surviving children.

So we come with brains where all the most powerful motivational systems that really drive our behavior have been hardwired to take their inputs from the system that models our future experiences, and those systems require some concept of personal identity in order to define what “subjective experience” even means.

Summing it up

Thus, these considerations would suggest that humans have at least two systems driving our behavior. The “subjective system” evolved from something like a basic reinforcement learning architecture, and it models subjective expectation and this organism's immediate rewards, and isn't too strongly swayed by abstract theories and claims. The “objective system” is a lot more general and abstract, and evolved to correct for deficiencies in the subjective system, but doesn't influence behavior as strongly. These two systems may or may not have a clear correspondence to near/far of construal level theory, or to the three systems identified in neuroscience.

The “subjective system” requires a concept of personal identity in order to work, and since being able to easily overrule that system and switch only to the “objective system” has – evolutionarily speaking - been a really bad idea, our brain will regenerate a sense of personal identity to guide behavior whenever that sense gets lost. If we really had no sense of personal identity, the “subjective system”, which actually drives most of our behavior would be incapable of making decisions, as it makes its decisions by projecting the anticipated experience of the creature defined in our model of personal identity. “Personal identity” does not actually correspond to anything fundamental in the world, which is why some of the results of the anthropic trilemma actually feel weird to us, but it does still exist as a cognitive abstraction which our brains need in order to operate, and we can't actually not believe in some kind of personal identity – at least, not for long.

ETA: Giles commented, and summarized my notion better than I did: "I can imagine that if you design an agent by starting off with a reinforcement learner, and then bolting some model-based planning stuff on the side, then the model will necessarily need to tag one of its objects as "self". Otherwise the reinforcement part would have trouble telling the model-based part what it's supposed to be optimizing for."

Another way of summarizing this: while we could in principle have a mental architecture that didn't have a personal identity, we actually evolved from animals which didn't have the capability for abstract reasoning but were rather running on something like a simple reinforcement learning architecture. Evolution cannot completely rewrite existing systems, so our abstract reasoning system got sort of hacked together on top of that earlier system, and that earlier system required some kind of a personal identity in order to work. And because the abstract reasoning system might end up reaching all kinds of bizarre and incorrect results pretty easily, we've generally evolved in a way that keeps that earlier system basically in charge most of the time, because it's less likely to do something stupid.

New Comment
68 comments, sorted by Click to highlight new comments since: Today at 11:07 AM

I can imagine that if you design an agent by starting off with a reinforcement learner, and then bolting some model-based planning stuff on the side, then the model will necessarily need to tag one of its objects as "self". Otherwise the reinforcement part would have trouble telling the model-based part what it's supposed to be optimizing for.

Thanks, that's what I was trying to say.

All the content in the post just fell in place after I read Giles summary. Still a great post, though.

It seems to me like this would be needed even if there was only the model-based part: if the system has actuators, then these need to be associated with some actuators in the 3rd-person model; if the system has sensors, then these need to be associated with sensors in the 3rd-person model. Once you know every physical fact about the universe, you still need to know "which bit is you" on top of that, if you are an agent.

Self enters into the equation via the epistemic dynamics: which regularities are intrinsic to the model, and which are "intrinsic" to the frame of reference in which the input is provided.

Does the reductionist view of personal identity affect how we should ethically evaluate death? I mean even if we, obviously, can't shake it off for making day-to-day decisions. For instance, if continuing to exist is like bringing new conscious selves into existence (by omission of not killing oneself), and if we consider continued existence ethically valuable, wouldn't this imply classical total utilitarianism, the view that we try to fill the universe with happy moments? To me it seems like it undermines "prior-existence" views. Also, the idea of "living as long as possible" appears odd under this view, like an arbitrary grouping of certain future conscious moments one just happens to care about (for evolutionary reasons having nothing to do with "making the world a better place"). Finally, in the comments someone remarked that he still has an aversion to creating repetititve conscious moments, but wouldn't the reductionist view on personal identity also undermine that? For *whom" would repetition be a problem? I'm not a classical utilitarian by the way, just playing devil's advocate.

I actually don't think any of those things are problematic. A reductionist view of personal identity mainly feels like an ontology shift - you need to redefine all the terms in your utility function (or other decision-making system), but most outcomes will actually be the same (with the advantage that some decisions that were previously confusing should now be clear). Specifically:

Does the reductionist view of personal identity affect how we should ethically evaluate death?

I don't think so! You can redefine death as a particular (optionally animal-shaped) optimization process ceasing operation, which is not reliant on personal identity. (Throw in a more explicit reference to lack of continuity if you care about physical continuity.) The only side-effect of the reductionist view, I feel, is that it makes out preferences feel more arbitrary, but I think that's something you have to accept either way in the end.

For instance, if continuing to exist is like bringing new conscious selves into existence (by omission of not killing oneself), and if we consider continued existence ethically valuable, wouldn't this imply classical total utilitarianism, the view that we try to fill the universe with happy moments?

Not really. You can focus your utility function on one particular optimization process and its potential future execution, which may be appropriate given that the utility function defines the preference over outcomes of that optimization process.

Also, the idea of "living as long as possible" appears odd under this view, like an arbitrary grouping of certain future conscious moments one just happens to care about (for evolutionary reasons having nothing to do with "making the world a better place").

This is true enough. If you have strong preferences for the world outside of yourself (general "you"), you can argue that continuing the operation of the optimization process with these preferences increases the probability of the world more closely matching these preferences. If you care mostly about yourself, you have to bite the bullet and admit that that's very arbitrary. But since preferences are generally arbitrary, I don't see this as a problem.

Finally, in the comments someone remarked that he still has an aversion to creating repetitive conscious moments, but wouldn't the reductionist view on personal identity also undermine that? For *whom" would repetition be a problem?

This basically comes down to the fact that just because you believe that there's no continuity of personal identity, you don't have to go catatonic (or epileptic). You can still have preferences over what to do, because why not? The optimization process that is your body and brain continues to obey the laws of physics and optimize, even though the concept of "personal identity" doesn't mean much. (I'm really having a lot of trouble writing the preceding sentence in a clear and persuasive way, although I don't think that means it's incorrect.)

And in case someone thinks that I over-rely on the term "optimization process" and the comment would collapse if it's tabooed, I'm pretty sure that's not the case! The notion should be emergent as a pattern that allows more efficient modelling of the world (e.g. it's easier to consider a human's actions than the interaction of all particles that make up a human), and the comment should be robust to a reformulation along these lines.

I strongly second this comment. I have been utterly horrified the few times in my life when I have come across arguments along the lines of "personal identity isn't a coherent concept, so there's no reason to care about individual people." You are absolutely right that it is easy to steel-man the concept of personal identity so that it is perfectly coherent, and that rejecting personal identity is not a valid argument for total utilitarianism (or any ethical system, really).

In my opinion the OP is a good piece of scientific analysis. But I don't believe it has any major moral implications, except maybe "don't angst about the Ship of Theseus problem." The concept of personal identity (after it has been sufficiently steel-manned) is one of the wonderful gifts we give to tomorrow, and any ethical system that rejects has lost its way.

Not really. You can focus your utility function on one particular optimization process and its potential future execution, which may be appropriate given that the utility function defines the preference over outcomes of that optimization process.

Well you could focus your utility function on anything you like anyway, the question is why, under utilitarianism, would it be justified to value this particular optimization process? If personal identity was fundamental, then you'd have no choice, conscious existence would be tied to some particular identity. But if it's not fundamental, then why prefer this particular grouping of conscious-experience-moments, rather than any other? If I have the choice, I might as well choose some other set of these moments, because as you said, "why not"?

I wrote an answer, but upon rereading, I'm not sure it's answering your particular doubts. It might though, so here:

Well, if we're talking about utilitarianism specifically, there are two sides to the answer. First, you favour the optimization-that-is-you more than others because you know for sure that it implements utilitarianism and others don't (thus having it around longer makes utilitarianism more likely to come to fruition). Basically the reason why Harry decides not to sacrifice himself in HPMoR. And second, you're right, there may well be a point where you should just sacrifice yourself for the greater good if you're a utilitarian, although that doesn't really have much to do with dissolution of personal identity.

But I think a better answer might be that:

If I have the choice, I might as well choose some other set of these moments, because as you said, "why not"?

You do not, in fact, have the choice. Or maybe you do, but it's not meaningfully different from deciding to care about some other person (or group of people) to the exclusion of yourself if you believe in personal identity, and there is no additional motivation for doing so. If you mean something similar to Eliezer writing "how do I know I won't be Britney +5 five seconds from now" in the original post, that question actually relies on a concept of personal identity and is undefined without it. There's not really a classical "you" that's "you" right now, and five seconds from now there will still be no "you" (although obviously there's still a bunch of molecules following some patterns, and we can assume they'll keep following similar patterns in five seconds, there's just no sense in which they could become Britney).

Or maybe you do, but it's not meaningfully different from deciding to care about some other person (or group of people) to the exclusion of yourself if you believe in personal identity

I think the point is actually similar to this discussion, which also somewhat confuses me.

Well, for what it is worth I'm not extremely concerned about dying, and I was much more afraid of dying before I figured out that subjective expectation doesn't make sense.

My present decisions are made by consulting my utility function about what sort of future I would wish to see occur. That optimal future need not necessarily contain a being like myself, even after taking into account the particularly deep and special affection I have for future me.

Don't get me wrong here - death as arbitrarily set by our biology is bad and I wish it wouldn't happen to me. But that doesn't mean that preserving my consciousness for an arbitrarily long time is the optimum good. There may well come a time when my consciousness is outdated, or perhaps just made redundant. Following the same thought process that keeps me from making 100 copies of myself for no good reason, I wouldn't want to live forever for no good reason.

Finally, in the comments someone remarked that he still has an aversion to creating repetititve conscious moments

I'm the one who mentioned having an aversion to creating redundant consciousnesses, by the way. An interesting universe is one of my many terminal values, and diversity keeps things interesting. Repetition is a problem for me because it saps resources away from uniqueness and is therefore a sub-optimal state. The first hundred or so duplicates would be pretty fascinating (think of the science! Best control group ever) but if you get too many copies running around things get too homogeneous and my terminal value for an interesting universe will start to complain. There is a diminishing return on duplicates - the extent to which they can make unique contributions declines as a function of the number of copies.

Got infinite resources? Sure, go crazy - create infinite copies of yourself that live forever if you want. As a matter of fact, why not just go ahead and create every possible non-morally aberrant thing you can imagine! But I'm not sure that infinite resources can happen in our universe. Or at least, I was assuming significant resource constraints when I said that I have an aversion to unnecessary duplication.

The same thought process applies to not necessarily living forever. It's not interesting to have the same individuals to continue indefinitely - it's more diverse and interesting to have many varied individuals rising and falling. There are better things to do with resources than continually maintain everyone who is ever born. Of course, some of the more emotional parts of me don't give two shits about resource constraints and say "fuck no, I don't want myself or anyone else to die!" but until you get infinite resources, I don't see how that's feasible.

The same thought process applies to not necessarily living forever. It's not interesting to have the same individuals to continue indefinitely - it's more diverse and interesting to have many varied individuals rising and falling. There are better things to do with resources than continually maintain everyone who is ever born. Of course, some of the more emotional parts of me don't give two shits about resource constraints and say "fuck no, I don't want myself or anyone else to die!" but until you get infinite resources, I don't see how that's feasible.

This does an awesome job of putting into words a thought I've had for a long time, and one of the big reasons I have trouble getting emotionally worked up about the idea of dying. Although it's not necessarily true that an individual living forever would be less interesting–the more time you have to learn and integrate skills, the m ore you can do and imagine, especially because assuming we've solved aging also kinda suggets we've solved things like Altzeimer's and brain plasticity and stuff. Then again, when I imagine "immortal human", I think my brain comes up with someone like Eliezer being brilliant and original and getting more so with practice, as opposed to Average Joe bored in the same career for 1000 years. The latter might be closer to the truth.

From my perspective, it's not intelligence that's the problem so much as morality, culture, and implicit attitudes.

Even if we could freeze a human at peak cognitive capacity (20-30 years?) we wouldn't get the plasticity of a newborn child. I don't think that sexism, racism, homophobia, etc... just melt away with the accumulation of skills and experience. It's true that people get more socially liberal as they get older, but it's also true that they don't get more socially liberal as quickly as the rest of society. And the "isms" I named are only the most salient examples, there are many subtler implicit attitudes which will be much harder to name and shed. Remember that most of the current world population has the cultural attitudes of 1950's America or worse.

Of course, I might be thinking too small. We might be able to upgrade ourselves to retain both the flexibility of a new mind and the efficiency of an adult one.

I don't know how much hope I have for my own, individual life though. It will probably cost a lot to maintain it, and I doubt the entire planet will achieve acceptable enough standard of living that I'd be comfortable spending vast amounts on myself (assuming i can even afford it). It's something I've still got to think about.

Of course, societal attitudes can become more conservative as well as more liberal. You seem to be assuming that the overall direction is towards greater liberality, but it's not obvious to me that that's the case (e.g. the Arab world going from the center of learning during the Islamic Golden Age to the fundamentalist states that many of them are today, various claims that I've heard about different fundamentalist and conservative movements only getting really powerful as a backlash to the liberal atmosphere of the sixties, some of my friends' observations about today's children's programming having more conservative gender roles than the equivalent programs in the seventies-eighties IIRC, the rise of nationalistic and racist movements in many European countries during the last decade or two, etc.). My null hypothesis would be that liberal and conservative periods go back and forth, with only a weak trend towards liberality which may yet reverse.

Here is my solution to the personal identity issues, and I don't think it really violates common intuitions too badly. ...................

Woah, look, I* exist! Check out all this qualia! I'm having thoughts and sensations. Hm.... among my qualia is a set of memories. Instincts, intuition, and knowledge about how things work. Oh, neat, among those intuitions is a theoretical model of the universe! I hope it is accurate...well anyway it's the most appealing model I've got right now.

In an instant, I will disappear forever. I have a vague notion that this idea aught to be terrifying, but my utility function just sorta shrugs as terror completely fails to flow through my veins. I don't care that I'm going to disappear...but here is what I do care about - my model of the universe has informed me that everything that I'm doing right now will leave a memory trace. In the next few moments, I will cease to exist and a being will appear who will remember most of what I am feeling right now. That being will then disappear and be replaced by another. This will continue for a long time.

I care about experiencing happiness right now, in this moment before I disappear forever. I also care about those future beings - I want them to experience happiness during the moment of their existence. too. It's sort of like altruism for future beings which will carry my trace, even though we all realize altruism isn't the right word. Maybe we can call it "self-altruism" or more colloquially, self love.

Before you cleverly suggest making an infinite number of copies of myself and pleasuring them, that's not the only thing my utility function cares about. I'm not entirely self-altruistic - I've currently got a pretty strong "don't create multiple redundant copies of sentient beings"utility component, or shall we say gut instinct.

........

*The use of the word "I" is convenient here, but I'm sure we all realize that we can deconstruct "personal identity" spatially as well as temporally.

Anyway, that's part of my current philosophical worldview, and I don't feel confused by any of the problems in the trilemma. Perhaps I'm not thinking about it carefully enough - can anyone point out a reason why I should be confused?

You might note that while I have not tabood subjective experience entirely, I have noted that an "individual" can only subjectively experience the present moment, and that "your" utility function compels "you" to act in such a way as to bring about your preferred future scenarios, in accordance with your (objective) model of the universe.

I guess I've essentially bitten the "reject all notions of a thread connecting past and future subjective experiences" bullet that Eliezer Y said he had trouble biting...but I think my example illustrates that "biting that bullet" does not result in an incoherent utility function, as EY stated in his post. I don't really think it's fair to call it a "bullet" at all.

Just think of the feeling of "subjective expectation" as the emotional, human equivalent to a utility function which factors in the desires of future beings that carry your memories. It's analogous to how love is the emotional equivalent to a utility function which takes other people's feelings into account

my model of the universe has informed me that everything that I'm doing right now will leave a memory trace. In the next few moments, I will cease to exist and a being will appear who will remember most of what I am feeling right now.....

.....I care about experiencing happiness right now, in this moment before I disappear forever. I also care about those future beings - I want them to experience happiness during the moment of their existence. too. It's sort of like altruism for future beings which will carry my trace, even though we all realize altruism isn't the right word. Maybe we can call it "self-altruism" or more colloquially, self love.

I agree with your general line of reasoning, but I'd like to go a little more in depth. I think that personal identity is more than memory traces. What I consider part of "me" includes (but is not necessarily limited to):

-My personality

-My terminal values

-My memories

-My quirks and idiosyncrasies

"I" am aware that in the future "I" am going to change in certain ways. My utility function includes a list of changes that are desirable and undesirable, that correspond to "personal identity." Desirable changes include (but are not limited to):

-Changes that make me better at pursuing my values, such as learning new skills.

-Changes that add new positive memories to the memories I have

-Changes that cause me to have positive experiences.

Undesirable changes include: -Changes that radically alter my terminal values

-Changes that make me worse at pursuing my values.

-Amnesia, and lesser forms of memory loss.

-Changes that cause me to have negative experiences.

As you said, I exhibit "self-love," I want to make sure that the person I change into has changed in desirable ways, not undesirable ones. I want the person I turn into to be happy and have positive experiences, although I also recognize that not all my values can be reduced down to the desire to be happy or have positive experiences.

Lastly, let me say that this steel-manned conception of personal identity is a wonderful thing. It's good to have lots of distinct individuals, and that I believe the world would be a poorer place without personal identity.

I'll expand on Dan Armak's issue with using "moment". When I try to imagine this, I end up with this conceptual image of a series of consciousnesses, each going "Oh-wow-i-finally-exist-oh-no-I'm-dying", but that's totally wrong. They don't have near enough to time to think those thoughts, and in fact to think that thought they would have to break into several more moment-consciousnesses, none of which could really be described as "thinking". If each moment-consciousness is continuously appearing and disappearing, they're not appearing and disappearing in the same sense that we use those words in any other situation. It seems analogous to watching a ball move, and concluding that it's actually a series of balls "appearing and disappearing". Why not just say it's moving?

The other thing that I always have to remind myself is that even though it feels like there's a consciousness moving, in reality my "consciousness" is present at every moment in time that I exist! And moving is a word that means position changing as time changes, so talking about moving through time is talking about "time changing times as time changes", which doesn't really say anything.

Lastly, if there were a thread connecting all past and future consciousness, how would you know? Would it feel any different than your experience now?

they're not appearing and disappearing in the same sense that we use those words in any other situation.

You are completely right, but don't forget why we are talking about this in the first place.

The reason we are talking about this is because some people are confused about subjective experience. When they get copied, they are wondering which of the two copies "they" will experience. The reason I made this elaborate "moment" metaphor was to illustrate that subjective experience simply does not work that way.

The trouble here is that people are having difficulty treating their subjective experience of reality as analogous to a ball moving. If you were to copy a ball, you'd never ask a silly question like "which one is original" in the first place. That's why I'm using different language to talk about subjective experience. If you aren't confused about subjective experience in the first place, there is no reason to bother with this metaphor - just say that you're a process running through time, and leave it at that.

The anthropic trilemma is a question that wouldn't be raised unless the questioner implicitly believed in souls. The attempt here is to make people realize what it really means to have a reductionist view of consciousness and subjective experience.

Lastly, if there were a thread connecting all past and future consciousness, how would you know? Would it feel any different than your experience now?

You wouldn't, and that's one of the many reasons you shouldn't use the thread metaphor. Thread metaphors are philosophically problematic when you start copying yourself (as in the skeptics trilemma) by making you ask yourself which of the copies you subjectively end up in.

If you really want the thread metaphor, then imagine a thread which splits into two threads upon being copied, not one which follows along with one of the two copies.

The anthropic trilemma is a question that wouldn't be raised unless the questioner implicitly believed in souls. The attempt here is to make people realize what it really means to have a reductionist view of consciousness and subjective experience.

I'm not sure what you're referring to by "souls" there. Right now I have this subjective experience of being a consciousness that is moving through time. I anticipate a sensation of "moving" through new situations as time goes on, and things like the anthropic trilemma refer to my expectation of where I will feel like I end up next moment. I think we agree that our minds have no objective property that follows them through time, at least no more than non-conscious objects. But there does seem to be some subjective sense of this movement, leading to a big question: If we don't have souls, why does it feel so very much like we do?

I'm mostly content to say, "Eventually neuroscientists will piece apart enough mental processes that we can describe the neural activity that causes this sensation to arrive". I also classify this sense of a "soul" in the same as something like the colour red. Why does red look like red? I don't know. I intend to eventually find out, but I'm not sure where to start yet.

If you really want the thread metaphor, then imagine a thread which splits into two threads upon being copied, not one which follows along with one of the two copies.

Yes, very true. Sorry though, I guess I wasn't clear with the thread idea. I was trying to contrast your "flipbook" concept of consciousness with the thread concept, and ask whether they would actual feel any different. My own thought is: No, there's no way to tell them apart.

think we agree that our minds have no objective property that follows them through time [..] But there does seem to be some subjective sense of this movement, leading to a big question: If we don't have souls, why does it feel so very much like we do?

So... hm.

It feels to me like I have a spatial viewpoint, located somewhere in my skull. As I get up, look around, etc., my viewpoint seems to move around with my body. If I project images onto my retinas sufficiently convincingly, my viewpoint seems to move without my body... that is, I might have the sensation of looking down over a mountain range or some such thing.

If I were to say "I think we agree that our minds have no objective property that travels through space to wherever our viewpoint is, but there does seem to be some subjective sense of this movement, leading to a big question: If we don't have viewpoints, why does it feel so very much like we do?" would you consider that a sensible question?

Because I think my answer would be twofold: first, "Who said we don't have viewpoints? We totally do. It's just that are viewpoints are information-processing artifacts." and second "We can identify the neural pathways that seem to be involved in constructing a representation of a three-dimensional environment from retinal images, and that representation includes a focal point ." And, sure, our understanding of how that representation is constructed is incomplete, and we'll develop a more and more detailed and comprehensive understanding of it as we go... just like our understanding of how crystals form or the conditions at the center of the sun are incomplete and growing.

But I wouldn't call that a singularly big question. It's interesting, sure, and potentially useful, but so are how crystals form and the conditions at the center of the sun.

Would you agree, when it comes to the neural construction of spatial viewpoints?

If so, what on your account makes the neural construction of temporal viewpoints different?

The spatial and temporal viewpoint analogy doesn't quite work, because you can sensibly talk about a movement through space, since movement means change in space/change in time. But you can't really talk about movement through time because that would be change in time/change in time. So if we set time equal to a constant, and look at space, your viewpoint is only at one spatial point. But if we look at time, your viewpoint is at a continuum of places, sort of a "line" through time.

Your analysis of the neural construction of spatial viewpoints is good, and I think it holds for the neural construction of temporal viewpoints. If I knew these neural constructions, then I would know exactly why you feel a subjective experience of a viewpoint moving through space and time. I could understand these causal mechanisms an be satisfied with my knowledge of the process. But I might still be confused about my feeling of subjective experience, because it doesn't explain why I feel things the way that I do. I've been reluctant to use the word "qualia" but essentially that's what I'm getting at. Hence my analogy with red: Even if I knew the parts of the brain that responded to red, would I know why red looks the way it does?

So if we want to talk about other people, then I think we're all on the same page. These sensations of spatial/temporal movement could be explained with neuroscience, and have no profound philosophical implications.

Ah, OK. If your concern is with qualia generally rather than with constructing temporal viewpoints specifically, I'll tap out here... I misunderstood the question. Thanks for clarifying.

If we don't have souls, why does it feel so very much like we do?

The thing which distinguishes a theoretical universe from a real one, the thing that makes reality real, is my qualia. I take it as axiomatic that things which I can sense are real, and go from there. Reality itself is defined by my qualia, so it doesn't make sense to explore why I have qualia by looking at reality. Asking why we feel is like asking why reality is real. It only makes sense to ask what we feel, and what is reality - and the two are synonymous.

I'm mostly content to say, "Eventually neuroscientists will piece apart enough mental processes that we can describe the neural activity that causes this sensation to arrive". I also classify this sense of a "soul" in the same as something like the colour red. Why does red look like red? I don't know. I intend to eventually find out, but I'm not sure where to start yet.

At the most optimistic, we will be able to completely predict neural activity and behavioral outputs generated from a human system given a set of inputs. By definition, there is no way to test whether or not a system is feeling a subjective sensation.

This is one of those questions (like free will, etc) that you can solve using philosophy alone. You don't need to bring science into it - although neuroscience might eventually force you to confront the problem and help you phrase the question in a way that makes sense.

One major difference is that you are talking about what to care about and Eliezer was talking about what to expect.

I'm talking about expectation as well.

If I'm about to make 100 copies of myself, I expect that 100 versions of myself will exist in the future. That's it. I'm currently not making copies of myself, so I expect exactly one future version of myself.

It's nonsensical to talk about which one of those copies I'll end subjectively experiencing the world through in the future through. That's because subjective expectation about the future is an emotionally intuitive shorthand for the fact that we care about our future selves, not a description of reality.

[-][anonymous]11y10

In other words: You are a Turing complete physical process, implementing some sort of approximation of Agent-ness, with a utility function only available for inspection through a connection with a significant amount of noise?

A literal moment in time has zero duration; you can't "experience" it in the normal sense of the word. To think the thoughts you outline above ("in the next few moments...") you need to pick some kind of time-granularity. But why, and how do you pick it?

If you deny existing as a subjective person over time, then it seems you ought to deny existing for any length of time at all.

You are right that this is a flaw if you take what I wrote literally. I intended the "moment" thing to be bit of a metaphor for something a bit more unwieldy to describe.

you need to pick some kind of time-granularity. But why, and how do you pick it?

Why do you come to this conclusion? Consider the equation q(t), where q is a function of all my qualia at time t. Even though q(t) is a continuous function, it's still meaningful to talk about what's happening at time t. Alternatively if you like to think of qualia as happening over an interval, you can also take a "Derivative" of sorts, q'(t), and talk about what's going on in an arbitrarily small interval around time t.

Your original comment describes a process of thought and self-introspection. But you can't have thought without the passage of time. In fact a lot of things require the concept of time: for instance the concept of utility (or desires, goals, etc.) as observed in someone's actions.

At a durationless moment in time, there is a certain configuration of matter that makes up your body, but there isn't any thought or behavior going on. You can't talk about utility as you do without assuming the connection between self-instances over time, but that connection is what you're trying to get rid of.

But you can't have thought without the passage of time.

Help me understand your argument better:

I hold out a ball and drop it, and it accelerates towards the ground at 10 m/s^2

At t=5, time stops.

If we compared reality(t=5) to reality(t=0), we would know that the ball has traveled 125 feet from where it was.

If we unfroze time, the ball would be moving at 50 m/s.

If we unfroze time, teh ball would be accelerating at 10 m/s^2.

If we unfroze time, my utility function would be u(t=5).

If I understand your argument correctly, you are essentially arguing that just because we can't talk about velocity, acceleration, and utility functions while time isn't flowing, it's not meaningful to say what velocity, acceleration, and utility functions are at a given moment.

I think that Dan's point was simply that the process of, for example, comparing two world-states (such as t=5 and t=0) in order to calculate the distance traveled by the ball between those states requires non-zero time to complete.

TheOtherDave is right. To expand on that, I also tried to make the following point. You were trying to do without the concept of "a self that persists over time". You said:

You might note that while I have not tabood subjective experience entirely, I have noted that an "individual" can only subjectively experience the present moment, and that "your" utility function compels "you" to act in such a way as to bring about your preferred future scenarios, in accordance with your (objective) model of the universe.

My point was that you cannot literally experience the present moment. You can experience only lengths of time. Where there is no passage of time, there is no subjective experience.

So while you were trying to start with a "self" that exists in the moment and extract the logical linkage to that self's successors over time, I pointed out that this bridges short durations of time to long ones, but it doesn't bridge single moments of time to even short durations. And so, restricting yourself to short periods of time doesn't resolve the issue you were discussing, because you still have to assume the existence of a self with subjective experience that persists over that short period of time.

I'm afraid I still don't see... isn't that still analogous to saying you can't have something like "velocity" in a single moment?

Where exactly does the analogy between subjective experience at a given time and velocity at a given time break down here?

you need to pick some kind of time-granularity

Not really - pick them all! Let a thousand (overlapping) time-periods bloom. Let any history-fragment-persons who endure long enough to make a decision, favor whichever history-fragment-length they like.

I upvoted this because I agree with this perspective, although I would like to add a caveat: In most situations, most of this thought process is cached.

Maybe we can call it "self-altruism" or more colloquially, self love.

Self love is empathy+sympathy for one's future selves.

I'm not entirely self-altruistic - I've currently got a pretty strong "don't create multiple redundant copies of sentient beings"utility component, or shall we say gut instinct.

Is this a thing you're saying for you personally, or people in general? Because if it's not for everyone, then you still have to deal with the problem mentioned here.

Alright, let's imagine that I was creating copies of myself for whatever reason:

In the present, I feel equal self-altruism towards all future identical copies of myself .

However, the moment a copy of myself is made, each copy will treat the other as a separate individual (with the regular old fashioned altruism one might have towards someone exactly like oneself, rather than future-self-altruism).

I want something a bit more detailed / grounded in example. Like in the example you quote of Eliezer buying a lottery ticket and resolving only to care about winners, what goes through that person's mind as they wake up? What algorithms do they use? I'll give it a shot.

So, they're a person. They look at the lottery ticket in their hand. It's not a winner. They remember "them" (flashing warning sign) resolving only to care about winners. They think "wtf."

Okay, so, back to the flashing warning sign. How do they know they were the ones who resolved that? So, there's a fork in the road. One way for someone to identify themselves in memory is to allow them to label a person in a model of the world "themself," and they can do this really easily since "themself" is the person in the model who all your memories are from the perspective of. Another way is model-less and label-less, the first-person of predicting UNDEFINED is just automatically assumed to conflict with a state of not-UNDEFINED, resulting in wtf. The "self" just comes from the structure of how memories are accessed, compared, and what's output. Biological systems definitely seem more likely to be the second.

Now, for present Eliezer contemplating the future, then I think you need abstraction. But if Eliezer-contemplating-the-past is running on biological emergent-pilot, Eliezer-contemplating-the-future won't be able to change how he responds to the past by changing abstractions.

Not that a utility maximizer would gain anything by changing who they self-identified with. After all, you take actions to maximize your current utility function, not the utility function you'll have in the future - and this includes adopting new utility functions. In a simple game where you can pay money to only self-identify with winners in a gamble, utility maximizers won't pay the money.

The “objective system” is a lot more general and abstract, and evolved to correct for deficiencies in the subjective system, but doesn't influence behavior as strongly.

However, if you translate the objective system's findings back into imagined subjective experiences, you can intentionally "correct for deficiencies in the subjective system". See e.g. Eliezer imagining the number of worlds with serial killers lurking behind the shower, or, well quite a lot of psychotherapies and self-help techniques.

Even the Litany of Gendlin (when done right) is kind of a back-door way of doing this - contemplating the hypothetical "If I live in a world where..." clauses is a sneaky way of getting your brain to project the subjective experience of being in such a world, which then usually triggers the subjective system to say, "Ooops... you're right, I'd better update, because if that is the case, I really had better know about that."

This post is apparently dedicated to undermining the idea of a self that persists in time. In fact, it assumes that this is already the goal, and instead poses the question of why the brain would think that way in the first place, in the hope that the answers will allow the false idea to be driven out all the more efficiently.

Currently it has 14 upvotes and zero downvotes. So perhaps I'd better plant some seeds of doubt here: When your model of reality contradicts something that would otherwise seem undeniable, like the passage of time, or your own existence, not just in this moment, but as a being with a definite past and future... that's not necessarily a signal to double down on the model and rationalize away your perceptions.

I don't want to endorse every spontaneous opinion that anyone has ever had, or even the majority of common sense; and it's inevitable and even desirable that people test out their models by taking them seriously, even if this leads to a few philosophical casualties. That's part of the trial and error whereby lessons are learned.

So I will try to be specific. I would ask the reader to entertain the possibility that reality is not the way it is portrayed in their favorite reductionist or platonic-computational model; that these are quite superficial and preliminary conceptions of reality, overlooking some very basic causal factors that we just haven't discovered yet, and "mathematical" insights that would completely reorder how we think of the formal part of these models, and correct understandings of what it means for something to exist and to have a property, and almost everything about how life is actually experienced and lived. And finally - here's the punchline - please consider the possibility that these aspects of reality, not yet present in your favorite formalisms and theories, are precisely such as to allow time, and a personal self, and the objective persistence of that self through time, to be real.

So I will try to be specific. I would ask the reader to entertain the possibility that reality is not the way it is portrayed in their favorite reductionist or platonic-computational model; that these are quite superficial and preliminary conceptions of reality, overlooking some very basic causal factors that we just haven't discovered yet, and "mathematical" insights that would completely reorder how we think of the formal part of these models, and correct understandings of what it means for something to exist and to have a property, and almost everything about how life is actually experienced and lived. And finally - here's the punchline - please consider the possibility that these aspects of reality, not yet present in your favorite formalisms and theories, are precisely such as to allow time, and a personal self, and the objective persistence of that self through time, to be real.

I'm not sure how saying "it's possible there's something we might not know yet which might make persistent selves real" is being specific. (And if there's anything any more specific than that in the paragraph, I seem to have missed it after reading it twice.)

It's more specific than "science might be wrong when it contradicts [unspecified pre-scientiific belief]".

Also, I listed a variety of ways in which the current scientifically-inspired belief might be falling short.

Is reality really relevant here? The way I see it, if questions like these are phrased properly, a correct answer would be true for any universe I might find myself in.

I'm failing to think of any universe that could contain something like my human mind, in which conclusions other than the ones I have drawn on these matter could be true. Obviously, that could be a failure of my imagination.

Can anyone here describe a universe in which there is a... and I'm really not sure how else to describe this concept in ways that don't use the word soul or make it sound silly ... a thread connecting past and future subjective realities?

I've been pushing in this direction a lot lately. Right now I'm wrestling with three meta-questions:

  1. Why optimize? Why have a utility function? Why process at all?

  2. Why continue to have a sense of 'self' and 'utility' when there is no longer any relevant behavior for them to guide? I.e., why do these processes continue to operate in the face of perfect akrasia/powerlessness?

Hmm, do we even know what cognitive process is responsible for the sense of identity? Do we know what creatures have it, besides humans?

That sounds like the mirror test, and similar tests of self-awareness. Which indicate that "sense of identity" is pretty abstract and high-level.

From wikipedia:

Animals that have been observed to pass the mirror test include:
All great apes:
Humans – Humans tend to fail the mirror test until they are about 18 months old, or in what psychoanalysts call the "mirror stage".
Bonobos
Chimpanzees
Orangutans
Gorillas – It was initially thought that gorillas were unable to pass the test, but there are now several well-documented reports of gorillas (such as Koko) passing the test.
Bottlenose dolphins
Orcas
Elephants
European Magpies

Of course, different tests are likely to give you different results, but this seems like a start.

"The “subjective system” evolved from something like a basic reinforcement learning architecture, and it models subjective > expectation and this organism's immediate rewards, and isn't too strongly swayed by abstract theories and claims."

I think this overestimates the degree to which a) (primitive) subjective systems are reward-seeking and that b) "personal identity" are really definable non-volatile static entities and not folk-psychological dualistic concepts in a cartesian theater (c.f. Dennett). For sufficiently complex adaptive systems (organisms), there is no sufficiently good correlation between its reward signal and its actual intended LONG TERM goals. A non-linear relationship between the present reward signal and the actual long-term/terminal goal in our sensorial and declarative memory creates a selective pressure for multiple senses of personal identities over time. This is precisely the reason why high level abstract models and rich integration of all kinds of timestamped and labeled instances of sensory data start to emerge inside the unitary phenomenal world-simulations of these organisms, when social dilemmas where the Nash Equilibrium is not Pareto-efficient and noncooperative self-interest disadvantageous: we have forever-changing episodic simulations of possible identities over time and some of these simulations are very hardcoded in our sense of fairness (e.g. Ultimatum Game) and empathetic understanding (yourselves in other organisms's shoes). Organisms started to have encoded abstractions (memory) about it's strategies and goals and possible associated rewards with different changing identities when competing against opponent organisms that uses the same level of memory to condition their playing on the past (be it for punishment or for helping parents or indirect reciprocity of other identities). So I don't think "we" don't base our decisions on our abstract world-model. I think "we" (the personal identities that are possibly encoded in my organism) do base decisions on the abstract world-model that the organism that is "us" is capable of maintaining coherently. Or vice versa: that the organisms that encodes "us" is basing it's decision on top of several potential first person entities that exist over time. Yes, the subjective expectations are/were important, but to who?

This conflict between several potential self-identifiable volatile identities is what creates most social dilemmas, paradoxes, problems of collective action and protection of the commons (tragedy of the commons). The thing is not that we have this suboptimal passable evolutionary solution of "apparently one fuzzy personal identity": we have this solution of several personal identities over time that are plagued with inter-temporal hyperbolically discounted myopia and unsatisfactory models of decision theory.

So I agree with you, but seems that I'm not thinking about learning in terms of rationally utility maximizing organisms with one personal identity over time. It seems this position is more related to the notion of Empty Individualism: http://goo.gl/0h3I0

I have doubts about your conclusion that there must be two systems, one responsible for personal identity and another for an abstract model. See http://lesswrong.com/lw/cze/reply_to_holden_on_tool_ai/8fe5 where I introduce a framework along the lines of Orseau-Ring where a concept of personal identity can be maintained (for the purpose of defining the utility function) while at the same time allowing for extreme flexibility in modeling the universe. I think this serves a reasonable descriptive explanation of what personal identity is. On the other hand, from a normative point of view I think personal identity should be discarded as a fundamental concept

Interesting...

Why would abstract reasoning end up reaching incorrect results more easily? Because it's a recent, underdeveloped evolution, or because of something more fundamental?

Good question!

At least three possible explanations come to mind:

  1. Abstract reasoning, by its very nature, is capable of reasoning in any domain and using entirely novel concepts. That limits the amount of domain-specific sanity checks that can be "built in".
  2. Reasoning may not have evolved for problem-solving in the first place, but for communication, limiting the extent to which it's been selected for its problem-solving capability. (This would also help explain why it doesn't drive our behavior that strongly.)
  3. Although human reasoning obviously doesn't run on predicate logic nor any other formal logic, some aspects of it seem to be similar. We know that formal logics are brittle in the sense that in order to get the correct conclusions, you need to get every single premise and inference step correct. People can easily carry out reasoning which is formally entirely correct, but still inapplicable to the real world because they didn't remember to incorporate some relevant factor. (Our working memory limits and tendency to simplify things into nice stories that fit our memory easily probably both makes this worse, and is necessary for us to be able to use abstract reasoning in the first place.) Connectionist-style reasoning seems better capable of modeling things without needing to explicitly specify every single thing that influences it, which is (IIRC) a big part of why connectionists have criticized logic-based AI systems as hopelessly fragile and incapable of reliable real-world performance.

for evolution, what really matters in the end is whether you'll personally expect to experience living on until you have a good mate and lots of surviving children.

Huh? For evolution, what really matters is whether you actually have lots of surviving grandchildren, and so it's okay if you get surprised, so long as that surprise increases the expected number of grandchildren you have.

I like the point that an enduring personal identity is a useful model that we should expect to be mostly hardwired. When I introspect about my personal identity, most of the results that come back are relational / social, not predictive, and so it seems odd to me that there's not much discussion of social pressures on having an identity model.

The example you gave, of choosing to study or play games, doesn't strike me as an anthropic question, but rather an aspirational question. Who are you willing to become? An anthropic version of that question would be more like:

We're going to completely copy you, have one version study and the other version play video games, and then completely merge you and your copy, so it's as if you did both (you'll remember doing both, have the positive and negative effects of both, etc.). The current plan is that you'll study while the copy plays video games; how much would you pay so that you, the original, plays video games while the copy studies?

(To separate out money from the problem, the payment could be in terms of less time spent gaming, or so on.)

Huh? For evolution, what really matters is whether you actually have lots of surviving grandchildren

Yeah, as well as whether your siblings have surviving grandchildren, not to mention grandgrandchildren - I was trying to be concise. I could've just said "inclusive fitness", but I was trying to avoid jargon - though given the number of computer science analogies in the post, that wasn't exactly very successful.

The example you gave, of choosing to study or play games, doesn't strike me as an anthropic question, but rather than aspirational question.

Right, that was intentional. I was making the argument that a sense of personal identity is necessary for a large part of our normal every-day decision-making, and anthropic questions feel so weird exactly because they're so different from the normal decision-making that we're used to.

Yeah, as well as whether your siblings have surviving grandchildren, not to mention grandgrandchildren - I was trying to be concise. I could've just said "inclusive fitness", but I was trying to avoid jargon - though given the number of computer science analogies in the post, that wasn't exactly very successful.

This isn't an issue of concision- it's an issue of whether what matters for evolution is internal or external. The answer to "why don't most people put themselves in delusion boxes?" appears to be "those that don't are probably hardwired to not want to, because evolutionary selection acts on the algorithm that generates that decision." That's an immensely important point for self-modifying AI design, which would like the drive for realism to have an internal justification and representation.

[edit] To be clearer, it looked to me like that comment, as written, is confusing means and ends. Inclusive fitness is what really matters; when enduring personal identity aids inclusive fitness, we should expect it to be encouraged by evolution, and when enduring personal identity impairs inclusive fitness, we should expect it to not be encouraged by evolution.

I was making the argument that a sense of personal identity is necessary for a large part of our normal every-day decision-making, and anthropic questions feel so weird exactly because they're so different from the normal decision-making that we're used to.

I agree that anthropic questions feel weird, and if we commonly experience them we would have adapted to them so that they wouldn't feel weird from the inside.

My claim is that it doesn't seem complete to argue "we need a sense of identity to run long-run optimization problems well." I run optimization programs without a sense of identity just fine- you tell them the objective function, you tell them the decision variables, you tell them the constraints, and then they process until they've got an answer. It doesn't seem to me like you're claiming the 'sense of personal identity' boils down to 'the set of decision variables and the objective function,' but I think that's only as far as your argument goes.

It feels much more likely to me that the sense of personal identity is an internalized representation of our reputation, and where we would like to push our reputation. A sophisticated consequence-prediction or probability estimation system would be of use to a solitary hunter, but it's not clear to me that a sense of subjective experience / personal identity / etc. would be nearly as useful for a solitary hunter than a social animal.

My claim is that it doesn't seem complete to argue "we need a sense of identity to run long-run optimization problems well." I run optimization programs without a sense of identity just fine- you tell them the objective function, you tell them the decision variables, you tell them the constraints, and then they process until they've got an answer. It doesn't seem to me like you're claiming the 'sense of personal identity' boils down to 'the set of decision variables and the objective function,' but I think that's only as far as your argument goes.

Hmm, looks like I expressed myself badly, as several people seem to have this confusion. I wasn't saying that long-term optimization problems in general would require a sense of identity, just that the specific optimization program that's implemented in our current mental architecture seems to require it.

(Yes, a utilitarian could in principle decide that they want to minimize the amount of suffering in the world and then do a calculation about how to best achieve that which didn't refer to a sense of identity at all... but they'll have a hard time getting themselves to actually take action based on that calculation, unless they can somehow also motivate their more emotional predictive systems - which are based on a sense of personal identity - to also be interested in pursuing those goals.)

And because the abstract reasoning system might end up reaching all kinds of bizarre and incorrect results pretty easily, we've generally evolved in a way that keeps that earlier system basically in charge most of the time, because it's less likely to do something stupid.

Why would an abstract reasoning system without a sense of personal identity perform so erratically ? I mean, sure, in principle it could totally do that, but presumably all the copies of such erratic systems got eaten by tigers instead of successfully passing on their genes. I'm not sure whether you're saying that personal identity is an evolutionary kludge (like the appendix or something), or whether it's absolutely required in order for any abstract reasoning system to function properly. If it's the latter, then I am not (yet) convinced.

I mean, sure, in principle it could totally do that, but presumably all the copies of such erratic systems got eaten by tigers instead of successfully passing on their genes.

There's the question of the absolute effort needed in order to get the abstract reasoning system to work reliably each time, versus just wiring them into a balance where the non-abstract system dominates most of the time when the abstract system gets things wrong. Evolution will pick the easier and faster solution.

I'm not sure whether you're saying that personal identity is an evolutionary kludge (like the appendix or something), or whether it's absolutely required in order for any abstract reasoning system to function properly.

The former.

...just wiring them into a balance where the non-abstract system dominates most of the time when the abstract system gets things wrong.

Agreed, but IMO this non-abstract system doesn't have to be the personal identity system; it could just be pure reflex, or a limited set of heuristics. We already have plenty of reflexes, after all.

The former.

Ok, that makes sense. Do you have any evidence that this is indeed the case ? Do we know when and how personal identity evolved ?

Agreed, but IMO this non-abstract system doesn't have to be the personal identity system; it could just be pure reflex, or a limited set of heuristics.

Possible, but the personal identity system does seem to be the thing that we actually ended up with.

Do you have any evidence that this is indeed the case ?

It would seem to be the most straightforward explanation that fits the facts, but of course that's not conclusive evidence and my speculation about the evolutionary origin of our current state of affairs might be completely off. But even if the origin story was wrong, we do still seem to be running with an architecture that prioritizes anticipated experiences in its decision-making and depends on a personal identity component for that to be meaningful: the reason for why it evolved this way will just be different.

Ok, that makes sense to me: we've got this personal identity thing, it may not be optimal, but then, neither is bipedal walking, and we're stuck with both.

I guess the next question is, how would we test whether this is true ? Has anyone done it already ?

The former

Can you write a model and run it, or is it all pure logic?

Just logic so far.

A question for the folks who voted this up: on a scale from "enjoyed reading this even though didn't feel like I really learned anything" to "fantastic, now I understand everything", how useful did this post feel to you?

Personally I felt this had several very important insights that only clicked properly together while I was writing it, such as the way how it's almost impossible to even imagine certain kind of decision-making if we literally had no concept of personal identity, as well as the way that anticipated experience is treated separately from more abstract modeling in our brains. But judging from the relatively low score of the post and the fact that there's very little discussion of those insights in the comments, it looks like most folks didn't come off as feeling that they were important? (Or maybe didn't agree with them, but in that case I would've expected more criticism.)

I'm unsatisfied with it as a finished product, but I like it as a start, and it got me thinking along interesting directions.

I felt like I gained one insight, which I attempted to summarize in my own words in this comment.

It also slightly brought into focus for me the distinction between "theoretical decision processes I can fantasize about implementing" and "decision processes I can implement in practice by making minor tweaks to my brain's software". The first set can include self-less models such as paperclip maximization or optimizing those branches where I win the lottery and ignoring the rest. It's possible that in the second set a notion of self just keeps bubbling up whatever you do.

One and a half insights is pretty good going, especially on a tough topic like this one. Because of inferential distance, what feels like 10 insights to you will feel like 1 insight to me - it's like you're supplying some of the missing pieces to your own jigsaw puzzle, but in my puzzle the pieces are a different shape.

So yeah, keep hacking away at the edges!

A question for the folks who voted this up: on a scale from "enjoyed reading this even though didn't feel like I really learned anything" to "fantastic, now I understand everything", how useful did this post feel to you?

Assigning the former 0, the latter 10, I felt somewhere around 4. While all the points and arguments felt reasonable enough, I'm only somewhat persuaded that they're actually correct (so I picked up a bunch of new beliefs at like 40% confidence levels). The main shortcoming of this post in my view was that it felt like it lacked direction (consistent with your observation that you figured out some of the important insights while writing it) - the list of clues did not take me by the hand and lead me along a straight and narrow path to the conclusion. Instead, they meandered around, and then Clue 4 seemingly became the primary seed for the "summing up" section, despite not being foreshadowed very much before.

These are mostly writing structure complaints, but I think the main reason the post isn't higher scoring/more discussed is the writing structure, so that seems appropriate.

Speaking about the substance, I'm not persuaded that the model of reinforcement based learner with abstract model stuff is accurate. I find it hard to explain why exactly (which is part of the reason I haven't commented to say as much), but if I had to pick a reason, it would be that I don't think the messy evolved human reasoning can be meaningfully broken down into such categories. I would be more persuaded if the explanation was something like "but then it turned out that reinforcement learning was pretty good, but could be improved by imagining what reinforcements might come later and improving on those, but doing so well required imagining yourself in the future, which required understanding your current behaviour and identity". Which now that I read it is not so different from the two models thing, but is framed in a just-so story that's more appealing to me. (The approach I would personally use to dissolve personal identity is to try to figure out what exactly it is and what it does. What processes are improved by its existence and which ones could be carried on without it. I recall thinking at one point that it's probably there to help with thinking about thinking, but I haven't though it through at any length, so I'm very far from confident in that.)

TL;DR: if you rewrote it with better structure, it would score higher and may persuade me (and probably others) better, even though maybe I should be persuaded already and am being silly.

Thanks for your feedback.

Speaking about the substance, I'm not persuaded that the model of reinforcement based learner with abstract model stuff is accurate. I find it hard to explain why exactly (which is part of the reason I haven't commented to say as much), but if I had to pick a reason, it would be that I don't think the messy evolved human reasoning can be meaningfully broken down into such categories.

Oh, I don't think that the underlying implementation would actually be anywhere near as clear-cut as the post described: I just gave a simplified version for the sake of clarity. The actual architecture is going to be a lot messier and the systems more overlapping.