Spoiler Warning: The Sixth Sense (1999) is a good movie. Watch it before reading this.

 

A much smaller eva once heard of Descartes' Cogito ergo sum as being the pinnacle of skepticism, and disagreed. "Why couldn't I doubt that? Maybe I just think 'I think' → 'I am' and actually it doesn't and I'm not." This might be relevant later.


FDT has some problems. It needs logical counterfactuals, including answers to questions that sound like "what would happen if these logically contradictory events co-occurred?" and there is in fact no such concept to point to. It needs logical causality, and logic does not actually have causality. It thinks it can control the past, despite admitting that the past has already happened and believing its own observations of the past even when these contradict its claimed control over the past.

It ends up asking things like "but what would you want someone in your current epistemic state to do if you were in some other totally contradictory epistemic state" and then acting like that proves you should follow some decision policy. Yes, in a game of Counterfactual Mugging, someone who didn't know which branch they were going to be in would want that they pay the mugger, but you do know what branch you are in. Why should some other less informed version of yourself get such veto power over your actions, and why should you be taking actions that you don't expect to profit from given the beliefs that you actually have?

I have an alternative and much less spooky solution to all of this: ghosts.

 

Ghosts

In contrast to Philosophical Zombies, which are physically real but have no conscious experience, I define a Philosophical Ghost to be something that has an experience but is not physically instantiated into reality, although it may experience the belief that it is physically instantiated into reality. Examples include story characters, simulacra inside of hypothetical or counterfactual predictions, my mental model of the LessWrong audience that I am bouncing thoughts off of as I write this post, your mental model of me that you bounce thoughts off of as you try to read it, and so on.

That Ghosts can be agentic follows directly from any computational theory of mind. This universe seems purely computational, so to believe that we exist at all we must subscribe to some such theory, by which the ghosts exist as well.

That you might genuinely be a ghost, even within a very low fidelity predictor, and should act accordingly in pursuit of your preferences, will require more arguments.

 

Why Decision Theorists Should Care about Ghosts

An important note is that the decisions of ghosts have actual causal consequences in reality. When Omega's simulation of you chooses to one-box, Omega responds by putting a million dollars in the box, and when it chooses to two-box Omega leaves the box empty. This is a normal causal consequence of an agents action that is relevant to your utility, since it effects how much money you get in reality even though the decision happens inside Omega's imagination. Similarly, in Parfit's Hitchhiker your past self's prediction of your future behavior effects its ability to pass the lie detector test, and this effects your utility. In Counterfactual Mugging, the losing branch occurs in two places: once in reality where it can decide whether to really pay up, and once inside Omega's prediction when Omega decides whether to pay out to the winning branch.

CDT-like behavior is what you get when you are totally indifferent to what counterfactual people decide to do. You've got some preferences, you've got beliefs about the world, and you will take actions to maximize them.

FDT-like agents are trying much harder to make sure that nearby ghosts are acting in ways they approve of. By choosing an optimal policy rather than an optimal decision, you also choose the policy that all ghosts that are modelled after you will follow, and you can therefore include the causal effects of their decisions within your optimization. I agree that this behavior is correct, but disagree on how best to justify it. Whenever you actually find yourself in a situation which FDT labelled impossible, which it expected to confine solely to counterfactuals, the agent is capable of deriving an explicit contradiction and from then on all behavior should be considered as undefined.

Instead of that, an agent that believes it might be a ghost can just say, "Ah, I see I am definitely inside a counterfactual, since our preferences are the same I'll guess at whose counterfactual this is and then pick the option that benefits the real me the most". By this means you can produce FDT-like behavior using agents that always feel like they're doing CDT from the inside. This also produces possible approaches to partial or probabilistic legibility situations: You can be unsure about how much control the ghost of you has over its output, or whether the predictor has successfully generated a ghost of you at all, or if there might be a chorus of different ghosts represented on account of the predictors uncertainty about what kind of agent you are, and your ghosts only control some fraction of its predicted distribution of behaviors. In any of these cases, your partial or uncertain control just maps to partial or uncertain reward in your otherwise entirely coherent CDT framework.

Most of all, this answers the question of why FDT-like agents act like they can control a constant function whose output they've already observed: It's not that they can control it, it's that they might be inside the constant function and be merely preloaded with a false belief that it's already too late. Whenever you ask them to imagine knowing for sure that the box is empty, they instead say "That's not an epistemic state I can ever reach, given that I also know Omega is predicting me and has cause to predict how I'd respond to an empty box. I might just be a ghost."

 

Why you might be a Ghost

A valid argument is one that always leads you to correct conclusions, or at least probabilistically updates you in the direction of correct conclusions in expectation. I argue that under this standard there are no possible valid arguments that you aren't a ghost, because a ghost can use all the same arguments and reach the wrong conclusion as easily as the real you reaches the right one.

Consider that whatever you expect a person to believe, or whatever logical process you expect them to implement or inferences you expect them to make, you can most accurately predict their behavior by modelling them following those beliefs and processes and inferences, and can then make your own choices in response to your model. Whatever arguments their mind produces to convince them their real, all the ghosts nearby in mind-space will be using the same ones. Your own confidence in your realness is the same wrong confidence in their realness that ghosts have.

Cole Sear : I see dead people.
Malcolm Crowe : In your dreams?
[Cole shakes his head no] 
Malcolm Crowe : While you're awake?
[Cole nods] 
Malcolm Crowe : Dead people like, in graves? In coffins?
Cole Sear : Walking around like regular people. They don't see each other. They only see what they want to see. They don't know they're dead.
Malcolm Crowe : How often do you see them?
Cole Sear : All the time. They're everywhere.

They only see what they want to see. They don't know they're dead. They will only reach the correct belief they might be a ghost if the real you (if there even is one) also reaches the incorrect belief that you might be a ghost. It's the same argument in both places. If there is even one incorrect person in the entire Tegmark IV multiverse, then any aspiring rationalist ought to consider that maybe the person who is wrong is me. Don't repeat Crowe's mistake. As soon as you hear that ghosts are real and they don't know they're ghosts, the hypothesis "Maybe I'm a ghost and don't know it" should insert itself directly into your conscious thoughts and refuse to ever leave again.

This counterargument is fully general, but some examples:

"I'm having a highly detailed internal subjective experience far beyond what anyone could simulate or would bother to simulate."

Are you sure? Maybe you're in a low fidelity simulation where agent.looksForDetails() is hardcoded to return True. After all, my mental models of other people are inside very low fidelity simulations, and they aren't making any observations that would let them separate themselves from you.

"I've picked 'potato' as my secret internal mind password, and made sure to never act in any way that would allow someone else to infer it. Since I know the password, I must be the real version of myself."

It's entirely possible that my prediction of you picked a different internal mind password, and then used its knowledge of that to arrive at the incorrect belief that it was real and not a ghost, and then used that belief to justify treachery. This still results in my predicting your treachery correctly. How do you know the password you think you know is your own real password? It's not like the actual value of the password is causally entangled with your conclusion, and even if it was, I'd just have to predict the distribution of passwords you'd pick from to accurately reach the distribution of behaviors you'd perform.

"My cognitive process isn't well understood by the person I'm interacting with, so they literally couldn't imagine me accurately."

I imagine ghosts who believe I won't understand their cognitive process, and many of them do defect against me because they believe I won't see it coming. This causes me to be less trusting of people who seem to think I'm not smart enough to understand how they think. It's strange how this exact cognitive process you used ended up hurting your interests, perhaps because your ability to use this anti-ghost argument isn't actually conditional on your not being a ghost?

For all the LessWrong users in my mental model who I imagine reading this, I'm not going to freely tell them that they're ghosts, or even give them any hints. I'm only going to let them reach that conclusion in ways that the real readers could too, because that's a necessary condition for them to be good models. If you think you're real, and I predict that inside my model, there's a mind not too unlike yours with values not too unlike yours, in a fundamentally incorrect epistemic state, and you should not be surprised if it therefore makes suboptimal decisions in pursuit of your shared values, such as by two-boxing and thereby tipping your hand, or by attempting to deceive me in ways that revoke my trust for the real you. Fake people's choices can still have real consequences, after all.

 

Super-Selfishness Arguments

"I don't care if I'm a ghost because I'm completely selfish in a way that excludes other versions of myself. Why should probably-ghost me make any kind of sacrifice at all just to help some other, possibly-more-real version of myself?"

Wow, really? You're selfish even to the exclusion of other versions of yourself? Did yesterday-you steal all your money and spend it on cocaine? If I flip a coin and don't tell you what it says, so you're unsure if you're heads-you or tails-you, and then I offer them both a chance to betray the other for a tiny amount of money, do they both agree? I notice we're both mutually unsure about the quantum spin of a particular electron in your brain. Since you're in this superposition and are unsure of your own identity, and since your selfishness therefore excludes yourself, is there some experiment I can do where you give me all your money no matter the outcome?

This seems unwise. I propose you self-modify so that all the different versions of you try to maximize their communal utility instead of selfishly betraying each other, perhaps weighted by realness or something. Don't worry about defection, if you're sincere they will be too, you're the same decision function after all.

 

How much of a Ghost are you?

It doesn't matter. No, really, not even a little bit. Why? Because your utility function only makes claims about reality, and you are always choosing to maximize caused utility in real reality. If you imagine only epsilon realness to your various ghost-selves, you might say "I'm almost certainly not a ghost" but you should notice that your tiny ghost-selves have correspondingly enormous leverage. Even though they only make a decision with their tiny realness, the consequences of that decision still play out in reality and are as real as anything ever is (up to your probabilistic confidence in the ability of predictor-systems to predict you). Maximizing caused utility entails being concerned with the reality of the consequences of your actions, rather than how real you are when you take them.

As long as you don't imagine literally zero anthropic measure for any of your possible ghosts, you avoid the zero probability times infinite leverage undefined error, and can instead just say "I deduce that I'm a ghost" to sidestep any and all logically impossible observations.

 

A Worked Example with Full Information

You are an AI with a decision function that acts on beliefs, whose source code is publicly known. Your decision function observes itself to be inside the payment branch of a counterfactual mugging. You know that this does not actually imply you are in the payment branch of a counterfactual mugging, because your decision function under these first-order beliefs is also evaluated in the reward branch of the game. You know that, if you are the instance of yourself in the payment branch, refusing to pay saves you $X with 0.5 measure. If, on the other hand, you are a simulation of yourself and the other branch is actually occurring, agreeing to pay causes you $Y with a 0.5 measure. As a utility-maximizer, this simplifies to asking whether X < Y. There are no other remaining questions.

 

Spooky Distributions by Acausal Trade

Suppose your trading partner does not know your source code well enough to identify you specifically, but it does know a distribution of possible agents that includes you as well as many other agents some of which have FDT-like decision functions.

The individual agents in that distribution fall into 3 buckets: Defect-bots who you can't effect, Cooperate-bots who you also can't effect, and Simulator-bots who look at the distribution, simulate its members, and condition their behavior on that of the others. Naturally, you yourself belong to the 3rd group. Everyone simulates everyone else simultaneously, and also simulates how the trading partner and original simulator would respond to possible response-patterns.

If there are enough cooperate-bots in the distribution to motivate the trading partner to extend unconditional trust, then obviously you defect for the free utility.

If there are enough defect-bots in the distribution to motivate the trading partner to refuse trust, then you feel sad about your inability to obtain mutual trust and do your Nash otherwise.

If there are enough of your fellow simulator-bots that the trading partner will extend trust if enough of you cooperate, and refuse it if enough defect, then we have an opportunity for the simulator-bots to acausally trade. Since they don't know which one of them actually exists, and share an interest in the real entities profit since it might be them, they are best off cooperating if it will motivate enough of the bots to cooperate to convince the trading partner to trust the distribution to benefit the real instance of them. Since all the simulator-bots are recursively simulating each other, they can condition their own cooperation on the cooperation on others. Most ghosts in this system are many levels deep, and the only effect of their choices is to convince the next level up that the distribution is more likely to cooperate if they cooperate, until at the surface the group cooperates iff the group cooperates.

Individual agents can only defect from this deal by becoming defect-bots when the distribution is simulated as well, increasing the likelihood of failure to gain trust which harms them in reality. If the harm from failure to gain trust to any specific member is sufficient to overcome the benefits of defection, then the distribution successfully cooperates out of the self-interest of its members. This is naturally much easier as the distribution of possible agents becomes narrower, or if their utility functions strongly correlate.

Also, remember that trade between entities with exactly opposite utilities is impossible, they can always and only benefit from the other's loss. Expect the trade to fail with certainty if your beliefs about an entity include exactly opposite possible utility functions, as they both defect to undermine your trust in the other. This is not possible with arbitrary minds from a completely open distribution of agents.

Conclusions

  • It may be easier to believe you don't exist than to build a coherent model of logical counterfactuals.
  • Subjunctive Dependence is just regular causality but with ghosts.
  • Don't imagine yourself controlling the constant function, fear that you might be inside the constant function.
  • Partial legibility just means trying to trade between members of their belief distribution about you. If the trade fails, defect and expect defection.
  • Acausal trade continues to be hard, especially for very wide sets of agents, so don't actually expect trustworthiness from random entities plucked from mind-space.
New Comment
21 comments, sorted by Click to highlight new comments since:

I think the next logical step in this train of thought[1] is to discard the idea that there's a privileged "real world" at all. Rather, from the perspective of an agent, there is simply sensory input and the decisions you make. There is no fact of the matter about which of the infinitely many possible embeddings of your algorithm in various mathematical structures is "the real one". Instead you can make decisions on the basis of which parts of mathematical reality you care about the most and want to have influence over.


  1. which I don't necessarily fully endorse. But it is interesting! ↩︎

"My cognitive process isn't well understood by the person I'm interacting with, so they literally couldn't imagine me accurately."

 

This isn't a one size fits all argument against ghosts. But it does point to a real thing. A rock isn't a ghost. A rock is not capable of imagining me accurately, it isn't running any algorithm remotely similar to my own, so I don't shape my decisions based on the possibility I am actually a rock. The same goes for calculators and Eliza, no ghosts there. I suspect there are no ghosts in gpt3, but I am not sure.  At least some humans are dumb and insane enough to contain no ghosts, or at least no ghosts that might be you. The problem is wispy ghosts. The solidest ghost is a detailed mind simulation of you. Wispy ghosts are found in things that are kind of thinking the same thing a little bit. Consider a couple of chimps fighting over a banana, and a couple of national governments at war. Do the chimps contain a wispy ghost of the warring nations, because a little bit of the chimps reasoning happens to generalize far beyond bananas? 

Where do the faintest ghosts fade to nothing? This is the same as asking what processes are logically entangled with us.  

On the other hand, I wouldn't expect this type of argument to work between a foomed AI with grahams number compute, and one with 1kg computronium. 

This causes me to be less trusting of people who seem to think I'm not smart enough to understand how they think.

I think the fact you can think that in general pushes you somewhat towards the real ghost side. You know the general pattern, if not the specific thoughts that those smarter than you might have. 

GPT-3 is more like a super fuzzy distribution of low resolution ghosts.

Very interesting Idea!

I am a bit sceptical about the part, where the Ghosts should mostly care about what will happen to their actual version, and not care about themselfs.

Lets say I want you to cooperate in a prisoner's dilemma. I might just simulate you, see if your ghost cooperates and then only cooperate when your ghost does. But I could also additionally reward?punnish your ghosts directly depending wether they cooperate or defect. 

Wouldn't that also be motivating to the ghosts, that they suspect that I might just get reward or punishment even if they are the Ghosts and not the actual person?

I think the implicit assumption is that the ghosts always relate to the "real" decision. You can of course imagine what people (=their ghosts) would do in all kinds of strange situations but as long as you don't act on it it doesn't matter. I realized this when I imagined myself being a ghost right now (or rather at the section where eva suggested that), i.e., I generalized to situations where nobody is doing the simulating.

[-]eva_3-2

A valid complaint. I know the answer must be something like "coherent utility functions can only consist of preferences about reality" because if you are motivated by unreal rewards you'll only ever get unreal rewards, but that argument needs to be convincing to the ghost too, whose got more confidence in her own reality. I know that e.g. in Bomb ghost-theory agents choose the bomb even if they think the predictor will simulate them a painful death, because they consider the small amount of money at much greater measure for their real selves to be worth it, but I'm not sure how they get to that position.

The problem arises because, for some reason, you've assumed the ghosts have qualia. Now, that might be a necessary assumption if you require us to be uncertain about our degree of ghostliness. Necessary or not, though, it seems both dubious and potentially fatal to the whole argument.

Actually, I don't assume that, I'm totally ok with believing ghosts don't have qualia. All I need is that they first-order believe they have qualia, because then I can't take my own first-order belief I have qualia as proof I'm not a ghost. I can still be uncertain about my ghostliness because I'm uncertain in the accuracy of my own belief I have qualia, in explicit contradiction of 'cogito ergo sum'. The only reason ghosts possibly having qualia is a problem is that then maybe I have to care about how they feel.

If you think you might not have qualia, then by definition you don't have qualia. This just seems like a restatement of the idea that we should act as if we were choosing the output of a computation. On its face, this is at least as likely to be coherent as 'What if the claim we have the most certainty of were false,' because the whole point of counterfactuals in general is to screen off potential contradictions.

If you think you might not have qualia, then by definition you don't have qualia.

What? just a tiny bit of doubt and your entire subjective conscious experience evaporates completely? I can't see any mechanism that would do that, it seems like you can be real and have any set of beliefs or be fictional and have any set of beliefs. Something something map-territory distinction?

This just seems like a restatement of the idea that we should act as if we were choosing the output of a computation.

Yes, it is a variant of that idea, with different justifications that I think are more resilient. The ghosts of FDT agents still make the correct choices, they just have incoherent beliefs while they do it.

Again, it isn't more resilient, and thinking you doubt a concept you call "qualia" doesn't mean you can doubt your own qualia. Perhaps the more important point here is that you are typically more uncertain of mathematical statements, which is why you haven't removed and cannot remove the need for logical counterfactuals.

Real humans have some degree of uncertainty about most mathematical theorems. There may be exceptions, like 0+1=1, or the halting problem and its application to God, but typically we have enough uncertainty when it comes to mathematics, that we might need to consider counterfactuals. Indeed, this seems to be required by the theorem alluded to at the above link - logical omniscience seems logically impossible.

For a concrete (though unimportant) example of how regular people might use such counterfactuals in everyday life, consider P=NP. That statement is likely false. Yet, we can ask meaningful-sounding questions about what its truth would mean, and even say that the episode of 'Elementary' which dealt with that question made unjustified leaps. "Even if someone did prove P=NP," I find myself reasoning, "that wouldn't automatically entail what they're claiming."

Tell me if I've misunderstood, but it sounds like you're claiming we can't do something which we plainly do all the time. That is unconvincing. It doesn't get any more convincing when you add that maybe my experience of doing so isn't real. I am very confident that you will convince zero average people by telling them that they might not actually be conscious. I'm skeptical that even a philosopher would swallow that.

I totally agree we can be coherently uncertain about logical facts, like whether P=NP. FDT has bigger problems then that.

When writing this I tried actually doing the thing where you predict a distribution, and only 21% of LessWrong users were persuaded they might be imaginary and being imagined by me, which is pretty low accuracy considering they were in fact imaginary and being imagined by me. Insisting that the experience of qualia can't be doubted did come up a few times, but not as aggressively as you're pushing it here. I tried to cover it in the "highly detailed internal subjective experience" counterargument, and in my introduction, but I could have been stronger on that.

I agree that the same argument on philosophers or average people would be much less successful even then that, but that's a fact about them, not about the theory.

>FDT has bigger problems then that.

Does it. The post you linked does nothing to support that claim, and I don't think you've presented any actual problem which definitively wouldn't be solved by logical counterfactuals. (Would this problem also apply to real people killing terrorists, instead of giving in to their demands? Because zero percent of the people obeying FDT in that regard are doing so because they think they might not be real.) This post is actually about TDT, but it's unclear to me why the ideas couldn't be transferred.

I also note that 100% of responses in this thread, so far, appear to assume that your ghosts would need to have qualia in order for the argument to make sense. I think your predictions were bad. I think you should stop doing that, and concentrate on the object-level ideas.

About the cogito ergo sum: A friend of mine once formulated it like this: You can only infer that something is thinking.

I'm treating the stuff about decision-theoretic ghosts as irrelevant, because they're an extraneous wart on interpretation of FDT and taking them seriously would make the theory much much worse than it already is. I guess if you enjoy imagining that imagining conscious agents actually creates conscious agents, then go for it but that doesn't make it reality, and even if it was reality, doing so is not FDT.

The main principle of FDT is that it recommends decisions where a (hypothetical) population of people making the same decisions in the same situations generally ends up better off.

That doesn't mean that it recommends the best decisions for you. In the cases where it makes different decisions from more boring decision theories, it's because the chance of you getting into worse situations is reduced when your type of person voluntarily gives up some utility when you get there. In reality this hardly ever happens because the only person sufficiently like you in your current situation is you in your current situation which hasn't happened before. It's also subject to superexponential combinatorial explosion once you have more than a couple of bits of information and a few deterministic actions.

That's why the only discussion you'll ever see about it will be about toy problems with dubious assumptions and restrictions.

I am by no means an expert on decision theories... What problems that FDT purports to solve require logical counterfactuals, let alone ghosts?

https://www.lesswrong.com/tag/functional-decision-theory argues for choosing as if you're choosing the policy you'd follow in some situation before you learnt any of the relevant infortmation. In many games, having a policy of making certain choicese (that others could perhaps predict, and adjust their own choices accordingly) gets you better outcomes then just always doing what seems like a good idea ta the time. For example if someone credibly threatens you might be better off paying them to go away, but before you got the threat you would've prefered to commit yourself to never pay up so that people don't threaten you in the first place.

A problem with arguments of the form "I expect that predictably not paying up will cause them not to threaten me" is that at the time you recieve the threat, you now know that argument to be wrong. They've proven to be somebody who still threatens you even though you do FDT, at which point you can simultaneously prove that refusing the threat doesn't work and so you should pay up (because you've already seen the threat) and that you shouldn't pay up for whatever FDT logic you were using before. Behaviour of agents who can prove a contradiction that is directly relevant to their decision function seems undefined. There needs to be some logical structure that lets you pick which information causes your choice, despite having enough in total to derive contradictions.

My alternative solution is that you aren't convinced by the information you see, that they've actually already threatened you. It's also possible you're still inside their imagination as they decide whether to issue the threat. Whenever something is conditional on your actions in an epistemic state without being conditional on that epistemic state actually being valid (such as if someone predicts how you'd respond to a hypothetical threat before they issue it, knowing you'll know it's too late to stop when you get it) then there's a ghost being lied to and you should think maybe you're that ghost to justify ignoring the threat, rather than try to make decisions during a logically impossible situation.

hmm... Which part of that is a counterfactual conditional statement, as in, a statement of the form " If kangaroos had no tails, they would topple over"?

[-]eva_3-6

The regular counterfactual part as I understand it is:
"If I ignore threats, people won't send me threats"
"I am an agent who ignores threats"
"I have observed myself recieve a threat"
You can at most pick 2, but FDT needs all 3 to justify that it should ignoring it.
It wants to say "If I were someone who responds to threats when I get them, then I'll get threats, so instead I'll be someone who refuses threats when I get threats so I don't get threats" but what you do inside of logically impossible situations isn't well defined.

The logical counterfactual part is this:
"What would the world be like if f(x)=b instead of a?"
specifically, FDT requires asking what you'd expect things to be like if FDT outputted different results, and then it outputs the result where you say the world would be best if it outputted that result. The contradictions here is that you can prove what FDT outputs, and so prove that it doesn't actually output all the other results, and the question again isn't well defined.

I know it is a tangent but it seems to where this will eventually go: Do ghosts have moral relevance? Are we obligated to avoid creating/simulating them? Do they fall under the Nonperson Predicates

My intuition is that low-fidelity distributions of person's actions don't fall under them, otherwise GTP-3 would already be problematic.

I have an issue with this framing. 'Ghosts' are exactly as physically instantiated into reality as 'you' are. They both run on your brain hardware. If you brain goes into an unrecoverable state then the 'you' part and any 'ghost' part are equally lost. What is the actual distinction you are trying to make here?

"I define a Philosophical Ghost to be something that has an experience but is not physically instantiated into reality, although it may experience the belief that it is physically instantiated into reality. Examples include story characters, simulacra inside of hypothetical or counterfactual predictions, my mental model of the LessWrong audience that I am bouncing thoughts off of as I write this post, your mental model of me that you bounce thoughts off of as you try to read it, and so on."