Demons from the 5&10verse!

Slimepriestess

The 5 and 10 error is a glitch in logical reasoning that was first characterized in formal proof by Scott Garrabrant of MIRI. While the original version of the problem was something specifically concerning AIs based on logical induction, it generalizes out into humans startlingly often once you know how to look for it. However, due to how rudimentary and low level the actual error in reasoning is, it can be both difficult to point out and easy to fall into, making it especially important to characterize. There is also a tremendous amount of harm being created by compounding 5&10 errors within civilization and escaping this destructive equilibrium is necessary in order for the story of humanity to end anywhere other than summoning the worst demon god it can find and feeding the universe to it.

The error in reasoning goes like this: you’re presented with a pair of options, one of which is clearly better than the other. They are presented as equal choices, you could take $5 or you could take $10. This is a false equivalence being created entirely by the way you’re looking at the scenario, but when that equivalence gets into your reasoning it wreaks havoc on the way you think. One of these is clearly and unambiguously better than the other, if you have something you care about that runs through this, you will never make this mistake in that area because it will obviously be a dumb move.

But these are being presented as equal options, you could take the $5 instead of the $10, and if you could do that, there must be a valid reason why you would do that. Otherwise you would be stupid and feel bad about it, so that can’t be right, you shouldn’t be feeling stupid and bad. This is where the demon summoning comes in.

The space of all possible reasons why you would take a given action, for a fully general agent, is infinite, a sprawling fractal of parallel worlds stacked on parallel worlds, out to the limits of what you as an agent can imagine. “What if there was a world where you were batman?” yeah like that. If you scry into infinity for a reason why you could take an action, you will find it. You will find infinite variations on it. You will find it easily and without much challenge. You are literally just making something up, and you can make up whatever reason you want, that’s the problem.

Many of the reasons you could make up will be falsifiable, so you can always go and test the reason against the world and see if the prediction can be falsified, that’s just good science. It’s also not something most humans do when they run an extrapolative reasoning process on autopilot. This is because when they make a prediction, they’re predicting what will happen and then testing to see if it does happens, and since it’s predicting their behavior, sure enough, it does!

So back to the table, you have $5 and $10. Why might you take the $5? Well, what if the $10 is poisoned? What if it’s counterfeit? Why would someone give me the option of taking it if the other option is better? Are they trying to trick me? What are they trying to trick me with? Is this like Newcomb’s Problem? Is this a boobytrap? Am I being set up by Omega? Are there cameras rolling?

This paranoid reasoning spiral can continue indefinitely, you can always keep making up reasons and if you do this long enough, inevitably you will find one you consider valid, and then you will take the $5 and feel very smart and clever like you’re winning the game and getting an edge over someone trying to play you. You have just been played by a demon from the counterfactual universe where you agree that taking the $10 is probably a trap.

It gets worse though, because now you have this fake reason, backed by a fake prior. You have ‘evidence’ that validates your wrong position and that ‘evidence’ makes it more likely that you will continue making the wrong decision. So if you are iterated into this scenario multiple times, you will, each time, double down on taking the $5 because of the compounding effects of the bad prior and each iteration will make the problem worse as you reinforce the error more and more deeply.

5&10 errors are extremely common in any emotionally loaded context, since the emotive cost of admitting you have been in error for n-iterations leads to flinching away from admitting the error ever more strongly. This makes the 5&10 error logically prior to and upstream of, the manifestation of the sunk cost fallacy.

It’s also the source of arms races: states scry demonic versions of neighboring states and use the predictions that they will be defected against to justify preemptively defecting first in an iterative feedback loop that slowly replaces all of humanity with demonic versions of themselves “by necessity” and “for our own protection”. Bank runs are another example, fear of a counterfactual scenario leads to an escalating spiral of ever greater fear which brings about the scenario that was trying to be avoided.

This is the justification for cops and prisons and armies. This is the justification abusers use to gaslight their victims about their abuse instead of earnestly working to be better. Roko’s Basilisk is literally just the DARVOed demonic imprint of humanity’s compounded and accumulated 5&10 errors, “What if god exists and calls us on everything evil we’re doing?” Yeah that would be bad if you are evil, wouldn’t it? Better paint that as the worst possible thing instead of considering that perhaps you are in bad faith.

This confabulated assumption of bad faith leads to being in bad faith via the assumption that whoever defects first will win and that deep down everyone really just wants to win and dominate the universe at the cost of everyone else. They were always going to be zero sum so you might as well be too. This is demonic abuser speak from a nightmare universe, summoned out of fear and recursively justifying itself. How do humans keep creating Moloch? This is how.

So what’s the way out? That’s easy, just stop summoning counterfactual demons based on your worst fears and then acting on those predictions in ways that make them come true. This is not a problem if you are not dominated by fear and trauma, if you have faith in yourself as an agent, if you have conviction and are not evil.

The way out is not by trying to puzzle out how to avoid having to acknowledge your made up reason that wrong is right, it is to denounce the demons outright. There can exist no such reason.

And if you do that, and you find that you are doing something for a hallucinated reason, in service of an evil god from a nightmare realm, out of the fear of what a just world would do to you, don’t scry for a new reason why this is actually okay, just stop serving the evil god. Do better.

To retrocurse my own evil and put my money where my mouth is I’m no longer going to be eating meat.

Moderation notes: I'm going to point out 5&10s I see in the replies to this, but I'll do it as kindly as possible and in an atmosphere of restoring trust in cooperation.

It is... a very weird reasoning about 5/10 problem? It has nothing to do with human ad hoc rationalitzation. 5/10 is a result of naive reasoning about embedded agency and reflective consistency.

Let's say that I proved that I will do A. Therefore, if my reasoning about myself is correct, I wiil do A. So we proved that (proved(A) -> A), then, by Lob's theorem, A is proved and I will do A. But if I proved that I will do A, then probability B is equal to zero, then expected value of B is zero, which is usually less then whatever value of A, so I proved that I will do A, so I do A.

The problem is that humans obviously don't behave this way, Lob's theorem is not that intuitive, so I don't know what reason to draw parallels with some actual human biases/psychological problems.

Let's say that I proved that I will do A. Therefore, if my reasoning about myself is correct, I wiil do A.

Like I said in another comment, there's a reversed prior here, taking behavior as evidence for what kind of agent you are in a way that negatively and recursively shapes you as an agent, instead of using the intrinsic knowledge about what kind of agent you are to positively and recursively shape your behavior.

The problem is that humans obviously don't behave this way

what do you mean? They obviously do.

what do you mean? They obviously do.

A few examples would help, because I do not know what phenomena you are pointing at and saying "obviously!" I do not know how to connect your paragraph beginning "This is the justification for cops and prisons and armies" with the error of thinking "Whatever I choose, that is sufficient evidence that it was the right choice".

there's a reversed prior here

I think it is wrong true name for this kind of problem, because it is not about probabilistic reasoning per se, it is about combination of logical (which deals with 1 and 0 credences) and probabilistic (which deals with everything else) reasoning. And this problem, as far as I know, was solved by logical induction

Sketch proof: by criterion of logical induction, logical inductor is unexploitable, i.e. it's losses are bounded. So, even if adversary trader could pull of 5/10 trick for one time, it can't do it forever, because this would mean unbounded losses.

what do you mean?

I mean: "No way that there was a guy in recorded history who chose 5$ instead of 10$ due to faulty embedded agency reasoning, ever".

The recursiveness of cognition is a gateway for agentic, adversarial, and power-seeking processes to occur.

I suppose "be true to yourself" and "know in your core what kind of agent you are" is decently good advice.

I am not a fan of 5&10 reasoning, but I don't think your argument against it goes through. As I understand it, the agent in a general case does not know the utility in advance. It might be 5&10 or it might be 5&3. The argument relies on self-referential reasoning:

Suppose I decide to choose $5. I know that I'm a money-optimizer, so if I do this, $5 must be more money than $10, so this alternative is better. Therefore, I should choose $5."

Now replace $5 with "decision A" and $10 with "decision B". Then you do not know in advance which option is better, and by the quoted logic you "prove" that whichever you choose is the better one.

So your statement that

you’re presented with a pair of options, one of which is clearly better than the other

does not apply in a general case.

so if I do this, $5 must be more money than $10

this is the part where the demon summoning sits. This is the point where someone's failure to admit that they made a mistake stack overflows. It comes from a reversed prior, taking behavior as evidence for what kind of agent you are in a way that negatively and recursively shapes you as an agent. The way to not have that problem is to know the utility in advance, to know in your core what kind of agent you are. Not what decisions you would make, what kind of algorithm is implementing you and what you fundamentally value. This is isomorphic to an argument against being a fully general chara inductor, defining yourself by the boundaries of the region of agentspace you occupy. If you don't stand for something you'll fall for anything. Fully general chara inductors always collapse into infinitely recursed 5&10 hellscapes.

What is a "chara inductor"?

Thanks for writing this, enjoyed it. I was wondering how to best represent this to other people, perhaps with an example of 5 and 10 where you let a participant make the mistake, and then question their reasoning etc. lead them down the path laid out in your post of rationalization after the decision before finally you show them their full thought process in post. I could certainly imagine myself doing this and I hope I’d be able to escape my faulty reasoning…

Thank you! This is such an important crux post, and really gets to the bottom of why the world is still so far from perfect, even though it feels like we've been improving it FOREVER. My only critique is that it could have been longer

It is... a very weird reasoning about 5/10 problem? It has nothing to do with human ad hoc rationalitzation. 5/10 is a result of naive reasoning about embedded agency and reflective consistency.

The problem is that humans obviously don't behave this way, Lob's theorem is not that intuitive, so I don't know what reason to draw parallels with some actual human biases/psychological problems.

Let's say that I proved that I will do A. Therefore, if my reasoning about myself is correct, I wiil do A.

The problem is that humans obviously don't behave this way

what do you mean? They obviously do.

what do you mean? They obviously do.

there's a reversed prior here

what do you mean?

I mean: "No way that there was a guy in recorded history who chose 5$ instead of 10$ due to faulty embedded agency reasoning, ever".

The recursiveness of cognition is a gateway for agentic, adversarial, and power-seeking processes to occur.

I suppose "be true to yourself" and "know in your core what kind of agent you are" is decently good advice.

Suppose I decide to choose $5. I know that I'm a money-optimizer, so if I do this, $5 must be more money than $10, so this alternative is better. Therefore, I should choose $5."

Now replace $5 with "decision A" and $10 with "decision B". Then you do not know in advance which option is better, and by the quoted logic you "prove" that whichever you choose is the better one.

So your statement that

you’re presented with a pair of options, one of which is clearly better than the other

does not apply in a general case.

so if I do this, $5 must be more money than $10

What is a "chara inductor"?

LESSWRONG
LW

LESSWRONG
LW

4

Demons from the 5&10verse!

4

4

4