# Re-understanding Robin Hanson’s “Pre-Rationality”

I’ve read Robin’s paper “Uncommon Priors Require Origin Disputes” several times over the years, and I’ve always struggled to understand it. Each time I would think that I did, but then I would forget my understanding, and some months or years later, find myself being puzzled by it all over again. So this time I’m going to write down my newly re-acquired understanding, which will let others check that it is correct, and maybe help people (including my future selves) who are interested in Robin's idea but find the paper hard to understand.

Here’s the paper’s abstract, in case you aren’t already familiar with it.

In standard belief models, priors are always common knowledge. This prevents such models from representing agents’ probabilistic beliefs about the origins of their priors. By embedding standard models in a larger standard model, however,

pre-priorscan describe such beliefs. When an agent’ s prior and pre-prior are mutually consistent, he must believe that his prior would only have been different in situations where relevant event chances were different, but that variations in other agents’ priors are otherwise completely unrelated to which events are how likely. Due to this, Bayesians who agree enough about the origins of their priors must have the same priors.

I think my main difficulty with understanding the paper is the lack of a worked out example. So I’ll take a simplified version of an example given in the paper and try to work out how it should be treated under the proposed formalism. Quoting the paper:

For example, if there were such a thing as a gene for optimism versus pessimism, you might believe that you had an equal chance of inheriting your mother’s optimism gene or your father’s pessimism gene.

Instead of talking about optimism vs pessimism in general, I’ll use the example of an AI which has a prior on just the outcome of one coin toss (A) which will occur after it is created. The AI programmer will program it with one of two priors. The “optimistic” prior O says that the coin will land heads with probability .6, and the “pessimistic” prior P says that the coin will land heads with probability .4. For some reason, the programmer has decided to choose the prior based on an independent coin toss (B), which corresponds to the random Mendelian inheritance in the original example.

Suppose an “optimistic” AI wakes up and then reads Robin’s paper. How would it reason? First, it needs a pre-prior (denoted p~ [EDIT: actually denoted q, as Hal pointed out in a comment] in the paper, but I’ll use r here) that explains how it got its prior. So it asks the programmer how it got its prior, and the programmer tells it about coin toss B. (I’m using the AI as an explanatory prop here, not saying that an actual AI would reason this way.) One plausible pre-prior at this point might be:

- r(p=O) = r(B=heads) = 0.5
- r(A=heads) = 0.6
- r(p=O AND A=heads) = 0.3

But unfortunately, this pre-prior doesn’t satisfy Robin’s *pre-rationality *condition (equation 2 on page 4), which when applied to this example says that

- O(A=heads) = r(A=heads | p=O) and
- P(A=heads) = r(A=heads | p=P)

The first equality holds, but the second one doesn’t, because P(A=heads) = 0.4, and r(A=heads | p=P) = 0.6.

And actually there seems no way to construct any pre-prior that satisfies pre-rationality, unless the coin tosses A and B are correlated in the pre-prior (I think this is what Robin meant in the paper by “track truth”). But the programmer insists that the two coin tosses are not correlated, so that’s that. What should the AI do at this point? Robin’s paper doesn’t say. It assumes pre-rationality and derives consequences from it, but doesn’t talk about what we should do if we find ourselves lacking in pre-rationality, except that we might want to do something to correct this defect.

One obvious solution here is for the AI *programmer *to not have picked different priors for the AI based on an independent coin toss in the first place, and perhaps it could be argued that it was irrational, according to ordinary rationality, for the programmer to have done that. If it had been the case that O=P, then the AI can easily construct a pre-rational pre-prior. But our own priors depend partly on our genes, which were picked by evolution, so this solution doesn’t seem to apply to us. And if we create any Bayesian AIs, the priors of those AIs will also be inevitably influenced (indirectly via us) by the randomness inherent in evolution.

So what should we (or our AIs) do? I think I have some ideas about that, but first, is my understanding of pre-rationality correct?

## Comments (18)

BestYes, you seem to understand the idea, at least as far as in what you've written above. Remember that pretty much all of the ways we express what it is to be rational are via

constraints, e.g., probs sum to one, update over time via conditionalization, and so on. If you once satisfied the constraints but find you no longer do, well then the obvious plan would be to move your beliefs back to what they would be had you not made whatever errors led you to violate the constraints. In this case, if your pre-prior is consistent and reasonable, but doesn't satisfy the pre-rationality condition relative to your prior, the obvious plan is to update your prior to be whatever the pre-rationality condition says it should be (treating prior "P" as just a label).*2 points [-]That doesn't seem to work in the specific example I gave. If the "optimistic" AI updates its prior to be whatever the pre-rationality condition says it should be, it will just get back the same prior O, because according to its pre-prior (denoted r in my example), it's actual prior O is just fine, and the reason it's not pre-rational is that in the counterfactual case where the B coin landed tails, it would have gotten assigned the prior P.

Or am I misinterpreting your proposed solution? (ETA: Can you make your solution formal?)

Indeed, worked examples are your friend.

Slightly more plausibly, suppose that a set of AI Construction Kits are distributed by lottery, three base-ten digits per ticket. An ACK ends up in the hands of an OB/LW reader, Rational Riana, who constructs the AI to believe that the probability of any lottery ticket winning is 1/1000, and that this probability is independent of the retrospective event of Riana winning.

But Riana believes, and so the AI believes as well, and indeed it happens to be true, that

ifthe lottery had come out differently, the ACK would have ended up in the hands of Superstitious Sally, who believes that lottery tickets inherhand are much more likely than average to win; and Sally's AI would have believed that the chance of Sally'snextlottery ticket winning was 1/10. (Furthermore, Sally's AI might believe that Sally winning the previous lottery was additional evidence to this effect, but we can leave out that fillip for now.)It seems to me that it is quite rational for Riana's AI to believe that the subjunctive Sally's AI it

couldhave been - if, indeed, one's reference class is such as to treat this counterfactual entity as an alternative instance of "me" - is merely irrational.Does this mean that Riana's AI isn't pre-rational? Or that Riana's AI isn't pre-rational with respect to the lottery ticket? Can Riana's AI and Sally's AI agree on the causal circumstances that led to their existence, while still disagreeing on the probability that Sally's AI's lottery ticket will win?

I similarly suspect that if I had been born into the Dark Ages, then "I" would have made many far less rational probability assignments; but I think this "alternative" me would have been simply mistaken due to being raised in an even crazier environment, rather than coherently updating a coherent pre-prior based on different data. Am I not pre-rational with respect to my birth date?

Yes, someone who reasonably believes "If I'd have been programmed by a crazy person, I'd have crazy beliefs" is not pre-rational as I defined it. My main purpose is to support my claim that a set of non-crazy people with common belief that they are not crazy do not agree to disagree. People often respond with the claim that non-crazy people can reasonably have different priors - this paper was an attempt to cut off that option.

I presume this only applies when full disclosure and trust are present?

According to my understanding of Robin's definition, yes.

I don't think Robin defined what it would mean for someone to be pre-rational "with respect" to something. You're either pre-rational, or not.

I'm not totally sure what you're asking here. Do you mean can they, assuming they are pre-rational, or just can they in general? I think the answers are no and yes, respectively.

I think the point you're making is that just saying Riana's AI and Sally's AI are both lacking pre-rationality isn't very satisfactory, and that perhaps we need some way to conclude that Riana's AI is rational while Sally's AI is not.

That would be one possible approach to answering the "what to do" question that I asked at the end of my post. Another approach I was thinking about is to apply Nesov's "trading across possible worlds" idea to this. Riana's AI could infer that if it were to change its beliefs to be more like Sally's AI, then due the the symmetry in the situation, Sally's AI would (counterfactually) change its beliefs to be more like Riana's AI. This could in some (perhaps most?) circumstances make both of them better off according to their own priors.

This example is not directly analogous to the previous one, because the medieval you might agree that the current you is the more rational one, just like the current you might agree that a future you is more rational.

*3 points [-]Wei, I understand the paper probably less well than you do, but I wanted to comment that p~, which you call r, is not what Robin calls a pre-prior. He uses the term pre-prior for what he calls q. p~ is simply a prior over an expanded state space created by taking into consideration all possible prior assignments. Now equation 2, the rationality condition, says that q must equal p~ (at least for some calculations), so maybe it all comes out to the same thing.

Equation 1 defines p~ in terms of the conventional prior p. Suppressing the index i since we have only one agent in this example, it says that p~(E|p) = p(E). The only relevant event E is A=heads, and p represents the prior assignment. So we have the two definitions for p~.

p~(A=heads | p=O) = O(A=heads)

p~(A=heads | p=P) = P(A=heads)

The first equals 0.6 and the second equals 0.4.

Then the rationality condition, equation 2, says

q(E | p) = p~(E | p)

and from this, your equations follow, with r substituted for q:

q (A=heads | p=O) = p~(A=heads | p=O) = O(A=heads)

q (A=heads | p=P) = p~(A=heads | p=P) = P(A=heads)

As you conclude, there is no way to satisfy these equations with the assumptions you have made on q, namely that the A event and the p-assigning events are independent, since the values of q in the two equations will be equal, but the RHS's are 0.6 and 0.4 respectively.

I think you're right that the descriptive (as opposed to prescriptive) result in this case demonstrates that the programmer was irrational. Indeed it doesn't make sense to program his AI that way, not if he wants it to "track truth".

Yes, it looks like I got a bit confused about the notation. Thanks for the correction, and for showing how the mathematical formalism works in detail.

Is it equivalent to state that leaning about the origins of your priors "screens off" the priors themselves?

*2 points [-]If I understand correctly, priors are "beliefs we have whose causes we don't understand". Does it seem only to me, then, that if you have theories about the origins of your priors, then those cease to be your priors? Your real priors, instead, are now your theories about the origins of your "priors".

Hanson writes: "This paper thereby shows that agents who agree enough about the origins of their priors must have the same prior."

Isn't this really the same as saying: agents that have the same priors (= theories about the origins of their "priors") ought to reach the same conclusions given the same information - just as per Aumann's agreement theorem?

No, priors are not "beliefs whose causes we don't understand", and no this result doesn't reduce to common priors implies no agreeing to disagree.

You're saying P ⇒ Q, Robin is saying weaker-version-of-Q ⇒ P. I think.

Theorem 1 in the paper gives the essential conditional independence relation: given a prior assignment to yourself, any ordinary event E is independent of any other priors assigned to anyone else. Your prior screens events from all other prior assignments.

I'm very busy today, but I'll look this over carefully tomorrow morning and reply then.

I find it puzzling that my post has been upvoted to 10, but nobody has explicitly confirmed that my understanding is correct. Should I take the votes as implicit confirmations, or are the upvoters just indicating that they're interested in posts like this?

In my case, I'm just interested in these kinds of posts.

*0 points [-]I haven't read Robin's paper. But this setup, and the difficulties you point out, remind me of the "centered" credulity functions that Nick Bostrom discusses in his Sleeping Beauty (SB) paper. (ETA: Here's a link to that paper.)

I haven't thought much about this, so maybe the similarity is superficial. Or I might be misremembering Bostrom's paper in some crucial way. But, as I recall, he gives one credulity function P to SB-before-she-learns-it's-Monday, but he gives another credulity function P+ to SB-after-she-learns-it's-Monday.

The key is that, in general, P+(

X)does not equalP(X| MONDAY). That is, you don't get P+ by just using P and conditioning on MONDAY. This is the crucial ingredient that Bostrom uses to evade the usual paradoxes that arise when you naively apply Bayesianism to SB's situation.Hanson's equation

P(A=heads) = r(A=heads | p=P)

seems analogous to forcing

P+(

X) = P(X| MONDAY),so I wonder if there is some analogy between the difficulties that forcing either one engenders.

ETA: . . . and hence an analogy between their solutions.

*0 points [-]I don't know if you understand pre-rationality correctly because I can't parse Robin's paper either, but your setup looks like another of those unstoppable force, unmovable object mindfucks. If an agent's beliefs about the source of their prior don't satisfy the pre-rationality condition together with the prior, you've got an agent with inconsistent beliefs about the world, plain and simple. It can be Dutch booked, etc. Which way to resolve the inconsistency is entirely up to programmer ingenuity: you can privilege the prior above the pre-prior, or the other way round, or use some other neat trick. Of course this doesn't apply to humans at all.