Re-understanding Robin Hanson’s “Pre-Rationality”

Wei Dai

34 Re-understanding Robin Hanson’s “Pre-Rationality”

3rd Nov 2009

3 min read

34

I’ve read Robin’s paper “Uncommon Priors Require Origin Disputes” several times over the years, and I’ve always struggled to understand it. Each time I would think that I did, but then I would forget my understanding, and some months or years later, find myself being puzzled by it all over again. So this time I’m going to write down my newly re-acquired understanding, which will let others check that it is correct, and maybe help people (including my future selves) who are interested in Robin's idea but find the paper hard to understand.

Here’s the paper’s abstract, in case you aren’t already familiar with it.

In standard belief models, priors are always common knowledge. This prevents such models from representing agents’ probabilistic beliefs about the origins of their priors. By embedding standard models in a larger standard model, however, pre-priors can describe such beliefs. When an agent’ s prior and pre-prior are mutually consistent, he must believe that his prior would only have been different in situations where relevant event chances were different, but that variations in other agents’ priors are otherwise completely unrelated to which events are how likely. Due to this, Bayesians who agree enough about the origins of their priors must have the same priors.

I think my main difficulty with understanding the paper is the lack of a worked out example. So I’ll take a simplified version of an example given in the paper and try to work out how it should be treated under the proposed formalism. Quoting the paper:

For example, if there were such a thing as a gene for optimism versus pessimism, you might believe that you had an equal chance of inheriting your mother’s optimism gene or your father’s pessimism gene.

Instead of talking about optimism vs pessimism in general, I’ll use the example of an AI which has a prior on just the outcome of one coin toss (A) which will occur after it is created. The AI programmer will program it with one of two priors. The “optimistic” prior O says that the coin will land heads with probability .6, and the “pessimistic” prior P says that the coin will land heads with probability .4. For some reason, the programmer has decided to choose the prior based on an independent coin toss (B), which corresponds to the random Mendelian inheritance in the original example.

Suppose an “optimistic” AI wakes up and then reads Robin’s paper. How would it reason? First, it needs a pre-prior (denoted p~ [EDIT: actually denoted q, as Hal pointed out in a comment] in the paper, but I’ll use r here) that explains how it got its prior. So it asks the programmer how it got its prior, and the programmer tells it about coin toss B. (I’m using the AI as an explanatory prop here, not saying that an actual AI would reason this way.) One plausible pre-prior at this point might be:

r(p=O) = r(B=heads) = 0.5
r(A=heads) = 0.6
r(p=O AND A=heads) = 0.3

But unfortunately, this pre-prior doesn’t satisfy Robin’s pre-rationality condition (equation 2 on page 4), which when applied to this example says that

O(A=heads) = r(A=heads | p=O) and
P(A=heads) = r(A=heads | p=P)

The first equality holds, but the second one doesn’t, because P(A=heads) = 0.4, and r(A=heads | p=P) = 0.6.

And actually there seems no way to construct any pre-prior that satisfies pre-rationality, unless the coin tosses A and B are correlated in the pre-prior (I think this is what Robin meant in the paper by “track truth”). But the programmer insists that the two coin tosses are not correlated, so that’s that. What should the AI do at this point? Robin’s paper doesn’t say. It assumes pre-rationality and derives consequences from it, but doesn’t talk about what we should do if we find ourselves lacking in pre-rationality, except that we might want to do something to correct this defect.

One obvious solution here is for the AI programmer to not have picked different priors for the AI based on an independent coin toss in the first place, and perhaps it could be argued that it was irrational, according to ordinary rationality, for the programmer to have done that. If it had been the case that O=P, then the AI can easily construct a pre-rational pre-prior. But our own priors depend partly on our genes, which were picked by evolution, so this solution doesn’t seem to apply to us. And if we create any Bayesian AIs, the priors of those AIs will also be inevitably influenced (indirectly via us) by the randomness inherent in evolution.

So what should we (or our AIs) do? I think I have some ideas about that, but first, is my understanding of pre-rationality correct?

Hansonian Pre-Rationality

Personal Blog

34

New Comment

Rendering 0/19 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 8:38 PM

Moderation Log

34 Re-understanding Robin Hanson’s “Pre-Rationality”

by Wei Dai

3rd Nov 2009

3 min read

34

Here’s the paper’s abstract, in case you aren’t already familiar with it.

In standard belief models, priors are always common knowledge. This prevents such models from representing agents’ probabilistic beliefs about the origins of their priors. By embedding standard models in a larger standard model, however, pre-priors can describe such beliefs. When an agent’ s prior and pre-prior are mutually consistent, he must believe that his prior would only have been different in situations where relevant event chances were different, but that variations in other agents’ priors are otherwise completely unrelated to which events are how likely. Due to this, Bayesians who agree enough about the origins of their priors must have the same priors.

For example, if there were such a thing as a gene for optimism versus pessimism, you might believe that you had an equal chance of inheriting your mother’s optimism gene or your father’s pessimism gene.

r(p=O) = r(B=heads) = 0.5
r(A=heads) = 0.6
r(p=O AND A=heads) = 0.3

But unfortunately, this pre-prior doesn’t satisfy Robin’s pre-rationality condition (equation 2 on page 4), which when applied to this example says that

O(A=heads) = r(A=heads | p=O) and
P(A=heads) = r(A=heads | p=P)

The first equality holds, but the second one doesn’t, because P(A=heads) = 0.4, and r(A=heads | p=P) = 0.6.

So what should we (or our AIs) do? I think I have some ideas about that, but first, is my understanding of pre-rationality correct?

Hansonian Pre-Rationality

Personal Blog

34

Mentioned in

28Reflections on Pre-Rationality

15Confusions Concerning Pre-Rationality

New Comment

Rendering 0/19 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 8:38 PM

Moderation Log

More from Wei Dai

Curated and popular this week

19Comments

Comment Permalink

Eliezer Yudkowsky17y50

Indeed, worked examples are your friend.

Slightly more plausibly, suppose that a set of AI Construction Kits are distributed by lottery, three base-ten digits per ticket. An ACK ends up in the hands of an OB/LW reader, Rational Riana, who constructs the AI to believe that the probability of any lottery ticket winning is 1/1000, and that this probability is independent of the retrospective event of Riana winning.

But Riana believes, and so the AI believes as well, and indeed it happens to be true, that if the lottery had come out differently, the ACK would have ended up in the hands of Superstitious Sally, who believes that lottery tickets in her hand are much more likely than average to win; and Sally's AI would have believed that the chance of Sally's next lottery ticket winning was 1/10. (Furthermore, Sally's AI might believe that Sally winning the previous lottery was additional evidence to this effect, but we can leave out that fillip for now.)

It seems to me that it is quite rational for Riana's AI to believe that the subjunctive Sally's AI it could have been - if, indeed, one's reference class is such as to treat this counterfactual entity as an alternative instance of "me" - is merely irrational.

Does this mean that Riana's AI isn't pre-rational? Or that Riana's AI isn't pre-rational with respect to the lottery ticket? Can Riana's AI and Sally's AI agree on the causal circumstances that led to their existence, while still disagreeing on the probability that Sally's AI's lottery ticket will win?

I similarly suspect that if I had been born into the Dark Ages, then "I" would have made many far less rational probability assignments; but I think this "alternative" me would have been simply mistaken due to being raised in an even crazier environment, rather than coherently updating a coherent pre-prior based on different data. Am I not pre-rational with respect to my birth date?

RobinHanson17y30

Yes, someone who reasonably believes "If I'd have been programmed by a crazy person, I'd have crazy beliefs" is not pre-rational as I defined it. My main purpose is to support my claim that a set of non-crazy people with common belief that they are not crazy do not agree to disagree. People often respond with the claim that non-crazy people can reasonably have different priors - this paper was an attempt to cut off that option.

1Wei Dai17y

According to my understanding of Robin's definition, yes. [...] I don't think Robin defined what it would mean for someone to be pre-rational "with respect" to something. You're either pre-rational, or not. [...] I'm not totally sure what you're asking here. Do you mean can they, assuming they are pre-rational, or just can they in general? I think the answers are no and yes, respectively. I think the point you're making is that just saying Riana's AI and Sally's AI are both lacking pre-rationality isn't very satisfactory, and that perhaps we need some way to conclude that Riana's AI is rational while Sally's AI is not. That would be one possible approach to answering the "what to do" question that I asked at the end of my post. Another approach I was thinking about is to apply Nesov's "trading across possible worlds" idea to this. Riana's AI could infer that if it were to change its beliefs to be more like Sally's AI, then due the the symmetry in the situation, Sally's AI would (counterfactually) change its beliefs to be more like Riana's AI. This could in some (perhaps most?) circumstances make both of them better off according to their own priors. [...] This example is not directly analogous to the previous one, because the medieval you might agree that the current you is the more rational one, just like the current you might agree that a future you is more rational.

See in context