# Epistemic vs. Instrumental Rationality: Approximations

What is the probability that my apartment will be struck by a meteorite tomorrow? Based on the information I have, I might say something like 10^{-18}. Now suppose I wanted to approximate that probability with a different number. Which is a better approximation: 0 or 1/2?

The answer depends on what we mean by "better," and this is a situation where epistemic (truthseeking) and instrumental (useful) rationality will disagree.

As an epistemic rationalist, I would say that 1/2 is a better approximation than 0, because the Kullback-Leibler Divergence is (about) 1 bit for the former, and infinity for the latter. This means that my expected Bayes Score drops by one bit if I use 1/2 instead of 10^{-18}, but it drops to minus infinity if I use 0, and any probability conditional on a meteorite striking my apartment would be undefined; if a meteorite did indeed strike, I would instantly fall to the lowest layer of Bayesian hell. This is too horrible a fate to imagine, so I would have to go with a probability of 1/2.

As an instrumental rationalist, I would say that 0 is a better approximation than 1/2. Even if a meteorite does strike my apartment, I will suffer only a finite amount of harm. If I'm still alive, I won't lose all of my powers as a predictor, even if I assigned a probability of 0; I will simply rationalize some other explanation for the destruction of my apartment. Assigning a probability of 1/2 would force me to actually plan for the meteorite strike, perhaps by moving all of my stuff out of the apartment. This is a totally unreasonable price to pay, so I would have to go with a probability of 0.

I hope this can be a simple and uncontroversial example of the difference between epistemic and instrumental rationality. While the normative theory of probabilities is the same for any rationalist, the sorts of approximations a bounded rationalist would prefer can differ very much.

## Comments (25)

BestWhile KL divergence is a very

naturalmeasure of the "goodness of approximation" of a probability distribution, which happens not to talk about the utility function, there is still a strong sense in which only an instrumental rationalist can speak of a "better approximation", because only an instrumental rationalist can say the word "better".KL divergence is an attempt to use a default sort of metric of goodness of approximation, without talking about the utility function, or while knowing as little as possible about the utility function; but in fact, in the absence of a utility function, you actually just can't say the word "better", period.

To the extent that this is true, perhaps the very notion of an epistemic rationalist (perhaps also of epistemic rationality) is incoherent. ("Epistemic rationality means acting so as to maximize one's accuracy." "Ah, but hidden in that word

accuracyis some sort of evaluation, which you aren't allowed to have.") But it sure seems like a useful notion.I propose that there

isat least one useful notion of epistemic rationality; in fact, there's one for each viable notion of what counts as better accuracy; since real people have utility functions, calling a real person an epistemic rationalist is really shorthand for "has a utility function that highly values accuracy-in-some-particular-sense"; that one can usefully talk about epistemic rationality in general, meaning something like "things that are true about anyone who's an epistemic rationalist in any of that term's many specific senses"; and that it's at least a defensible claim that something enough like K-L divergence to make Peter's argument go through is likely to be part of any viable notion of accuracy.If epistemic rationalists can't speak of a "better approximation," then how can an epistemic rationalist exist in a universe with finite computational resources?

Pureepistemic rationalists with no utility function? Well, they can't, really. That's part of the problem with the Oracle AI scenario.They can speak of a “closer approximation” instead. (But that still needs a metric.)

This is basically right, but I guess I think of it in slightly different terms. The KL divergence embodies a particular,

implicitutility function, which just happens to be wrong lots of the time. So it can make sense to speak of "better_KL", it's just not something that's necessarily very useful.Note also that alternative divergence measures, embodying different implicit utility functions, could give different answers. For example, Jensen-Shannon divergence would agree with instrumental rationality here, no? (Though you could obviously construct examples where it too would diverge from our actual utility functions.)

*5 points [-]How is this train of thought "instrumental"? You aren't making any choices or decisions outside of your own brain.

To make it a real instrumental example, consider whether or not you should go buy a meteorite shield. Lets say the shield costs S and if the meteorite hits you it costs M, and the true probability of the strike is p. So buying the shield is best if pM > S.

Now if you go with 0, you'll never buy the shield, so if pM > S you have an expected loss of (pM - S) due to your approxamation.

If you go with 1/2 then you'll buy the shield if M/2 > S. If M/2 > S and pM <= S then you bought the shield when you shouldn't have, and you lose and expected (S - pM).

So you see, it all depends on how big M is compared to S

M < S/p : 0 is the better instrumental approximation

M > S/p : 1/2 is better

In other words, if the risks (or payoffs) are small compared to the probabilities involved and the costs of shields, round to 0. Otherwise round to 1/2.

This OB post covered similar ground. What I took away from that post was that log odds are the natural units for converting evidence into beliefs, and probabilities are the natural units for converting beliefs into actions.

I don't like it very much. Instrumental rationality is the art and science of pursuing goals, and epistemic rationality is the special case of that where the goal is truth-seeking.

Part of your post assumes a contradiction. If forced to choose between 1/2 and zero then zero no longer means can't possibly happen and 1/2 no longer means will happen 50% of the time.

The only way your analysis works is if you are forced to choose between zero and 1/2 knowing that in the future you will forget that your choices were limited to zero and 1/2.

*1 point [-]When I'm choosing between approximations, I haven't actually started using the approximation yet. I'm predicting,

based on the full knowledge I have now, the cost of replacing that full knowledge with an approximation.So to calculate the expected utility of changing my beliefs (to the approximation), I use the approximation to calculate my hypothetical actions, but I use my current beliefs as probabilities for the expected utility calculation.

*1 point [-]So you are assuming that in the future you will be forced to act on the belief that the probability can't be something other than 0 or 1/2 even though in the future you will know that the probability will almost certainly be something other than 0 or 1/2.

But isn't this the same as assuming that in the future you will forget that your choices had been limited to zero and 1/2?

*-1 points [-]Hrm, I think you might be ignoring the cost of actually doing the calculations, unless I'm missing something. The value of simplifying assumptions comes from how much easier it makes a situation to model. I guess the question would be, is the effort saved in modeling this thing with an approximation rather than exact figures worth the risks of modeling this thing with an approximation rather than exact figures? Especially if you have to do many models like this, or model a lot of other factors as well. Such as trying to sort out what are the best ways to spend your time overall, including possibly meteorite preparations.

It seems to me you use wrong wording. In contrary to the epistemic rationalist, the instrumental rationalist does not "gain" any "utility" from changing his beliefs. He is gaining utility from changing his action. Since he can either prepare or not prepare for a meteoritic catastrophe and not "half prepare", I think the numbers you should choose are 0 and 1 and not 0 and 0.5. I'm not entirely sure what different numbers it will yield, but I think it's worth mentioning.

Why does it sound more like 1 than .5? If I believed the probability of my home getting struck by a meteorite was as high as .5, I would definitely make preparations.

This is a bit tangential, but perhaps a bounded rationalist should represent his beliefs by a family of probability functions, rather than by an approximate probability function. When he needs to make a decision, he can compute upper and lower bounds on the expected utilities of each choice, and then either make the decision based on the beliefs he has, or decide to seek out or recall further information if the upper and lower expected utilities point to different choices, and the bounds are too far apart compared to the cost of getting more information.

I found one decision theory that uses families of probability functions like this (page 35 of http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.1906), although the motivation is different. I wonder if such decision systems have been considered for the purpose of handling bounded rationality.

*2 points [-]I admit that I've learned about the KL divergence just now and through the wiki-link, and that my math in general is not so profound. But as it's not about calculation but about the reasoning behind the calculation, I suppose I can have my word:

The wiki-entry mentions that

So P here is 10^-18 and Q is either 0 or 0.5.

What your epistemic rationalist has done seems like falling pray to the bias of anchoring and adjusting. The use of mathematical equations just makes the anchoring mistake look more fomal; it's not less wrong in any way. So while the instrumental rationalist might have a reason to choose the arbitary figure of 1/2 (it makes his decisions be more simple, for example) the epistemic rationalist

does not. If the epistemic rationalist is shown the two figures of 0 and 1/2 and is asked what approximation is "better" he would probably say 0. And that's for several reason: First of all, if he is an epistemic rationalist and thus trueseeking, he wouldn't use the KL equation at all. The KL takes something accurate (or true) P and makes it less accurate (or less true) KLD, and that's exactly against what he is seeking - having more accurate and true results. But you tell me he has to choose between either "0" or "1/2". Well, if he has to chooce between one of these numbers he will still not choose to use the KL equation. The wiki mentions that the Q in the equation typically stands for "... a theory, model, description, or approximation of P" while the number "1/2" in your example is none of these but an arbitary number - this equation, then, does not fit the situation. He will use a different mathematical method, let's say, subtraction, and see the absolute value of what difference is smaller, in which case it will be 0's. Also, since 1/2 and 0 are arbitary numbers, an epistemic rationalist would know better than use any of these numbers in any equation, since it will produce a result that is accurate just as if would use any other two arbitary numbers. He would know that he should do his own calculations - ignoring the numbers 0 and 1/2 - and then compare his result to the numbers he is "offered" (0 and 1/2) and choose the closest number to his own calculation. Since he knows that the "true" probability is 10^-18 he will choose the closest number to his result which seems to be 0.Of course, everything that I said about "1/2" above holds true about "0".

(I'm sorry in advance if my mathematical explentation are unclear or clumsy. If I explain arguments through math badly, then I explain arguments through math in English much worst as I was studying mathematics in a different language)

Reading the comments so far, I think Peter wasn't as clear as he had hoped (or this is all jumping to disagree too quickly). As I see it, the point is that an epistemic rationalist, a completely abstract mathematical construct to the best of our knowledge of the physical world, would make a choice that is at odds with an instrumental rationalist, i.e. a real person who's trying to win in real life. Having bounded resources, there is some threshold below which a physically existing rationalist will treat probabilities as equivalent to zero, i.e. will choose not to expend any resources on preparing for such a situation.

A meteorite makes a bad example because it's easy to imagine it happening. Suppose we consider the probability of a three layer chocolate cake spontaneously appearing in the passenger seat of our car during the drive home this afternoon. Yes, the probability must be nonzero, but it's so small as to not be worth considering. All those events with probabilities so small they aren't worth any resources are the ones you never even think about, so they are equivalent to having a probability of zero for the bounded rationalist.

... Ah, if

onlythat were so.(But I take it you mean "the ones you never even think about if you are an optimized bounded rationalist", in which case I think you're right.)

So what lesson does a rationalist draw from this? What is best for the Bayesian mathematical model is not best in practice? Conserving information is not always "good"?

Also,

This seems distinctly contrary to what an instrumental rationalist would do. It seems more likely he'd say "I was wrong, there was actually an infinitesimal probability of a meteorite strike that I previously ignored because of incomplete information/negligence/a rounding error."

*0 points [-]I say this with trepidation, since Peter and Eliezer have both already read this, but...

(If the probability distribution peaked at 1/2, it would be not-completely-unreasonable to use a flat distribution, and express a probability as a fixed-point number between 0 and 1. In that case, it would take 60 bits to express 10^-18. With floating point, you'd get a good approximation with 7 bits.)

But you're not really making a fair comparison. You're comparing "probability distribution centered on 1/2" with "0, no probability distribution". If the "centered on 0" choice doesn't get to have a distribution, neither should the "centered on 1/2" choice. Then both give you a divergence of infinity.

The KL-divergence comparison assumes use of a probability distribution. The probability distribution that peaks at zero is going to be able to represent 1E-18 with many fewer bits than the one that peaks at 1/2. So zero wins in both cases, and there is no demonstrated conflict between epistemic and instrumental rationality.

I was talking about a discrete probability distribution over two possible states: {meteorite, no meteorite}. You seem to be talking about something else.

*0 points [-]Okay. I thought you were talking about real-valued probability distributions from 0 to 1. But I don't know if you can claim to draw significant conclusions about epistemic rationality from using the wrong type of probability distribution.

What do you mean by "the wrong type of probability distribution"?

It might clarify things to note the connection between Kullback-Leibler divergence and communication theory. The Kullback-Leibler divergence is the utility function to use when minimizing the expected length of the signal encoding (i.e, recording or communicating) what actually happened. The choice of "1/2" or "0" is equivalent to to constraining the agent to choose between using one bit or or an infinite amount of bits to record/communicate the state of "improbable event did (not) occur".

In short, KL divergence isn't about truth-seeking

per se. It's about the resources necessary to encode signals -- definitely an instrumental question.