It seems like in order to go from P(H) to P(H|E) you have to become certain that E. Am I wrong about that? 

Say you have the following joint distribution:

P(H&E) = a
P(~H&E) = b
P(H&~E) = c

P(~H&~E) = d 

Where a,b,c, and d, are each larger than 0.

So P(H|E) = a/(a+b). It seems like what we're doing is going from assigning ~E some positive probability to assigning it a 0 probability. Is there another way to think about it? Is there something special about evidential statements that justifies changing their probabilities without having updated on something else? 

New Comment
12 comments, sorted by Click to highlight new comments since: Today at 1:20 PM

Updating by Bayesian conditionalization does assume that you are treating E as if its probability is now 1. If you want an update rule that is consistent with maintaining uncertainty about E, one proposal is Jeffrey conditionalization. If P1 is your initial (pre-evidential) distribution, and P2 is the updated distribution, then Jeffrey conditionalization says:

P2(H) = P1(H | E) P2(E) + P1(H | ~E) P2(~E).

Obviously, this reduces to Bayesian conditionalization when P2(E) = 1.

Yeah, the problem i have with that though is that I'm left asking: why did I change my probability in that? Is it because i updated on something else? Was I certain of that something else? If not, then why did I change my probability of that something else, and on we go down the rabbit hole of an infinite regress.

The infinite regress is anticipated in one of your priors.

You're playing a game. Variant A of an enemy attacks high most of the time, variant B of an enemy attacks low some of the time; the rest of the time they both do forward attacks. We have priors, which we can arbitrary set at any value. The enemy does a forward attack; here, we assign 100% probability to our observation of the forward attack. But let's say we see it out of the corner of our eye; in that case, we might assign 60% probability to the forward attack, but we still have 100% probability on the observation itself. Add an unreliable witness recounting the attack they saw out of the corner of their eye; we might assign 50% probability to that they're telling the truth, but 100% probability that we heard them. Add in a hearing problem; now we might assume 90% probability we heard them correctly, but 100% probability that we heard them at all.

We can keep adding levels of uncertainty, true. Eventually we will arrive at the demon-that-is-deliberately-deceiving-us thing Descartes talks about, at which point we can't be certain of anything except our own existence.

Infinite regress results in absolutely no certainty. But infinite regress isn't useful; lack of certainty isn't useful. We can't prove the existence of the universe, but we can see, quite obviously, the usefulness of assuming the universe does exist. Which is to say, probability doesn't exist in a vacuum; it serves a purpose.

Or, to approach it another way: Godel. We can't be absolutely certain of our probabilities because at least one of our probabilities must be axiomatic.

why did I change my probability in that?

Presumably because you got some new information. If there is no information, there is no update. If the information is uncertain, make appropriate adjustments. The "infinite regress" would either converge to some limit or you'll end up, as OrphanWilde says, with Descartes' deceiving demon at which point you don't know anything and just stand there slack-jawed till someone runs you over.

Note that you can show, for every E, P(E|E) = 1 (the proof is left as an exercise). This means that yes, whatever you have to the right of the sign | is taken to be certain. Why is this so?

The main reason is that updating on evidence, in a sense, means translating your entire probability model to a possible world where that evidence is true. This is usually justified because you treat sensory data (readings from a gauge, the color of a ball extracted from an urn, etc.) as certainties. But nothing limits you to do this only for evidence in the sensorial meaning. You can also entertain the idea that a memory or a fictional fact is true and update your model accordingly.
By the same theorem, you can move in and out of that possible world: everything is controlled by P(E). If you divide any probability by P(E), you move in, if you multiply everything by P(E), you move out.

Focus on the question you want to solve. Then decide what heuristics are appropriate to deal with the issue.

For any bit of evidence E0, you can always back it out based on E1,E2...

As P(E0|E1,E2,...)

Analysis starts in some context of belief. You can question bits of "evidence", and do a finer analysis, but the goal is usually to get a good answer, not reevaluate all your premises.

It seems like in order to go from P(H) to P(H|E) you have to become certain that E.

You need to be certain enough for the analysis at hand.

Is there something special about evidential statements that justifies changing their probabilities without having updated on something else?

Linguistically, "evidence" is derived from Latin "videre" which means 'to see'. The archetypal form of getting evidence is seeing something with your own eyes.

You change your probability of E, because you see E. Why do you believe that you see E? Your visual neurons were changed in response to the signals from your retina, which is an involuntary process.

The correct answer to this is that mathematically, E has to be certain. But mathematics (at least the mathematics which we currently use) does not correspond precisely to the reality of how beliefs work, but only approximately. In reality, it is possible for E (and everything that changed our mind about E or about H) to be somewhat uncertain. That simply says that reality is a bit more than math.

Consider P(E) = 1/3. We can consider three worlds, W1, W2 and W3, all with the same probability, with E being true in W3 only. Placing yourself in W3, you can evaluate the probability of H while updating P(E) = 1 (because you're placing yourself in the world where E is true with certainty.

In the same way, by placing yourself in W1 and W2, you evaluate H with P(E) = 0.

The thing is, you're "updating" on an hypothetical fact. You're not certain of being in W1, W2, or W3. So you're not actually updating, you're artificially considering a world where the probabilities are shifted to 0 or 1, and weighting the outcomes by the probabilities of that world happening.

When you update, you're not simply imagining what you would believe in a world where E was true, you're changing your actual beliefs about this world. The point of updates is to change your behavior in response to evidence. I'm not going to change my behavior in this world simply because I'm imagining what I would believe in a hypothetical world where E is definitely true. I'm going to change my behavior because observation has led me to change the credence I attach to E being true in this world.

There's a labeling problem here. E is an event. The extra information you're updating on, the evidence, the thing that you are certain of, is not "E is true". It's "E has probability p". You can't actually update until you know the probability of E.

What the joint probability give you is by how much you have to update your credence in H, given E. Without P(E), you can't actually update.

P(H|E) tells you "OK, if E is certain, my new probability for H is P(H|E)". P(H|~E) tells you "OK, if E is impossible, my new probability for H is P(H|~E)". In the case of P(E) = 0.5, I will update by taking the mean of both.

Updating, proper updating, will only happen when you are certain of the probability of E (this is different form "being certain of E"), and the formulas will tell you by how much. Your joint probabilities are information themselves: they tell you how E relates to H. But you can't update on H until you know evidence about E.