Tyrrell_McAllister comments on Take heed, for it is a trap - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (187)
I don't see how the claim is "sophisticated and true". Let P and Q be statements. You cannot simultaneously assign 50% prior probability to each of the following three statements:
This remains true even if you don't know the complexities of these statements.
See here.
I think that either you are making a use-mention error, or you are confusing syntax with semantics.
Formally speaking, the expression "p(A)" makes sense only if A is a sentence in some formal system.
I can think of three ways to try to understand what's going in your dialogue, but none leads to your conclusion. Let Alice and Bob be the first and second interlocutor, respectively. Let p be Bob's probability function. My three interpretations of your dialogue are as follows:
Alice and Bob are using different formal systems. In this case, Bob cannot use Alice's utterances; he can only mention them.
Alice and Bob are both using the same formal system, so that A, B, and C are sentences—e.g., atomic proposition letters—for both Alice and Bob.
Alice is talking about Bob's formal system. She somehow knows that Bob's model-theoretic interpretations of the sentences C and A&B are the same, even though [C = A&B] isn't a theorem in Bob's formal system. (So, in particular, Bob's formal system is not complete.)
Under the first interpretation, Bob cannot evaluate expressions of the form "p(A)", because "A" is not a sentence in his formal system. The closest he can come is to evaluate expressions like "p(Alice was thinking of a true proposition when she said 'A')". If Bob attends to the use-mention distinction carefully, he cannot be trapped in the way that you portray. For, while C = A & B may be a theorem in Alice's system,
is not (we may suppose) a theorem in Bob's formal system. (If, by chance, it is a theorem in Bob's formal system, then the essence of the remarks below apply.)
Now consider the second interpretation. Then, evidently, C = A & B is a theorem in Alice and Bob's shared formal system. (Otherwise, Alice would not be in a position to assert that C = A & B.) But then p, by definition, will respect logical connectives so that, for example, if p(B & ~A) > 0, then p(A) < p(C). This is true even if Bob hasn't yet worked out that C = A & B is in fact a consequence of his axioms. It just follows from the fact that p is a coherent probability function over propositions.
This means that, if the algorithm that determines how Bob answers a question like "What is p(A)?" is indeed an implementation of the probability function p, then he simply will not in all cases assert that p(A) = 0.5, p(B) = 0.5, and p(C) = 0.5.
Finally, under the third interpretation, Bob did not say that p(A|B) = 1 when he said that p(C)/ p(B) = 1, because A&B is not syntactically equivalent to C under Bob's formal system. So again Alice's trap fails to spring.
How does it makes sense then? Quite a bit more would need to be assumed and specified.
Hence the "only if". I am stating a necessary, but not sufficient, condition. Or do I miss your point?
Well, we could also assume and specify additional things that would make "p(A)" make sense even if "A" is not a statement in some formal system. So I don't see how your remark is meaningful.
Do you mean, for example, that p could be a measure and A could be a set? Since komponisto was talking about expressions of the form p(A) such that A can appear in expressions like A&B, I understood the context to be one in which we were already considering p to be a function over sentences or propositions (which, following komponisto, I was equating), and not, for example, sets.
Do you mean that "p(A)" can make sense in some case where A is a sentence, but not a sentence in some formal system? If so, would you give an example? Do you mean, for example, that A could be a statement in some non-formal language like English?
Or do you mean something else?
In my own interpretation, A is a hypothesis -- something that represents a possible state of the world. Hypotheses are of course subject to Boolean algebra, so you could perhaps model them as sentences or sets.
You have made a number of interesting comments that will probably take me some time to respond to.
I've been trying to develop a formal understanding of your claim that the prior probability of an unknown arbitrary hypothesis A makes sense and should equal 0.5. I'm not there yet, but I have a couple of tentative approaches. I was wondering whether either one looks at all like what you are getting at.
The first approach is to let the sample space Ω be the set of all hypotheses, endowed with a suitable probability distribution p. It's not clear to me what probability distribution p you would have in mind, though. Presumably it would be "uniform" in some appropriate sense, because we are supposed to start in a state of complete ignorance about the elements of Ω.
At any rate, you would then define the random variable v : Ω → {True, False} that returns the actual truth value of each hypothesis. The quantity "p(A), for arbitrary unknown A" would be interpreted to mean the value of p(v = True). One would then show that half of the hypotheses in Ω (with respect to p-measure) are true. That is, one would have p(v = True) = 0.5, yielding your claim.
I have two difficulties with this approach. First, as I mentioned, I don't see how to define p. Second, as I mentioned in this comment, "the truth of a binary string is a property involving the territory, while prior probability should be entirely determined by the map." (ETA: I should emphasize that this second difficulty seems fatal to me. Defining p might just be a technicality. But making probability a property of the territory is fundamentally contrary to the Bayesian Way.)
The second approach tries to avoid that last difficulty by going "meta". Under this approach, you would take the sample space Ω to be the set of logically consistent possible worlds. More precisely, Ω would be the set of all valuation maps v : {hypotheses} → {True, False} assigning a truth value to every hypothesis. (By calling a map v a "valuation map" here, I just mean that it respects the usual logical connectives and quantifiers. E.g., if v(A) = True and v(B) = True, then v(A & B) = True.) You would then endow Ω with some appropriate probability distribution p. However, again, I don't yet see precisely what p should be.
Then, for each hypothesis A, you would have a random variable V_A : Ω → {True, False} that equals True on precisely those valuation maps v such that v(A) = True. The claim that "p(A) = 0.5 for arbitrary unknown A" would unpack as the claim that, for every hypothesis A, p(V_A = True) = 0.5 — that is, that each hypothesis A is true in exactly half of all possible worlds (with respect to p-measure).
Do either of these approaches look to you like they are on the right track?
ETA: Here's a third approach which combines the previous two: When you're asked "What's p(A), where A is an arbitrary unknown hypothesis?", and you are still in a state of complete ignorance, then you know neither the world you're in, nor the hypothesis A whose truth in that world you are being asked to consider. So, let the sample space Ω be the set of ordered pairs (v, A), where v is a valuation map and A is a hypothesis. You endow Ω with some appropriate probability distribution p, and you have a random variable V : Ω → {True, False} that maps (v, A) to True precisely when v(A) = True — i.e., when A is true under v. You give the response "0.5" to the question because (we suppose) p(V = True) = 0.5.
But I still don't see how to define p. Is there a well-known and widely-agreed-upon definition for p? On the one hand, p is a probability distribution over a countably infinite set (assuming that we identify the set of hypotheses with the set of sentences in some formal language). [ETA: That was a mistake. The sample space is countable in the first of the approaches above, but there might be uncountably many logically consistent ways to assign truth values to hypotheses.] On the other hand, it seems intuitively like p should be "uniform" in some sense, to capture the condition that we start in a state of total ignorance. How can these conditions be met simultaneously?
I think the second approach (and possibly the third also, but I haven't yet considered it as deeply) is close to the right idea.
It's pretty easy to see how it would work if there are only a finite number of hypotheses, say n: in that case, Ω is basically just the collection of binary strings of length n (assuming the hypothesis space is carved up appropriately), and each map V_A is evaluation at a particular coordinate. Sure enough, at each coordinate, half the elements of Ω evaluate to 1, and half to 0 !
More generally, one could imagine a probability distribution on the hypothesis space controlling the "weighting" of elements of Ω. For instance, if hypothesis #6 gets its probability raised, then those mappings v in Ω such that v(6) = 1 would be weighted more than those such that v(6) = 0. I haven't checked that this type of arrangement is actually possible, but something like it ought to be.