Hello, everyone.
I'm relatively new here as a user rather than as a lurker, but even after trying to read ever tutorial on Bayes' Theorem I could get my hands on, I'm still not sure I understand it. So I was hoping that I could explain Bayesianism as I understand it, and some more experienced Bayesians could tell me where I'm going wrong (or maybe if I'm not going wrong and it's a confidence issue rather than an actual knowledge issue). If this doesn't interest you at all, then feel free to tap out now, because here we go!
Abstraction
Bayes' Theorem is an application of probability. Probability is an abstraction based on logic, which is in turn based on possible worlds. By this I mean that they are both maps that refer to multiple territories: whereas a map of Cincinatti (or a "map" of what my brother is like, for instance), abstractions are good for more than one thing. Trigonometry is a map of not just this triangle here, but of all triangles everywhere, to the extent that they are triangular. Because of this it is useful even for triangular objects that one has never encountered before, but only tells you about it partially (e.g. it won't tell you the lengths of the sides, because that wouldn't be part of the definition of a triangle; also, it only works at scales at which the object in question approximates a triangle (i.e. the "triangle" map is probably useful at macroscopic scales, but breaks down as you get smaller).
Logic and Possible Worlds
Logic is an attempt to construct a map that covers as much territory as possible, ideally all of it. Thus when people say that logic is true at all times, at all places, and with all things, they aren't really telling you about the territory, they're telling you about the purpose of logic (in the same way that the "triangle" map is ideally useful for triangles at all times, at all places).
One form of logic is Propositional Logic. In propositional logic, all the possible worlds are imagined as points. Each point is exactly one possible world: a logically-possible arrangement that gives a value to all the different variables in the universe. Ergo no two possible universes are exactly the same (though they will share elements).
These possible universes are then joined together in sets called "propositions". These "sets" are Venn diagrams, or what George Lakoff refers to as "container schemas"). Thus, for any given set, every possible universe is either inside or outside of it, with no middle ground (see "questions" below). Thus if the set I'm referring to is the proposition "The Snow is White", that set would include all possible universes in which the snow is white. The rules of propositional logic follow from the container schema.
Bayesian Probability
If propositional logic is about what's inside a set or outside of a set, probability is about the size of the sets themselves. Probability is a measurement of how many possible worlds are inside a set, and conditional probability is about the size of the intersections of sets.
Take the example of the dragon in your garage. To start with, there either is or isn't a dragon in your garage. Both sets of possible worlds have elements in them. But if we look in your garage and don't see a dragon, then that eliminates all the possibilities of there being a *visible* dragon in your garage, and thus eliminates those possible universes from the 'there is a dragon in your garage' set. In other words, the probability of that being true goes down. And because not seeing a dragon in your garage would be what you would expect if there in fact isn't a dragon in your garage, that set remains intact. Then if we look at the ratio of the remaining possible worlds, we see that the probability of the no-dragon-in-your-garage set has gone up, not because in absolute terms (because the set of all possible worlds is what we started with; there isn't any more!) but relative to the alternate hypothesis (in the same way that if the denominator of a fraction goes down, the size of the fraction goes up.)
This is what Bayes' Theorem is about: the use of process of elimination to eliminate *part* of the set of a proposition, thus providing evidence against it without it being a full refutation.
Naturally, this all takes place in ones mind: the world doesn't shift around you just because you've encountered new information. Probability is in this way subjective (it has to do with maps, not territories per se), but it's not arbitrary: as long as you accept that possible worlds/logic metaphor, it necessarily follows
Questions/trouble points that I'm not sure of:
*I keep seeing probability referred to as an estimation of how certain you are in a belief. And while I guess it could be argued that you should be certain of a belief relative to the number of possible worlds left or whatever, that doesn't necessarily follow. Does the above explanation differ from how other people use probability?
*Also, if probability is defined as an arbitrary estimation of how sure you are, why should those estimations follow the laws of probability? I've heard the Dutch book argument, so I get why there might be practical reasons for obeying them, but unless you accept a pragmatist epistemology, that doesn't provide reasons why your beliefs are more likely to be true if you follow them. (I've also heard of Cox's rules, but I haven't been able to find a copy. And if I understand right, they says that Bayes' theorem follows from Boolean logic, which is similar to what I've said above, yes?)
*Another question: above I used propositional logic, which is okay, but it's not exactly the creme de la creme of logics. I understand that fuzzy logics work better for a lot of things, and I'm familiar with predicate logics as well, but I'm not sure what the interaction of any of them is with probability or the use of it, although I know that technically probability doesn't have to be binary (sets just need to be exhaustive and mutually exclusive for the Kolmogorov axioms to work, right?). I don't know, maybe it's just something that I haven't learned yet, but the answer really is out there?
Those are the only questions that are coming to mind right now (if I think of any more, I can probably ask them in comments). So anyone? Am I doing something wrong? Or do I feel more confused than I really am?
Unfortunately no, but from your description it seems quite like the theory of the mind of General Semantics.
Not exactly, because in the end symbols are just unit of perceptions, all distinct from one another. But while Lakoff's theory probably aims at psychology, logic is a denotational and computational tool, so it doesn't really matter if they aren't perfect inverse.
Yes. Since a group of maps can be seen just as a set of things in itself, it can be treated as a valid territory. In logic there are also map/territory loops, where the formulas itself becomes the territory mapped by the same formulas (akin to talking in English about the English language). This trick is used for example in Goedel's and Tarski's theorems.
Yes. Basically the Bayesian definition is more inclusive: e.g. there is no definition of a probability of a single coin toss in the frequency interpretation, but there is in the Bayesian. Also in Bayes take on probability the frequentist definition emerges just as a natural by-product. Plus, the Bayesian framework produced a lot of detangling in frequentist statistics and introduced more powerful methods.
The first two chapters of Jaynes' book, a pre-print version of which is available online for free, do a great job in explaining and using Cox to derive Bayesian probability. I urge you to read them to fully grasp this point of view.
And easily falsifiable.
Yes, but remember that this measure interpretation of probability requires the set of possible world to be measurable, which is a very special condition to impose on a set. It is certainly very intuitive, but technically burdensome. If you plan to work with probability, it's better to start from a cleaner model.
Yes. Fuzzy logic has an infinity of truth values for its propositions, while in PTEL every proposition is 'in reality' just true or false, you just don't know which is which, and so you track your certainty with a real number.
Yes, in PTEL you already have real numbers, so it's not difficult to just say "The tea is 0.7 cold", and provided you have a clean (that is, classical) interpretation for this, the sentence is just true or false. Then you can quantify you uncertainty: "I give 0.2 credence to the belief that the tea is 0.7 cold". More generally, "I give y credence to the belief that the tea is x cold".
What comes out is a probability distribution, that is the assignment of a probability value to every value of a parameter (in this case, the coldness of tea). Notice that this would be impossible in the frequentist interpretation.
I think it's similar, but Lakoff focuses more on how things are abstracted away. For example, because in childhood affection is usually associated with warmth (e.g. through hugs), the different areas of your brain that code for those things become linked ("neurons that wire together, fire together"). This then becomes the basis of a cognitive metaphor, Affection Is Warmth, such that we can also say "She has a warm smile" or &qu... (read more)