This post summarizes response to the Less Wrong Book Club and Study Group proposal, floats a tentative virtual meetup schedule, and offers some mechanisms for keeping up to date with the group's work. We end with summaries of Chapter 1.
Statistics
The proposal for a LW book club and study group, initially focusing on E.T. Jaynes' Probability Theory: The Logic of Science (a.k.a. PT:TLOS), drew an impressive response with 57 declarations of intent to participate. (I may have missed some or misinterpreted as intending to participate some who were merely interested. This spreadsheet contains participant data and can be edited by anyone (under revision control). Please feel free to add, remove or change your information.) The group has people from no less than 11 different countries, in time zones ranging from GMT-7 to GMT+10.
Live discussion schedule and venues
Many participants have expressed an interest in having informal or chatty discussions over a less permanent medium than LW itself, which should probably be reserved for more careful observations. The schedule below is offered as a basis for further negotiation. You can edit the spreadsheet linked above with your preferred times, and by the next iteration if a different clustering emerges I will report on that.
- Tuesdays at UTC 18:00 (that is 1pm Bay Area, 8pm in Europe, etc. - see linked schedule for more)
- Wednesdays at UTC 11:00 (seems preferred by Australian participants)
- Sundays at UTC 18:00 (some have requested a weekend meeting)
The unofficial Less Wrong IRC channel is the preferred venue. An experimental Google Wave has also been started which may be a useful adjunct, in particular as we come to need mathematical notations in our discussions.
I recommend reading the suggested material before attending live discussion sessions.
Objectives, math prerequisites
The intent of the group is to engage in "earnest study of the great literature in our area of interest" (to paraphrase from the Knowledge Hydrant pattern language, a useful resource for study groups).
Earnest study aims at understanding a work deeply. Probably (particularly so in the case of PT:TLOS) the most useful way to do so is sequentially, in the order the author presented their ideas. Therefore, we aim for a pace that allows participants to extract as much insight as possible from each piece of the work, before moving on to the next, which is assumed to build on it.
Exercises are useful stopping-points to check for understanding. When the text contains equations or proofs, reproducing the derivations or checking the calculations can also be a good way to ensure deep understanding.
PT:TLOS is (from personal experience) relatively accessible on rusty high school math (in particular requires little calculus) until at least partway through Chapter 6 (which is where I am at the moment). Just these few chapters contain many key insights about the Bayesian view of probability and are well worth the effort.
Format
My proposal for the format is as follows. I will post one new top-level post per chapter, so as to give people following through RSS a chance to catch updates. Each chapter, however, may require splitting up into more than one chunk to be manageable. I intend to aim for a weekly rhythm: the monday after the first chunk of a new chapter is posted, I will post the next chunk, and so on. If you're worried about missing an update, check the top-level post for the current chapter weekly on mondays.
Each update will identify the current chunk, and will link to a comment containing one or more "opening questions" to jump-start discussion.
Updates also briefly summarize the previous chunk and highlights of the discussion arising from it. (Participants in the live chat sessions are encouraged to designate one person to summarize the discussion and post the summary as a comment.) By the time a new chapter is to be opened, the previous post will contain a digest form of the group's collective take on the chapter just worked through. The cumulative effect will be a "Less Wrong's notes on PT:TLOS", useful in itself for newcomers.
Chapter 1: Plausible Reasoning
In this chapter Jaynes fleshes out a theme introduced in the preface: "Probability theory as extended logic".
Sections: Deductive and Plausible Reasoning - Analogies with Physical Theories - The Thinking Computer - Introducing the Robot (week of 14/06)
Classical (Aristotelian) logic - modus ponens, modus tollens - allows deduction (teasing apart the concepts of deduction, induction, abduction isn't trivial). But what if we're interested not just in "definitely true or false" but "is this plausible", as we are in the kind of everyday thinking Jaynes provides examples of? Plausible reasoning is a weaker form of inference than deduction, but one Jaynes argues plays an important role even in (say) mathematics.
Jaynes' aim is to construct a working model of our faculty of "common sense", in the same sense that the Wright brothers could form a working model of the faculty of flight, not by vague resort to analogy as in the Icarus myth, but by producing a machine embodying a precise understanding. (Jaynes, however, speaks favorably of analogical thinking: "Good mathematicians see analogies between theorems; great mathematicians seen analogies between analogies". He acknowledges that this line of argument itself stems from analogy with physics.)
Accordingly, Jaynes frames what is to follow as building an "inference robot". Jaynes notes, "the question of the reasoning process used by actual human brains is charged with emotion and grotesque misunderstandings", and so this frame will be helpful in keeping us focused on useful questions with observable consequences. It is tempting to also read a practical intent - just as robots can carry out specialized mechanical tasks on behalf of humans, so could an inference robot keep track of more details than our unaided common senses - we must however be careful not to project onto Jaynes some conception of a "Bayesian AI".
Sections: Boolean Algebra - Adequate Sets of Operations - The Basic Desiderata - Comments - Common Language vs Formal Logic - Nitpicking (week of 21/06)
Jaynes next introduces the familiar formal notation of Boolean algebra to represent truth-values of propositions, their conjunction and disjunction, and denial. (Equality denotes equality of truth-values, rather than equality of propositions.) Some care is required to distinguish common usage of terms such as "or", "implies", "if", etc. from their denotation in the Boolean algebra of truth-values. From the axioms of idempotence, commutativity, associativity, distributivity and duality, we can build up any number of more sophisticated consequences.
One such consequence, sketched out next, is that any function of n boolean variables can be expressed as a sum (logical OR) involving only conjunctions (logical AND) of each variable or its negation. Each of different logic functions can thus be expressed in terms of only building blocks and only three operations (conjunction, disjunction, negation). In fact an even smaller set of operations is adequate to construct all Boolean functions: it is possible to express all three in terms of the NAND (negation of AND) operation, for instance. (A key argument in Chapter 2 hinges on this reduction of logic functions to an "adequate set".)
The "inference robot", then, is to reason in terms of degrees of plausibility assigned to propositions: plausibility is a generalization of truth-value. We are generally concerned with "conditional probability"; how plausible something is given what else we know. This is represented in the familiar notation A|B (" the plausibility of A given that B is true", or "A given B"). The robot is assumed to be provided sensible, non-contradictory input.
Jaynes next considers the "basic desiderata" for such an extension. First, they should be real numbers. (This is motivated by an appeal to convenience of implementation; the Comments defend this in greater detail, and a more formal justification can be found in the Appendices.) By convention, greater plausibility will be represented with a greater number, and the robot's "sense of direction", that is, the consequences it draws from increases or decreases in the plausibility of the "givens", must conform to common sense. (This will play a key role in Chapter 2.) Finally, the robot is to be consistent and non-ideological: it must always draw the same conclusions from identical premises, it must not arbitrarily ignore information available to it, and it must represent equivalent states of knowledge by equivalent values of plausibility.
(The Comments section is well worth reading, as it introduces the Mind Projection Fallacy which LW readers who have gone through the Sequences should be familiar with.)
Perhaps it might be wiser to use measures (distributions), or measures on spaces of measures, or iterate that construction indefinitely. (The concept of hyperpriors seems to go in this direction, for example.)
Consider the following propositions.
P1: The recently minted U.S. quarter I just vigorously flipped into the air landed heads on the floor.
P2: A ball pulled from an unspecified urn containing an unspecified number of balls is white.
P3(x): The probability of P2 is x
Part of the problem is the laxness in specifying the language, as I mentioned. For example, if the language we use is rich enough to support self-referring interpretations, then it may not even be possible to coherently assign a truth value--or any other probability, or to know whether that is possible.
But even ruling out Goedelian potholes in the landscape and uncountably infinite families of propositions, the contrast between P1 and P2 is problematic. P1 is backed up by a vast trove of background knowledge and evidence, and our confidence in asserting Prob(P1) = 1/2 is very strong. On the other hand, background knowledge and evidence about P2 is virtually nil. It is reasonable as a matter of customary usage to assume the number of balls in the urn is finite, and thus the probability of P1 is a rational number, but until you start adding in more assumptions and evidence, one's confidence in Prob(P2) < x for any particular real number x seems typically to be very much lower than for P1. Summarizing one's state of knowledge about these two propositions onto the same scale of reals between 0 and 1 seems to ignore an awful lot that we know about the relative state of knowledge vs. ignorance with respect to P1 and P2. An awful lot of knowledge is being jettisoned because it won't fit into this scheme of definite real numbers. To make the claim Prob(P2) = 1/2 (or any other definite real number you want to name) just does not seem like the same kind of thing as the claim Prob(P1) = 1/2. It feels like a category mistake.
Jaynes addresses this to some degree in Appendix A4 "Comparative Probability". He presents an argument that seems to go like this. It hardly matters very much what real number we use to start with for a statement without much background evidence, because the more evidence we accumulate, the more our assignments are coordinated with other statements into a comprehensive picture, and the probabilities eventually converge to true and correct values. That's a heartening way to look at it, but it also goes to show that many of the assignments of specific real numbers we make, such as for P2 or P3, are largely irrelevancies that are right next door to meaningless. And in the end he reiterates his initial argument that the benefits of being able to have a real number to calculate with are irresistible. This comes at the price of helping ourselves to the illusion of more precision than our state of ignorance seems to entitle us to. This is why the axiom of comparability seems to me to make an unnatural correspondence to the way we could or should think about these things.
Very interesting! But I have to read up on the Appendix A4 I think to fully appreciate it...I will come back if I change my mind after it! :-)
My own, current, thoughts are like this: I would bet on the ball being white up to some ratio...if my bet was $1 and I could win $100 I would do it for instance. The probability is simply the border case where ratio between losing and winning is such that I might as well bet or not do it. Betting $50 I would certainly not do. So I would estimate the probability to be somewhere between 1 and 50%...and somewhere there ... (read more)