Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: MrMind 19 July 2017 03:27:09PM *  1 point [-]

As I recall, he ran into some issues with universally quantified statements -- they end up having zero probability in his system.

Cox's probability is essentially probability defined on a Boolean algebra (the Lindenbaum-Tarski algebra of propositional logic).
Kolmogorov's probability is probability defined on a sigma-complete Boolean algebra.
If I can show that quantifiers are related to sigma-completeness (quantifiers are adjunctions in the proper pair of categories, but I've yet to look into that), then I can probably lift the equivalnce via the Loomis-Sikorski theorem back to the original algebras, and get exactly when a Cox's probability can be safely extended to predicate logic.
That's the dream, anyway.

Comment author: ksvanhorn 20 July 2017 07:29:17PM 0 points [-]

I'd be interested in reading what you come up with once you're ready to share it.

One thing you might consider is whether sigma-completeness is really necessary, or whether a weaker concept will do. One can argue that, from the perspective of constructing a logical system, only computable countable unions are of interest, rather than arbitrary countable unions.

Comment author: MrMind 07 July 2017 03:55:44PM 1 point [-]

I'm working on extending probability to predicate calculus and your work will be very precious, thanks!

Comment author: ksvanhorn 07 July 2017 04:15:42PM 2 points [-]

If you haven't already, I would suggest you read Carnap's book, The Logical Foundations of Probability (there's a PDF of it somewhere online). As I recall, he ran into some issues with universally quantified statements -- they end up having zero probability in his system.

Comment author: ChristianKl 07 July 2017 01:29:40AM *  1 point [-]

I don't think that changes much about the core argument. Chapman wrote in Probability theory does not extend logic :

Probability theory can be viewed as an extension of propositional calculus. Propositional calculus is described as “a logic,” for historical reasons, but it is not what is usually meant by “logic.”

[...]

Probability theory by itself cannot express relationships among multiple objects, as predicate calculus (i.e. “logic”) can. The two systems are typically combined in scientific practice.

Comment author: ksvanhorn 07 July 2017 04:08:04PM *  5 points [-]
Comment author: CronoDAS 07 July 2017 03:17:39PM *  0 points [-]

Why is #4 above "less than" and not "less than or equal to"?

::thinks a bit::

What this is saying is, if there are logically possible worlds where A is false and B is true, but no logically possible worlds where A is true and B is false, then A is strictly less likely than B - that all logically possible worlds have nonzero probability. This is a pretty strong assumption...

Comment author: ksvanhorn 07 July 2017 03:56:58PM 1 point [-]

Epistemic probabilities / plausibilities are not properties of the external world; they are properties of the information you have available. Recall that the premise X contains all the information you have available to assess plausibilities. If X does not rule out a possible world, what basis do you have for assigning it 0 probability? Put another way, how do you get to 100% confidence that this possible world is in fact impossible, when you have no information to rule it out?

Bayesian probability theory as extended logic -- a new result

9 ksvanhorn 06 July 2017 07:14PM

I have a new paper that strengthens the case for strong Bayesianism, a.k.a. One Magisterium Bayes. The paper is entitled "From propositional logic to plausible reasoning: a uniqueness theorem." (The preceding link will be good for a few weeks, after which only the preprint version will be available for free. I couldn't come up with the $2500 that Elsevier makes you pay to make your paper open-access.)

Some background: E. T. Jaynes took the position that (Bayesian) probability theory is an extension of propositional logic to handle degrees of certainty -- and appealed to Cox's Theorem to argue that probability theory is the only viable such extension, "the unique consistent rules for conducting inference (i.e. plausible reasoning) of any kind." This position is sometimes called strong Bayesianism. In a nutshell, frequentist statistics is fine for reasoning about frequencies of repeated events, but that's a very narrow class of questions; most of the time when researchers appeal to statistics, they want to know what they can conclude with what degree of certainty, and that is an epistemic question for which Bayesian statistics is the right tool, according to Cox's Theorem.

You can find a "guided tour" of Cox's Theorem here (see "Constructing a logic of plausible inference"). Here's a very brief summary. We write A | X for "the reasonable credibility" (plausibility) of proposition A when X is known to be true. Here X represents whatever information we have available. We are not at this point assuming that A | X is any sort of probability. A system of plausible reasoning is a set of rules for evaluating A | X. Cox proposed a handful of intuitively-appealing, qualitative requirements for any system of plausible reasoning, and showed that these requirements imply that any such system is just probability theory in disguise. That is, there necessarily exists an order-preserving isomorphism between plausibilities and probabilities such that A | X, after mapping from plausibilities to probabilities, respects the laws of probability.

Here is one (simplified and not 100% accurate) version of the assumptions required to obtain Cox's result:

 

  1. A | X is a real number.
  2. (A | X) = (B | X) whenever A and B are logically equivalent; furthermore, (A | X) ≤ (B | X) if B is a tautology (an expression that is logically true, such as (a or not a)).
  3. We can obtain (not A | X) from A | X via some non-increasing function S. That is, (not A | X) = S(A | X).
  4. We can obtain (A and B | X) from (B | X) and (A | B and X) via some continuous function F that is strictly increasing in both arguments: (A and B | X) = F((A | B and X), B | X).
  5. The set of triples (x,y,z) such that x = A|X, y = (B | A and X), and z = (C | A and B and X) for some proposition A, proposition B, proposition C, and state of information X, is dense. Loosely speaking, this means that if you give me any (x',y',z') in the appropriate range, I can find an (x,y,z) of the above form that is arbitrarily close to (x',y',z').
The "guided tour" mentioned above gives detailed rationales for all of these requirements.

Not everyone agrees that these assumptions are reasonable. My paper proposes an alternative set of assumptions that are intended to be less disputable, as every one of them is simply a requirement that some property already true of propositional logic continue to be true in our extended logic for plausible reasoning. Here are the alternative requirements:
  1. If X and Y are logically equivalent, and A and B are logically equivalent assuming X, then (A | X) = (B | Y).
  2. We may define a new propositional symbol s without affecting the plausibility of any proposition that does not mention that symbol. Specifically, if s is a propositional symbol not appearing in A, X, or E, then (A | X) = (A | (s ↔ E) and X).
  3. Adding irrelevant background information does not alter plausibilities. Specifically, if Y is a satisfiable propositional formula that uses no propositional symbol occurring in A or X, then (A | X) = (A | Y and X).
  4. The implication ordering is preserved: if  A → B is a logical consequence of X, but B → A is not, then then A | X < B | X; that is, A is strictly less plausible than B, assuming X.
Note that I do not assume that A | X is a real number. Item 4 above assumes only that there is some partial ordering on plausibility values: in some cases we can say that one plausibility is greater than another.

 

I also explicitly take the state of information X to be a propositional formula: all the background knowledge to which we have access is expressed in the form of logical statements. So, for example, if your background information is that you are tossing a six-sided die, you could express this by letting s1 mean "the die comes up 1," s2 mean "the die comes up 2," and so on, and your background information X would be a logical formula stating that exactly one of s1, ..., s6 is true, that is,

(s1 or s2 or s3 or s5 or s6) and
not (s1 and s2) and not (s1 and s3) and not (s1 and s4) and
not (s1 and s5) and not (s1 and s6) and not (s2 and s3) and
not (s2 and s4) and not (s2 and s5) and not (s2 and s6) and
not (s3 and s4) and not (s3 and s5) and not (s3 and s6) and
not (s4 and s5) and not (s4 and s6) and not (s5 and s6).

Just like Cox, I then show that there is an order-preserving isomorphism between plausibilities and probabilities that respects the laws of probability.

My result goes further, however, in that it gives actual numeric values for the probabilities. Imagine creating a truth table containing one row for each possible combination of truth values assigned to each atomic proposition appearing in either A or X. Let n be the number of rows in this table for which X evaluates true. Let m be the number of rows in this table for which both A and X evaluate true. If P is the function that maps plausibilities to probabilities, then P(A | X) = m / n.

For example, suppose that a and b are atomic propositions (not decomposable in terms of more primitive propositions), and suppose that we only know that at least one of them is true; what then is the probability that a is true? Start by enumerating all possible combinations of truth values for a and b:
  1. a false, b false: (a or b) is false, a is false.
  2. a false, b true : (a or b) is true,  a is false.
  3. a true,  b false: (a or b) is true,  a is true.
  4. a true,  b true : (a or b) is true,  a is true.
There are 3 cases (2, 3, and 4) in which (a or b) is true, and in 2 of these cases (3 and 4) a is also true. Therefore,

    P(a | a or b) = 2/3.

This concords with the classical definition of probability, which Laplace expressed as

The probability of an event is the ratio of the number of cases favorable to it, to the number of possible cases, when there is nothing to make us believe that one case should occur rather than any other, so that these cases are, for us, equally possible.

This definition fell out of favor, in part because of its apparent circularity. My result validates the classical definition and sharpens it. We can now say that a “possible case” is simply a truth assignment satisfying the premise X. We can simply drop the problematic phrase “these cases are, for us, equally possible.” The phrase “there is nothing to make us believe that one case should occur rather than any other” means that we possess no additional information that, if added to X, would expand by differing multiplicities the rows of the truth table for which X evaluates true.

For more details, see the paper linked above.
Comment author: cousin_it 03 July 2017 02:29:20PM 0 points [-]

If you separate Bayesian probability from decision theory, then it has no justification except self-consistency, and you can no longer say that all correct reasoning must approximate Bayes (which is the claim under discussion).

Comment author: ksvanhorn 05 July 2017 05:27:11AM 0 points [-]

Sure it does. Haven't you heard of Cox's Theorem? It singles out (Bayesian) probability theory as the uniquely determined extension of propositional logic to handle degrees of certainty. There's also my recent paper, "From Propositional Logic to Plausible Reasoning: A Uniqueness Theorem"

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fauthors.elsevier.com%2Fa%2F1VIqc%2CKD6ZCKMf&data=02%7C01%7C%7C12e6bb32616e4a953bb808d4bfe40576%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636344433443102669&sdata=9lY8lw3AEn8Hw5IuPxo2YPcLadVhyXR5b98rULWC8nE%3D&reserved=0

or

https://arxiv.org/abs/1706.05261

Comment author: moridinamael 16 January 2017 04:29:46PM 6 points [-]

Let me attempt to explain it in my own words.

You have a thought, and then you have some kind of emotional reaction to it, and that emotionally reaction should be felt in your body. Indeed, it is hard to have an emotion that doesn't have a physical component.

Say you think that you should call your mom, but then you feel a heaviness or a sinking in your gut, or a tightness in your neck or throat or jaw. These physical sensations are one of the main ways your subconscious tries to communicate with you. Let's further say that you don't know why you feel this way, and you can't say why you don't want to call your mom. You just find that you know you should call your mom but some part of you is giving you a really bad feeling about it. If you don't make an effort to untangle this mess, you'll probably just not call your mom, meaning whatever subconscious process originated those bad feelings in the first place will continue sitting under the surface and probably recapitulate the same reaction in similar situations.

If you gingerly try to "fit" the feeling with some words, as Gendlin says, the mind will either give you no feedback or it will give you a "yes, that's right" in the form of a further physical shift. This physical shift can be interpreted as the subconscious module acknowledging that its signal has been heard and ceasing to broadcast it.

I really don't think Gendlin is saying that the origin of your emotions about calling your mom is stored in your muscles. I think he's saying that when you have certain thoughts or parts of yourself that you have squashed out of consciousness with consistent suppression, these parts make themselves known through physical sensations, so it feels like it's in your body. And the best way to figure out what those feelings are is to be very attentive to your body, because that's the channel through which you're able to tentatively communicate with that part of yourself.

OR, it may not be that you did anything to suppress the thoughts, it may just be that the mind is structured in such a way that certain parts of the mind have no vocabulary with which to just inject a simple verbal thought into awareness. There's no reason a priori to assume that all parts of the mind have equal access to the phenological loop.

Maybe Gendlin's stuff is easier to swallow if you happen to already have this view of the conscious mind as the tip of the iceberg, with most of your beliefs and habits and thoughts being dominated by the vast but unreflective subconscious. If you get into meditation in any serious way, you can really consistently see that these unarticulated mental constructs are always lurking there, dominating behavior, pushing and pulling. To me, it's not woo at all, it's very concrete and actionable, but I understand that Gendlin's way of wording things may serve as a barrier to entry.

Comment author: ksvanhorn 09 April 2017 04:54:20PM 0 points [-]

I appreciate your explanation, and it makes sense to me. But I still can't find any hint in Gendlin's writing that he's speaking metaphorically.

Comment author: moridinamael 20 December 2016 03:14:11PM 12 points [-]

A common bucket error for me: Idea X is a potentially very important research idea that is, as far as I know, original to me. It would really suck to discover that this wasn't original to me. Thus, I don't want to find out if this is already in the literature.

This is a change from how I used to think about flinches: I used to be moralistic, and to feel disapproval when I noticed a flinch, and to assume the flinch had no positive purpose. I therefore used to try to just grit my teeth and think about the painful thing, without first "factoring" the "purposes" of the flinch, as I do now.

This is key. Any habit that involves "gritting your teeth" is not durable.

Also, Focusing should easily be part of the LW "required reading".

Comment author: ksvanhorn 16 January 2017 03:43:00PM 0 points [-]

I'm reading Gendlin's book Focusing and struggling with it -- it's hard for me to understand why you and Anna think so highly of this book. It's hard to get past all the mystic woo about knowledge "in the body"; Gendlin seems to think that anything not in the conscious mind is somehow stored/processed out there in the muscles and bones. Even taking that as metaphorical -- which Gendlin clearly does not -- I find his description of the process very unclear.

Comment author: Irgy 30 July 2015 05:02:10AM *  1 point [-]

To my view, the 1/36 is "obviously" the right answer, what's interesting is exactly how it all went wrong in the other case. I'm honestly not all that enlightened by the argument given here nor in the links. The important question is, how would I recognise this mistake easily in the future? The best I have for the moment is "don't blindly apply a proportion argument" and "be careful when dealing with infinite scenarios even when they're disguised as otherwise". I think the combination of the two was required here, the proportion argument failed because the maths which normally supports it couldn't be used without at some point colliding with the partly-hidden infinity in the problem setup.

I'd be interested in more development of how this relates to anthropic arguments. It does feel like it highlights some of the weaknesses in anthropic arguments. It seems to strongly undermine the doomsday argument in particular. My take on it is that it highlights the folly of the idea that population is endlessly exponentially growing. At some point that has to stop regardless of whether it has yet already, and as soon as you take that into account I suspect the maths behind the argument collapses.

Edit: Just another thought. I tried harder to understand your argument and I'm not convinced it's enough. Have you heard of ignorance priors? They're the prior you use, in fact the prior you need to use, to represent a state of no knowledge about a measurement other than an invariance property which identifies the type of measurement it is. So an ignorance prior for a position is constant, and for a scale is 1/x, and for a probability has been at least argued to be 1/x(1-x). These all have the property that their integral is infinite, but they work because as soon as you add some knowledge and apply Bayes rule the result becomes integrable. These are part of the foundations of Bayesian probability theory. So while I agree with the conclusion, I don't think the argument that the prior is unnormalisable is sufficient proof.

Comment author: ksvanhorn 05 August 2015 04:48:02PM *  1 point [-]

Actually, no, improper priors such as you suggest are not part of the foundations of Bayesian probability theory. It's only legitimate to use an improper prior if the result you get is the limit of the results you get from a sequence of progressively more diffuse priors that tend to the improper prior in the limit. The Marginalization Paradox is an example where just plugging in an improper prior without considering the limiting process leads to an apparent contradiction. My analysis (http://ksvanhorn.com/bayes/Papers/mp.pdf) is that the problem there ultimately stems from non-uniform convergence.

I've had some email discussions with Scott Aaronson, and my conclusion is that the Dice Room scenario really isn't an appropriate metaphor for the question of human extinction. There are no anthropic considerations in the Dice Room, and the existence of a larger population from which the kidnap victims are taken introduces complications that have no counterpart when discussing the human extinction scenario.

You could formalize the human extinction scenario with unrealistic parameters for growth and generational risk as follows:

  • Let n be the number of generations for which humanity survives.

  • The population in each generation is 10 times as large as the previous generation.

  • There is a risk 1/36 of extinction in each generation. Hence, P(n=N+1 | n >= n) = 1/36.

  • You are a randomly chosen individual from the entirety of all humans who will ever exist. Specifically, P(you belong to generation g) = 10^g / N, where N is the sum of 10^t for 1 <= t <= n.

Analyzing this problem, I get

P(extinction occurs in generation t | extinction no earlier than generation t) = 1/36

P(extinction occurs in generation t | you are in generation t) = about 9/10

That's a vast difference depending on whether or not we take into account anthropic considerations.

The Dice Room analogy would be if the madman first rolled the dice until he got snake-eyes, then went out and kidnapped a bunch of people, randomly divided them into n batches, each 10 times larger than the previous, and murdered the last batch. This is a different process than what is described in the book, and results in different answers.

Comment author: Slider 28 July 2015 10:13:14PM 3 points [-]

The madman murders only almost always. It is possible but vanishingly unlikely that he just never rolls snake eyes (or he runs outside of the total population with the growth so he can't get a full patch). Option 1 doesn't care whether the doom ultimately happens while option 2 assumes that the doom will happen.

The proper enlish version of option two would be "Given that the dice came up snake eyes and that you were kidnapped at some point what is the probabilty that it did so while you were kidnapped?". Notice also that this is independent off what dice readings result in doom. That is if the world is only saved on snake eyes the chance is still "only" 9/10.

Comment author: ksvanhorn 29 July 2015 06:32:39PM 0 points [-]

Note that

P(you are in batch t | murders batch t & you are kidnapped)

cannot be 9/10 for all t; in fact, this probability must go to 0 in the limit as t -> infinity, regardless of what prior you use.

View more: Next