An approach to logical counterfactuals inspired by the Demski prior

Scott Garrabrant

This is a probabilistic approach to logical counterfactuals inspired by the Demski prior. Given a sentence $ϕ$ , we wish to ask what other logical sentences are true in the world $C (ϕ)$ where we correctly counterfactually assume $ϕ$ . I spoke about this before here and here, but now I am going to approach the problem slightly differently. Instead of insisting that every other sentence $ψ$ is either true or false in the counterfactual world $C (ϕ)$ , I allow $C (ϕ)$ to assign probabilities to each sentence.

When you try to describe counterfactual worlds as sets of true and false sentences, you necessarily have very sharp boundaries. You will have, for example, counterfactual worlds where $ϕ$ and $ψ$ are true, but $ϕ \land ψ$ is false. Probabilities seem to be more realistic by allowing us to smooth the boundaries. As you look at sentences which are more and more logically distinct from $ϕ$ , you can gradually change probabilities so that they will represent the truth, rather than representing consequences of $ϕ$ .

Let $μ$ be a measure on logical sentences, for example, $μ (ϕ) = 2^{- ℓ (ϕ)}$ , where $ℓ (ϕ)$ is the number of bits necessary to encode $ϕ$ . Let $T_{N}$ be the theory containing all sentences true about the natural numbers. Consider the following procedure which computes $P (ψ | ϕ)$ . This definition is only for $ϕ$ which are consistent by themselves.

Let $T_{0}$ be the theory containing only the sentence $ϕ$ . For $n \geq 0$ , compute $T_{n + 1}$ from $T_{n}$ by sampling a sentence $t_{n}$ according to $μ$ . If both $T_{N} + t_{n}$ and $T_{n} + t_{n}$ are consistent, then $T_{n + 1} = T_{n} + t_{n}$ . Otherwise, $T_{n + 1} = T_{n}$ . Let $T_{\infty}$ be the union of all of the $T_{n}$ . $P (ψ | ϕ)$ is the probability that $ψ$ is a consequence of $T_{\infty}$ .

As it is, this procedure is not approximable, but you can make a similar thing approximable by replacing $T_{N}$ with $P A$ , or a complete theory sampled from your favorite approximable distribution.

Claim: $P (\cdot | ϕ)$ gives a coherent probability assignment which assigns probability 1 to $ϕ$ , and thus represents a probability distribution on complete theories.

Proof: The probability distribution on complete theories is exactly the distribution on $T_{\infty}$ . All we need to show is that $T_{\infty}$ is complete (with probability 1). Take a sentence $ψ$ . Either $ψ$ or $\neg ψ$ is consistent with $T_{N}$ . WLOG assume $ψ$ is consistent with $T_{N}$ . The sentence $ψ$ is eventually sampled at some time $n$ . Either $ψ$ is added to $T_{n}$ , or it is inconsistent with $T_{n}$ . Therefore, either $T_{n}$ proves $\neg ψ$ , or $T_{n + 1}$ proves $ψ$ , so $T_{\infty}$ proves either $ψ$ or $\neg ψ$ . Note that $T_{\infty}$ does not necessarily contain either $ψ$ or $\neg ψ$ as an axiom.

Now I will give the reasons I am considering this proposal. None of the following is stuff I actually know to be true. I think it is plausible that my intuitions about the result of this procedure are very wrong.

It seems that true sentences will generally have high probabilities. Thus, if $ϕ$ and $ψ$ are both complex sentences, and there is a simple proof of $ϕ \to ψ$ , it is likely that many true sentences will be sampled and accepted before $ψ$ is considered. Thus, it seems plausible that sufficiently many simple axioms to complete the simple proof of $ϕ \to ψ$ will be accepted before $ψ$ is considered. If this happens, $ψ$ will automatically be included. It seems then that $ψ$ will have a high probability.

Thus it is plausible that this proposal follows the spirit of the conjecture that simple proofs of $ϕ \to ψ$ correspond to legitimate counterfactuals. Note that this informal argument only goes the one direction. If If there is a simple proof of $ϕ \to ψ$ , it seems likely that $P (ψ | ϕ)$ will be large, but it does not seem likely that $P (ϕ | ψ)$ will be large. This in consistent with my idea of logical counterfactuals.

Proof-length based definitions of counterfactuals usually have the unfortunate property that they are dependent on the formalities of our proof system, and equivalent proof systems can give very different proof lengths. Perhaps this proposal can get many of the nice properties of proof length based systems, while being independent choosing different equivalent ways to carry out proofs.

I am, as usual, a bit confused. If you require a sentence to be consistent with (e.g.) PA before being added to $T_{n + 1}$ , this proposal is unable to assign nonzero probability to the trillionth digit of pi being 2 - and conditional on the trillionth digit of pi counterfactually being 2, it is unable to go on to believe in PA.

It seems like some looser condition for adding to the theory is needed. Not just as a sop to practicality, but to get at some important desiderata of logical counterfactuals.

Here's the picture of logical counterfactuals that I'm currently thinking under:

People have some method for generating mental models of math, and these mental models have independencies that the ground truth mathematics doesn't. E.g., when I imagine the trillionth digit of pi being 2, it doesn't change (in the mental model) whether the Collatz conjecture is true. In fact, for typical scenarios I consider, I can continue to endorse (in my mental model) the typical properties of the real numbers, even when considering a collection of statements, most of which are inconsistent with those properties (like assigning a distribution over some digit of pi).

This apparent independence produces an apparent partial causal graph (within a certain family of mental models), which leads to the use of causal language like "Even if one set the trillionth digit of pi to 2, it would not change the things I'm taking for granted in my mental models, nor would it change the things that would not change in my mental model when I change the setting of this digit of pi."

It seems like some looser condition for adding to the theory is needed. Not just as a sop to practicality, but to get at some important desiderata of logical counterfactuals.

Here's the picture of logical counterfactuals that I'm currently thinking under:

LESSWRONG
LW

LESSWRONG
LW

3

An approach to logical counterfactuals inspired by the Demski prior

3

Ω 2

3

Ω 2

3

Ω 2