I'm probably explaining it poorly in the post. P0 is not just a function of statements in F. P0 is a probability measure on the space of truth assignments i.e. functions {statement in F} -> {truth, false}. This probability measure is defined by making the truth value of each statement an independent random variable with 50/50 distribution.
PD is produced from P0 by imposing the condition "there is no contradiction of length <= D" on the truth assignment, i.e. we set the probability of all truth assignments that violate the condition to 0 and renormalize the probabilities of all other assignments. In other words P_D(s) = # {D-consistent truth assignments in which s is assigned true} / # {D-consistent truth assignments}.
Technicality: There is an infinite number of statements so there is an infinite number of truth assignments. However there is only a finite number of statements that can figure in contradictions of length <= D. Therefore all the other statements can be ignored (i.e. assumed to have independent probabilities of 1/2 like in P_0). More formally, the sigma-algebra of measurable sets on the space of truth assignments is generated by sets of the form {truth assignment T | T(s) = true} and {truth assignment T | T(s) = false}. The set of D-consistent truth assignments is in this sigma algebra and has positive probability w.r.t. our measure (as long as F is D-consistent) so we can use this set to form a conditional probability measure.
It may not be clear what you meant by "length" of contradiction. Is it the number of deductive steps to reach a contradiction, or the total number of symbols in a proof of contradiction?
Consider for instance two sentences X and ~X where X contains a billion symbols ... Is that a contradiction of length 1, or a contradiction of length about 2 billion? I think you mean about 2 billion. In which case, you will always have PD(s) = 0.5 for sentences s of length greater than D. Right?
Followup to: Intelligence Metrics and Decision Theory
Related to: Bridge Collapse: Reductionism as Engineering Problem
A central problem in AGI is giving a formal definition of intelligence. Marcus Hutter has proposed AIXI as a model of perfectly intelligent agent. Legg and Hutter have defined a quantitative measure of intelligence applicable to any suitable formalized agent such that AIXI is the agent with maximal intelligence according to this measure.
Legg-Hutter intelligence suffers from a number of problems I have previously discussed, the most important being:
Logical Uncertainty
The formalism introduced here was originally proposed by Benja.
Fix a formal system F. We want to be able to assign probabilities to statements s in F, taking into account limited computing resources. Fix D a natural number related to the amount of computing resources that I call "depth of analysis".
Define P0(s) := 1/2 for all s to be our initial prior, i.e. each statement's truth value is decided by a fair coin toss. Now define
PD(s) := P0(s | there are no contradictions of length <= D).
Consider X to be a number in [0, 1] given by a definition in F. Then dk(X) := "The k-th digit of the binary expansion of X is 1" is a statement in F. We define ED(X) := Σk 2-k PD(dk(X)).
Remarks
PD(s) = 0.
Non-Constructive UDT
Consider A a decision algorithm for optimizing utility U, producing an output ("decision") which is an element of C. Here U is just a constant defined in F. We define the U-value of c in C for A at depth of analysis D to be
VD(c, A; U) := ED(U | "A produces c" is true). It is only well defined as long as "A doesn't produce c" cannot be proved at depth of analysis D i.e. PD("A produces c") > 0. We define the absolute U-value of c for A to be
V(c, A; U) := ED(c, A)(U | "A produces c" is true) where D(c, A) := max {D | PD("A produces c") > 0}. Of course D(c, A) can be infinite in which case Einf(...) is understood to mean limD -> inf ED(...).
For example V(c, A; U) yields the natural values for A an ambient control algorithm applied to e.g. a simple model of Newcomb's problem. To see this note that given A's output the value of U can be determined at low depths of analysis whereas the output of A requires a very high depth of analysis to determine.
Naturalized Induction
Our starting point is the "innate model" N: a certain a priori model of the universe including the agent G. This model encodes the universe as a sequence of natural numbers Y = (yk) which obeys either specific deterministic or non-deterministic dynamics or at least some constraints on the possible histories. It may or may not include information on the initial conditions. For example, N can describe the universe as a universal Turing machine M (representing G) with special "sensory" registers e. N constraints the dynamics to be compatible with the rules of the Turing machine but leaves unspecified the behavior of e. Alternatively, N can contain in addition to M a non-trivial model of the environment. Or N can be a cellular automaton with the agent corresponding to a certain collection of cells.
However, G's confidence in N is limited: otherwise it wouldn't need induction. We cannot start with 0 confidence: it's impossible to program a machine if you don't have even a guess of how it works. Instead we introduce a positive real number t which represents the timescale over which N is expected to hold. We then assign to each hypothesis H about Y (you can think about them as programs which compute yk given yj for j < k; more on that later) the weight QS(H) := 2-L(H) (1 - e-t(H)/t). Here L(H) is the length of H's encoding in bits and t(H) is the time during which H remains compatible with N. This is defined for N of deterministic / constraint type but can be generalized to stochastic N.
The weights QS(H) define a probability measure on the space of hypotheses which induces a probability measure on the space of histories Y. Thus we get an alternative to Solomonoff induction which allows for G to be a mechanistic part of the universe, at the price of introducing N and t.
Remarks
Intelligence Metric
To assign intelligence to agents we need to add two ingredients:
Instead, we define I(Q0) := EQS(Emax(U(Y(H)) | "Q(Y(H)) = Q0" is true)). Here the subscript max stands for maximal depth of analysis, as in the construction of absolute UDT value above.
Remarks