Paul Christiano has devised a new fundamental approach to the "Löb Problem" wherein Löb's Theorem seems to pose an obstacle to AIs building successor AIs, or adopting successor versions of their own code, that trust the same amount of mathematics as the original. (I am currently writing up a more thorough description of the question this preliminary technical report is working on answering. For now the main online description is in a quick Summit talk I gave. See also Benja Fallenstein's description of the problem in the course of presenting a different angle of attack. Roughly the problem is that mathematical systems can only prove the soundness of, aka 'trust', weaker mathematical systems. If you try to write out an exact description of how AIs would build their successors or successor versions of their code in the most obvious way, it looks like the mathematical strength of the proof system would tend to be stepped down each time, which is undesirable.)
Paul Christiano's approach is inspired by the idea that whereof one cannot prove or disprove, thereof one must assign probabilities: and that although no mathematical system can contain its own truth predicate, a mathematical system might be able to contain a reflectively consistent probability predicate. In particular, it looks like we can have:
∀a, b: (a < P(φ) < b) ⇒ P(a < P('φ') < b) = 1
∀a, b: P(a ≤ P('φ') ≤ b) > 0 ⇒ a ≤ P(φ) ≤ b
Suppose I present you with the human and probabilistic version of a Gödel sentence, the Whitely sentence "You assign this statement a probability less than 30%." If you disbelieve this statement, it is true. If you believe it, it is false. If you assign 30% probability to it, it is false. If you assign 29% probability to it, it is true.
Paul's approach resolves this problem by restricting your belief about your own probability assignment to within epsilon of 30% for any epsilon. So Paul's approach replies, "Well, I assign almost exactly 30% probability to that statement - maybe a little more, maybe a little less - in fact I think there's about a 30% chance that I'm a tiny bit under 0.3 probability and a 70% chance that I'm a tiny bit over 0.3 probability." A standard fixed-point theorem then implies that a consistent assignment like this should exist. If asked if the probability is over 0.2999 or under 0.30001 you will reply with a definite yes.
We haven't yet worked out a walkthrough showing if/how this solves the Löb obstacle to self-modification, and the probabilistic theory itself is nonconstructive (we've shown that something like this should exist, but not how to compute it). Even so, a possible fundamental triumph over Tarski's theorem on the undefinability of truth and a number of standard Gödelian limitations is important news as math qua math, though work here is still in very preliminary stages. There are even whispers of unrestricted comprehension in a probabilistic version of set theory with ∀φ: ∃S: P(x ∈ S) = P(φ(x)), though this part is not in the preliminary report and is at even earlier stages and could easily not work out at all.
It seems important to remark on how this result was developed: Paul Christiano showed up with the idea (of consistent probabilistic reflection via a fixed-point theorem) to a week-long "math squad" (aka MIRI Workshop) with Marcello Herreshoff, Mihaly Barasz, and myself; then we all spent the next week proving that version after version of Paul's idea couldn't work or wouldn't yield self-modifying AI; until finally, a day after the workshop was supposed to end, it produced something that looked like it might work. If we hadn't been trying to solve this problem (with hope stemming from how it seemed like the sort of thing a reflective rational agent ought to be able to do somehow), this would be just another batch of impossibility results in the math literature. I remark on this because it may help demonstrate that Friendly AI is a productive approach to math qua math, which may aid some mathematician in becoming interested.
I further note that this does not mean the Löbian obstacle is resolved and no further work is required. Before we can conclude that we need a computably specified version of the theory plus a walkthrough for a self-modifying agent using it.
See also the blog post on the MIRI site (and subscribe to MIRI's newsletter here to keep abreast of research updates).
This LW post is the preferred place for feedback on the paper.
EDIT: But see discussion on a Google+ post by John Baez here. Also see here for how to display math LaTeX in comments.
So, first of all, there isn't really a particular formal proof system in the paper, except sort of implicitly; all we get is a function P(.) from L'-sentences to real numbers. However, we can talk about the first-order theory you obtain by taking the base theory T and adding to it the reflection axioms, i.e. the axioms a < P('G') < b for all G such that the function P(.) satisfies a < P(G) < b. [For anyone not paying very close attention, P('.') is a function symbol in the formal language, while P(.) is a meta-language function from the formal language to the real numbers.] The main theorem is that P(.) assigns probability 1 to this theory (and therefore to all consequences of this theory), so it is interesting to ask what it entails.
Your question, if I understand correctly, is whether this theory does in particular entail
[-1 < P('G') < 1] --> [P('-1 < P('G') < 1') = 1]
(note that this is simply a sentence in the language L', not a meta-language statement). There's no obvious reason that the system should be able to prove this -- all we have is T and the individual axioms of the form [a < P('G') < b] --, and indeed your proof shows that (unless there is an inconsistency in the paper) this is not a theorem of the system: if it were, then so would be P('G') = 1, and therefore ¬G, leading to a contradiction since P(.) assigns probability 1 to all theorems of this system.
In short, yes, there are true statements about P(.) such that the corresponding statement about P('.') is false in the formal system, and assigned zero probability by P(.). And yes, I do believe this is worth keeping in mind.
Here is my current preferred way of thinking about this: In non-standard analysis, there is a "standard" operator st(.) which takes a non-standard real number x and returns the standard number st(x) that x is infinitesimally close to (assuming one exists, which is the case if x is not unbounded). This operator can also be applied to functions f; st(f)(x) is the standard number which f(x) is infinitesimally close to. I imagine that there is a collection of "possible" non-standard models, each of which fulfills all the axioms that P(.) assigns probability 1, and that one of these possible non-standard models is the true one describing the real world. Then, P(.) is st(P('.')), where P('.') is the function in the true non-standard model. [ETA: This is always defined because we have -1 < P('G') < 2, so every value of P('.') is infinitesimally close to a standard number.]
How could the real world be described by a non-standard model, you ask? The following is rather handwavy, but I'm imagining the agent as computing an infinite sequence of approximations P_i(.) to the intended distribution, and the true model to be the ultrapower of some standard model of ZFC, with P('.') being interpreted as P_i(.) in the i'th component of the ultrapower. Whether any leverage can be gotten out of making this idea formal, I don't know. (And it's certainly true that to pull off this intuition, I still need to have a free ultrafilter falling out of the sky.)
Ok, for [a < P('G') < b] I see why you'd use a schema (because it's an object level axiom, not a meta-language axiom).
But this still seems possibly problematic. We know that adding axioms like [-1 < P('G') < 1] --> [P('-1 < P('G') < 1') = 1], would break the system. But I don't see a reason to suppose that the other reflection axioms don't break the system. It might or it might not, but I'm not sure; was there a proof along the lines of "this system is inconsistent if and only if the initial system is" or something?