Motivation
Löb's theorem is pretty counterintuitive. Speaking informally about one or more agents engaging in logically valid reasoning, the theorem says something like this:
- "If it's believed that believing a particular prophesy would cause it to self-fulfill, then the prophecy will be believed."
Formally, the theorem just says , where means " is provable". This is most weird if X is a false statement, because it means you can't prove that you can't prove X (Gödel's theorem).
(Also: the box symbol above is not an unrendered symbol; it's supposed to be a box.)
Meta-motivation
I'd like to make Löb's theorem more intuitive for humans, because it shows how agents can sidestep the need to mentally simulate each other in games, by instead using reasoning/arguments that happen to self-fulfill in a good and defensible way. Basically, Löbian reflection helps agents to avoid metacognitive stack overflows when thinking about themselves and each other, and I want more human beings to understand how that can work, and sometimes already does work, in real world agents.
State of the art
The best attempt I've seen to make Löb's theorem more intuitive is Eliezer Yudkowsky's Cartoon Guide to Löb's Theorem, which is still quite confusing. The confusingness comes from thinking about a self-referential statement ( on Wikipedia; S on Arbital) that's used to carry out the proof. The statement basically says "If this statement is provable, then C." Dealing with that sentence is pretty cumbersome, and requires a lot of fiddling around with nested statements and implications.
Doing better
I think we can make a new proof of Löb's theorem that doesn't use that weird self-referential sentence, by instead making the proof of Löb's theorem itself self-referential. Page 15 of the following paper poses an open problem on how to do this, which I think is possible to resolve affirmatively:
Here's a screenshot from it:

If we can make a proof like that work, we could then use the following much shorter and simpler cartoon guide to explain it:

In other words, I want to write a proof of Löb's theorem that is structured much like a Quine. I'd like the details to (eventually) be really crisp and readable, so it can be peer reviewed for correctness, at a level of rigor comparable to this earlier paper on Löb's theorem for proof systems with bounded length:
As a possible start to writing such a proof, I think some of the same machinery (e.g., the diagonal lemma) from Boolos's textbook "The Logic of Provability" can be used to cook up self-referential proofs fitting the cartoon template above... thereby making Löb's theorem less mysterious and more intuitive forever.
Further meta-motivation (added Nov 26)
A key reason I'm interested in having a self-referential self-validating proof, rather than a normal-ish proof about a sentence that is self-referential (like on Wikipedia), is that human documents often refer to themselves, but human sentences rarely refer directly to themselves in isolation. This sentence is an exception, but such utterances are generally rare. So, making Löb more intuitive to humans either means
- making humans more accustomed to thinking about sentences that refer to themselves (so the traditional proof of Löb can be more intuitive), or
- finding a new proof that self-references in a way that's more like the way a human document refers to itself.
The aim of this post is to try for the second approach. Notice how you didn't think the previous sentence was weirdly self-referential? That's because it referred to the whole document that it's part of, rather than just narrowly referring directly to itself. I'm not quite sure why humans are more comfortable with that than with directly-and-narrowly self-referential sentences. Maybe it's because I thing needs to have enough non-self-referential content before humans will care about it enough to tolerate it talking about itself. I'm not sure.
Countering naive impossibility arguments (added Nov 26)
It's in fact possible to use Löb's theorem to construct a self-referential proof of the kind illustrated above. So, naive arguments that "Proofs can't be self-referential like that" are not going to hold up. The hard thing I'm asking for is to make such a proof — of Löb's theorem specifically — without going through the traditional proof of Löb, i.e., without using a narrowly-and-directly-self-referential sentence (like on Wikipedia).
Difficulty assessment
I think getting a Qunean proof to work here is probably a bit easier than the JSL paper linked directly above. Like, I think if I spent 3 months on it (which is how long the JSL paper took), then I could do it. I do think getting it right involves mentally syncing with some fundamental and standard machinery in provability logic, which can take a bit of time if you've never worked in the area before (as I hadn't).
Anyway, if you can figure out how to write this proof, please post your solution or solution attempts here :)
various attempts to distill my objection:
the details of the omitted "zip" operation are going to be no simpler than the standard proof of löb's theorem, and will probably turn out to be just a variation on the standard proof of löb's theorem (unless you can find a way of building self-reference that's shorter than the Y combinator (b/c the standard proof of löb's theorem is already just the Y combinator plus the minimal modifications to interplay with gödel-codes))
even more compressed: the normal proof of löb is contained in the thing labeled "zip". the argument in the OP is operating under the assumption that a proof can make reference to its own gödel-number, and that assumption is essentially just löb's theorem.
a thing that might help is asking what specific proof-steps the step "1. the gödel-code P of this very proof proves C" cashes out into. obviously step 1 can't cash out into verifying (under quotation) every step of the entire proof (by checking for the appropriate prime-factors), because then the proof would require at least one step for every step in the proof plus three extra steps at the end, which would make the proof infinite (and infinite proofs admit contradictions).
so step 1 has to involve some indirection somehow. (cf: we can't get a fixpoint in λ-calculus by
f (f (f ...
, we need some indirection, as occurs in(Y f)
aka(λ s. f (s s)) (λ s. f (s s))
, to make the term finite.)this works fine, as long as you are ok w/ the gödel-code in step 1 being not P, but some different proof of the same conclusion (analogous to the difference between
f (Y f)
andY f
), and so long as you have the analog of the Y combinator (modified to interplay with gödel-codes).but the analog of the Y combinator is precisely löb's theorem :-p
if i understand your OP correctly, we can factor it into three claims:
(1) is löb's theorem.
(2) is an indirect proof of löb's theorem, from the assumption of löb's theorem (in direct analogy with how, once we've found syntax
Y f
for the fixpoint of f, we can wrap it in f to find an alternate syntax, namelyf (Y f))
.(3) is false (i'm fairly confident)--you can get real close, but you can't quite get there (in direct analogy to how you can get λ-term
Y f
that means the same thing asf (Y f)
, but you can't get a λ-termfix f
that is syntactically exactlyf (fix f)
, on pain of infinite syntax).afaict, there's two ways to read this proposal (if we port across the analogy to λ-calculus), according to whether you cash "zip" out to a recursive instance of the whole proposal, vs whether you cash "zip" out to some different proof of löb's theorem:
"i hear that people find the Y combinator confusing. instead of using
(Y f)
to name the fixpoint of f, why not usef (f (f (f (...
ad infinitum?""i hear that people find the Y combinator confusing. instead of using
(Y f)
to name the fixpoint of f, why not usef (Y f)
?"the answer to (1) is that it doesn't make sense unless you allow infinite syntax trees (which aren't permitted in normal math b/c infinite proofs admit contradictions).
the answer to (2) is that, while
(Y f)
may be somewhat tricky to grasp,f (Y f)
is surely no simpler :-p