Based on a potential misreading of this post, I added the following caveat today:
Important Caveat: Arguments in natural language are basically never "theorems". The main reason is that human thinking isn't perfectly rational in virtually any precisely defined sense, so sometimes the hypotheses of an argument can hold while its conclusion remains unconvincing. Thus, the Löbian argument pattern of this post does not constitute a "theorem" about real-world humans: even when the hypotheses of the argument hold, the argument will not always play out like clockwork in the minds of real people. Nonetheless, Löb's-Theorem-like arguments can play out relatively simply in the English language, and this post shows what would look like.
I'm not convinced that "logically derivable" is a reasonable definition of "implicit", and I feel that the proof hinges on this in order to apply standard rules of logic to natural language statements.
And even replacing "implicit" with "logically derivable" might demand that we embed logic in natural language and point to that embedding with the phrase "logically derivable" in order to make the proof go through. Less mathematically/philosophically trained people might understand "logically derivable" to mean something quite different.
Hat tip to Ben Pace for pointing out that invitations are often self-referential, such as when people say "You are hereby invited", because "hereby" means "by this utterance":
https://www.lesswrong.com/posts/rrpnEDpLPxsmmsLzs/open-technical-problem-a-quinean-proof-of-loeb-s-theorem-for?commentId=CFvfaWGzJjnMP8FCa
That comment was like 25% of my inspiration for this post :)
I was confused for a while by trying to understand why invitations that are self-referential. It wasn’t until I read the inspirational post that I realized that you are referring to is the word “hereby.”
I guess I could have used that to be explicit, despite it being implicitly stated.
I'm having difficulties getting my head around the intended properties of the "implicitly" modal.
Related to: Löb's Lemma: an easier approach to Löb's Theorem.
Natural language models are really taking off, and it turns out there's an analogue of Löb's Theorem that occurs entirely in natural language — no math needed. This post will walk you through the details in a simple example: a very implicit party invitation.
Important Caveat: Arguments in natural language are basically never "theorems". The main reason is that human thinking isn't perfectly rational in virtually any precisely defined sense, so sometimes the hypotheses of an argument can hold while its conclusion remains unconvincing. Thus, the Löbian argument pattern of this post does not constitute a "theorem" about real-world humans: even when the hypotheses of the argument hold, the argument will not always play out like clockwork in the minds of real people. Nonetheless, Löb's-Theorem-like arguments can play out relatively simply in the English language, and this post shows what would look like.
Motivation
(Skip this if you just want to see the argument.)
Understanding the structure here may be helpful for anticipating whether Löbian phenomena can, will, or should arise amongst language-based AI systems. For instance, Löb's Theorem has implications for the emergence of cooperation and defection in groups of formally defined agents (LaVictoire et al, 2014; Critch, Dennis, Russell, 2022). The natural language version of Löb could play a similar role amongst agents that use language, which is something I plan to explore in a future post. Aside from being fun, I'm hoping this post will make clear that the phenomenon underlying Löb's Theorem isn't just a feature of formal logic or arithmetic, but of any language that can talk about reasoning and deduction in that language, including English. And as Ben Pace points out here, invitations are often self-referential, such as when people say "You are hereby invited to the party": hereby means "by this utterance" (google search). So invitations a natural place to explore the kind of self-reference happening in Löb's Theorem.
This post isn't really intended as an "explanation" of Löb's Theorem in its classical form, which is about arithmetic. Rather, the arguments here stand entirely on their own, are written in natural language, and are about natural language phenomena. That said, this post could still function as an "explanation" of Löb's Theorem because of the tight analogy with it.
Implicitness
Okay, imagine there's a party, and maybe you're invited to it. Or maybe you're implicitly invited to it. Either way, we'll be talking a bunch about things being implicit, with phrasing like this:
These will all mean "X is implied by things that are known (to you) (via deduction or logical inference)".
Explicit knowledge is also implicit. In this technical sense of the word, "implicit" and "explicit" are not actually mutually exclusive: X trivially implies X, so if you explicitly observed X in the world, then you also know X implicitly. If you find this bothersome or confusing, just grant me this anyway, or skip to "Why I don't treat 'implicit' and inexplicit' as synonyms here" at the end.
Abbreviations. To abbreviate things and to show there's a simple structure at play here, I'll sometimes use the box symbol "□" as shorthand to say things are implicit:
A peculiar invitation
Okay! Let p be the statement "You're invited to the party". You'd love to receive such a straightforward invitation to the party, like some people did, those poo poo heads, but instead the host just sends you the following intriguing message:
Abbreviation: □p→p
Interesting! Normally, being invited to a party and being implicitly invited are not the same thing, but for you in this case, apparently they are. Seeing this, you might feel like the host is hinting around at implicitly inviting you, and maybe you'll start to wonder if you're implicitly invited by virtue of the kind of hinting around that the host is doing with this very message. Well then, you'd be right! Here's how.
For the moment, forget about the host's message, and consider the following sentence, without assuming its truth (or implicitness):
Ψ:
The sentence Ψ has some cool properties that are going to let us write out a clever argument.
(Optional digression: Perhaps the apparent circularity of Ψ reminds you of an instance of Curry's paradox, like "If this sentence is true, then Santa Clause exists". The Santa Clause sentence is quite problematic, because if you try to assign it a truth value, you find that it must be true and that Santa Clause exists. If that's not obvious, or Curry's paradox doesn't bother you, just skip this paragraph and don't worry about it. If you are bothered by it, note that Curry's paradox is basically the content of Tarski's theorem on the undefinability of truth: if you try to let sentences talk about their own truth, you get paradoxes. But truth is not the same same as implicitness. It turns out sentences can talk about their own implicitness without leading to logical paradoxes, if "implicit" means what it means in this post: following by logical implication from things you know. )
The main observation we need to move forward is just this:
Line 0:
Abbreviation: Ψ↔□Ψ→p
Hopefully that's intuitively clear once you stare at it for 60 seconds. Just take the first sentence and unpack the "this", and you'll see it means the same thing as the second sentence. Please do give yourself a whole 60 seconds for this!
Another intuitive property we'll need is the following, which doesn't depend at all on the message from the host:
Line 1:
Abbreviation: □Ψ→□p
Please give yourself at least another 60 seconds to stare at that one. In short, if the indented thing were implicit, then you'd know it was implicit so it'd be implicitly implicit, which would allow you to apply it in an implicit argument to conclude you're implicitly invited. In case that doesn't become intuitively clear, please see "Proving Line 1" at the end see see these steps explicitly :)
Anyway, how does this stuff get you invited to the party? Well, the host did say to you specifically:
Line 2 (host's message):
Abbreviation: □p→p
... which combines with Line 1 to give:
Line 3:
Abbreviation: □Ψ→p
By Line 0, this is the same as:
Line 4:
Abbreviation: Ψ
Now, being the clever person that you are, you realize that since all of the above were implied by things you knew, it's all implicit:
Line 5:
Abbreviation: □Ψ
Finally, from Line 3 and Line 5, it follows that:
Line 6:
Abbreviation: p
Whoah, you're invited! Just because you thought about the weirdly indirect invitation long enough to infer the implication that you were invited.
Now that's my kind of party :)
Proving Line 1
Line 1 doesn't actually require the host's message at all. It follows directly from Line 0, like this:
Suppose:
Abbreviation: Suppose □Ψ
When something's implicit, you can tell that it's implicit by working through the steps of the implication. From this supposition, we're going to conclude □p, and then drop the supposition to conclude □Ψ→□p.
From the supposition, we get:
Line 0.1:
Abbreviation: □□Ψ
Also, by Line 0 and the supposition, we get
Line 0.2:
Abbreviation: □(□Ψ→p)
Now lines 0.1 and 0.2 combine, still under the supposition, to give:
Line 0.3:
Abbreviation: □p
Now, since we supposed □Ψ and deduced □p, we can drop the supposition and conclude an implication:
Line 1:
Abbreviation: □Ψ→□p
Why I don't treat 'implicit' and 'inexplicit' as synonyms here
In this post, I counted explicit knowledge as a special case of implicit knowledge. Here are five reasons for this:
One draft reader complained that most usages of the word "implicit" are for referring to things that are inexplicit, suggesting (implicitly!) that the two words should be treated as synonyms. However, consider that most usages of the word "mammal" are used to refer to non-human animals, yet "mammal" is not a perfect antonym of "human", and in fact includes humans as a specific case. Yes, I will grant that calling an explicit thing "implicit" is usually misleadingly vague, the way it's misleadingly (and perhaps humorously) vague to say "I live with another mammal" when you actually live with another human. But that needn't mean that "mammal" should be an antonym of "human". Similarly, "implicit" needn't be treated as an "antonym" of explicit, especially when we have a perfectly good English word "inexplicit" that already unambiguously means "not explicit".
Finally, to expound more on point 5, "trusting someone implicitly" means that stating the trust explicitly is unnecessary because the trust is implied without doubt (example google results). That usually doesn't rule out the possibility of also stating the trust explicitly. For the sake of making crisp arguments about implicit trust and implicit agreements — which I'll want to do in some subsequent posts — treating implicit and explicit as opposites will be a recipe for very cumbersome language. So for simplicity, it will be better to use "implicit" for "things that follow via implication", and use "inexplicit" if you really want a single-world adjective to unambiguously call something "not explicit".
Recap and Takeaways