Would you like to see a primer on several MIRI research topics (assuming only the background of having taken a course with proofs in math or computer science)? Or are you curious why MIRI does so much with mathematical logic, and why people on Less Wrong keep referring to Löb's Theorem?
If you answered yes to either question, you may be interested in my lecture notes, An Introduction to Löb's Theorem in MIRI Research! These came out of an introductory talk that I gave at a MIRIx workshop.
Since I've got some space here, I'll just copy and paste the table of contents and the introduction section...
Contents
1 Introduction
2 Crash Course in Löb's Theorem
2.1 Gödelian self-reference and quining programs
2.2 Löb's Theorem
3 Direct Uses of Löb's Theorem in MIRI Research
3.1 “The Löbstacle”
3.2 Löbian cooperation
3.3 Spurious counterfactuals
4 Crash Course in Model Theory
4.1 Axioms and theories
4.2 Alternative and nonstandard models
5 Uses of Model Theory in MIRI Research
5.1 Reflection in probabilistic logic
6 Crash Course in Gödel-Löb Modal Logic
6.1 The modal logic of provability
6.2 Fixed points of modal statements
7 Uses of Gödel-Löb Modal Logic in MIRI Research
7.1 Modal Combat in the Prisoner’s Dilemma
7.2 Modal Decision Theory
8 Acknowledgments
1 Introduction
This expository note is devoted to answering the following question: why do many MIRI research papers cite a 1955 theorem of Martin Löb, and indeed, why does MIRI focus so heavily on mathematical logic? The short answer is that this theorem illustrates the basic kind of self-reference involved when an algorithm considers its own output as part of the universe, and it is thus germane to many kinds of research involving self-modifying agents, especially when formal verification is involved or when we want to cleanly prove things in model problems. For a longer answer, well, welcome!
I’ll assume you have some background doing mathematical proofs and writing computer programs, but I won’t assume any background in mathematical logic beyond knowing the usual logical operators, nor that you’ve even heard of Löb’s Theorem before.
To motivate the mathematical sections that follow, let’s consider a toy problem. Say that we’ve designed Deep Thought 1.0, an AI that reasons about its possible actions and only takes actions that it can show to have good consequences on balance. One such action is designing a successor, Deep Thought 2.0, which has improved deductive abilities. But if Deep Thought 1.0 (hereafter called DT1) is to actually build Deep Thought 2.0 (DT2), DT1 must first conclude that building DT2 will have good consequences on balance.
There’s an immediate difficulty—the consequences of building DT2 include the actions that DT2 takes; but since DT2 has increased deductive powers, DT1 can’t actually figure out what actions DT2 is going to take. Naively, it seems as if it should be enough for DT1 to know that DT2 has the same goals as DT1, that DT2’s deductions are reliable, and that DT2 only takes actions that it deduces to have good consequences on balance.
Unfortunately, the straightforward way of setting up such a model fails catastrophically on the innocent-sounding step “DT1 knows that DT2’s deductions are reliable”. If we try and model DT1 and DT2 as proving statements in two formal systems (one stronger than the other), then the only way that DT1 can make such a statement about DT2’s reliability is if DT1 (and thus both) are in fact unreliable! This counterintuitive roadblock is best explained by reference to Löb’s theorem, and so we turn to the background of that theorem.
Good question as well! I should add motivation to the notes on this point.
Basically, we study deterministic mathematical objects within formal systems because these cases are easier for us to analyze and prove theorems about; it's an example of looking first under the streetlight before meandering farther into the darkness. Hopefully this way we'll notice many phenomena that still hold when we add randomness, bounded computations, and other realistic limitations.
And indeed, plenty of the topics in MIRI research do hold up when you make them messier. Many of the topics in formal systems can handle random variables just as easily as deterministic facts. There's a version of Löb's Theorem for bounded-length proof searches, and tracing it through shows that Löbian cooperation and other phenomena behave the same there as in the unbounded case (for sufficiently large choices of the bounds). Using quantum random variables to evaluate logical counterfactuals doesn't solve implementing UDT, it just collapses it into CDT; so we still need to study logical counterfactuals. And so on.
Also, when I say "formal system" you can usually substitute in "Peano Arithmetic" or "Peano Arithmetic plus some extra axioms".
Right! So you're trying to get ahold of the idea of an intelligent computational agent, in clear formalisms, and trying to solve the basic issues that arise there. And often, the issues you discover at the fundamental mathematical level work their way through to the highly applied level.
That makes sense. I feel like this is the most direct answer to the question