An Introduction to Löb's Theorem in MIRI Research

orthonormal

29 An Introduction to Löb's Theorem in MIRI Research

23rd Mar 2015

2 min read

29

Would you like to see a primer on several MIRI research topics (assuming only the background of having taken a course with proofs in math or computer science)? Or are you curious why MIRI does so much with mathematical logic, and why people on Less Wrong keep referring to Löb's Theorem?

If you answered yes to either question, you may be interested in my lecture notes, An Introduction to Löb's Theorem in MIRI Research! These came out of an introductory talk that I gave at a MIRIx workshop.

Since I've got some space here, I'll just copy and paste the table of contents and the introduction section...

1 Introduction

2 Crash Course in Löb's Theorem

2.1 Gödelian self-reference and quining programs

2.2 Löb's Theorem

3 Direct Uses of Löb's Theorem in MIRI Research

3.1 “The Löbstacle”

3.2 Löbian cooperation

3.3 Spurious counterfactuals

4 Crash Course in Model Theory

4.1 Axioms and theories

4.2 Alternative and nonstandard models

5 Uses of Model Theory in MIRI Research

5.1 Reflection in probabilistic logic

6 Crash Course in Gödel-Löb Modal Logic

6.1 The modal logic of provability

6.2 Fixed points of modal statements

7 Uses of Gödel-Löb Modal Logic in MIRI Research

7.1 Modal Combat in the Prisoner’s Dilemma

7.2 Modal Decision Theory

8 Acknowledgments

1 Introduction

This expository note is devoted to answering the following question: why do many MIRI research papers cite a 1955 theorem of Martin Löb, and indeed, why does MIRI focus so heavily on mathematical logic? The short answer is that this theorem illustrates the basic kind of self-reference involved when an algorithm considers its own output as part of the universe, and it is thus germane to many kinds of research involving self-modifying agents, especially when formal verification is involved or when we want to cleanly prove things in model problems. For a longer answer, well, welcome!

I’ll assume you have some background doing mathematical proofs and writing computer programs, but I won’t assume any background in mathematical logic beyond knowing the usual logical operators, nor that you’ve even heard of Löb’s Theorem before.

To motivate the mathematical sections that follow, let’s consider a toy problem. Say that we’ve designed Deep Thought 1.0, an AI that reasons about its possible actions and only takes actions that it can show to have good consequences on balance. One such action is designing a successor, Deep Thought 2.0, which has improved deductive abilities. But if Deep Thought 1.0 (hereafter called DT1) is to actually build Deep Thought 2.0 (DT2), DT1 must first conclude that building DT2 will have good consequences on balance.

There’s an immediate difficulty—the consequences of building DT2 include the actions that DT2 takes; but since DT2 has increased deductive powers, DT1 can’t actually figure out what actions DT2 is going to take. Naively, it seems as if it should be enough for DT1 to know that DT2 has the same goals as DT1, that DT2’s deductions are reliable, and that DT2 only takes actions that it deduces to have good consequences on balance.

Unfortunately, the straightforward way of setting up such a model fails catastrophically on the innocent-sounding step “DT1 knows that DT2’s deductions are reliable”. If we try and model DT1 and DT2 as proving statements in two formal systems (one stronger than the other), then the only way that DT1 can make such a statement about DT2’s reliability is if DT1 (and thus both) are in fact unreliable! This counterintuitive roadblock is best explained by reference to Löb’s theorem, and so we turn to the background of that theorem.

(Here's the link to the full notes again.)

Löb's theoremMachine Intelligence Research Institute (MIRI)Spurious Counterfactuals

Personal Blog

29

New Comment

Rendering 0/27 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 8:52 AM

Moderation Log

29 An Introduction to Löb's Theorem in MIRI Research

by orthonormal

23rd Mar 2015

2 min read

29

Since I've got some space here, I'll just copy and paste the table of contents and the introduction section...

1 Introduction

2 Crash Course in Löb's Theorem

2.1 Gödelian self-reference and quining programs

2.2 Löb's Theorem

3 Direct Uses of Löb's Theorem in MIRI Research

3.1 “The Löbstacle”

3.2 Löbian cooperation

3.3 Spurious counterfactuals

4 Crash Course in Model Theory

4.1 Axioms and theories

4.2 Alternative and nonstandard models

5 Uses of Model Theory in MIRI Research

5.1 Reflection in probabilistic logic

6 Crash Course in Gödel-Löb Modal Logic

6.1 The modal logic of provability

6.2 Fixed points of modal statements

7 Uses of Gödel-Löb Modal Logic in MIRI Research

7.1 Modal Combat in the Prisoner’s Dilemma

7.2 Modal Decision Theory

8 Acknowledgments

1 Introduction

(Here's the link to the full notes again.)

Löb's theoremMachine Intelligence Research Institute (MIRI)Spurious Counterfactuals

Personal Blog

29

Mentioned in

70Probabilistic Payor Lemma?

50Alignment Might Never Be Solved, By Humans or AI

New Comment

Rendering 0/27 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 8:52 AM

Moderation Log

More from orthonormal

Curated and popular this week

27Comments

Comment Permalink

V_V11y10

I haven't read the paper yet (thanks for posting it, anyway), so maybe the answer to my question is there, but there is something about MIRI interest with Löb's theorem that always bugged me, specifically:

Unfortunately, the straightforward way of setting up such a model fails catastrophically on the innocent-sounding step “DT1 knows that DT2’s deductions are reliable”. If we try and model DT1 and DT2 as proving statements in two formal systems (one stronger than the other), then the only way that DT1 can make such a statement about DT2’s reliability is if DT1 (and thus both) are in fact unreliable! This counterintuitive roadblock is best explained by reference to Löb’s theorem, and so we turn to the background of that theorem.

Sure, DT1 can't prove that DT2 decisions are reliable, and in fact in general it can't even prove that DT1 itself makes reliable decisions, but DT1 may be able to prove "Assuming that DT1 decisions are reliable, then DT2 decisions are reliable".
Isn't that enough for all practical purposes?

Notice that this even makes sense in the limit case where DT2 = DT1, which isn't necessarily just a theoretical pathological case but can happen in practice when even a non-self-modifying DT1 ponders "Why should I not kill myself?"

Am I missing something?
Isn't Löb's theorem just essentially a formal way of showing that you can't prove that you are not insane?

orthonormal11y10

Good question! Translating your question to the setting of the logical model, you're suggesting that instead of using provability in Peano Arithmetic as the criterion for justified action, or provability in PA + Con(PA) (which would have the same difficulty), the agent uses provability under the assumption that its current formal system (which includes PA) is consistent.

Unfortunately, this turns out to be an inconsistent formal system!

Thus, you definitely do not want an agent that makes decisions on the criterion "if I assume that my own deductions ar... (read more)

See in context

29

An Introduction to Löb's Theorem in MIRI Research

29

Contents

1 Introduction

29

29

An Introduction to Löb's Theorem in MIRI Research

29

Contents

1 Introduction

29