V_V comments on An Introduction to Löb's Theorem in MIRI Research - Less Wrong

16 Post author: orthonormal 23 March 2015 10:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (27)

You are viewing a single comment's thread.

Comment author: V_V 25 March 2015 06:12:38PM *  1 point [-]

I haven't read the paper yet (thanks for posting it, anyway), so maybe the answer to my question is there, but there is something about MIRI interest with Löb's theorem that always bugged me, specifically:

Unfortunately, the straightforward way of setting up such a model fails catastrophically on the innocent-sounding step “DT1 knows that DT2’s deductions are reliable”. If we try and model DT1 and DT2 as proving statements in two formal systems (one stronger than the other), then the only way that DT1 can make such a statement about DT2’s reliability is if DT1 (and thus both) are in fact unreliable! This counterintuitive roadblock is best explained by reference to Löb’s theorem, and so we turn to the background of that theorem.

Sure, DT1 can't prove that DT2 decisions are reliable, and in fact in general it can't even prove that DT1 itself makes reliable decisions, but DT1 may be able to prove "Assuming that DT1 decisions are reliable, then DT2 decisions are reliable".
Isn't that enough for all practical purposes?

Notice that this even makes sense in the limit case where DT2 = DT1, which isn't necessarily just a theoretical pathological case but can happen in practice when even a non-self-modifying DT1 ponders "Why should I not kill myself?"

Am I missing something?
Isn't Löb's theorem just essentially a formal way of showing that you can't prove that you are not insane?

Comment author: orthonormal 25 March 2015 06:24:50PM 1 point [-]

Good question! Translating your question to the setting of the logical model, you're suggesting that instead of using provability in Peano Arithmetic as the criterion for justified action, or provability in PA + Con(PA) (which would have the same difficulty), the agent uses provability under the assumption that its current formal system (which includes PA) is consistent.

Unfortunately, this turns out to be an inconsistent formal system!

Thus, you definitely do not want an agent that makes decisions on the criterion "if I assume that my own deductions are reliable, then can I show that this is the best action?", at least not until you've come up with a heuristic version of this that doesn't lead to awful self-fulfilling prophecies.

Comment author: Quill_McGee 25 March 2015 06:31:24PM 2 points [-]

I don't think he was talking about self-PA, but rather an altered decision criteria, such that rather that "if I can prove this is good, do it" it is "if I can prove that if I am consistent then this is good, do it" which I think doesn't have this particular problem, though it does have others, and it still can't /increase/ in proof strength.

Comment author: V_V 25 March 2015 06:46:46PM *  1 point [-]

I don't think he was talking about self-PA, but rather an altered decision criteria, such that rather that "if I can prove this is good, do it" it is "if I can prove that if I am consistent then this is good, do it"

Yes.

and it still can't /increase/ in proof strength.

Mmm, I think I can see it.
What about "if I can prove that if a version of me with unbounded computational resources is consistent then this is good, do it". (*) It seems to me that this allows increase in proof strength up to the proof strength of that particular ideal reference agent.

(* there should be probably additional constraints that specify that the current agent, and the successor if present, must be provably approximations of the unbounded agent in some conservative way)

Comment author: Quill_McGee 25 March 2015 09:42:00PM 1 point [-]

"if I can prove that if a version of me with unbounded computational resources is consistent then this is good, do it"

In this formalism we generally assume infinite resources anyway. And even if this is not the case, consistent/inconsistent doesn't depend on resources, only on the axioms and rules for deduction. So this still doesn't let you increase in proof strength, although again it should help avoid losing it.

Comment author: V_V 25 March 2015 09:49:25PM *  1 point [-]

If we are already assuming infinite resources, then do we really need anything stronger than PA?

And even if this is not the case, consistent/inconsistent doesn't depend on resources, only on the axioms and rules for deduction.

A formal system may be inconsistent, but a resource-bounded theorem prover working on it might never be able to prove any contradiction for a given resource bound. If you increase the resource bound, contradictions may become provable.