V_V comments on An Introduction to Löb's Theorem in MIRI Research - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (27)
I haven't read the paper yet (thanks for posting it, anyway), so maybe the answer to my question is there, but there is something about MIRI interest with Löb's theorem that always bugged me, specifically:
Sure, DT1 can't prove that DT2 decisions are reliable, and in fact in general it can't even prove that DT1 itself makes reliable decisions, but DT1 may be able to prove "Assuming that DT1 decisions are reliable, then DT2 decisions are reliable".
Isn't that enough for all practical purposes?
Notice that this even makes sense in the limit case where DT2 = DT1, which isn't necessarily just a theoretical pathological case but can happen in practice when even a non-self-modifying DT1 ponders "Why should I not kill myself?"
Am I missing something?
Isn't Löb's theorem just essentially a formal way of showing that you can't prove that you are not insane?
Good question! Translating your question to the setting of the logical model, you're suggesting that instead of using provability in Peano Arithmetic as the criterion for justified action, or provability in PA + Con(PA) (which would have the same difficulty), the agent uses provability under the assumption that its current formal system (which includes PA) is consistent.
Unfortunately, this turns out to be an inconsistent formal system!
Thus, you definitely do not want an agent that makes decisions on the criterion "if I assume that my own deductions are reliable, then can I show that this is the best action?", at least not until you've come up with a heuristic version of this that doesn't lead to awful self-fulfilling prophecies.
I don't think he was talking about self-PA, but rather an altered decision criteria, such that rather that "if I can prove this is good, do it" it is "if I can prove that if I am consistent then this is good, do it" which I think doesn't have this particular problem, though it does have others, and it still can't /increase/ in proof strength.
Yes.
Mmm, I think I can see it.
What about "if I can prove that if a version of me with unbounded computational resources is consistent then this is good, do it". (*) It seems to me that this allows increase in proof strength up to the proof strength of that particular ideal reference agent.
(* there should be probably additional constraints that specify that the current agent, and the successor if present, must be provably approximations of the unbounded agent in some conservative way)
"if I can prove that if a version of me with unbounded computational resources is consistent then this is good, do it"
In this formalism we generally assume infinite resources anyway. And even if this is not the case, consistent/inconsistent doesn't depend on resources, only on the axioms and rules for deduction. So this still doesn't let you increase in proof strength, although again it should help avoid losing it.
If we are already assuming infinite resources, then do we really need anything stronger than PA?
A formal system may be inconsistent, but a resource-bounded theorem prover working on it might never be able to prove any contradiction for a given resource bound. If you increase the resource bound, contradictions may become provable.