I haven't read the paper yet (thanks for posting it, anyway), so maybe the answer to my question is there, but there is something about MIRI interest with Löb's theorem that always bugged me, specifically:
Unfortunately, the straightforward way of setting up such a model fails catastrophically on the innocent-sounding step “DT1 knows that DT2’s deductions are reliable”. If we try and model DT1 and DT2 as proving statements in two formal systems (one stronger than the other), then the only way that DT1 can make such a statement about DT2’s reliability is if DT1 (and thus both) are in fact unreliable! This counterintuitive roadblock is best explained by reference to Löb’s theorem, and so we turn to the background of that theorem.
Sure, DT1 can't prove that DT2 decisions are reliable, and in fact in general it can't even prove that DT1 itself makes reliable decisions, but DT1 may be able to prove "Assuming that DT1 decisions are reliable, then DT2 decisions are reliable".
Isn't that enough for all practical purposes?
Notice that this even makes sense in the limit case where DT2 = DT1, which isn't necessarily just a theoretical pathological case but can happen in practice when even a non-self-modifying DT1 ponders "Why should I not kill myself?"
Am I missing something?
Isn't Löb's theorem just essentially a formal way of showing that you can't prove that you are not insane?
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Similar to V_V's comment. I'm only a layman, but I don't understand one of the first steps.
The paper appears to jump straight into the logic, justifying itself (pardon the pun) by a statement about the intricacies of formal systems proving things about other formal systems.
Can you explain how and why all AI systems correspond to formal systems - what precisely you mean by 'formal system'?
(Btw, I have in fact read GEB, but I am still unable to confidently define a formal system.)
Or a better question might be "Why does this need to be possible within the realms of formal logic to be doable at the level of code/AI? There are many general problems intractable as part of pure maths, yet many numerical methods exist to solve particular instantiations. Why must this be doable at the level of pure logic?"
Good question as well! I should add motivation to the notes on this point.
Basically, we study deterministic mathematical objects within formal systems because these cases are easier for us to analyze and prove theorems about; it's an example of looking first under the streetlight before meandering farther into the darkness. Hopefully this way we'll notice many phenomena that still hold when we add randomness, bounded computations, and other realistic limitations.
And indeed, plenty of the topics in MIRI research do hold up when you make them messier. Many of the topics in formal systems can handle random variables just as easily as deterministic facts. There's a version of Löb's Theorem for bounded-length proof searches, and tracing it through shows that Löbian cooperation and other phenomena behave the same there as in the unbounded case (for sufficiently large choices of the bounds). Using quantum random variables to evaluate logical counterfactuals doesn't solve implementing UDT, it just collapses it into CDT; so we still need to study logical counterfactuals. And so on.
Also, when I say "formal system" you can usually substitute in "Peano Arithmetic" or "Peano Arithmetic plus some extra axioms".