Lo! A cartoon proof of Löb's Theorem!
Löb's Theorem shows that a mathematical system cannot assert its own soundness without becoming inconsistent. Marcello and I wanted to be able to see the truth of Löb's Theorem at a glance, so we doodled it out in the form of a cartoon. (An inability to trust assertions made by a proof system isomorphic to yourself, may be an issue for self-modifying AIs.)
It was while learning mathematical logic that I first learned to rigorously distinguish between X, the truth of X, the quotation of X, a proof of X, and a proof that X's quotation was provable.
The cartoon guide follows as an embedded Scribd document after the jump, or you can download as a PDF file. Afterward I offer a medium-hard puzzle to test your skill at drawing logical distinctions.
Cartoon Guide to Löb's ... by on Scribd
Cartoon Guide to Löb's Theorem - Upload a Document to Scribd
And now for your medium-hard puzzle:
The Deduction Theorem (look it up) states that whenever assuming a hypothesis H enables us to prove a formula F in classical logic, then (H->F) is a theorem in classical logic.
Let ◻Z stand for the proposition "Z is provable". Löb's Theorem shows that, whenever we have ((◻C)->C), we can prove C.
Applying the Deduction Theorem to Löb's Theorem gives us, for all C:
((◻C)->C)->C
However, those familiar with the logic of material implication will realize that:
(X->Y)->Y
implies
(not X)->Y
Applied to the above, this yields (not ◻C)->C.
That is, all statements which lack proofs are true.
I cannot prove that 2 = 1.
Therefore 2 = 1.
Can you exactly pinpoint the flaw?
I think the missing piece is that it's really hard to formally-specify a scale of physical change.
I think the notion of "minimizing change" is secretly invoking multiple human brain abilities, which I suspect will each turn out to be very difficult to formalize. Given partial knowledge of a current situation S: (1) to predict the future states of the world if we take some hypothetical action, (2) to invent a concrete default / null action appropriate to S, and (3) to informally feel which of two hypothetical worlds is more or less "changed" with respect to the predicted null-action world.
I think (1) (2) and (3) feel so introspectively unobtrusive because we have no introspective access into them; they're cognitive black-boxes. We just see that their outputs are nearly always available when we need them, and fail to notice the existence of the black-boxes entirely.
You'll also require an additional ability, a stronger form of (3) which I'm not sure even humans implement: (4) given two hypothetical worlds H1 and H2, and the predicted null-action world W0, compute the ratio difference(H1, W0) / difference(H2, W0), without dangerous corner-cases.
If you can formally specify (1) (2) and (4), then yes! I then think you can use that to construct a utility function that won't obsess (won't "tile the universe") using the plan you described -- though I recommend investing more effort than my 30-minute musings to prove safety, if you seem poised to actually implement this plan.
Some issues I foresee:
Humans are imperfect at (1) and (2), and the (1)- and (2)-outputs are critical to not just ensuring non-obsession, but also to the intelligence quality of the AI overall. While formalizing human (1) and (2) algorithms may enable human-level general AI (a big win in its own right), superhuman AI will require non-human formalizations for (1) and (2). Inventing non-human formalizations here feels difficult and risky -- though perhaps unavoidable.
The hypothetical world states in (4) are very-very-high-dimensional objects, so corner-cases in (4) seem non-trivial to rule-out. A formalization of the human (3)-implementation might be sufficient for some viable alternative plan, in which case the difficulty of formalizing (3) is bounded-above by the difficulty of reverse-engineering the human (3) neurology. By contrast, inventing an inhuman (4) could be much more difficult and risky. This may be weak evidence that plans merely requiring (3) ought to be preferred over plans requiring (4).