HamletHenna comments on AI reflection problem - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (26)
Scott Aaronson wrote
Taking this seriously, it seems that if PA is consistent, then PA + ¬Cons(PA) ⊢ ◻¬Cons(PA) is true, i.e. there exists a term, X, encoding a proof of ¬Cons(PA). It also seems, from "you'll never be able to trap them", that we can't construct this proof. Which is to say, within some PA-like systems of interest, there are things (existentially) provable which are not (constructively) provable. So I agree that an AI should be able to accept statements provable in the second system before modifying to use that system. Our difference is between requiring existential proofs (which are not interesting to me, since things can be provable and yet never proved) and constructive proofs (a set whose membership is undecidable, yet which, by showing equivalence between the two systems or between the first and a subsystem of the second, we can accept). Again however, this is not a real limitation, because the constructive proof of a statement lets us infer that the statement is existentially provable. Thus we recover the "sensible" existentially provable statements, and discard the ones which are provable and yet unprovable.
I don't think this is quite what you meant. This means PA + ¬Cons(PA) allows you to prove that you can prove ¬Cons(PA), where the second use of the word 'prove' does not make reference to any specific formal system. The turnstile refers to the specific theory that is being used, here PA + ¬Cons(PA), and the box symbol is used when the meaning is clear from context, but you've used both.
This seems like a description of some constructive logic version of PA. I'm not sure exactly how that would work or if there is a standard constructive axiomatization of the natural numbers, but the set of theorems of constructive logic is a subset of those of classical logic, so this will not be able to prove that some given axiom system is trustable if the original system cannot do so.
You could add constructive theorems if you also add axioms, which I'm not sure whether you're proposing. For example, given a universal Turing machine U and a number x, there is a shortest program p such that U(p) = x and it could be the case, depending on U and x, that PA can prove that such a p exists, but not what it is. Then, by adding axioms that U never halts on certain inputs, it can be possible to deduce the value of p.
I don't see how that could help with the reflection problem though. Do you propose adding soundness of some formal system as an axiom, but preventing Loeb's theorem from applying by switching to constructive logic? I think Loeb's theorem is still valid constructively; it doesn't use negation, but I'm not sure if that is a sufficient criterion to be provable constructively.
It's called Heyting arithmetic, but no, nowhere above do I mean intuitionistic logic when I say constructive. Or I don't think I do. I recognize Löb's theorem and am not trying restrict any system's expressive power to avoid it (i.e. using Dan Willard's self verifying theories).
Maybe I'll try to prove some sufficient conditions for a Gödel machine to self modify in a toy universe given this perspective. That should make it more convincing / concrete.