cousin_it comments on An example of self-fulfilling spurious proofs in UDT - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (39)
Excellent! I wonder how far the malicious behavior can be extended.
Here the problem is that A directly uses a valid but malicious inference module Q. If A were built to enumerate proofs by length and act on the first one of the form "A()==a implies U()==u, and A()!=a implies U()<=u", can U be rewritten to guarantee that a particular spurious proof comes up first?
Or if A uses a halting oracle to check through statements of that form, can U be written (incorporating that oracle) so that the halting oracle fails to return for the genuine counterfactual and only returns "True" for a particular spurious one?
These are reasonable questions. I don't know the answers. Would be cool if someone else figured them out ;-)