Eliezer_Yudkowsky comments on Reflection in Probabilistic Logic - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (171)
This is all nifty and interesting, as mathematics, but I feel like you are probably barking up the wrong tree when it comes to applying this stuff to AI. I say this for a couple of reasons:
First, ZFC itself is already comically overpowered. Have you read about reverse mathematics? Stephen Simpson edited a good book on the topic. Anyway, my point is that there's a whole spectrum of systems a lot weaker than ZFC that are sufficient for a large fraction of theorems, and probably all the reasoning that you would ever need to do physics or make real word, actionable decisions. The idea that physics could depend on reasoning of a higher consistency strength than ZFC just feels really wrong to me. Like the idea that P could really equal NP. Of course my gut feeling isn't evidence, but I'm interested in the question of why we disagree. Why do you think these considerations are likely to be important?
Second, Isn't the the whole topic of formal reasoning a bike shed? Isn't the real danger that you will formalize the goal function wrong, not that the deductions will be invalid?
The problem is if your mathematical power has to go down each time you create a successor or equivalently self-modify. If PA could prove itself sound that might well be enough for many purposes. The problem is if you need a system that proves another system sound and in this case the system strength has to be stepped down each time. That is the Lob obstacle.
Each time you create a successor or equivalently self-modify, or rather each time that an action that you take has a non-zero chance of modifying yourself? Wouldn't that mean you'd have to check constantly? What if the effect of an action is typically unpredictable at least to some degree?
Also, since any system is implemented using a physical substrate, some change is inevitable (and until the AI has powered up to multiple redundancy level not yet so stochastically unlikely as to be ignorable). What happens if that change affects the "prove the system sound" physical component, is there a provable way to safeguard against that? (Obligatory "who watches the watcher".)
It's a hard problem any way you slice it. You there, go solve it.
So you are assuming that it will be wanting to prove the soundness of any successors? Even though it can't even prove the soundness of itself? But it can believe in it's own soundness in a Bayesian sense without being able to prove it. There is not (as far as I know) any Godelian obstacle to that. I guess that was your point in the first place.
Why would successors use a different system, though? Verifying proofs in formal systems is easy, it's coming up with the proofs that's difficult -- an AI would refine its heuristics in order to figure out proofs more efficiently, but it would not necessarily want to change the system it is checking them against.
It would want to change the system it checked proofs against if it did not trust that system. A naively built system which checked its proofs with PA but did not trust PA probabilistically (ie, held some probability that PA is false: not difficult to construct) would very possibly prefer to reduce its set of axioms to something which PA verifies (something true with 100% probability in its understanding). But (at least if the reduced system is still robinson arithmetic or stronger) the cycle would continue, since the new system is also not self-verifying.
This cycle could (likely) be predicted by the system, however, so it is not obvious what the system would actually choose to do.
If the system did not trust PA, why would it trust a system because PA verifies it? More to the point, why would it trust a self-verifying system, given that past a certain strength, only inconsistent systems are self-verifying?
If the system held some probability that PA was inconsistent, it could evaluate it on the grounds of usefulness, perhaps contrasting it with other systems. It could also try to construct contradictions, increasing its confidence in PA for as long as it doesn't find any. That's what we do, and frankly, I don't see any other way to do it.
Because that's how it works! The system "is" PA, so it will trust (weaker) systems that it (PA) can verify, but it will not trust itself (PA).
It would only trust them if it could verify them.
True.
Not necessarily; this depends on how the system works. In my probabilistic prior, this would work to some degree, but because there exists a nonstandard model in which PA is inconsistent (there are infinite proofs ending in contradictions), there will be a fixed probability of inconsistency which cannot be ruled out by any amount of testing.
That doesn't seem consistent to me. If you do not trust yourself fully, then you should not fully trust anything you demonstrate, and even if you do, there is still no incentive to switch. Suppose that the AI can demonstrate the consistency of system S from PA, and wants to demonstrate proposition A. If AI trusts S as demonstrated by PA, then it should also trust A as demonstrated by PA, so there is no reason to use S to demonstrate A. In other words, it is not consistent for PA to trust S and not A. Not fully, at any rate. So why use S at all?
What may be the case, however, (that might be what you're getting to) is that in demonstrating the consistency of S, PA assigns
P(Consistent(S)) > P(Consistent(PA)). Therefore, what both PA and S can demonstrate would be true with greater probability than what only PA demonstrates. However, this kind of system solves our problem in two ways: first, it means the AI can keep using PA, but can increase its confidence in some statement A by counting the number of systems that prove A. Second, it means that the AI can increase its confidence in PA by arbitrarily building stronger systems and proving the consistency of PA from within these stronger systems. Again, that's similar to what we do.That sounds reasonable to me -- the usefulness of certainty diminishes sharply as it approaches 1 anyway. Your paper sounds interesting, I'll give it a read when I have the time :)
You are smuggling in some assumptions from your experience with human cognition.
If I believe X, but don't believe that the process producing my beliefs is sane, then I will act on X (I believe it, and we haven't yet talked about any bridge between what I believe, and what I believe about my beliefs), but I still won't trust myself in general.
I certainly agree that "if I don't trust myself, I don't trust my output" is an assumption I draw from human cognition, but if I discard that assumption and those that seem in the same reference class it I'm not sure I have much of a referent left for "I trust myself" at all. Is there a simple answer to what it means for me to trust myself, on your account? That is, how could I tell whether or not I did? What would be different if I did or didn't?
E.g., do you think the generalization "things I believe are true" is true? Would you be comfortable acquiring influence that you can use as you see fit, because you trust yourself to do something reasonable?
I have a machine which maximizes expected utility given the judgments output by some box O. If O says "don't trust O!" it won't cause the machine to stop maximizing expected utility according to O's judgments---that's just what it is programmed to do. But it will cause the machine to doubt that it will do a good job, and work to replace itself with an alternative. The details of exactly what happens obviously depend on the formalism, but hopefully the point makes sense.