smoofra comments on Reflection in Probabilistic Logic - Less Wrong

63 Post author: Eliezer_Yudkowsky 24 March 2013 04:37PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (171)

You are viewing a single comment's thread.

Comment author: smoofra 23 March 2013 08:41:30PM 4 points [-]

This is all nifty and interesting, as mathematics, but I feel like you are probably barking up the wrong tree when it comes to applying this stuff to AI. I say this for a couple of reasons:

First, ZFC itself is already comically overpowered. Have you read about reverse mathematics? Stephen Simpson edited a good book on the topic. Anyway, my point is that there's a whole spectrum of systems a lot weaker than ZFC that are sufficient for a large fraction of theorems, and probably all the reasoning that you would ever need to do physics or make real word, actionable decisions. The idea that physics could depend on reasoning of a higher consistency strength than ZFC just feels really wrong to me. Like the idea that P could really equal NP. Of course my gut feeling isn't evidence, but I'm interested in the question of why we disagree. Why do you think these considerations are likely to be important?

Second, Isn't the the whole topic of formal reasoning a bike shed? Isn't the real danger that you will formalize the goal function wrong, not that the deductions will be invalid?

Comment author: Nisan 23 March 2013 08:49:17PM 8 points [-]

The result works for theories as simple as Peano arithmetic.

Comment author: Eliezer_Yudkowsky 23 March 2013 11:04:59PM 7 points [-]

The problem is if your mathematical power has to go down each time you create a successor or equivalently self-modify. If PA could prove itself sound that might well be enough for many purposes. The problem is if you need a system that proves another system sound and in this case the system strength has to be stepped down each time. That is the Lob obstacle.

Comment author: Kawoomba 24 March 2013 08:25:57AM *  2 points [-]

Each time you create a successor or equivalently self-modify, or rather each time that an action that you take has a non-zero chance of modifying yourself? Wouldn't that mean you'd have to check constantly? What if the effect of an action is typically unpredictable at least to some degree?

Also, since any system is implemented using a physical substrate, some change is inevitable (and until the AI has powered up to multiple redundancy level not yet so stochastically unlikely as to be ignorable). What happens if that change affects the "prove the system sound" physical component, is there a provable way to safeguard against that? (Obligatory "who watches the watcher".)

It's a hard problem any way you slice it. You there, go solve it.

Comment author: smoofra 24 March 2013 03:23:03AM 2 points [-]

So you are assuming that it will be wanting to prove the soundness of any successors? Even though it can't even prove the soundness of itself? But it can believe in it's own soundness in a Bayesian sense without being able to prove it. There is not (as far as I know) any Godelian obstacle to that. I guess that was your point in the first place.

Comment author: Broolucks 24 March 2013 12:07:51AM *  0 points [-]

Why would successors use a different system, though? Verifying proofs in formal systems is easy, it's coming up with the proofs that's difficult -- an AI would refine its heuristics in order to figure out proofs more efficiently, but it would not necessarily want to change the system it is checking them against.

Comment author: abramdemski 24 March 2013 12:36:48AM 2 points [-]

It would want to change the system it checked proofs against if it did not trust that system. A naively built system which checked its proofs with PA but did not trust PA probabilistically (ie, held some probability that PA is false: not difficult to construct) would very possibly prefer to reduce its set of axioms to something which PA verifies (something true with 100% probability in its understanding). But (at least if the reduced system is still robinson arithmetic or stronger) the cycle would continue, since the new system is also not self-verifying.

This cycle could (likely) be predicted by the system, however, so it is not obvious what the system would actually choose to do.

Comment author: Broolucks 24 March 2013 01:00:25AM 1 point [-]

If the system did not trust PA, why would it trust a system because PA verifies it? More to the point, why would it trust a self-verifying system, given that past a certain strength, only inconsistent systems are self-verifying?

If the system held some probability that PA was inconsistent, it could evaluate it on the grounds of usefulness, perhaps contrasting it with other systems. It could also try to construct contradictions, increasing its confidence in PA for as long as it doesn't find any. That's what we do, and frankly, I don't see any other way to do it.

Comment author: abramdemski 24 March 2013 01:27:06AM 2 points [-]

If the system did not trust PA, why would it trust a system because PA verifies it?

Because that's how it works! The system "is" PA, so it will trust (weaker) systems that it (PA) can verify, but it will not trust itself (PA).

More to the point, why would it trust a self-verifying system, given that past a certain strength, only inconsistent systems are self-verifying?

It would only trust them if it could verify them.

If the system held some probability that PA was inconsistent, it could evaluate it on the grounds of usefulness, perhaps contrasting it with other systems.

True.

It could also try to construct contradictions, increasing its confidence in PA for as long as it doesn't find any.

Not necessarily; this depends on how the system works. In my probabilistic prior, this would work to some degree, but because there exists a nonstandard model in which PA is inconsistent (there are infinite proofs ending in contradictions), there will be a fixed probability of inconsistency which cannot be ruled out by any amount of testing.

Comment author: Broolucks 24 March 2013 02:05:21AM *  0 points [-]

Because that's how it works! The system "is" PA, so it will trust (weaker) systems that it (PA) can verify, but it will not trust itself (PA).

That doesn't seem consistent to me. If you do not trust yourself fully, then you should not fully trust anything you demonstrate, and even if you do, there is still no incentive to switch. Suppose that the AI can demonstrate the consistency of system S from PA, and wants to demonstrate proposition A. If AI trusts S as demonstrated by PA, then it should also trust A as demonstrated by PA, so there is no reason to use S to demonstrate A. In other words, it is not consistent for PA to trust S and not A. Not fully, at any rate. So why use S at all?

What may be the case, however, (that might be what you're getting to) is that in demonstrating the consistency of S, PA assigns P(Consistent(S)) > P(Consistent(PA)). Therefore, what both PA and S can demonstrate would be true with greater probability than what only PA demonstrates. However, this kind of system solves our problem in two ways: first, it means the AI can keep using PA, but can increase its confidence in some statement A by counting the number of systems that prove A. Second, it means that the AI can increase its confidence in PA by arbitrarily building stronger systems and proving the consistency of PA from within these stronger systems. Again, that's similar to what we do.

Not necessarily; this depends on how the system works. In my probabilistic prior, this would work to some degree, but because there exists a nonstandard model in which PA is inconsistent (there are infinite proofs ending in contradictions), there will be a fixed probability of inconsistency which cannot be ruled out by any amount of testing.

That sounds reasonable to me -- the usefulness of certainty diminishes sharply as it approaches 1 anyway. Your paper sounds interesting, I'll give it a read when I have the time :)

Comment author: paulfchristiano 24 March 2013 04:15:23AM 1 point [-]

You are smuggling in some assumptions from your experience with human cognition.

If I believe X, but don't believe that the process producing my beliefs is sane, then I will act on X (I believe it, and we haven't yet talked about any bridge between what I believe, and what I believe about my beliefs), but I still won't trust myself in general.

Comment author: TheOtherDave 24 March 2013 04:39:59AM 1 point [-]

I certainly agree that "if I don't trust myself, I don't trust my output" is an assumption I draw from human cognition, but if I discard that assumption and those that seem in the same reference class it I'm not sure I have much of a referent left for "I trust myself" at all. Is there a simple answer to what it means for me to trust myself, on your account? That is, how could I tell whether or not I did? What would be different if I did or didn't?

Comment author: paulfchristiano 24 March 2013 04:56:42AM 3 points [-]

E.g., do you think the generalization "things I believe are true" is true? Would you be comfortable acquiring influence that you can use as you see fit, because you trust yourself to do something reasonable?

I have a machine which maximizes expected utility given the judgments output by some box O. If O says "don't trust O!" it won't cause the machine to stop maximizing expected utility according to O's judgments---that's just what it is programmed to do. But it will cause the machine to doubt that it will do a good job, and work to replace itself with an alternative. The details of exactly what happens obviously depend on the formalism, but hopefully the point makes sense.

Comment author: paulfchristiano 24 March 2013 04:13:08AM 2 points [-]

As others have said, doesn't much matter whether you use ZFC or any other system.

On the second point: one general danger is that your ability to build systems to do X will outpace your understanding of systems that do X. In this case, you might mess up and not quite get what you want. One way to try and remedy this is to develop a better formal understanding of "systems that do X." AGI seems like an important case of this pattern, because there is some chance of serious negative consequences from failing, from building systems whose behavior you don't quite understand. (I tend to think this probability is smaller than Eliezer, but I think most everyone who has thought about it seriously agrees that this is a possible problem.) At the moment we have no such formal understanding for AGI, so it might be worth thinking about.

Comment author: lukeprog 23 March 2013 09:28:14PM 0 points [-]

Isn't the real danger that you will formalize the goal function wrong, not that the deductions will be invalid?

Both are huge difficulties, but most of the work in FAI is probably in the AI part, not the F part.

Comment author: wedrifid 24 March 2013 12:47:52AM 4 points [-]

Both are huge difficulties, but most of the work in FAI is probably in the AI part, not the F part.

It is worth noting that taking the "F" seriously implies adding a rather significant amounts of work to the "AI part". It requires whole extra orders of formal rigor and all the additional complications of provable goal stability under self improvement (regardless of what those goals happen to be). While this doesn't matter for the purpose of answering smoofra's question it seems to me that it could could be potentially misleading to neglect the difference in workload between creating "F-compatible AI" and "AI" when talking about the workload imposed by 'F'.

Note that I don't think I'm saying something controversial here. I am expecting this to just be wording that I am wary of rather than a fundamentally different understanding. But if I have actually misunderstood the MIRI position on the relative difficulty of Friendliness to arbitrary AI then I would appreciate being corrected. That would be significant new information for me to consider (and also extremely good news!)

The reason this is significant can of course be illustrated by considering the counterfactual world where a convergence thesis holds. Or, more relevantly, considering the possibility of GAI researchers that believe that a convergence thesis holds but somehow manage to be competent researchers anyhow. Their task becomes (crudely speaking) that of creating any AI that can make something smarter than itself. My estimate is that this is an order of magnitude simpler than the task for FAI creators creating an AI, even completely neglecting the work that goes into creating the goal system.

If I find out that I am mistaken about the relative difficulties here then I will get to drastically update my expectation of humanity surviving and in general in the direction of awesome things happening.

Comment author: [deleted] 26 March 2013 01:15:59AM 0 points [-]

I take it that by "convergence thesis", you're referring to the statement that all sufficiently intelligent agents will have approximately the same values.

Comment author: wedrifid 26 March 2013 07:33:26AM 0 points [-]

I take it that by "convergence thesis", you're referring to the statement that all sufficiently intelligent agents will have approximately the same values.

Yes.

Comment author: smoofra 23 March 2013 09:36:02PM 0 points [-]

OK, forget about F for a second. Isn't the huge difficulty finding the right deductions to make, not formalizing them and verifying them?