You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

JGWeissman comments on The Inefficiency of Theoretical Discovery - Less Wrong Discussion

19 Post author: lukeprog 03 November 2013 09:26PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (109)

You are viewing a single comment's thread. Show more comments above.

Comment author: JGWeissman 04 November 2013 06:49:33PM 1 point [-]

The standard argument in favor of this line of research is that Lob's theorem implies that purely logical systems will have difficulty reasoning about their future behavior, and that therefore it is worth asking whether a probabilistic language of thought can overcome this obstacle. But I believe this is due to a failure to think sufficiently concretely about the problem. To give two examples: first, humans seem perfectly capable of making predictions about their future behavior despite this issue, and I have yet to see an argument for why AIs should be different. Secondly, we manage to routinely prove facts about the behavior of programs (this is the field of program analysis) despite the fact that in theory this should be "undecidable".

These examples involve predictions generated by processes which are not purely logical systems, and which we don't understand enough to code into an AI. So it seems like Paul's idea could be progress towards having a process that makes such predictions about itself that we can understand at that level.

Comment author: jsteinhardt 04 November 2013 08:17:57PM *  0 points [-]

Can you clarify in what sense you think a computer program is not a purely logical system? Or am I misunderstanding you?

ETA: In particular, I'm referring to the point where I said:

Secondly, we manage to routinely prove facts about the behavior of programs (this is the field of program analysis) despite the fact that in theory this should be "undecidable".

Comment author: JGWeissman 04 November 2013 09:00:29PM 0 points [-]

I didn't meant that the computer program is not a purely logical system, but that the people proving facts about its behavior aren't purely logical systems.

Comment author: jsteinhardt 04 November 2013 10:38:38PM 0 points [-]

Program analysis consists of writing computer programs that reason about other computer programs. Is the objection that these programs were written by a human? That seems like a strange objection to me if so.

Comment author: JGWeissman 04 November 2013 11:21:20PM 1 point [-]

Ok, then I did not understand exactly what you meant, but I still don't think this is a counterexample to the problem Paul's idea tries to get around.

The problem is that logical systems have problems reasoning about their own behavior, not a claim that there is no other logical system that can reason about them. In particular, we are interested in if an optimization process can model itself as an optimization process, accurately predicting that its future decisions are likely to achieve outcomes that score well on its optimization criteria and the score will be better if it has more resources, and will become much worse if its representation of its criteria gets corrupted, (all using abstract reasoning in much less time than fully simulating its future decisions). Can program analysis do that?

Comment author: jsteinhardt 05 November 2013 03:33:40AM *  2 points [-]

ETA: Also, I should note that this is a good question and I'm glad you asked it!

If your question is whether a program analyzer can, when given itself as input, produce sensible results, the answer is yes. Program analyzers are meant to run on arbitrary code, so in particular they can be run on themselves as a special instance. (Actually, nothing particularly special happens in this case as far as I can tell.)

Now, a key point is the formalism we are working in: a program analyzer takes in a program P and some specification S, and checks whether P obeys specification S (for instance, checking that it never accesses memory before it allocates it). Importantly, P is allowed to do any 3 of the following:

  • report that P satisfies S (with a proof)
  • report that P does not satisfy S (with a counterexample)
  • give up (i.e. report "unsure")

So a key question is how often it gives up! But while there are many instances where modern program analysis tools do give up, there are also many where they don't. Furthermore if you are an AI and you propose a modification to your code, and your program analysis subroutine reports "unsure", you are free (and would be wise) to try a different modification instead. Researchers in the field of program analysis are extremely cognizant of the halting problem (which is closely related to Lob's theorem) and typically deal with it by over-approximating, e.g. by identifying conditions that would be sufficient for halting, although not necessarily necessary. As a result one obtains solubility at the cost of precision (although note that the approximation is always sound: if we report that P satisfies S, then P does indeed always satisfy S).

Comment author: JGWeissman 05 November 2013 04:22:50PM 1 point [-]

This is a good approach for dealing with the halting problem, but I think that Lob's theorem is not so closely related that getting around the halting problem means you get around Lob's theorem.

The theoretical AI that would run into Lob's theorem would need more general proof producing capability than these relatively simple program analyzers.

It seems like these program analyzers are built around the specification S they check for, with the human programmer doing the work of constructing a structure of a proof which can be filled in to a complete proof by supplying basic facts that the program can check. For example, I have a library that produces .Net assemblies, with byte code that targets a stack based virtual machine, and I want to verify that instructions to read elements off the stack get elements of the type is expecting. So I wrote my library to keep track of types that could be written to the stack at any execution point (represented by a stack of types). It is straightforward to compute the possible stack states after each instruction, given that instruction and the previous possible stack trace, and determine if the instruction is legal given the previous state (well, branching makes it more complicated, but not too much). So, in this case, I came up with the structure of tracking possible states after each instruction, and then it was straightforward to write my program to fill in that structure, but I did not, and don't know how to, write my program to come up with the proof structure. It is therefor easy to be confident that proof will have nice properties, because the space of possible proofs with this structure is much smaller than the space of all possible proofs.

The theoretical AI that would run into Lob's theorem would be able to come up with proof structures, as an AI that could only use proof structures prepackaged by human programmers would have huge gaps in its capabilities. This may introduce difficulties not seen in simple program analyzers.

In the example I asked about earlier, the optimizer that needs to prove things about its own future decisions, it runs into an issue that Lob's theorem applies to. In order prove that its own future decisions are good, it would rely on the fact that it's future decision are based on its own sound proof system, so it needs to assert that its own proof system is sound, that if its proof system says "X", then X. Lob's theorem says that, for all statements X, if a system P (at least as powerful as Peano arithmetic) says "P says 'X' implies X" then P says X. So, if the system asserts its own soundness, it asserts anything.

So, in summary:

Lob's theorem is a problem for generally powerful formal logic based proof systems that assert their own soundness.

Program analyzers are formal logic based proof systems that do not run into Lob's theorem, because they are not generally powerful, and do not assert their own soundness.

Humans are generally powerful proof generators than can have confidence in their own soundness, but are not purely based on formal logic, (we might intuitively dismiss a formal logical proof as "spurious") and so can get around Lob's theorem, but we don't understand how humans do this well enough to write a program to do it.

Paul's idea of a probabilistic proof system could lead to a generally powerful proof generator with confidence in its own soundness that is not based on formal logic, so that it gets around Lob's theorem, that we can understand well enough to write a program for.

Comment author: V_V 17 November 2013 07:06:40PM 0 points [-]

It seems like these program analyzers are built around the specification S they check for, with the human programmer doing the work of constructing a structure of a proof which can be filled in to a complete proof by supplying basic facts that the program can check. For example, I have a library that produces .Net assemblies, with byte code that targets a stack based virtual machine, and I want to verify that instructions to read elements off the stack get elements of the type is expecting. So I wrote my library to keep track of types that could be written to the stack at any execution point (represented by a stack of types). It is straightforward to compute the possible stack states after each instruction, given that instruction and the previous possible stack trace, and determine if the instruction is legal given the previous state (well, branching makes it more complicated, but not too much). So, in this case, I came up with the structure of tracking possible states after each instruction, and then it was straightforward to write my program to fill in that structure, but I did not, and don't know how to, write my program to come up with the proof structure. It is therefor easy to be confident that proof will have nice properties, because the space of possible proofs with this structure is much smaller than the space of all possible proofs.

I'm not sure I've understood what you have in mind here, but in the general case complete type checking in .NET (that is, proving that an assembly not only is syntactically well-formed but also never throws type-related exceptions at runtime) is undecidable because of Rice's theorem.

In the general case, complete type checking is as difficult as proving arbitrary claims in first-order logic.

Comment author: JGWeissman 17 November 2013 11:28:12PM 0 points [-]

I am not trying to write a classifier that tells you whether or not an arbitrary program throws a type exception. I wrote a verifier that tells you whether or not an arbitrary program can be proven not to throw type exceptions (except possibly at an explicit cast statement, or a throw exception statement) with a particular proof strategy that covers a huge space of useful, nicely structured programs.

See also jsteinhardt's comment I was responding to, which discussed getting around the halting problem by allowing the checker to say "I don't know".

Comment author: V_V 18 November 2013 01:26:04AM *  0 points [-]

am not trying to write a classifier that tells you whether or not an arbitrary program throws a type exception. I wrote a verifier that tells you whether or not an arbitrary program can be proven not to throw type exceptions (except possibly at an explicit cast statement, or a throw exception statement)

I'm not an expert on .NET, but is there anything that can throw a type exception other than an explicit cast or an explicit throw (or the standard library, I suppose)?

Comment author: jsteinhardt 05 November 2013 05:32:26PM *  0 points [-]

My mathematical logic is a bit rusty, but my impression is that the following are true:

  1. Godel's theorem presents a strictly stronger obstacle than Lob's theorem. A reflectively consistent theory may still be incomplete, but any complete theory is necessarily reflectively consistent.

  2. The undecidability of the halting problem is basically Godel's theorem stated in computational terms. If we could identify a subset L of Turing machines for whom the halting problem can be decided, as long as it was closed under operations such as inserting a (non-self-referential) sub-routine, then we would be able to verify any (non-self-referential) property of the program that was also expressible in L. This is a sketch of a claim rather than an actual claim that I've proved, though.

Finally, I think it's worth pointing out an actual example of a program analysis tool since I think they are more powerful than you have in mind. The following slides are a good example of such a tool. At a high level, it gets around the problems you are worried about by constructing an over-approximation of the halting problem that is expressible in propositional logic (and thus decidable, in fact it is even in NP). More generally we can construct a sequence of approximations, each expressible in propositional logic, whose conjunction is no longer an approximation but in fact exactly the original statement.

Comment author: JGWeissman 06 November 2013 02:50:46AM 0 points [-]

Godel's theorem presents a strictly stronger obstacle than Lob's theorem.

Why do you say that? My understanding is that Godel's theorem says that a (sufficiently powerful) logical system has true statements it can't prove, but these statements are excessively complicated and probably not important. Is there some way you envision an AGI being limited in its capacity to achieve its goals by Godel's theorem, as we envision Lob's theorem blocking an AGI from trusting its future self to make effective decisions? (Besides where the goals are tailored to be blocked by the theorem: "Prove all true statements in my formal system")

Finally, I think it's worth pointing out an actual example of a program analysis tool since I think they are more powerful than you have in mind. The following slides are a good example of such a tool.

As near as I can tell, this is more powerful than other static analysis tools that I have seen (though maybe not, the Google Web Toolkit optimizer is pretty impressive), but it is well within what I expect to be possible, and doesn't get around Lob's theorem. (I can't be too confident in this assessment of its power because I don't see a clear claim of what sort of statements it can prove or how it works except that it seems to detect when inputs to a statement may have invalid values and that it uses brute force techniques to analyze functions and then associates a summary of the analysis with that function (constraints on valid inputs and guarantees on outputs?) so that its analysis of call sites can use the summary.) (This sort of program analyzer is interesting in its own right, and I would like to see a more complete explanation of it, but I don't think it's existence says anything about the problems posed by Lob's theorem.)

Comment author: jsteinhardt 06 November 2013 04:24:11AM 0 points [-]

Why do you say that?

Do you agree or disagree that complete implies reflectively consistent? If you agree, then do you agree or disagree that this means avoidance of Godelian obstacles implies avoidance of Lobian obstacles? If you agree with both of those statements, I'm confused as to why:

Godel's theorem presents a strictly stronger obstacle than Lob's theorem.

is a controversial statement.

Comment author: [deleted] 04 November 2013 08:29:28PM -1 points [-]

I'm guessing that he is using the standard AI terminology of a "logical program" as one which reasons by means of logical inference over a knowledgebase. This is not how human minds work, nor do many AI researchers view it as a viable path forward.

The types of programs which have seen success in the AI revolution that is currently going on, often are not amenable to program analysis. They very often fall in the "undecidable" category, at least with respect to their anticipated behavior.

Comment author: asr 04 November 2013 10:06:58PM 1 point [-]

The types of programs which have seen success in the AI revolution that is currently going on, often are not amenable to program analysis. They very often fall in the "undecidable" category, at least with respect to their anticipated behavior.

This depends which properties you care about. I suspect there isn't a small model of the AI program that duplicates its output behavior, but you can do a lot with off-the-shelf analysis.

It's relatively easy to prove things like "this program will only ever use the following subset of available system calls," or "this program is well typed", or "the program cannot modify itself", or "can only modify these fixed specific parts of its state."

I can also imagine a useful AI program where you can prove a bound on the run-time, of the form "this program will always terminate in at most X steps", for some actual fixed constant X. (Obviously you cannot do this for programs in general, but if the program only loops over its inputs for a fixed number of iterations or somesuch, you can do it.)

These sorts of properties are far from a general proof "this program is Safe", but they are non-trivial and useful properties to verify.

Comment author: [deleted] 04 November 2013 10:47:58PM 0 points [-]

The FAI-related things you would want to prove are of the form: “when given a world state characterized by X, input percepts characterized by Y, the program always generates outputs characterized by Z.” For many existing and popular AI architectures, we haven't the foggiest idea how to do this. (It's no surprise that Eliezer Yudkowsky favors Pearl's causal probabilistic graph models, where such analysis is at least known to be possible.)

Comment author: jsteinhardt 05 November 2013 03:40:17AM 0 points [-]

To the extent that X, Y, and Z can be written formally within the programming language, program analysis at least in principle is fully capable of proving such a statement. I apologize if this comes off as rude, but your statement that "we haven't the foggiest idea how to do this" is flat-out false. While there are certainly unique challenges to reasoning about the sort of code that gets written in machine learning, it seems to me that the main reason we don't have well-developed analysis tools is that most code doesn't look like machine learning code, and so there has been little pressure to develop such tools.

Comment author: jsteinhardt 04 November 2013 10:37:07PM *  0 points [-]

Program analysis consists of writing computer programs that reason about other computer programs. Is the objection that these programs were written by a human? That seems like a strange objection to me if so.

Edit: oops, replied to wrong comment.

Comment author: jsteinhardt 04 November 2013 09:05:31PM 0 points [-]

Can you give an example? The main reason they are not amenable to program analysis is because the sorts of guarantees they are supposed to satisfy are probabilistic / statistical in nature, and we don't yet have good techniques for verifying such properties. I am pretty sure that the issue is not undecidability.

I should also clarify that whether or not JGWeissman thought that I meant "logical program" instead of "computer program", my comment was intended to mean "computer program" and the field of program analysis studies "computer programs", not "logical programs".

Comment author: [deleted] 04 November 2013 09:44:58PM 0 points [-]

Ok, throw it back at you: how do you prove the behavior of a deep belief network, or any other type of neural network currently in vogue, short of actually running it? If you do have some way of doing it, can I be coauthor? I didn't mean to imply that it was proved impossible, just that no one has the faintest idea how to do it - and not for lack of trying.

Comment author: jsteinhardt 05 November 2013 03:48:18AM 0 points [-]

What does "prove the behavior" mean? If you mean "prove that it will classify this next image correctly" then of course the answer is no, although that is because it's not even necessarily true. If you mean "prove that the program's behavior always falls within some well-defined regime", then yes we can absolutely prove that sort of statement. Things that are at the boundary of what we can prove (i.e. I don't know of any existing formal tools that do it, but I'm pretty sure we could make one) would be something like "the weights of the network don't change too much if we change the input a small amount" or "the prediction error on the training set will decrease monotonically" or "the generalization error is bounded by <X> assuming that the test data ends up being drawn from the same distribution as the training data".

Presumably for friendliness you want something like "P and P' satisfy some contract relative to each other" (such as sufficiently similar behavior) which is the sort of thing we are already relatively good at proving. Here I'm imagining that P is the "old" version of a program and P' is a "modified" version of the program that we are considering using.