What is this, "A Series of Unfortunate Logical Events"? I laughed quite a bit, and enjoyed walking through the issues in self-knowledge that the löbstacle poses.
This is a really good illustration of the 5 and 10 problem. I read the linked description, but didn't fully understand it until you walked through this example in more depth. Thanks!
One possible typo: You used 5a and 5b in most of the article, but in the original list of steps they are called 6 and 7.
This looks full of straightforward errors.
Firstly, the agent's logical deduction system is unsound. It includes something comparable with Peano arithmetic (or else Löb's theorem can't be applied), and then adds a deduction rule "if P can be proved consistent with the system-so-far then assume P is true". But we already know that for any consistent extension T of Peano arithmetic there is at least one proposition G for which both G and ~G are consistent with T. So both of these are deducible using the rule. Now, the agent might not find the contradiction, because...
This system can only find proofs up to a length of a million characters. The requirements of Löb's theorem include a formal system that is an extension of PA and is closed under deduction: that is, derives proofs of unbounded length. So we can't apply Löb's theorem to this system anyway.
Firstly, the agent's logical deduction system is unsound. It includes something comparable with Peano arithmetic (or else Löb's theorem can't be applied), and then adds a deduction rule "if P can be proved consistent with the system-so-far then assume P is true". But we already know that for any consistent extension T of Peano arithmetic there is at least one proposition G for which both G and ~G are consistent with T. So both of these are deducible using the rule. Now, the agent might not find the contradiction, because...
Are you referring to step 5a/5b of the algorithm, that says search for a proof and then act based on what is found? I wouldn't exactly characterize this as adding a further deduction rule. But perhaps you are referring to something?
This system can only find proofs up to a length of a million characters. The requirements of Löb's theorem include a formal system that is an extension of PA and is closed under deduction: that is, derives proofs of unbounded length. So we can't apply Löb's theorem to this system anyway.
It's true this gets a bit more finicky in the resource-bounded case. Critch's paper on resource-bounded Lob goes into this. I haven't looked into this very much.
Yes, I'm referring to 5a/5b in conjunction with self-reflection as a deduction rule, since the agent is described as being able to use these to derive new propositions.
There is also a serious problem with 5a itself: the agent needs to try to prove that some new proposition P is "consistent with its model of the world". That is, for its existing axioms and derivation rules T, prove that T+P does not derive a contradiction.
If T is consistent, then this is impossible by Gödel's second incompleteness theorem. Hence for any P, Step 5a will always exhaust all possible proofs of up to whatever bound and return without finding such a proof. Otherwise T is inconsistent and it may be able to find such a proof, but it is also obvious that its proofs can't be trusted and not at all surprising that it will take the wrong route.
I think that I should modify 5a from "Search for a proof that this sentence is consistent with your model of the world, up to a maximum proof length of one million characters" to "Search to a proof of this sentence, using your model of the world as a set of starting assumptions". This is indeed a significant change to the algorithm and I thank you for pointing it out. I think that would resolve your second concern, about the problem with 5a itself, yes?
I think it might also resolve your first concern, about the unsoundness of the logical system, because the agent does not have a deduction rule that says that if P can be proved then P is true, but rather reasons from the fact that P can be proved, and the fact that this particular agent chooses its actions based on proof search and will behave in such-and-such a way if it finds a proof of this particular P, to conclude P. This only work for certain particular sentences P, such as the one that is constructed in this story, since this particular sentence P is one that, if the agent finds a proof of it, will cause it to take actions that lead to P itself being true.
Yes, I think this change is much better. I'm still a bit unclear about how exactly the agent reasons about itself. That doesn't seem to be well-defined.
Is it capable of distinctions between its models of the world? The reasoning in the article leads me to think not. The wording used is of the form "a proof exists of P" rather than "in hypothetical situation H in which I have world model M, my deductive rules could find a proof of P".
This may lead it to equivocation errors, where the same symbols are used to refer to different things.
I think there's a logical error. You claim to be deducing "IF route FAST is taken THEN I will arrive at 3pm", but what you should actually be deducing is "IF route SLOW is taken THEN (IF route FAST is taken THEN I will arrive at 3pm)". What you end up with is proving that "route SLOW is taken" is logically equivalent to "IF route FAST is taken THEN I will arrive at 3pm AND IF route SLOW is taken THEN I will arrive at 2pm", but neither of them are proved.
As someone who can follow the 5 and 10 problem, but who doesn't really internalize why it matters (no matter how many times I look at similar explainers, I always just come away with "don't do that! recursion without exit conditions is taught as a bad thing to elementary school coders. OF COURSE a thing can't contain a perfect model of itself plus any other stuff.")
I'm sorry to report that this version uses a lot more words than most, but the object values (route selection) are so disconnected from the main point that it doesn't shed any light on the problem.
How oh how did things go so tragically wrong for our autonomous car?
There's a one-line answer: the builders weren't trying to make a car pick a route, they were trying to ... I don't actually know what.
the builders weren't trying to make a car pick a route, they were trying to ... I don't actually know what.
Well there is a perspective from which the algorithm used here can seem very compelling. After all, what could make more sense than considering each possible action, searching for a proof that connects each action to its consequences, and then picking the action with the most desirable consequences? This is what the proof search algorithm does. It is deeply surprising that it fails so completely. I should have made that clearer in the post.
recursion without exit conditions is taught as a bad thing to elementary school coders
The algorithm here is not actually recursive. The proof search is just a straightforward loop through all possible proofs up to some length limit. Checking the validity of a proof means checking whether each step follows deductively from the previous steps. It does not involve any recursion, not even one level of recursion, and certainly not recursion without exit conditions.
OF COURSE a thing can't contain a perfect model of itself plus any other stuff
In what way is it not possible for a thing to contain a perfect model of itself? A computer can certainly contain a circuit diagram of itself on a file on its hard drive. A program can certainly contain a copy of its own source code inside a string. Humans are capable of comprehending their own anatomy and neurology, apparently without any fixed limit. In what way can a thing not contain a model of itself?
I think your definition of perfect model is a bit off - a circuit diagram of a computer is definitely not a perfect model of the computer! The computer itself has much more state and complexity, such as temperature of the various components, which are relevant to the computer but not the model.
Containing a copy of your source code is a weird definition of a model. All programs contain their source code, does a program that prints it source code have more of a model of itself than other programs, which are just made of their source code? Human brains are not capable of holding a perfect model of a human brain, plus more, because the best-case encoding requires using one atom of the human brain to model an atom in the human brain, leaving you at 100% capacity.
The key word is "perfect" - to fit a model of a thing inside that thing, the model must contain less information than the thing does.
I think your definition of perfect model is a bit off - a circuit diagram of a computer is definitely not a perfect model of the computer! The computer itself has much more state and complexity, such as temperature of the various components, which are relevant to the computer but not the model.
But there is no fundamental law of physics that says that the computer cannot maintain estimates of the temperature of each of its components.
Now it is true that a computer cannot store the exact physical state of every atom that constitutes it, since storing that much information would indeed require the full physical expressivity of every atom that constitutes it, and there would indeed be no room left over to do anything else.
But digital computers are designed precisely so that their behavior can be predicted without needing to track the exact physical state of every atom that constitutes them. Humans are certainly able to reason about computers in quite a bit of detail without tracking the exact physical state of every atom that constitutes them. And there is no reason that a digital computer couldn't store and use a self-model with at least this level of predictive power.
Containing a copy of your source code is a weird definition of a model. All programs contain their source code, does a program that prints it source code have more of a model of itself than other programs, which are just made of their source code?
Well sure, containing a copy of your own source code on its own is not really a "self-model", but it does show that there is nothing fundamentally blocking you from analyzing your own source code, including proving things about your own behavior. There is nothing fundamentally paradoxical or recursive about this. The claim was made that things cannot contain perfect models of themselves, but in fact things can contain models of themselves that are sufficiently to reason about.
The key word is "perfect" - to fit a model of a thing inside that thing, the model must contain less information than the thing does.
Well yes, but the autonomous car in the parable didn't go wrong by having an insufficiently perfect self-model. It had a self-model quite sufficient to make all the predictions it needed to make. If the self-model had been more detailed, the car still would have gone wrong. Even if it had a truly perfect self-model, even though such a thing is not permitted under our laws of physics, it still would have gone wrong. So the problem isn't about the inability for things to contain models of themselves. It's about how those self-models are used.
considering each possible action, searching for a proof that connects each action to its consequences, and then picking the action with the most desirable consequences?
I'm clearly seriously misunderstanding something about proofs. My understanding of Gödel's theorem, and general complexity theory, lead me to find that mechanism VERY un-compelling. There may be aspects you want to prove (or disprove), but there are going to be important decisions that are unprovable, and many (most, I think) theoretically-provable outcomes will be computationally infeasible. And that's my belief before even considering the counterfactual problem (false implies everything).
I'll need to think more about why this feels like recursion - my intuition is handwavey at this point, related to the agent deciding by evaluating the agent's choice, rather than deciding by removing the agent in each branch.
A computer can certainly contain a circuit diagram of itself on a file on its hard drive.
Sure, but those aren't computations. they're compressed representations, which requires a lot more resources to simulate. And if that circuit/source includes the execution of the resulting structures (aka emulation), we're back to recursion, in which each iteration consumes additional resources.
Sure, but those aren't computations. they're compressed representations, which requires a lot more resources to simulate. And if that circuit/source includes the execution of the resulting structures (aka emulation), we're back to recursion, in which each iteration consumes additional resources.
Can you give me an example of something that cannot be done by a computer that is attempting to reason about its own circuitry?
Do you mean that since the computer is using some of its processing power to analyze its own circuitry, it cannot therefore dedicate 100% of its processing power to any other process running on the system? Or do you mean that it cannot use all of its processing power to analyze its own circuitry, even if there are no other processes running on the system?
The former I agree with, but that is true of any process whatsoever -- e.g. if the computer is running a game of solitaire then that process will use up some non-zero amount of processing power, which the computer therefore cannot devote to any other process.
The latter I disagree with. Let's say that the computer is running an analysis of its circuitry that can be expressed as a SAT problem (as many real-world hardware verification problems can be). So the computer loads the representation of its circuitry, constructs a SAT problem, then begins searching over the possible variable assignments looking for a solution. Why couldn't the computer dedicate its full capacity at full speed to this search?
for a given set of resources (cpu-time or instruction count, reads/writes/total storage, etc.), there are computations that can be done directly, which cannot be done in an emulator which takes some of those resources.
There is some amount of underlying useful work that's being done (calculating expected value of hypothetical actions) which is feasible to directly calculate, and infeasible to calculate the calculation.
When the useful work IS the emulation, then of course it's using it's full power. But it can't emulate and verify the emulation/verification (without additional resources).
Why are you talking about emulation? There are lots of ways to analyze a circuit diagram other than emulation. The autonomous car in the story does not use emulation.
Why are you talking about emulation?
That's an excellent question - I don't know if the connection between formal proof and emulation/reflection exists anywhere outside of my mind. I believe my arguments hold for the impossibility of proving something without additional resources over just calculating it (possibly using a method that has proofs about it's correctness, which happened outside the computation itself).
This was bothering me, but I think I found the logical flaw: Quote: NOT (route FAST is taken)
And then from this it deduced
IF route FAST is taken THEN I will arrive at 3pm This, I’m afraid dear reader, is also permitted by the laws of logic.
The statements "not P" and "not P OR Q" might have the same truth value, but they are not logically equivalent. At this point in the proof, saying "THEN I will arrive at 3pm" is arbitrary, and could have been "THEN pigs will fly." I think that's what's known as the principle of explosion, but I could be wrong about that term.
The statements "not P" and "not P OR Q" might have the same truth value, but they are not logically equivalent. At this point in the proof, saying "THEN I will arrive at 3pm" is arbitrary, and could have been "THEN pigs will fly." I think that's what's known as the principle of explosion, but I could be wrong about that term.
OK I agree this is very much the crux of the issue. However:
Second, the way that "IF .. THEN" is defined in propositional or first order seems not to capture quite what we mean by those words in ordinary language. I think this is part of what you are pointing out.
I feel like the confusion between propositional logic and ordinary language is the only reason Lob's theorem is even being discussed in the first place. The car's programmers used IF X THEN Y to represent the statement "If X, then Y happens", which means something quite different. Other than the incidental similarity of these statements in the English language, why is this more relevant than any other programming error?
Ah, so the error is back here:
3. Make a list of all possible logical sentences of the form IF route 1 is taken THEN I will arrive at such-and-such a time AND IF route 2 is taken THEN I will arrive at such-and-such a time AND ...
Because the algorithm was created without including the additional assumption (used later on in the "proof") that if route 1 route is taken, then route 2 would NOT be taken (and vice versa). If you include only that additional piece of information, then the statements generated in step 3 are "logically" equivalent to:
"IF route 1 is not taken AND IF route 2 is taken THEN I will arrive at Time" (or "IF route 1 is taken I will arrive at Time AND route 2 is not taken").
And that (again from our route 1 XOR route 2 assumption) is equivalent to a list of :
IF route # is taken, THEN I will arrive at time
for all possible combinations of route and time, with no conjunctions at all.
Financial status: This is independent research, now supported by a grant. I welcome financial support.
Epistemic status: I believe ~85% that the technical argument presented in this piece is correct.
Outline
This is my attempt to explain the basic Löb situation with the 5-and-10 problem.
This post considers an autonomous car choosing between a long route and a slow route, with the goal of minimizing the time to its destination.
If the autonomous car makes its decisions using a certain seemingly-reasonable algorithm based on proof search, this post shows that it may nonsensically take the slow route.
This is the 5-and-10 problem phrased in terms of autonomous driving.
Parable of the unreliable autonomous car
A long time ago, in a far away place, there was an autonomous car.
This particular autonomous car was programmed to use the agent model in its reasoning. That is, it was programmed to model the world as consisting of an agent and an environment, the agent being itself, and the environment being a network of roads that it would navigate each day. This is how the autonomous car was programmed to model the world.
Beyond using the agent model, this autonomous car had been given, by its designers, an accurate model of its own behavior. It could tell you how its own decision algorithm would behave under any set of circumstances. It could tell you this without being faced with those actual circumstances because it had a model of itself, and it could answer questions based on that model.
One day, a passenger got into the autonomous car and gave it a destination. There were two ways for the autonomous car to get there: a fast route, which would get it there at 1pm, and a slow route, which would get it there at 2pm. The goal of the autonomous car was to get to its destination as quickly as possible. This parable is about the reasoning followed by the car in deciding which of these two routes to take.
Now in this far-away place, autonomous cars were designed by a strange cadre of alien engineers, and thus they were programmed with strange algorithms. This particular autonomous car was programmed to decide among routes using the following procedure:
Make a list of the possible routes
Make a list of the possible arrival times
Make a list of all possible logical sentences of the form
IF route 1 is taken THEN I will arrive at such-and-such a time AND IF route 2 is taken THEN I will arrive at such-and-such a time AND ...
Shuffle this list of logical sentences into a random order
For each logical sentence in the list from first to last:
5a. Search to a proof of this sentence, using your model of the world as a set of starting assumptions, up to a maximum proof length of one million characters.
5b. If a proof is found then output the route associated with the earliest arrival time
Here are a few extra details about this algorithm:
The list of possible routes in step 1 is returned by a subroutine that uses the car’s inbuilt map of the road network. On this particular day it returned two routes: SLOW and FAST.
The list of possible arrival times used on this particular day were 1pm, 2pm, and 3pm.
Since there is always a finite number of possible routes and arrival times, there is also always a finite number of logical sentences in step 3.
Step 5a only considers proofs that can be expressed in under one million characters because otherwise this step might run forever.
If a proof is found in step 5a then the loop over sentences terminates at step 5b.
In the first version of this post, step 5a was written as "Search for a proof that this sentence is consistent with your model of the world, up to a maximum proof length of one million characters." Thank you to JBlack for helping to correct this in the comments.
Now from our perspective here on Earth, we might think that this is a strange algorithm for an autonomous car to use. We might ask why the autonomous car didn’t just consider the routes one by one and compute an arrival time for each one. We might ask what business an autonomous car has conducting a search over logical proofs while a passenger is waiting patiently in the back seat. These are reasonable questions from our perspective here on Earth. But this parable is not about why this autonomous car was programmed in this way, it is about the tragic consequences that followed from it being programmed in this way.
And so the autonomous car began using its procedure to decide which route to take. It got to step 3 and the sentences in its list were:
Then in step 4 it shuffled this list of sentences, and just by chance the very first sentence that it considered in step 5 was
Now you and I, dear reader, can see that this sentence is false. In fact the fast route would get the autonomous car to its destination at 1pm, not 3pm. But the autonomous car was merely executing its programmed instructions. And at this time its programmed instructions told it, in step 5a, to look for a proof that the sentence was consistent with its model of the world. And its model of the world was the agent model, and within that model it had an understanding of the the road network (the environment), as well as an understanding of the its own decision-making procedure (the agent; itself), and so it could ask whether it was in fact the case that
And so in step 5a it looked for a proof that this sentence was consistent with its model, and here is how that went.
The autonomous car saw that its own decision-making algorithm would consider sentences and determine, for each one, whether it was consistent with its internal model, by searching for a proof. It saw that the first such sentence to be considered would be
It saw that the route chosen by its decision-making algorithm would be determined by whether this sentence was determined by the algorithm to be consistent with its model or not. It did not know whether this particular sentence was consistent with its model, so it decided to proceed by using proof-by-implication, which is where you assume some sentence p, then derive some sentence q following logically from it, then you get "IF p THEN q". (This is also a common proof method here on Earth.)
The p that the algorithm decided to assume for proof-by-implication was the following:
Now, to understand what happened next, it is important to understand that this is not the same as assuming the sentence
The autonomous car saw that the behavior of its decision algorithm rested not on whether taking the FAST route would in fact get it there by 3pm, nor whether taking the SLOW route would in fact get it there at 2pm, but on whether a proof existed of the sentence in question.
So the autonomous car proceeded in its reasoning, assuming for the sake of proof-by-implication that
It saw that, given this assumption, its decision algorithm would, at step 5a, find the proof. It saw that its decision algorithm would, upon finding this proof, stop considering further sentences in the list and take the route associated with the earliest arrival time in the sentence that had been proved. It saw that the route associated with the earliest arrival time in the sentence that had been proved was the SLOW route. It then looked into its model of the road network and saw that, upon taking the SLOW route, it would arrive at its destination at 2pm.
Now you might ask why, if the autonomous car had a sufficiently detailed model of the road network, it didn’t just use that model to compare the two routes at the outset and skip over the search over proofs. Wouldn’t it be simpler just to compare the two routes and pick the one with the shortest arrival time? Well, yes, that would seem simpler to us here on Earth. But that is just not how this autonomous car was programmed.
This was all within the scope of the proof-by-implication, so at this time the autonomous car had in fact proved
Now you might think, dear reader, that the manner of reasoning used up to this point by this autonomous car was a little strange. I am afraid, dear reader, that at this point its reasoning became stranger still, although never stepping outside the very same rules of logic that we use here on Earth.
Still searching for a proof in step 5a, the autonomous car made the following logical deductions.
First, from
it deduced
This is permitted by the laws of logic because whenever one has "q", one can deduce "IF p THEN q", since the "IF...THEN" is material implication and is logically equivalent to "NOT p OR q".
Second, from
it deduced
And then from this it deduced
This, I’m afraid dear reader, is also permitted by the laws of logic. Once again, "IF p THEN q" is equivalent to "NOT p OR q", and the autonomous car already derived "NOT (route FAST is taken)", so it was perfectly justified in deriving, within the scope of the proof-by-implication in which it had established that route FAST would not be taken, that
And so, as this tragedy nears its final movements, the autonomous car had derived, using proof-by-implication, the following:
And here, dear reader, you will see the shape of the demise of our dear autonomous car already outlined in this tragic tale. For Löb’s theorem, with which I am sure you are well acquainted, tells us that whenever we have a proof system that can prove a sentence of the form
then that very same formal system will, unwittingly, and without the slightest possibility of mistake, eventually also prove the sentence
Yes, tragic indeed, dear reader, is it not? Shall we continue with this tale, or is its conclusion already clear?
The autonomous car had proved
Being of no further use, it discarded this sentence and moved on with its proof search. It is not known for how long the passenger sat in the back seat, or by what strange motions of logic the proof search proceeded from this point. Perhaps many aeons passed, or perhaps it was merely seconds. We do not know. All that is known is that the autonomous car eventually proved
We know that it proved this because Löb’s theorem tells us that it must. We have seen that the logical apparatus of this autonomous car was capable of proving "IF a proof exists for p THEN p", and Löb’s theorem tells us that any logical apparatus capable of proving "IF a proof exists for p THEN p" is also capable of proving "p".
Therefore a proof was found, and the autonomous car proceeded to step 5b, and selected the route with the earliest arrival time, and that route was the SLOW route.
Discussion
How oh how did things go so tragically wrong for our autonomous car?
The first thing we must see, dear reader, is that the strange engineers in this far away place made certain choices in designing this autonomous car. These choices were design choices. They were not necessitated by the laws of logic — nobody would suggest that they were. The laws of logic do not force us to design autonomous cars with sophisticated self-models that use proof search to decide among routes to a destination.
So the most direct conclusion to draw from this sad fable, dear reader, is not to design autonomous cars in this particular way here on Earth, lest we ourselves suffer the fate of the unfortunate passenger from this tale. Making decisions based on proof search might sound like an adventure, dear reader, but as this tale shows, it is not for the faint of heart, nor for the punctual.
There are, of course, any number of alternative design choices one might make for an autonomous car. One might, for example, estimate an arrival time for each route based on distance and traffic conditions, and select the route for which the estimated arrival time is earliest. Why did the alien designers of our cherished autonomous car not give it a decision algorithm such as this?
Well it is not for us to understand the minds of such alien creatures. And yet we might, for the sake of satisfying our own curiosity, speculate, while knowing, of course, that the truth will be far beyond any conjecture that we could possibly invent.
Perhaps it was the intention of these extraordinary designers to incorporate not just knowledge of the environment, but also knowledge of the agent, which in this case is the autonomous car itself, into the estimated arrival time for each route. It is after all the case that the programming of the car affects its arrival time just as much as the environment affects its arrival time. Were the programming of the car such that left turns would be taken very quickly while right turns would be taken very slowly, it would be to the benefit of our esteemed and punctual passenger to pick a route with as few right turns as possible. And although one could hard-code a heuristic that minimizes the number of right turns into the decision algorithm, these alien engineers hardly seem like the type to make such brutish engineering choices. A far more general approach, and perhaps — who can say? — the intention of these remarkable alien engineers would be for the autonomous car to examine its own programming and determine for itself which factors would contribute to the speedy navigation of a route. This is certainly an ambition that we can sympathize with.
We can see, then, that by examining its own programming, our cherished autonomous car "steps out of itself" and looks back at itself as though watching a child from a distance. It is only natural, then, for us, in making our own choices about the design of our own autonomous systems, to similarly "step out" and look at the systems we are building from a certain distance. When we do this, we see that the peculiar design choices made for this peculiar autonomous car are faulty. There is no need to "put ourselves in the shoes" of the autonomous car and wonder why this or that conclusion was reached. We can simply see that the system fails to behave in the manner that we wish it to behave.
And we might speculate that it has something to do with the way the agent model was used.
But that topic, dear reader, will have to wait for another day.