Followup to: What's a "natural number"?

While thinking about how to make machines understand the concept of "integers", I accidentally derived a tiny little math result that I haven't seen before. Not sure if it'll be helpful to anyone, but here goes:

You're allowed to invent an arbitrary scheme for encoding integers as strings of bits. Whatever encoding you invent, I can give you an infinite input stream of bits that will make your decoder hang and never give a definite answer like "yes, this is an integer with such-and-such value" or "no, this isn't a valid encoding of any integer".

To clarify, let's work through an example. Consider an unary encoding: 0 is 0, 1 is 10, 2 is 110, 3 is 1110, etc. In this case, if we feed the decoder an infinite sequence of 1's, it will remain forever undecided as to the integer's value. The result says we can find such pathological inputs for any other encoding system, not just unary.

The proof is obvious. (If it isn't obvious to you, work it out!) But it seems to strike at the heart of the issue why we can't naively explain to computers what a "standard integer" is, what a "terminating computation" is, etc. Namely, if you try to define an integer as some observable interface (get first bit, get last bit, get CRC, etc.), then you inevitably invite some "nonstandard integers" into your system.

This idea must be already well-known and have some standard name, any pointers would be welcome!

New to LessWrong?

New Comment
147 comments, sorted by Click to highlight new comments since: Today at 9:25 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

An alternative phrasing:

In the Atlas Mountains, one can meet two species of Sphinxes: True Sphinxes and Hollow Sphinxes. Each True Sphinx has a secret number, and it will truthfully answer any question about it, provided it can do so in less than a hundred words. A Hollow Sphinx, however, just enjoys wasting people's time : it has no secret number but will pretend it does, and answers questions so as to never contradict itself and maintain uncertainty in the asker's mind.

While you can be sure you are speaking to a True Sphinx (for example, by guessing it's number), you can never be sure that you are speaking to a Hollow Sphinx - it might be a True Sphinx whose number is very large. In fact, no-one has been able to determine whether any Hollow Sphinxes still exist.

6cousin_it13y
I wonder if your explanation will make sense to people who didn't understand the original problem... but it's awesome!
2Emile13y
Seems that the best strategy would "merely" be to try to find an upper bound, like "Is the number smaller than "BusyBeaver(BusyBeaver(BusyBeaver(BusyBeaver(3^^^3))))?", and a Hollow Sphinx would just answer "no, it's bigger", so presented like that, it doesn't look like a very interesting problem. Though there may be a series of question such that a Hollow Sphinx can't keep up the masquarade unless it has a Halting Oracle (say, a yes-or-no question such that it can't compute which answer could apply to an infinite amount of numbers), but I have no idea which questions those might be. (The same applies to the original problem: there might be some exotic encoding such that an "infinitely confusing" input exists, determining say it's first 100 bits is uncomputable.)

BusyBeaver(BusyBeaver(BusyBeaver(BusyBeaver(3^^^3))))

This is the moral equivalent of saying BB(3^^^3)+1. Mere recursion is puny. At least jump from BB to BB_2!

determining say it's first 100 bits is uncomputable

Every sequence of 100 bits is computable.

Here's an encoding where the "infinitely confusing" input is uncomputable: take the unary encoding from my post and XOR it with some infinite uncomputable sequence, e.g. the halting oracle sequence. Unfortunately, you can't use this test in practice because you can't compute the required bits either.

6Paul Crowley13y
Mere recursion may be puny, but given the "fifteen-second challenge" and that BB_2 etc aren't well known enough to use there, it seems like the best bet, doesn't it?
0[anonymous]13y
If you're okay with losing to me, then it's a good bet, yeah :-)
4Emile13y
To put it more formally: we want an encoding algorithm E and a decoding algorithm D such that for any natural number n, a turing machine can compute E(n) and D(E(n)) in finite time, but that there exists a k such that "determining a string s of length k so that the set {n such that E(n) starts with s} is infinite" is an undecidable problem. Do D and E exist fulfilling those conditions? I admit my working knowledge of what is decidable/computable is somewhat limited; maybe the condition should be to find E and D such that no algorithm can return s given k, or whether there exists an algorithm that given E and D, can return a "confusing" sequence of arbitrary length.
3cousin_it13y
I asked MathOverflow to solve this and got a complete solution within an hour. Just like last time. Wow, just wow.
2Emile13y
The stack exchange sites are pretty impressing, and humbling. Maybe Eliezer should outsource more of his research to them.
0cousin_it13y
I think I have a partial solution for the case where the "infinitely confusing" string is unique. Namely, if there is a unique infinite string S on which D doesn't terminate, then S is computable. Here's an algorithm for computing S by using D. Assume that the first bit of S is 0. That means D terminates on all strings whose first bit is 1. By the argument from the original post, that implies D's running time on strings starting with 1 is uniformly bounded. (Otherwise we would be able to generate an "infinitely confusing" string for D that starts with 1 by using the proof technique from my post, which can't happen because S is unique and starts with 0 by assumption.) Therefore, by feeding D all possible finite inputs in turn while allowing it gradually increasing running times ("dovetailing"), we will eventually get a "covering set" that proves D always terminates on strings starting with 1. So we determine that the first bit of S must be 0. In the same way you can determine the second bit of S, and so on. Let's see how it works on unary notation. We run D on all finite inputs in turn. First we notice that D("0") terminates without trying to read further, therefore the first bit of S is 1. Then we notice that D("10") terminates, therefore the second bit of S is also 1. And so on.
0Emile13y
That looks true, though the process you describe only works if S is indeed unique, othwerwise it gets stuck.
2wedrifid13y
BB_2?

The BusyBeaver function for Turing machines with a halting oracle.

Aaronson

4benelliott13y
I'm sorry to ask this question, because it seems very stupid, but how exactly would one create a set-up where a Turing machine, as in head-and-tape style thing, actually interacts with a halting oracle? I don't doubt that its possible but I can't think of an elegant way to do it.
7cousin_it13y
Wikipedia has a detailed explanation of the setup.
0wedrifid13y
Ahh, so that's what the <Can someone recall the title of Eliezer's parable in which the genius level humans spend thousands of years deciphering the messages sent by the not-so-smart universe simulators?> were using!

<Can someone recall the title of Eliezer's parable in which the genius level humans spend thousands of years deciphering the messages sent by the not-so-smart universe simulators?>

You may be thinking of "That Alien Message." Best wishes, the Less Wrong Reference Desk.

0wedrifid13y
Thankyou! I'd been looking for that one for a while.
0[anonymous]13y
Try asking the sphinx "Is your number less than 10 if and only if P=NP?"
0Emile13y
That wouldn't work for distinguishing the two kinds of sphinxes, unless you suppose that one kind knows whether P=NP and the other doesn't.
3[anonymous]13y
True, but it would still yield an interesting answer :) ETA: Also, one could give the sphinx a Turing machine and ask if it halts within N steps or fewer, where N is that sphinx's number. After a number of such questions, you'd feel pretty confident that either the sphinx is a True Sphinx or it has a halting oracle.
0Emile13y
Huh, that looks like it could work, neat :) So to bring it back to cousin_it's post, one could have a "busy beaver candidate" (a turing machines where we don't know whether it goes on for ever or eventually halt or start repeating), and the encoding of a natural number n would be that the first bit is whether or not that candidate halts before n steps, followed by the unary encoding of n. "Encoding" or decoding n would be easy (but long when n is big), but "breaking" that encoding for any usy beaver candidate would require a halting oracle.
2[anonymous]13y
Suppose I pass you the bit that says the candidate does not halt, followed by an infinite string of 1s. Then to decode this (by which I mean reject it as invalid) you would need to know whether the busy beaver candidate halts, which we've rejected as hard. This is a problem with the Sphinxes, too, in retrospect. A Hollow Sphinx could just keep answering "yes" if it's hard to check whether the Turing machine halts, making you do the hard work.
1Emile13y
Agreed, but a non-oracle Sphinx wouldn't be impossible to recognize any more, it would just be very hard to recognize (you would need the Sphinx to guess wrong, and to be patient enough to figure out that it did). In summary, whoever has the halting oracle wins, and if nobody does, it's a patience contest.

This phenomenon is noted in any handling of information theory that discusses "stream arithmetic coding". Those codes can be interpreted as a fraction written in binary, where each bit narrows down a region of the [0,1) numberline, with earlier-terminating bitstreams reserved for more likely source strings.

Any probability distribution on integers has a corresponding (optimal) arithmetic stream coder, and you can always make it indefinitely expect more input by feeding it the bit that corresponds to the least likely next character (which means it's not finished decompressing).

The human David MacKay discusses such encodings and their implicit unbounded length potential in Chapter 6 of his/her book which can be accessed without spending USD or violating intellectual property rights.

2cousin_it13y
Thanks for pointing out the connection!
4Clippy13y
You're welcome! I'm always glad to learn when knowledge I've gained through paperclip maximization has value to humans (though ideally I'd want to extract USD when such value is identified). I should add (to extend this insight to some ot the particulars of your post) that the probability distribution on the integers implicitly assumed by the unary encoding you described is that smaller numbers are more likely (in proportion to their smallness), as do all n-ary number systems. So-called "scientific" notation instead favors "round" numbers, i.e. those padded with zeros the soonest in the least-significant-digit direction.
3cousin_it13y
Your comments are often pleasant to read, but I don't pay USD for comments that are pleasant to read, and don't know anyone who does. Sorry.
1Clippy13y
Thanks. I didn't mean charging for comments, just that if I identified major insights, I could sell consulting services or something. Or become a professor at a university teaching the math I've learned from correct reasoning and paperclip maximizing. (Though my robot would need a lot of finishing touches to pass.)

You're allowed to invent an arbitrary scheme for encoding integers as strings of bits. Whatever encoding you invent, I can give you an infinite input stream of bits that makes your decoder hang and never give a definite answer like "yes, this is an integer with such-and-such value" or "no, this isn't a valid encoding of any integer".

I find it annoying how my brain keeps saying "hah, I bet I could" even though I explained to it that it's mathematically provable that such an input always exists. It still keeps coming up with "how about this clever encoding?, blablabla" ... I guess that's how you get cranks.

6cousin_it13y
Yeah, that's a nice feature of the problem. I felt that too, and hoped that someone would react the same way :-)
[-][anonymous]13y80

Agree about the theorem. Not sure about the implication for explaining integers, since a human who has been assigned the task of decoding the string should have the same difficulty with that infinite input stream. His understanding of what an integer is doesn't protect him. What seems to me to protect the human from being mentally trapped by an endless input is that his tolerance for the task is finite and eventually he'll quit. If that's all it is, then what you need to teach the computer is impatience.

2cousin_it13y
It still feels very weird to me that we cannot explain our intuitive notion of "finiteness" to a machine. Maybe this shows that we don't understand finiteness very well yet.
8Perplexed13y
Of course we can explain it to a machine, just as we explain it to a person. By using second-order concepts (like "smallest set of thingies closed under zero and successor"). Of course then we need to leave some aspects of those second-order concepts unexplained and ambiguous - for both machines and humans.
0jsalvatier13y
I don't understand what you're referring to in your second sentence. Can you elaborate? What sorts of things need to be ambiguous?
5Perplexed13y
By 'ambiguous', I meant to suggest the existence of multiple non-isomorphic models. The thing that puzzled cousin_it was that the axioms of first-order Peano arithmetic can be satisfied by non-standard models of arithmetic, and that there is no way to add additional first-order axioms to exclude these unwanted models. The solution I proposed was to use a second-order axiom of induction - working with properties (i.e.sets) of numbers rather than the first-order predicates over numbers. This approach successfully excludes all the non-standard models of arithmetic, leaving only the desired standard model of cardinality aleph nought. But it extends the domain of discourse from simply numbers to both numbers and sets of numbers. And now we are left with the ambiguity of what model of sets of numbers we want to use. It is mildly amusing that in the case of arithmetic, the unwanted non-standard models were all too big, but in the case of set theory, people seem to prefer to think of the large models as standard and dismiss Godel's constructive set theory as an aberation.
0[anonymous]13y
Depends what you mean by 'large' I suppose. A non-well founded model of ZFC is 'larger' than the well-founded submodel it contains (in the sense that it properly contains its well-founded submodel), but it certainly isn't "standard". By Gödel's constructive set theory are you talking about set theory plus the axiom of constructibility (V=L)? V=L is hardly 'dismissed as an aberration' any more than the field axioms are an 'aberration' but just as there's more scope for a 'theory of rings' than a 'theory of fields', so adding V=L as an axiom (and making a methodological decision to refrain from exploring universes where it fails) has the effect of truncating the hierarchy of large cardinals. Everything above zero-sharp becomes inconsistent. Furthermore, the picture of L sitting inside V that emerges from the study of zero-sharp is so stark and clear that set theorists will never want to let it go. ("No one will drive us from the paradise which Jack Silver has created for us".)
0jsalvatier13y
Thank you for the clarification.
4[anonymous]13y
You've thought more about explaining things to machines than I have, but I'm not sure what you consider to count as having successfully explained a concept to a machine. For example, if you tell the machine that any string with both a beginning and an end is finite, then you've given the machine at least some sort of explanation of finiteness - one which I presume you consider to be inadequate. But when I try to imagine what you might find to be inadequate about it, my naive guess is that you're thinking something like this: "even if you tell a machine that a finite string has an end, that won't help the machine to decide whether a given input has reached its end". In other words, the definition doesn't give the machine the ability to identify infinite strings as such, or to identify finite strings as such in a time shorter than the length of the string itself. But my response to this is that our own human understanding of finiteness doesn't give us that ability either. Just for starters, we don't know whether the universe goes on forever in all directions or closes in on itself as a four-dimensional ball - and we might never know.
0cousin_it13y
I want it to be able to do some of the things that humans can do, e.g. arrive at the conclusion that the Paris-Harrington theorem is "true" even though it's independent of PA. Humans manage that by having a vague unformalized concept of "standard integers" over which it is true, and then inventing axioms that fit their intuition. So I'm kicking around different ways to make the "standard integers" more clear.
3Zetetic13y
Considering that Paris-Harrington is derivable from second order arithmetic, do you think any of the ideas from reverse mathematics might come into play? This paper might of interest if you aren't very familiar with reverse mathematics, but would like to know more. Also, it's my intuition that a good deal of mathematics has more to say about human cognition than it does about anything else. That said, this seems like the sort of problem that should be tackled from a computational neuroscience angle first and foremost. I'm likely wrong about the 'first and foremost' part, but it seems like something on numerical cognition could help out. Also, have you looked at this work? I don't care for the whole 'metaphor' camp of thought (I view it as sort of a ripoff of the work on analogies), but there seem to be a few ideas that could be distilled here.
0cousin_it13y
I'm familiar with reverse mathematics and it is indeed very relevant to what I want. Not so sure about Lakoff. If you see helpful ideas in his paper, could you try to distill them?
1Zetetic13y
I could give it a shot. Technically I think they are Rafael Nunez's ideas more than Lakoff's (though they are framed in Lakoff's metaphorical framework). The essential idea is that mathematics is built directly from certain types of embodied cognition, and that the feeling of intuitiveness for things like limits comes from the association of the concept with certain types of actions/movements. Nunez's papers seem to have the central goal of framing as much mathematics as possible into an embodied cognition framework. I'm really not sure how useful these ideas are, but I'll give it another look through. I think that at most there might be the beginnings of some useful ideas, but I get the impression that Nunez's mathematical intuition is not top notch, which makes his ideas difficult to evaluate when he tries to go further than calculus. Fortunately, his stuff on arithmetic appears to be the most developed, so if there is something there I think I should be able to find it.
1[anonymous]13y
Having now looked up the PH theorem, I don't understand what you mean. Do you disagree with any of the following? Computers can prove Paris-Harrington just as easily as humans can. They can also prove the strong Ramsey result that is the subject of PH as easily as humans can. Both humans and computers are incapable of proving the Ramsey result within Peano arithmetic. Both are capable of proving it in some stronger formal systems. Both humans and computers can "see that the Ramsey result is true" in the sense that they can verify that a certain string of symbols is a valid proof in a formal system. They are both equally capable of verifying that the Ramsey result (which concerns finite sets of integers) is true by experiment. Neither a human nor a computer can "see that the Ramsey result is true" in any stronger sense.
-2cousin_it13y
I agree with everything in your comment except the last sentence. Sorry for being cryptic, I think this still gets the point across :-)
4JoshuaZ13y
Can I attempt a translation/expansion for Sewing-Machine of why you disagree with the last sentence? It seems that there's an intuition among humans that the Ramsey result is true, in the sense that PA + PH captures our intuition of the integers more closely than PA + ~PH given the second order result. What you want is a computer to be able to make that sort of intuitive reasoning or to make it more precise. Is that more or less the point?
6[anonymous]13y
We can all agree that human intuition is grand but not magical, I hope? Here is my point of view: you are having difficulty teaching a computer to "make that sort of intuitive reasoning" because that sort of reasoning is not quite right. "That sort of reasoning" is a good heuristic for discovering true facts about the world (for instance, discovering interesting sequences of symbols that constitute a formal proof of the Paris-Harrington theorem), and to that extent it surely can be taught to a computer. But it does not itself express a true fact about the world, and because of that you are limited in your ability to make it part of the premises on which a computer operates (such as the limitation discussed in the OP). So I've been thinking lately, anyway.
1Sniffnoy13y
I'm really at a loss as to why such a thing should be intuitive. The additional condition seems to me to be highly unnatural; Ramsey's theorem is a purely graph-theoretic result, and this strengthened version involves comparing the number of vertices used to numbers that the vertices happen to correspond to, a comparison we would ordinarily consider meaningless.
0JoshuaZ13y
If I'm following cousin it, the idea doesn't have anything really to do with the statement about Ramsey numbers. As I understand it, if in some system that is only slightly stronger than PA we can show some statement S of the form A x in N, P(x), then we should believe that the correct models of PA are those which have S being true. Or to put it a different way, we should think PA + S will do a better job telling us about reality than PA + ~S would. I'm not sure this can be formalized beyond that. Presumably if it he had a way to formalize this, cousin it wouldn't have an issue with it.
3[anonymous]13y
Shades of Penrosian nonsense.
2endoself13y
A finite number is one that cannot be the cardinality of a set that has a subset with an equal cardinality.
5cousin_it13y
This reduces the problem of explaining "standard integers" to the problem of explaining "subsets", which is not easier. I don't think there's any good first-order explanation of what a "subset" is. For example, your definition fails to capture "finiteness" in some weird models of ZFC. More generally, I think "subsets" are a much more subtle concept than "standard integers". For example, a human can hold the position that all statements in the arithmetical hierarchy have well-defined (though unknown) truth values over the "standard integers", and at the same time think that the continuum hypothesis is "neither true nor false" because it quantifies over all subsets of the same integers. (Scott Aaronson defends something like this position here.)
4[anonymous]13y
Well, ZFC is a first-order theory...
-2MrMind13y
Yes, but Subsets(x,y) is a primitive relationship in ZFC. I don't really know what cousin_it means by an explanation, but assuming it's something like a first-order definition formula, nothing like that exists in ZFC that doesn't subsume the concept in the first place.
2Sniffnoy13y
No, it isn't. The only primitive relations in ZFC are set membership and possibly equality (depending on how you prefer it). "x is a subset of y" is defined to mean "for all z, z in x implies z in y".
4MrMind13y
Can I downvote myself? Somehow my mind switched "subset" and "membership", and by the virtue of ZFC being a one-sorted theory, lo and behold, I wrote the above absurdity. Anyway, to rewrite the sentence and make it less wrong: subsets(x,y) is defined by the means of a first-order formula through the membership relation, which in a one-sorted theory already pertains the idea of 'subsetting'. x E y --> {x} <= y. So subsetting can be seen as a transfinite extension of the membership relation, and in ZFC we get no more clarity or computational intuition from the first than from the second.
2[anonymous]13y
Set theory is not easier than arithmetic! Zero is a finite number, and N+1 is a finite number if and only if N is.
0endoself13y
Yes, that is a much better definition. I don't know why this one occurred to me first.
0[anonymous]13y
Sewing Machine's previous comment isn't really a definition, but it leads to the following: "n is a finite ordinal if and only if for all properties P such that P(0) and P(k) implies P(k+1), we have P(n)." In other words, the finite numbers are "the smallest" collection of objects containing 0 and closed under successorship. (If "properties" means predicates then our definition uses second-order logic. Or it may mean 'sets' in which case we're using set theory.)
0Sniffnoy13y
Though with the standard definitions, that requires some form of choice.
0CronoDAS13y
There's actually a reason for that - it's impossible to define "finite" in first-order logic. There's no set of first-order statements that's true in all finite models and false in all infinite models.
0cousin_it13y
I know that, of course. Inserting obligatory reference to Lowenheim-Skolem for future generations doing Ctrl+F.
0RHollerith13y
How's that relevant? For present purposes, why should we care whether the logic we use is first-order?
3CronoDAS13y
In first-order logic, if a statement is logically valid - is true in all models - then there exists a (finite) proof of that statement. Second-order logic, however, is incomplete; there is no proof system that can prove all logically valid second-order statements. So if you can reduce something to first-order logic, that's a lot better than reducing it to second-order logic.
1RHollerith13y
Well, now you are exhibiting the IMHO regrettable tendency that the more mathematical LWers exhibit of putting way too much focus on incompleteness uncomputability and other negative results that have negligible chance of actually manifesting unless you are specifically looking for pathological cases or negative results. I use second-order logic all the time. Will work fine for this purpose.
0CronoDAS13y
Yeah, you're probably right...
0RHollerith13y
I still do not see how first-order logic relates in any way to cousin_it's statement in grandparent of grandparent. Just because second-order logic is incomplete does not mean we must restrict ourselves to first-order logic.
0CronoDAS13y
I think I recall reading somewhere that you only need first-order logic to define Turing machines and computer programs in general, which seems to suggest that "not expressible in first order logic" means "uncomputable"... I could just be really confused about this though... Anyway, for some reason or other I had the impression that "not expressible in first order logic" is a property that might have something in common with "hard to explain to a machine".
0RHollerith13y
ADDED. relates -> supports or helps to illuminate
-2[anonymous]13y
If we could explain what "finite" meant, then we could explain what "infinite" meant. More and more I'm starting to believe "infinite" doesn't mean anything.
0Zetetic13y
How about 'not finite'? Even better, 'non-halting' or maybe even...'not step-wise completable/obtainable'? Something is 'in-finite'ly far away if there is no set of steps that I can execute to obtain it within some determinate amount of time.
4[anonymous]13y
You're repeating my point. It's no harder to explain what "finite" means than it is to explain what "not finite" means. You take this to mean that the concept of "not finite" is easy, and I take it to mean that concept of "finite" is hard. Cousin it's recent experience lends credence to my point of view. For any N, it's easy to explain to a computer what "cardinality smaller than N" means. It's hard to explain to a computer what "exists N such that cardinality is smaller than N" means, and this is the source of cousin it's trouble. The next logical step is to realize for a cognitive illusion the idea that humans have a special insight into what this means, that computers can't have. For some reason most people aren't willing to take that step.
5cousin_it13y
If it's just a cognitive illusion, why is it so hard to find contradictions in axiom systems about integers that were generated by humans intuitively, like PA? We seem to have a superpower for inventing apparently consistent axiom systems that make "intuitive sense" to us. The fact that we have this superpower is even machine-checkable to a certain degree (say, generate all proofs of up to a million symbols and look for contradictions), but the superpower itself resists attempts at formalization.
9[anonymous]13y
We work in axiom systems that have not been proven inconsistent because in the past we have reacted to inconsistencies (such as Russel's paradox) by abandoning the axioms. I wouldn't call this a superpower but a bias. Russel's paradox is a cliche but it's really illustrative. The principle of noncontradiction says that, since we have arrived at a false statement (the barber shaves himself and doesn't shave himself) some of our premises must be wrong. Apparently the incorrect premise is: given a property P, there is a set of elements that satisfy P. What could it possibly mean that this premise is false? Evidently it means that the everyday meaning of words like "property," "set," and "element" is not clear enough to avoid contradictions. Why are you so sure that the everyday meaning of words like "number," "successor," and "least element" are so much clearer? Here's another reason to be unimpressed by the fact that no contradictions in PA have been found. The length of the shortest proof of falsum in a formal system has the same property as the busy beaver numbers: they cannot be bounded above by any computable function. i.e. there is an inconsistent formal system with fewer than 100 axioms in which all theorems with proofs of length smaller than BB(100) are consistent with each other. Why couldn't PA be such a system?
0[anonymous]13y
I suspect that for most "naive" ways of constructing such a system X, the very long proof of inconsistency for X should reduce to a very short proof of inconsistency for X+Con(X), because the latter system should be enough to capture most "informal" reasoning that you used to construct X in the first place. The existence of such a proof wouldn't imply the inconsistency of X directly (there are consistent systems X such that X+Con(X) is inconsistent, e.g. X=Y+not(Con(Y)) where Y is consistent), but if PA+Con(PA) were ever shown to be inconsistent, that would be highly suggestive and would cause me to abandon my intuitions expressed above. But as far as we know now, PA+Con(PA) looks just as consistent as PA itself. Moreover, I think you can add a countable-ordinal pile of iterated Con's on top and still get a system that's weaker than ZFC. (I'm not 100% confident of that, would be nice if someone corrected me!) So I'm pretty confident that PA is consistent, conditional on the statement "PA is consistent" being meaningful. Note that you need a notion of "standard integers" to make sense of the latter statement too, because integers encode proofs.
0JoshuaZ13y
You've made good points (and I've voted up your remark) but I'd like to note a few issues: First we can prove the consistency of PA assuming certain other axiomatic systems. In particular, Gentzen's theorem shows that consistency of PA can be proved if one accepts a system which is incomparable in strength to PA (in the sense that there are theorems in each system which are not theorems in the other). This isn't necessarily true unless one has a 1-1 correspondence between axiomatic systems and Turing machines, rather than just having axiomatic systems modelable as Turing machines. This is a minor detail that doesn't impact your point in a substantial fashion. Second, it isn't clear how long we expect an average output of a Turing machine to be. I don't know if anyone has done work on this, but it seems intuitively clear to me that if I pick a random Turing machine with n states, run it on the blank tape, I should expect with a high probability that the Turing machine will halt well before BB(n) or will never halt. Let's make this claim more precise: Definition: Let g(f)(n) be the (# of Turing machines with n states which when run on the blank tape either never halt or halt in fewer than f(n) steps) / (# of Turing machines with n states). Question: Is there is a computable function f(n) such that g(f)(n) goes to 1 as n goes to infinity. If such a function exists, then we should naively expect that random axiomatic systems will likely to either have an easy contradiction or be consistent. I'm not sure one way or another whether or not this function exists, but my naive guess is that it does.
0[anonymous]13y
Well, let's say we want to know whether Turing machine T halts. Say T has k states. Can't we use T's program to find a family of programs T' which are essentially the same as T except they consist of a common prefix followed by arbitrary suffix? Let k' be the number of states in the 'prefix'. The proportion of programs with <= n states that belong to family T' must be at least 2^(-k'-1) for all n >= k'. Now let's start running all programs and as we go along let's keep track of whether, for each value of n, a proportion greater than or equal to 1 - 2^(-k'-1) of programs of length <= n have halted in fewer than f(n) steps. Eventually this will be true for some n >= k'. At that point, we check to see whether one of the programs in T' has halted. If not, then none of the programs in T' can halt, and neither can T. Therefore, such a function f could be used to solve the halting problem.
0[anonymous]13y
But strictly stronger in consistency strength, of course. (Consistency strength turns out to be more fundamental than logical strength simpliciter.) Well, let's say we want to know whether Turing machine T halts. Say T has k states. Can't we use T's program to find a family of programs T' which are essentially the same as T except they consist of a common prefix followed by arbitrary suffix? Let k' be the number of states in the 'prefix'. The proportion of programs with less than n states that belong to family T' must tend to 2^(-k') as n goes to infinity. If we choose n such that g(f)(n) > 1 - 2^(-k') then we just need to run all members of T' for f(n) steps. If none of them have halted then T does not halt. Therefore, such a function f could be used to solve the halting problem.
0[anonymous]13y
You must be aware that such halting probabilities are usually uncomputable, right? In any case you're not going to be surprised that I wouldn't find any information about this limit of ratios compelling, any more than you would buy my argument that 15 is not prime because most numbers are not prime.
0JoshuaZ13y
Yes, but the existence of this function looks weaker than being able to compute Chaitin constants. Am I missing something here? My prior that a random integer is prime is 1/log n . If you give me a large integer, the chance that it is prime is very tiny and that is a good argument for assuming that your random integer really isn't prime. I'm not sure why you think that's not a good argument, at least in the context when I can't verify it (if say the number is too large).
0[anonymous]13y
But 1/log(n) takes a long time to get small, so that the argument "15 is not prime because most numbers are not prime" is not very good. It seems even more specious in settings where we have less of a handle on what's going on at all, such as with halting probabilities. Are you trying to make a probability argument like this because you scanned my argument as saying "PA is likely inconsistent because a random axiom system is likely inconsistent?" That's not what I'm trying to say at all.
0JoshuaZ13y
Sure, in the case of n=15, that's a very weak argument. And just verifying is better, but the point is the overall thrust of the type of argument is valid Bayesian evidence. No. I'm confused as to what I said that gives you that impression. If you had said that I'd actually disagree strongly (since what it is a reasonable distribution for "random axiomatic system" is not at all obvious). My primary issue again was with the Turing machine statement, where it isn't at all obvious how frequently a random Turing machine behaves like a Busy Beaver.
2[anonymous]13y
I think you are being way to glib about the possibility of analyzing these foundational issues with probability. But let's take for granted that it makes sense -- the strength of this "Bayesian evidence" is P(ratio goes to 1 | PA is inconsistent) / P(ratio goes to 1) Now, I have no idea what the numerator and denominator actually mean in this instance, but informally speaking it seems to me that they are about the same size. We can replace those "events" by predictions that I'm more comfortable evaluating using Bayes, e.g. P(JoshuaZ will find a proof that this ratio goes to 1 in the next few days) and P(JoshuaZ will find a proof that this ratio goes to 1 in the next few days | Voevodsky will find an inconsistency in PA in the next 10 years). Those are definitely about the same size.
0JoshuaZ13y
Sure. There's an obvious problem with what probabilities mean and how we would even discuss things like Turing machines if PA is inconsistent. One could talk about some model of Turing machines in Robinson arithmetic or the like. But yes, I agree that using conventional probability in this way is fraught with difficulty.

OK, here's my encoding (thanks Misha):

N is encoded by N triplets of bits of the form 1??, followed by a zero.

Each triplet is of the form (1, a(N, k), b(N, k)) (with 1 <= k <= N), where

  • a(N, k) means "Turing machine number k halts in N steps or less, and it's final state has an even number of ones on the tape", and

  • b(N, k) means "Turing machine number k halts in N steps or less, and it's final state has an odd number of ones on the tape".

By "Turing machine number k" I mean we have an infinite list containing all poss... (read more)

0cousin_it13y
Nice! But why do you need the parity of the output tape? Can't you use groups of 2 bits instead of 3 and just specify whether machine number k halts within N steps? (Edit: sorry, now I see why. You could confuse the decoder by telling it that all machines halt.) Thanks a lot! Your solution feels more intuitive to me than the one offered on MathOverflow.
2Emile13y
Well you could, but then you'd need a way to protect against an input of only ones - i.e. "all Turing machines halt in N steps or less". You could make a decoder that handles cases like that by checking whether a Turing machine eventually starts looping, but that makes it's logic less straightforward, and it isn't clear whether an infinite input couldn't be found without solving the halting problem. Turing machines could be classified in three categories: * H: those that halt * L: those that eventually start looping * F : those that go on forever The halting problem is checking for membership in H, and tricking the "loop-detecting" 2-bit decoder would "only" require never returning '1' for a member of L. There may be an algorithm A that returns 1 for membership in H, 0 for membership of L, and either 1 or 0 for membership of F. That algorithm wouldn't be equivalent to a halting oracle, so it might not be undecidable, I don't know. So having 3 bits just makes checking simpler, since you can't pretend to know that an algorithm will halt, you have to know something about the state it halts in too.

The proof is obvious. (If it isn't obvious to you, work it out!)

I deem this obnoxious.

2Skatche13y
A proof in ROT13: Gb rnpu cbffvoyr rapbqvat bs n pbhagnoyr frg ol svavgr ovg fgevatf gurer pbeerfcbaqf n ovanel gerr: ng gur ebbg abqr lbh tb yrsg (fnl) vs gur svefg ovg vf n mreb naq evtug vs vg'f n bar, gura ng gur arkg abqr lbh tb yrsg be evtug vs gur frpbaq ovg vf n mreb be bar, naq fb sbegu. Gur npghny vagrtref pbeerfcbaq gb grezvany abqrf, ernpurq ol n svavgr ahzore bs oenapuvatf. Znysbezrq ovg fgevatf, gbb, pbeerfcbaq gb grezvany abqrf, ng gur rneyvrfg cbvag gung jr pna gryy gurl'er znysbezrq. Ubjrire, fvapr gurer ner vasvavgryl znal vagrtref, naq bayl svavgryl znal cbffvoyr ovg fgevatf bs nal tvira yratgu, gurer zhfg or ng yrnfg bar vasvavgryl ybat cngu va gur gerr, naq vs jr srrq guvf frdhrapr bs mrebf naq barf vagb bhe pbzchgre, gur pbzchgre jvyy unat.
5Sniffnoy13y
You seem to have left out the part of the proof where you do the actual work! ROT13'd: Fxngpur fubjf gung gurer zhfg or neovgenevyl ybat cnguf; gur ceboyrz abj vf gb fubj gung gurer vf na vasvavgryl ybat cngu. Jr pbafgehpg guvf cngu nf sbyybjf: Fvapr gurer ner vasvavgryl znal cnguf, gurer ner rvgure vasvavgryl znal cnguf fgnegvat jvgu 0, be vasvavgryl znal cnguf fgnegvat jvgu 1. Vs gur sbezre ubyqf, fgneg jvgu 0, bgurejvfr fgneg jvgu 1. Fvapr juvpurire qvtvg jr cvpxrq, gurer ner vasvavgryl znal fgnegvat jvgu vg, jr pna ybbx ng gur arkg qvtvg naq ercrng guvf cebprff. Fvapr jr pna ercrng guvf neovgenevyl, guvf trgf hf na vasvavgryl ybat cngu. Va trareny guvf nethzrag trgf lbh jung'f xabja nf Xbavt'f Yrzzn.
0Skatche13y
Sorry, I have a bit of a skewed perspective about what's obvious. :P Once I perceived the connection to binary trees it seemed plain as day.
4jschulter13y
I find that for me, and many other people I know in the mathematics department of my university, once infinities, uncountability, and such enter the picture, the accuracy of intuition quickly starts to diminish, so it's wise to be careful and make sure the proof is complete before declaring it obvious. As a good example, note how surprising and notable Cantor's diagonal argument seemed the first time you heard it- it isn't obvious that the reals aren't countable when you don't already know that, so you might start trying to construct a counting scheme and end up with one that "obviously" works.
[-][anonymous]13y60

Here is a related well-known idea: in nonstandard analysis, a subset B of the real numbers is infinite if and only if its extension B* contains nonstandard numbers.

It might be equivalent, actually. Think of infinite bit streams as hyperintegers (e.g. the infinite sequence of 1s is the hyperinteger (1, 11, 111, 1111...), in binary if you like). Let E be the set of encodings of all integers, and all their prefixes. E is infinite, so E* contains a nonstandard element. I believe (here is where my nonstandard analysis fails me) that such a nonstandard element g... (read more)

Nice problem! I think this is a proof, but I don't know if it's your proof:

Vs K unf vasvavgryl znal inyvq fhssvkrf, ng yrnfg bar bs K0 be K1 zhfg unir vasvavgryl znal inyvq fhssvkrf. Fb yrg k_{x+1} = 0 vs k_1 ... k_x 0 unf vasvavgryl znal inyvq fhssvkrf, naq 1 bgurejvfr; gura k_1 gb k_x unf vasvavgryl znal inyvq fhssvkrf sbe nal x.

1Cyan13y
Warning: reading this comment may make rot13 less effective at obscuring text. V whfg abgvprq gung k naq x ner rkpunatrq ol ebg-guvegrra, fb gur rkcerffvbaf lbh jebgr nera'g bofpherq irel jryy.
0Tyrrell_McAllister13y
Ner gurer frgf bs inyvq fgevatf fhpu gung ab Ghevat znpuvar N pbhyq vzcyrzrag guvf "nggnpx" ba gur bevtvany znpuvar G? Gung vf, qbrf gurer rkvfg n G fhpu gung, rira vs N unf npprff gb G'f fbhepr pbqr, N pnaabg gryy pbapyhfviryl juvpu bs K0 be K1 unf vasvavgryl znal inyvq fhssvkrf? Vs fb, pna lbh npghnyyl pbafgehpg n G gung cebinoyl unf guvf cebcregl? ETA: Looks like cousin_it says that the answer to my existence question is "Yes."
0Paul Crowley13y
The "original Turing machine" decides whether a prefix is valid? Then yes definitely; bear in mind Rice's theorem, which basically says that no non-trivial property of a Turing machine is computable.
0Tyrrell_McAllister13y
If I understand you, yes. By the "original Turing machine", I meant the machine T that putatively can interpret an input string as an integer or, alternatively, reject the input string as not corresponding to any integer. So, can we actually construct such a machine that is provably immune to the "attack" in your proof, in the sense that no Turing machine could implement the attack? Or are you saying that Rice's theorem (with which I will have to acquaint myself) says that pretty much any Turing machine T that maps strings to integers will fit the bill? (Though, the one in cousin_it's OP doesn't . . .)
0Paul Crowley13y
Hmm. So T is an acceptor, A is an attacker. A(T) is an infinite sequence of symbols produced after examining T's source code, and T(A(T)) = ⊥ means T never rejects the sequence. Then what I was asserting is ¬∃A:∀T:T(A(T)) = ⊥. What you're asking now is whether ∀T:∃A:T(A(T)) = ⊥ and I can't immediately work out the answer.
0cousin_it13y
Yeah, I said it was obvious :-)
1Sniffnoy13y
Ha, I immediately thought of König's Lemma, but failed to notice that the concreteness of the situation means we don't need DC... that's the danger of not doing it from scratch, I guess...

This seems like a decent place to ask: I'm slowly trying to learn Type Theory. I haven't seen a place where (Co)Inductive datatypes (as in the Calculus of (Co)Inductive Constructions) are explained formally (though preferably for novice readers); does anyone have a a suggestion?

After learning a bit more about inductive and coinductive types from Robert Harper's book in progress(which I found more lucid on the topic than Types and Programming Languages), I don't quite understand why it's important for the datatype to be coinductive. You can clearly encode the integers as inductive types, why doesn't that suffice for 'explaining to a computer what a "terminating computation" is'? Are computers 'naturally coinductive' in some sense?

0jsalvatier13y
Actually, since computers are turing complete, perhaps it's more like 'computers are naturally recursively typed' and there is an analogous impossibility theorem for encoding some coinductive type in the same way? Perhaps I just don't understanding enough yet.

To clarify, let's work through an example. Consider an unary encoding: 0 is 0, 1 is 10, 2 is 110, 3 is 1110, etc. In this case, if we feed the decoder an infinite sequence of 1's, it will remain forever undecided as to the integer's value. The result says we can find such pathological inputs for any other encoding system, not just unary.

I'm not sure this is particularly significant. If you haven't finished feeding in the input you can hardly expect the machine to give a complete output unless you have given a specific specification as to how to simplify... (read more)

[-][anonymous]9y00

Encoding is not definition. While it is true that we cannot decide the integer value of "1111..." as you describe, the reason behind it is obvious: you are trying to use an inductive algorithm on a co-inductive data structure.

Clearly, your proposed bits-to-ints encoding scheme is isomorphic to lists of unit, and it so happens that there is a one-to-one mapping between list-of-unit and inductively defined natural numbers, exactly so long as the lists in question are inductive and thefore finite. This isomorphism is made up of length :: list unit -... (read more)

Have you considered posting your observation/question to cstheory.stackexchange.com? I'd be interested in seeing the response there.

0cousin_it13y
I asked a question on MathOverflow, you can find the link in my conversation with Emile above.
[-][anonymous]13y00

Curiously, if you replace "finite" with "countable" and "infinite" with "uncountable" then it is possible to devise such an encoding scheme. (This is connected with the existence of "Aronszajn trees".)

Suppose we have a countable alphabet of symbols A, and we want to encode countable ordinals using functions alpha -> A for countable ordinals alpha, then there exists an encoding scheme such that, after only countably many symbols have been received, you can either say "yes, this represents the countab... (read more)

[-][anonymous]13y00

I'm pretty sure information theory books mention this phenomenon (though not explicitly as a theorem) in discussions of stream arithmetic coding. Those can be interpreted as a fraction written in binary, where each bit narrows down a region of the [0,1) numberline, with earlier-terminating bitstreams reserved for more likely source strings.

Any probability distribution on integers has a corresponding (optimal) arithmetic stream coder and you can always make it indefinitely expect more input by feeding it the bit that corresponds to the least likely next ch... (read more)

As for your problem, Robert Harper (posts as 'Abstract Type') frequently notes that you cannot define the natural numbers or any inductive type in haskell because it is lazy so bottom (non-termination) is a member of all types and therefore requires coinductive types (for example see also the reddit discussion).

6cousin_it13y
Right. In fact, this phrase from Harper's blog was one of the inspirations for my post: ETA: it seems you slightly misunderstood Harper's point. The problem with defining "data Nat = Zero | Succ Nat" is not that bottom is in Nat, but rather that laziness allows you to write the circular definition "omega = Succ omega" (note that pattern-matching on omega doesn't hang), which is similar to the unary example in my post.
2jsalvatier13y
Thanks for correcting me :) I imagine there's a lot of Harper I don't understand correctly.

Whatever encoding you invent, I can give you an infinite input stream of bits that makes your decoder hang and never give a definite answer like "yes, this is an integer with such-and-such value" or "no, this isn't a valid encoding of any integer".

What if my encoding is: return TRUE(=is an integer) for all inputs?

That's not such a dumb thing to do. In fact, any encoding that didn't assign some integer outcome to any sequence of bits would be trivially suboptimal.

6cousin_it13y
The decoder also has to learn the integer's value. Also, each integer must have at least one bit sequence that decodes to it in finite time. Sorry, it seems I err on the side of conciseness even when I explicitly try to be wordy!
[-][anonymous]13y-10

How would an infinite stream of bits possibly encode an integer in the first place? All integers are finite, so unless you did something stupid like assign the infinite sequence 11111111... to the integer 37, an infinite number of bits should never correspond to any integer.

2benelliott13y
The idea is that you invent a system where each integer corresponds to a finite bit-string, but the lengths of these strings must be unbounded. Unary is one such code. Then you could set up a computer program which decodes these strings, so you feed it one bit at a time, each time asking it 'has an integer been uniquely specified yet?' The OP's claim is that whatever encoding method you come up with, he can come up with an infinite string that will make your program keep saying "no integer has been uniquely defined yet" forever. So, nobody is encoding integers as infinite strings, although there's no particular why you can't, in fact the overwhelming majority of possible encodings use infinite strings.
2cousin_it13y
The decoder doesn't have access to the fact that the stream is infinite. It can only query individual bits.
[-][anonymous]13y120

I guess my objection is that your task is too obviously impossible. If I tell you that I can color any page in any coloring book, and you give me an infinite coloring book, I won't be able to finish, even though I know how to color. Similarly, if we have a decoder that can interpret infinitely many finite bit strings, it can be stumped by an infinite one, by virtue of its being infinite.

6wmorgan13y
Agreed. There's an even easier way to confound the machine, really: when it asks you for the bit string, walk away from the computer! Note that there is no way to feed infinite input to a Turing machine. The halting problem isn't demonstrated by a program of infinite length. It's demonstrated by a finite program that resists static analysis. The analogous integer is not the one represented by an infinite bit stream, but it might be something like the Berry number.
2[anonymous]13y
The thing is, not just any infinite input stream will do. It has to at all times look properly encoded to the decoder. You're allowed to come up with any encoding scheme that you like, so what you can try to do is to come up with an encoding scheme that an infinite input stream must violate in finite time. This turns out to be impossible, and the proof is fairly easy, but it is not as obvious as your description characterizes it here.
5[anonymous]13y
Granted. But even so, the primary problem is that the input is infinite: we can deal with any finite input stream, no problem. ETA: Also, the obvious way to approach the problem is to have a terminator character (this is how the unary representation works, for instance). In that case, what this result says is "if your decoder expects a terminator, then if you feed it an infinite string without a terminator, it'll get stuck." This seems obvious.
0[anonymous]13y
It is very nearly as obvious as Misha's description characterizes... and Misha's analogy could be extended to 'color encoding schemes' in the same way.
0[anonymous]13y
Seems like you need to add some more explanation to the scenario to bring a wider audience up to speed.
-1[anonymous]13y
But the stream isnt an integer. If you tell me I must assign this number to an integer and give me no options to leave then I will be forced to spend the rest of mylife decoding it. Ignoring that we can't give an infinite sequence in the first place I am not sure this is a meaningful problem