Followup to: What's a "natural number"?
While thinking about how to make machines understand the concept of "integers", I accidentally derived a tiny little math result that I haven't seen before. Not sure if it'll be helpful to anyone, but here goes:
You're allowed to invent an arbitrary scheme for encoding integers as strings of bits. Whatever encoding you invent, I can give you an infinite input stream of bits that will make your decoder hang and never give a definite answer like "yes, this is an integer with such-and-such value" or "no, this isn't a valid encoding of any integer".
To clarify, let's work through an example. Consider an unary encoding: 0 is 0, 1 is 10, 2 is 110, 3 is 1110, etc. In this case, if we feed the decoder an infinite sequence of 1's, it will remain forever undecided as to the integer's value. The result says we can find such pathological inputs for any other encoding system, not just unary.
The proof is obvious. (If it isn't obvious to you, work it out!) But it seems to strike at the heart of the issue why we can't naively explain to computers what a "standard integer" is, what a "terminating computation" is, etc. Namely, if you try to define an integer as some observable interface (get first bit, get last bit, get CRC, etc.), then you inevitably invite some "nonstandard integers" into your system.
This idea must be already well-known and have some standard name, any pointers would be welcome!
OK, here's my encoding (thanks Misha):
N is encoded by N triplets of bits of the form 1??, followed by a zero.
Each triplet is of the form (1, a(N, k), b(N, k)) (with 1 <= k <= N), where
a(N, k) means "Turing machine number k halts in N steps or less, and it's final state has an even number of ones on the tape", and
b(N, k) means "Turing machine number k halts in N steps or less, and it's final state has an odd number of ones on the tape".
By "Turing machine number k" I mean we have an infinite list containing all possible Turing machines, sorted by description length.
So to encode N, the encoder "just" simulates the N first turing machines for N steps, and notes which of them stops.
The decoder reads his input triplet-by-triplet. When reading input k, it simulates the k first Turing machines for k steps, and checks whether the results are consistent with all that it has read in the input so far : if one of the simulated machines halts in a way not announced in the input (or if a triplet is impossible like "111"), it stops reading.
While an "infinitely confusing" input stream does exist, generating it requires solving the halting problem, which is generally considered pretty hard.
Nice! But why do you need the parity of the output tape? Can't you use groups of 2 bits instead of 3 and just specify whether machine number k halts within N steps? (Edit: sorry, now I see why. You could confuse the decoder by telling it that all machines halt.)
Thanks a lot! Your solution feels more intuitive to me than the one offered on MathOverflow.