Followup to: What's a "natural number"?
While thinking about how to make machines understand the concept of "integers", I accidentally derived a tiny little math result that I haven't seen before. Not sure if it'll be helpful to anyone, but here goes:
You're allowed to invent an arbitrary scheme for encoding integers as strings of bits. Whatever encoding you invent, I can give you an infinite input stream of bits that will make your decoder hang and never give a definite answer like "yes, this is an integer with such-and-such value" or "no, this isn't a valid encoding of any integer".
To clarify, let's work through an example. Consider an unary encoding: 0 is 0, 1 is 10, 2 is 110, 3 is 1110, etc. In this case, if we feed the decoder an infinite sequence of 1's, it will remain forever undecided as to the integer's value. The result says we can find such pathological inputs for any other encoding system, not just unary.
The proof is obvious. (If it isn't obvious to you, work it out!) But it seems to strike at the heart of the issue why we can't naively explain to computers what a "standard integer" is, what a "terminating computation" is, etc. Namely, if you try to define an integer as some observable interface (get first bit, get last bit, get CRC, etc.), then you inevitably invite some "nonstandard integers" into your system.
This idea must be already well-known and have some standard name, any pointers would be welcome!
Well you could, but then you'd need a way to protect against an input of only ones - i.e. "all Turing machines halt in N steps or less". You could make a decoder that handles cases like that by checking whether a Turing machine eventually starts looping, but that makes it's logic less straightforward, and it isn't clear whether an infinite input couldn't be found without solving the halting problem.
Turing machines could be classified in three categories:
The halting problem is checking for membership in H, and tricking the "loop-detecting" 2-bit decoder would "only" require never returning '1' for a member of L. There may be an algorithm A that returns 1 for membership in H, 0 for membership of L, and either 1 or 0 for membership of F. That algorithm wouldn't be equivalent to a halting oracle, so it might not be undecidable, I don't know.
So having 3 bits just makes checking simpler, since you can't pretend to know that an algorithm will halt, you have to know something about the state it halts in too.