Patrick comments on The Pascal's Wager Fallacy Fallacy - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (121)
You reference a popular idea, something like "The integers are countable, but the real number line is uncountable." I apologize for nitpicking, but I want to argue against philosophers (that's you, Eliezer) blindly repeating this claim, as if it was obvious or uncontroversial.
Yes, it is strictly correct according to current definitions. However, there was a time when people were striving to find the "correct" definition of the real number line. What people ended up with was not the only possibility, and Dedekind cuts (or various other things) are a pretty ugly, arbitrary construction.
The set containing EVERY number that you might, even in principle, name or pick out with a definition is countable (because the set of names, or definitions, is a subset of the set of strings, which is countable).
The Lowenheim-Skolem theorem says (loosely interpreted) that even if you CLAIM to be talking about uncountably infinite things, there's a perfectly self-consistent interpretation of your talk that refers only to finite things (e.g. your definitions and proofs themselves).
You don't get magical powers of infinity just from claiming to have them. Standard mathematical talk is REALLY WEIRD from a computer science perspective.
That's not the Lowenheim-Skolem Theorem. You've confused finite with countable (i.e. finite or countably infinite). Here's a simple example of a theory that can't be satisfied with a finite model.
Any model that satisfies this must have at least 1 element by axiom 1. Call it 0. s(0) != 0 so the model must have at least 2 elements. s(s(0)) != s(0) by axiom 2. So the model has at least 3 elements.
Suppose we have n distinct elements in our model, all obtained from applying s to 0 the appropriate number of times. Then we need one more, since s(s(...(s(n))) [applied n times] != s(s(...s(n))) [applied n-1 times] (this follows from axiom 2.)
So any model that satisfies these axioms must be infinite. (Incidentally, you can get theories that specify the natural numbers with more precision: see http://en.wikipedia.org/wiki/Robinson_Arithmetic).
Mathematicians routinely use "infinite" to mean "infinite in magnitude". For example, the concept "The natural numbers" is infinite in magnitude, but I have picked it out using only 19 ascii characters. From a computer science perspective, it is a finite concept - finite in information content, the number of bits necessary to point it out.
Each of the objects in the set of the Peano integers is finite. The set of Peano integers, considered as a whole, is infinite in magnitude, but finite in information content.
Mathematician's routine speech sometimes sounds as if a generic real number is a small thing, something that you could pick up and move around. In fact, a generic real number (since it's an element of an uncountable set) is infinite in information content - they're huge, and impossible to encounter, much less pick up.
Lowenheim-Skolem allows you to transform proofs that, on a straightforward reading, claim to be manipulating generic elements of uncountable sets (picking up and moving around real numbers for example), with proofs that claim to be manipulating elements of countable sets - that is, objects that are finite in information content.
In that tranformation, you will probably introduce "objects" which are something like "the double-struck N", and those objects certainly still satisfy internal predicates like "InfiniteInMagnitude(the double-struck N)".
However, you're never forced to believe that mathematicians are routinely doing impossible things - you can always take a formalist stance, pointing out that mathematicians are actually manipulating symbols, which are small, finite-in-information-content things.
Löwenheim-Skolem only applies to first-order theories. While there are models of the theory of real closed fields that are countable, referring to those models as "the real numbers" is somewhat misleading, because there isn't only one of them (up to model-theoretic isomorphism).
Also, if you're going to measure information content, you really need to fix a formal language first, or else "the number of bits needed to express X" is ill-defined.
Basically, learn model theory before trying to wield it.
I don't know model theory, but isn't the crucial detail here whether or not the number of bits needed to express X is finite or infinite? If so, then it seems we can handwave the specific formal language we're using to describe X, in the same way that we can handwave what encoding for Turing Machines generally when talking about Kolmogorov complexity, even though to actually get a concrete integer K(S) representing the Kolmogorovo complexity of a string S requires us to use a fixed encoding of Turing Machines. In practice, we never actually care what the number K(S) is.
Let's say I have a specific model of the real numbers in mind, and lets pretend "number of bits needed to describe X" means "log2 the length of the shortest theory that proves the existence of X."
Fix a language L1 whose constants are the rational numbers and which otherwise is the language of linear orders. Then it takes a countable number of propositions to prove the existence of any given irrational number (i.e., exists x[1] such that x[1] < u[1], ..., exists y[1] such that y[1] > v[1], ..., x[1] = y[1], ... x[1] = x[2], ..., where the sequences u[n] and v[n] are strict upper and lower bounding sequences on the real number in question).
Now fix a language L2 whose constants are the real numbers. It now requires one proposition to prove the existence of any given irrational number (i.e., exists x such that x = c).
The difference between this ill-defined measure of information and Kolomogrov complexity is that Turing Machines are inherently countable, and the languages and models of model theory need not be.
(Disclaimer: paper-machine2011 knew far more about mathematical logic than paper-machine2016 does.)
Whether a theory proves the existence of X may be an undecideable question.
How many bits it takes to describe X is an undecidable question when defined in other ways, too.
The definition "length of the shortest program which minimizes (program length + runtime)" isn't undecideable, although you could argue that that's not what we normally mean by number of bits.
Adding program length and runtime feels to me like a type error.
So, given this, what exactly is your complaint? You started off criticizing Eliezer (and whomever else) for saying "The integers are countable, but the real number line is uncountable" - I suppose on the grounds that everything in the physical universe is countable, or something. (You weren't exactly clear.) But now you point out (correctly) that there is a perfectly good interpretation of this statement which in no way depends on there being an uncountable number of physical things anywhere, or otherwise violates your (not-exactly-well-defined) philosophy. So haven't you just defeated yourself?
I have a knee-jerk response, railing against uncountable sets in general and the real numbers in particular; it's not pretty and I know how to control it better now.
I'm fairly confident that for your purposes you could live with the computable numbers (that is: those numbers whose decimal expansion can be computed by <fix a Turing-equivalent computational foundation here>), and as long as you didn't need anything stronger than integration amenable to quadrature, you'd be just fine.
There are people who take this route, but I can't think of any off the top of my head. Knuth once stated that he'd like to write a calculus book roughly following this path, but, well, he's got other things on his mind.
EDIT: I should point out also that the computable numbers are countable (by the usual Godel encoding of whatever machine is rattling off the digits for you), and that for all practical intents and purposes they're probably equivalent to whatever calculus-related mischief is in play at the moment.
There's some weirdnesses down that route - for example, it turns out that you can't distinguish zero from nonzero, so the step function is actually uncomputable.
My contrarian claim is that everyone could live with the nameable numbers - that is, the numbers that can be pointed out using a finite number of books to describe them. People who really strongly care about the uncountability of the reals have a hard time coming up with a concrete example of what they'd miss.
I don't understand. Those also seem to fall prey to
Also,
Lebesgue measure theory, Gal(C/R) = Z/2Z, and some pathological examples in the history of differential geometry without which the current definition of a manifold would have been much more difficult to ascertain.
Off the top of my head. There are certainly other things I would miss.
Those are theories, which are not generally lost if you switch the underlying definitions aptly - and they are sometimes improved (if the new definitions are better, or if the switch demonstrates an abstraction that was not previously known).
People can't pick out specific examples of numbers that are lost by switching to using nameable numbers, they can only gesture at broad classes of numbers, like "0.10147829..., choosing subsequent digits according to no specific rule". If you can describe a specific example (using Lebesgue measure theory if you like), then that description is a name for that number.
I really wish I had the time to explicitly write out the reasons why I believe these examples are compelling reasons to use the usual model of the real numbers. I tried, but I've already spent too long and I doubt they would convince you anyway.
So? Omega could obliterate 99% of the particles in the known universe, and I wouldn't be able to name a particular one. If it turns out in the future that these nameable numbers have nice theoretic properties, sure. The effort to rebuild the usual theory doesn't seem to be worth the benefit of getting rid of uncountability. (Or more precisely, one source of uncountability.)
I think I've spent enough time procrastinating on this topic. I don't see it going anywhere productive.
I'm not sure what a "nameable number" is. Whatever countable naming scheme you invent, I can "name" a number that's outside it by the usual diagonal trick: it differs from your first nameable number in the first digit, and so on. (Note this doesn't require choice, the procedure may be deterministic.) Switching from reals to nameable numbers seems to require adding more complexity than I'm comfortable with. Also, I enjoy having a notion of Martin-Löf random sequences and random reals, which doesn't play nice with nameability.
I'm confused; this is true for any real closed field. What are you getting at with this?
A mistake. I was thinking of C as the so-called "generic complex numbers." You're right that if you replace C with the algebraic closure of whatever countable model's been dreamed up, then C = R[i] and that's it.
Admittedly I'm only conjecturing that Gal(C/K) will be different for some K countable, but I think there's good evidence in favor of it. After all, if K is the algebraic closure of Q, then Gal(C/K) is gigantic. It doesn't seem likely that one could "fix" the other "degrees of freedom" with only countably many irrationals.
Of course, whether a number is definable or not depends on the surrounding theory. Stick to first-order theory of the reals and only algebraic numbers will be definable! Definable in ZF? Or what?
EDIT Apr 30: Oops! Obviously definability depends only on the ambient language, not the actual ambient theory... no difference here between ZF and ZFC...
Sorry for what might be a silly question, but what do you mean by “generic real number”? In the sense of “one number picked at random from the set”, a “generic natural number” would also be a huge and impossible to encounter—almost all natural numbers would need more bits then are Plank volumes in the universe to represent—and it doesn’t seem that you’re trying to say that.
If you start selecting things at random, then you need a probability distribution. Many routinely used probability distributions over the natural numbers give you a nontrivial chance of being able to fit the number on your computer.
There are, of course, corresponding probability distributions over the reals (take a probability distribution over the natural numbers and give zero probability to anything else). However, the routinely used probability distributions on the reals give zero probability to the output being a natural number, a rational number, describable with a finite algebraic equation, or in fact, being able to fit the number on your computer.
One of the problems with real numbers is that if someone trying to do Bayesian analysis of a sensor that reads 2.000..., or 3.14159... using one of these real number distributions as their prior, cannot conclude that the quantity measured probably is 2 or pi, no matter how many digits of precision the sensor goes out to.
I get that the sensor thing was only an example, but still: it doesn’t seem like a real objection. I mean, you’re not going to have (or need) a sensor with infinity decimals of precision. (Or perhaps I’m not understanding you?)
In terms of “selecting things at random”, for any practical use I can think of you’ll be selecting things like intervals, not real numbers. I don’t quite see how that’s relevant to the formalism you use to reason about how and what you’re calculating.
I think there’s some big understanding gap here. Could you explain (or just point to some standard text), how does one reason about trivial things like areas of circles and volumes of spheres without using reals?
Perhaps you've confused the "pi has a decimal expansion goes on forever without seeming pattern" with "a generic real number has a decimal expansion that goes on forever without pattern"? Pi does have a finite representation, "Pi". We use it all the time, and it's just as precise as "2" or "0.5".
Specifically, you could start with the rationals, and complete it by including all solutions to differential equations. Pi would be included, and many other numbers, but you'd still only have a countable set - because every number would have one or more shortest definitions - finite information content.
If you had a probability distribution over such a set, it would naturally favor hypotheses with short definitions. If it started out including pi as a possibility, and you gathered sufficient sensor data consistent with pi (a finite amount), the probability distribution would give pi as the best hypothesis. This is reasonable behavior. You have to do non-obvious mucking around with your prior to get that sort of reasonableness with standard real-number probabilities.
As others have pointed out, any specific countable system of numbers (such as the "solutions to differential equations" that I mentioned) is susceptible to diagonalization - but I see no reason to "complete" the process, as if being susceptible to diagonalization were a terrible flaw above all others. All the entities that you're actually manipulating (finite precision data and finite, stateable hypotheses like "exactly 2" and "exactly pi") are finite-information-content, and "completing" the reals against diagonalization makes essentially all the reals infinite-information-content - a cure in my mind far worse than the disease.
(Note: I’m not arguing in this particular post, just asking clarifying questions, as you seem to have the issues much clearer in your mind than I do.)
1) It seems one can start with naturals, extend them to integers, then to rationals, then to whatever set results from including solutions to differential equations (does that have a standard name?). I imagine there are countably infinite many constructions like that, am I right? They seem to “divide” the numbers “finer” (I’d welcome a hint to more formal description of this), though they aren’t necessarily totally ordered in terms of how “fine” they are, and that the limit of this process after an infinity of extensions seems to be the reals. (Am I missing something important until here? In particular, we can reach the reals much faster, is there some important property in particular the countable extensions have in general, other than their result set being countable and their individual structure?)
2) Do you have other objections to real numbers that do not involve probabilities, probability distributions, and similar information theory concepts?
3) I don’t quite grok your π example. It seems to me that a finite amount of sensor data will always only be able to tell you it’s consistent with all values in the interval π±ε; if you’re using a sufficiently “dense” set, even just the rationals, you’ll have an infinity of values in that interval, while using the reals you’ll have an uncountable one. In the countable case you’ll have to have probabilities for the countable infinity of consistent values, which could result in π being the most probable one, and in the uncountable one you’ll need a probability distribution function, which could as well have π as the most probable. (In particular, I can’t see a reason why you couldn’t find a the probability distribution function that has exactly the same value as your probability function when applied to the values in your π-containing countably-infinite set and is “well-behaved” in some sense on the reals between them; but I’m likely to miss something here.)
I sort-of get that picking π in a countable set can be a finite-information operation and an infinite-information one in an uncountable set (though I’m not quite clear if or why that works on sets at least as “finely divided” as the rationals). But that seems to be a trick of picking the right countable set to contain the value you’re looking for:
If you started estimating π (let’s say, the ratio of diameter to circumference in an euclidean universe) with, say, just the rationals, you may or may not get a “most likely” hypothesis, but it wouldn’t be π; you’ll only estimate that one if you happened to start with a set that contained it. And if you use a set that contains π, there would always be some kind of other number that fits in a “finer”-but-countable set you aren’t using that you might need to estimate (assuming there’s a lot of such sets as I speculate in point 1 above).
Of course, using the reals doesn’t save you from that: you still need an infinite amount of information to find an arbitrary real. But by using probability distributions—even if you construct them by picking a probability function on a countable set and then extending it to the reals somehow—it forces you to think about the parts outside that countable set (i.e., other even “finer” countable sets). In a way, this feels like reminding you of things you didn’t think of.
OK, what am I missing?
1) Yes, there are countably many constructions of various kinds of numbers. The construction can presumably be written down, and strings are finite-information-content entities. Yes, they're normally understood to form a set-theoretic lattice - the integers are a subset of the gaussian integers, and the integers are a subset of the rationals, and both the gaussian and rationals are a subset of the complex plane.
However, the reals are not in any well-defined sense "the" limit of that lattice - you could create a contrived argument that they are, but you could also create an argument that the natural limit is something else, either stopping sooner, or continuing further to include infinities and infinitiesimals or (salient to mathematicians) the complex plane.
Defenders of the reals as a natural concept will use the phrase "the complete ordered field", but when you examine the definition of "complete" they are referencing, it uses a significant amount of set theory (and an ugly Dedekind cuts construction) to include everything that it wants to include, and exclude many other things that might seem to be included.
2) Yes. I think the reals are a history-laden concept; they were built in fear of set-theoretic and calculus paradoxes, and without awareness of the computational approach - information theory and Godel's incompleteness. They are useful primarily in the way that C++ is useful to C++ programmers - as a thicket or swamp of complexity that defends the residents from attack. Any mathematician doing useful work in a statistical, calculus-related, or topological field who casually uses the reals will need someone else, a numerical analyst or computer scientist, to laboriously go through their work and take out the reals, replacing them with a computable (and countable) alternative notion - different notions for different results. Often, this effort is neglected, and people use IEEE floats where the mathematician said "real", and get ridiculous results - or worse, dangerously reasonable results.
3) You're right that the finite amount of sensor data will only say it is consistent with this interval. As you point out, if there are an uncountable set within that interval, then it's entirely possible for there to be no single value that is a maximum of the probability distribution function. (That's an excellent example of some of the ridiculous circumlocutions that come from using uncountable sets, when you actually want the system to come up with one or a few best hypotheses, each of which is stateable.)
Pi is always a finite-information entity. Everything nameable is. It doesn't become an infinitely large in information content just because you consider it as an element of the reals.
Yes - if you use a probability distribution over the rationals as your prior, and the actual value is irrational, then you can get bad results. I think this is a serious problem, and we should think hard about what bayesian updating with misspecified models looks like (I know Cosma Shalizi has done some work on this), so that we have some idea what failure looks like. We should also think carefully about what we would consider to be a reasonable hypothesis, one that we might eventually come to rest on.
However, it's a false fork to argue "We wouldn't use the rationals therefore we should use the reals". As I've been trying to say, the reals are a particular, large, complicated, and deeply historical construction, and we should not expect to encounter them "out in the world".
Andrej Bauer has implemented actual real number arithmetic (not IEEE nonsense, or "computable reals" which are interesting, but not reals). Peano integers, in his (Ocaml-based) language, RZ, would probably be five or ten lines. (Commutative groups are 13 lines). In contrast, building from commutative groups up to defining reals as sequences of nested intervals takes five pages; see the appendix: http://math.andrej.com/wp-content/uploads/2007/04/rzreals.pdf
Regarding "reminding you of things you didn't think of", I think Cosma Shalizi and Andrew Gelman have convincingly argued that Bayesian philosophy/methodology is flawed - we don't just pick a prior, collect data, do an update, and believe the results. If we were magical uncomputable beasties (Solomonoff induction), possibly that is what we would do. In the real world, there are other steps, including examining the data, including the residual errors, to see if it suggests hypotheses that weren't included in the original prior. http://www.stat.columbia.edu/~gelman/research/unpublished/philosophy.pdf
Hi John! Thank you very much for taking the time to answer at such length. The links you included were also very interesting, thanks.
I think I got a bit of insight into the original issue (way up in the comments, when I interjected in your chat with Patrick).
With respect to the points closer in this thread, it’s become more like teaching than an actual discussion. I’m much too little educated in the subject, so I could contribute mostly with questions (many inevitably naïve) rather than insights. I’ll stop here then; though I am interested, I’m not interested enough right now to educate myself, so I won’t impose on your time any longer.
(That is, not unless you want to. I can continue if for some reason you’d take pleasure in educating me further.)
Thank you again for sharing your thoughts :-)