It seems to me like this discussion has gotten too far into either/or "winning". The prevalance of encodings in mathematics and logic (e.g. encoding the concept of pairing in set theory by defining (a, b) to be the set {{a}, {a, b}}, the double negation encoding of classical logic inside intuitionist logic, et cetera) means that the things we point to as "foundations" such as the ZF axioms for set theory are not actually foundational, in the sense of necessary and supporting. ZF is one possible system which is sufficiently strong to encode the things that we want to do. There are many other systems which are also sufficiently strong, and getting multiple perspectives on a topic by starting from different alternative "foundations" is probably a good thing. You do not have to choose just one!
Another way to look at it is - the things that we call foundations of mathematics are like the Planck length - yes, according to our current best theory, that's the smallest, most basic element. However, that doesn't mean it is stable; rather, it's cutting edge (admittedly, Planck length is more at the cutting edge of physics than foundations of mathematics are at a cutting edge of mathematics). The stable things are more like cubits and inches - central, human-scale "theorems" such that if a new alternative foundation of mathematics contradicted them, we would throw away the foundation rather than the theorem. Some candidate "cubit" or "inch" theorems (really, arguments) might be the infinitude of primes, or the irrationality of the square root of 2.
Friedman and Simpson have a program of "Reverse Mathematics", studying in what way facts that we normally think of as theorems, like the Heine-Borel theorem, pin down things that we normally think of as logic axioms, such as induction schemas. I really think that Simpson's paper "Partial Realizations of Hilbert's Program" is insightful and helpful to this discussion: http://www.math.psu.edu/simpson/papers/hilbert.ps
Lakatos's "Proofs and Refutations" is also helpful, explaining in what sense a flawed and/or informal argument is helpful - Note that Lakatos calls arguments proofs, even when they're flawed and/or informal, implicitly suggesting that everyone calls their arguments proofs until the flaws are pointed out. I understand he retreated from that philosophical position later in life, but that pre-retreat philosophical position might still be tenable.
I agree that we should weigh possible foundations against desired results and respect multiple possibilities as you say. However, we need a formalization of this. It might be that 1st order vs 2nd order is not important. I would suggest, however, that the puzzle I presented in the post is important. The proof-theoretic structure of 1st vs 2nd order might not be a big deal. (A learning system which prefers compact first-order theories can learn the desired many-sorted logic.) The structure of reasonable probabilistic beliefs over these two domains, though, is another thing yet! (A learner which prefers compact first-order theories cannot mimic the desired behavior which I described.)
You won't automatically get the desired behavior by constructing some sort of intuitive learner based on informal principles. So, we need to discuss formalism.
Abram,
After coming to believe that ZF's smallest infinite set closed under succession describes time, why do you not expect to possibly see a Turing machine proving ZF inconsistent? In other words, as you invent stronger mathematical theories, why do you automatically expect there to be fewer and fewer possible times? Why not just generalize PA to describe time and stay there?
If I understand you correctly, then I'm not sure why you are asking the question. Supposing I generalize PA to describe time, I would like to still keep adding logical information to PA based on my observations. For example, I tentatively believe the Goldbach conjecture, and would like my probability to increase as more instances are checked. In fact, I would like my credence to increase as "something like" the fraction of Turing machines that are non-halting out of all Turing machines which don't halt by time X, where X is the largest number I've checked (assuming I'm checking starting from 2 and skipping nothing).
Furthermore, I would like this process to allow me to probabilistically accept a wide range of mathematical foundations, rather than only accepting more statements in the language of PA.
I'm asking about the process that causes other mathematical beliefs to generalize to your beliefs about physical time in such fashion that physical time always seems to have the smallest model allowed by any of your mathematical beliefs. When I learn that the ordinal epsilon-0 corresponds to an ordering on unordered finitely branching trees, I don't conclude that a basket of apples is made out of little tiny unordered trees. What do the physical apples have to do with ordinals, after all?
Why, as you come to believe that Zermelo-Fraenkel set theory has a model, do you come to believe that physical time will never show you a moment when a machine checking for ZF-inconsistency proofs halts? Why shouldn't physical time just be a random model of PA instead, allowing it to have a time where ZF is proven inconsistent? Why do you transfer beliefs from one domain to the other - or what law makes them the same domain?
This isn't meant to be an unanswerable question, I suspect it's answerable, I'm asking if you have any particular ideas about the mechanics.
Could you please clarify your question here?
Why, as you come to believe that Zermelo-Fraenkel set theory has a model, do you come to believe that physical time will never show you a moment when a machine checking for ZF-inconsistency proofs halts?
I try to intepret it (granted, I interpret it in my worldview which is different) and I cannot see the question here.
I am not 100% sure whether even PA has a model, but I find it likely that even ZFC has. But if I say that ZFC has a model, it means that this is a model where formula parts are numbered by the natural numbers derived from my notion of subsequent moments of time.
This isn't meant to be an unanswerable question, I suspect it's answerable, I'm asking if you have any particular ideas about the mechanics.
Oh, OK. (This was in fact my main confusion about the question.)
One thing is clear to me: in the case that I've made a closed-world assumption about my domain (which is what the least-set description of natural numbers is), I should somehow set up the probabilities so that a hypothesis which proves the existence of a domain element with a specific property does not get an a priori chunk of probability mass just for being a hypothesis of a specific length. Ideally, we want to set this up so that it follows "something like" the halting distribution which I mentioned in my previous reply (but of course that's not computable).
One idea is that such a hypothesis only gets probability mass from specific domain elements with that property. So, for example, if we want to measure the belief that the Goldbach conjecture is false, the probability mass we assign to that must come from the mass we assign to conjectures like false_at(1000), false_at(1002), et cetera. As I eliminate possibilities like this, the probability for exists x. false_at(x) falls.
This can be formalized by saying that for closed-world domains, as we form a theory, we must have an example before we introduce an existential statement. (I'm assuming that we want to model the probability of a theory as some process which generates theories at random, as in my recent AGI paper.) We might even want to say that we need a specific example before we will introduce any quantifiers. This doesn't strictly rule out any theories, since we can just write down examples that we don't yet know to be false before introducing the quantified statements, but it modifies the probabilities associated with them.
However, I'm not really confident in that particular trick (I'm not sure it behaves correctly for nested quantifiers, et cetera). I would be more confident if there were an obvious way to generalize this to work nicely with both open-world and closed-world cases (and mixed cases). But I know that it is not possible to generalize "correctly" to full 2nd-order logic (for any definition of "correctly" that I've thought of so far).
Response to Second-Order Logic: The Controversy. This started out as a comment, but got fun.
Several days later, the two meet again.
"You know, I still feel uneasy about the discussion we had. I didn't walk away with the feeling that you really appreciated the point that any proof system you propose for 2nd-order logic can equally well be interpreted as a proof system for a first-order many-sorted logic. If an alien took a look at a computer program implementing 2nd-order logic, what reason would they have to conclude that the computer program was an attempt to implement true quantification over properties, with everything our mathematicians take that to mean?"
But I told you: I want a system which will be able to consider the hypothesis that time is connected, or perhaps the hypothesis that the universe is finite, and so on.
"But externally, it looks like you just have some particular first-order theory in which that term has a meaning. We can talk about the kind of 'finite' that first-order set theory allows us to define. We could suppose your brain implements some particular theory, which you can't verbally articulate, but which is what you mean when you say such things."
Turing machines either halt or don't halt! Like I said before, I don't want a system that treats "not provable one way or the other" as a physical possibility. I don't even know what that means.
"Of course you would think that. Any system implementing classical logic will think that. First-order set theory doesn't prove it one way or the other for every Turing machine, but it still claims that every Turing machine halts or doesn't halt. It just has incomplete knowledge. Everything you're saying is still consistent with the use of just first-order logic."
But I want to rule out the nonstandard models! I want a system which will not expect, with some probability, to see specific Turing machines halt at a non-standard time.
"We can't distinguish that behavior in the real world. Everything you say to me will sound the same. When you predict your observations for specific amounts of time into the future, you are making predictions about the connected chain of time. If you make a prediction about the entire future, we could say the nonstandard models slip in, but that's exactly like what happens due to the limitations of our knowledge in the 2nd-order case. There is nothing magical about saying that we rule out nonstandard models! We must be saying that in some logic, and the proof system of that logic won't rule out the nonstandard models. We're just ruling out sets of possibilities which contradict our theory."
You're right, of course. Maybe it's a mistake to talk about models in this way. I'd like to be able to state my point as a property of the physical operation of the system, but I don't know how.
"Taking 2nd-order logic just to be a many-sorted first-order logic even addresses your point about the infinite schema in first-order arithmetic. We don't need to come up with some modification of the induction probabilities. We can just prefer succinct theories in this particular many-sorted first-order logic. In fact, we can just prefer short theories in first-order logic, since we can describe the many-sorted situation with predicates in a one-sorted logic."
Is that... true? Now you're saying that even for induction, first-order logic already covers everything that 2nd-order logic can do. What about mathematical conjectures for the natural numbers? Suppose I define a computable property of numbers, and I check that property up to 10,000. This allows us to rule out theories inconsistent with what we've found, which should increase our credence in the theory that the property holds for all numbers. But if we can't *prove* that from our limited axioms, then a nonstandard model will have a finite description consistent with what we know! So, an induction system which assigns probabilities based on first-order description length will continue to give a fixed credence to that, because we never rule it out as we check more cases. But a belief should converge to one as we check more cases. My hypothetical 2nd-order system should do that.
"If you don't know anything about these objects you're observing, do you really want to converge to conventional number theory? Let's go back to your example about time. We don't have any reason to suspect that time is a disconnected object. It's probably connected. But if there were disconnected moments, as in nonstandard models, we would have no way of falsifying that. Observations which you can get in our connected bit of the universe tell you nothing about whether there are disconnected bits. So I don't think you want to converge to zero probability there."
Maybe. But let's stick to my example. If we've already stated that we're dealing with the natural numbers, and we agree that these are the least set containing zero and closed under the successor relationship, shouldn't our belief in universal statements about them approach one as we observe examples?
"Sure, you'd like it to... but is that possible? The 2nd-order logic has limited knowledge, just like the first-order version. There is still a finite theory which describes stuff we would only see in nonstandard cases. The theory is inconsistent according to the supposed semantics, but the actual proof system will never rule it out using the observations it can make. So it sounds like it's not possible to converge to probability 1 for the correct belief, here."
You just admitted that there's a correct belief! But never mind. Anyway, it's perfectly possible. We could insert a hack, which says that in the case where we're discussing, we believe the universal statement with probability 1/(x+2), where x is the size of the smallest natural number we haven't checked for the property yet. If the universal statement gets logically proved or disproved, we discard this probability.
"That... works... but... it's not remotely plausible! It obviously doesn't generalize. You can't just say a special case is handled completely differently from everything else. And even for that case, you pulled the function 1/(x+2) from nowhere. It's not a rational expectation."
I know, I know. I don't know how to make it into a reasonable credence measure for beliefs. I'll tell you one thing: if we assert that the structure we're considering is the least set satisfying some first-order axioms, then we should be wary of new statements which assert the existence of examples with properties we have never seen before. These examples could be non-standard, even if they are consistent with our proof system. We should assign them diminishing probability as we check more examples and never manage to construct anything which satisfies the strange property.
"I appreciate the effort, but I'm not sure where it's going. We know for sure that it's not possible to make an approximatable probability distribution consistent with the semantics of 2nd-order logic. The Turing Jump operator applied to the halting problem gives us the convergence problem: even with the aid of a halting oracle, we can't decide whether a particular process will converge. Your result shows that we can converge to the right answer for universal generalizations of computable predicates, which is just another way of saying that we can specify a convergent process which eventually gets the right answer for halting problems. However, 2nd-order arithetic can discuss convergence problems just as well as halting problems, and many more iterations of the Jump operator in fact. Any convergent process will just have to be wrong for a whole lot of cases."
Maybe there's a divergent probability distribution that's reasonable in some sense...? No, forget I said anything, that's foolish. You're right, we can't rule out all the nonstandard models this way. There should be some relevant notion of rational belief to apply, though...
"Should there? Rationality isn't exactly floating out there in the Platonic realm for us to reach out and grab it. We make it up as we go along. We're evolved creatures, remember? It's a miracle that we can even discuss the halting problem! There's no reason we should be able to discuss the higher infinities. Maybe we're just fooling ourselves, and the whole problem is meaningless. Maybe we're just a first-order system which has learned to postulate a many-sorted reality, and think it's doing 2nd-order logic, because that just happens to be a convenient way of describing the universe. We think we can meaningfully refer to the least set satisfying some properties, but we can't state what we mean well enough to get at the ramifications of that. We think 'Every Turing machine halts or doesn't!' because we're running on classical logic... but we're fooling ourselves, because all our definitions of Turing machines are fundamentally incomplete, and there isn't actually any unique thing we mean by it. We think there continue to be true answers to problems even as we pile on Turing Jump operators, but really, it's nothing but mathematical sophistry. All of this is consistent with what we know."
But do you believe that?
"Well..."