Response to Second-Order Logic: The Controversy. This started out as a comment, but got fun.
Several days later, the two meet again.
"You know, I still feel uneasy about the discussion we had. I didn't walk away with the feeling that you really appreciated the point that any proof system you propose for 2nd-order logic can equally well be interpreted as a proof system for a first-order many-sorted logic. If an alien took a look at a computer program implementing 2nd-order logic, what reason would they have to conclude that the computer program was an attempt to implement true quantification over properties, with everything our mathematicians take that to mean?"
But I told you: I want a system which will be able to consider the hypothesis that time is connected, or perhaps the hypothesis that the universe is finite, and so on.
"But externally, it looks like you just have some particular first-order theory in which that term has a meaning. We can talk about the kind of 'finite' that first-order set theory allows us to define. We could suppose your brain implements some particular theory, which you can't verbally articulate, but which is what you mean when you say such things."
Turing machines either halt or don't halt! Like I said before, I don't want a system that treats "not provable one way or the other" as a physical possibility. I don't even know what that means.
"Of course you would think that. Any system implementing classical logic will think that. First-order set theory doesn't prove it one way or the other for every Turing machine, but it still claims that every Turing machine halts or doesn't halt. It just has incomplete knowledge. Everything you're saying is still consistent with the use of just first-order logic."
But I want to rule out the nonstandard models! I want a system which will not expect, with some probability, to see specific Turing machines halt at a non-standard time.
"We can't distinguish that behavior in the real world. Everything you say to me will sound the same. When you predict your observations for specific amounts of time into the future, you are making predictions about the connected chain of time. If you make a prediction about the entire future, we could say the nonstandard models slip in, but that's exactly like what happens due to the limitations of our knowledge in the 2nd-order case. There is nothing magical about saying that we rule out nonstandard models! We must be saying that in some logic, and the proof system of that logic won't rule out the nonstandard models. We're just ruling out sets of possibilities which contradict our theory."
You're right, of course. Maybe it's a mistake to talk about models in this way. I'd like to be able to state my point as a property of the physical operation of the system, but I don't know how.
"Taking 2nd-order logic just to be a many-sorted first-order logic even addresses your point about the infinite schema in first-order arithmetic. We don't need to come up with some modification of the induction probabilities. We can just prefer succinct theories in this particular many-sorted first-order logic. In fact, we can just prefer short theories in first-order logic, since we can describe the many-sorted situation with predicates in a one-sorted logic."
Is that... true? Now you're saying that even for induction, first-order logic already covers everything that 2nd-order logic can do. What about mathematical conjectures for the natural numbers? Suppose I define a computable property of numbers, and I check that property up to 10,000. This allows us to rule out theories inconsistent with what we've found, which should increase our credence in the theory that the property holds for all numbers. But if we can't *prove* that from our limited axioms, then a nonstandard model will have a finite description consistent with what we know! So, an induction system which assigns probabilities based on first-order description length will continue to give a fixed credence to that, because we never rule it out as we check more cases. But a belief should converge to one as we check more cases. My hypothetical 2nd-order system should do that.
"If you don't know anything about these objects you're observing, do you really want to converge to conventional number theory? Let's go back to your example about time. We don't have any reason to suspect that time is a disconnected object. It's probably connected. But if there were disconnected moments, as in nonstandard models, we would have no way of falsifying that. Observations which you can get in our connected bit of the universe tell you nothing about whether there are disconnected bits. So I don't think you want to converge to zero probability there."
Maybe. But let's stick to my example. If we've already stated that we're dealing with the natural numbers, and we agree that these are the least set containing zero and closed under the successor relationship, shouldn't our belief in universal statements about them approach one as we observe examples?
"Sure, you'd like it to... but is that possible? The 2nd-order logic has limited knowledge, just like the first-order version. There is still a finite theory which describes stuff we would only see in nonstandard cases. The theory is inconsistent according to the supposed semantics, but the actual proof system will never rule it out using the observations it can make. So it sounds like it's not possible to converge to probability 1 for the correct belief, here."
You just admitted that there's a correct belief! But never mind. Anyway, it's perfectly possible. We could insert a hack, which says that in the case where we're discussing, we believe the universal statement with probability 1/(x+2), where x is the size of the smallest natural number we haven't checked for the property yet. If the universal statement gets logically proved or disproved, we discard this probability.
"That... works... but... it's not remotely plausible! It obviously doesn't generalize. You can't just say a special case is handled completely differently from everything else. And even for that case, you pulled the function 1/(x+2) from nowhere. It's not a rational expectation."
I know, I know. I don't know how to make it into a reasonable credence measure for beliefs. I'll tell you one thing: if we assert that the structure we're considering is the least set satisfying some first-order axioms, then we should be wary of new statements which assert the existence of examples with properties we have never seen before. These examples could be non-standard, even if they are consistent with our proof system. We should assign them diminishing probability as we check more examples and never manage to construct anything which satisfies the strange property.
"I appreciate the effort, but I'm not sure where it's going. We know for sure that it's not possible to make an approximatable probability distribution consistent with the semantics of 2nd-order logic. The Turing Jump operator applied to the halting problem gives us the convergence problem: even with the aid of a halting oracle, we can't decide whether a particular process will converge. Your result shows that we can converge to the right answer for universal generalizations of computable predicates, which is just another way of saying that we can specify a convergent process which eventually gets the right answer for halting problems. However, 2nd-order arithetic can discuss convergence problems just as well as halting problems, and many more iterations of the Jump operator in fact. Any convergent process will just have to be wrong for a whole lot of cases."
Maybe there's a divergent probability distribution that's reasonable in some sense...? No, forget I said anything, that's foolish. You're right, we can't rule out all the nonstandard models this way. There should be some relevant notion of rational belief to apply, though...
"Should there? Rationality isn't exactly floating out there in the Platonic realm for us to reach out and grab it. We make it up as we go along. We're evolved creatures, remember? It's a miracle that we can even discuss the halting problem! There's no reason we should be able to discuss the higher infinities. Maybe we're just fooling ourselves, and the whole problem is meaningless. Maybe we're just a first-order system which has learned to postulate a many-sorted reality, and think it's doing 2nd-order logic, because that just happens to be a convenient way of describing the universe. We think we can meaningfully refer to the least set satisfying some properties, but we can't state what we mean well enough to get at the ramifications of that. We think 'Every Turing machine halts or doesn't!' because we're running on classical logic... but we're fooling ourselves, because all our definitions of Turing machines are fundamentally incomplete, and there isn't actually any unique thing we mean by it. We think there continue to be true answers to problems even as we pile on Turing Jump operators, but really, it's nothing but mathematical sophistry. All of this is consistent with what we know."
But do you believe that?
"Well..."
Oh, OK. (This was in fact my main confusion about the question.)
One thing is clear to me: in the case that I've made a closed-world assumption about my domain (which is what the least-set description of natural numbers is), I should somehow set up the probabilities so that a hypothesis which proves the existence of a domain element with a specific property does not get an a priori chunk of probability mass just for being a hypothesis of a specific length. Ideally, we want to set this up so that it follows "something like" the halting distribution which I mentioned in my previous reply (but of course that's not computable).
One idea is that such a hypothesis only gets probability mass from specific domain elements with that property. So, for example, if we want to measure the belief that the Goldbach conjecture is false, the probability mass we assign to that must come from the mass we assign to conjectures like false_at(1000), false_at(1002), et cetera. As I eliminate possibilities like this, the probability for exists x. false_at(x) falls.
This can be formalized by saying that for closed-world domains, as we form a theory, we must have an example before we introduce an existential statement. (I'm assuming that we want to model the probability of a theory as some process which generates theories at random, as in my recent AGI paper.) We might even want to say that we need a specific example before we will introduce any quantifiers. This doesn't strictly rule out any theories, since we can just write down examples that we don't yet know to be false before introducing the quantified statements, but it modifies the probabilities associated with them.
However, I'm not really confident in that particular trick (I'm not sure it behaves correctly for nested quantifiers, et cetera). I would be more confident if there were an obvious way to generalize this to work nicely with both open-world and closed-world cases (and mixed cases). But I know that it is not possible to generalize "correctly" to full 2nd-order logic (for any definition of "correctly" that I've thought of so far).