I want it to somehow recognize that Goodstein's theorem is likely to be true and ZFC is likely to be consistent -
The first thing to ask is why we humans believe this to be the case. When looked at this way, it seems like a straightforward case of inductive (=Bayesian) reasoning: we expect that if ZFC were inconsistent, we would have found a contradiction by now. Right?
In general, what's mysterious to me here is the nature of intuition and its relationship with mathematical truth.
This is a mystery that I feel can dissolve -- at least, I've done so to my own personal satisfaction. The answer -- my answer -- is that "mathematical truth" is a purely formal notion, with no dependence on intuition. A theorem which has been formally proved in an axiom system can never be said to be "mathematically incorrect". The worst that can be said is that the particular axiom system that contains the theorem is inappropriate for modeling some physical phenomenon of interest.
(Hence, for example, my stance that it is an outright mistake to say that "infinite sets don't exist". What is meant by people who say this -- the only coherent thing than can be meant -- is "no aspect of the physical universe is appropriately modeled by a mathematical axiom system in which infinite sets exist". I believe this is also likely erroneous, but at least it isn't blatantly incoherent.)
Here's a simple testcase: throwing darts at the real line. Would we wish an AI to be swayed by such arguments?
As per the above, it's a mistake to argue about whether Freiling's axiom is "true" in the first place. It's true in ZFC+~CH, and false in ZFC+CH, end of story. It's no different from arguing about whether the sum of the angles in a triangle is "really" 180 degrees: in Euclidean geometry it is, in hyperbolic geometry it isn't. You can ask whether Euclidean geometry is consistent, and you can ask whether Euclidean geometry is an appropriate model of the physical world, but you can't ask whether Euclidean geometry is "true".
Thanks! Your comment prompts me to reformulate my original question this way: given a formal system, how can the AI determine that it talks about "the" natural numbers? For example, we can add to PA some axiom that rules out its standard model, but leaves many nonstandard ones. The simplest example would be to add the inconsistency of PA - the resulting theory will (counterintuitively) be just as consistent as PA, but quite weird. It will have many interesting provable theorems that are nevertheless common-sensically "false", e.g. "...
A big problem with natural numbers is that the axiomatic method breaks on them.
Mystery #1: if we're allowed to talk about sets of natural numbers, sets of these sets, etc., then some natural-sounding statements are neither provable nor disprovable ("independent") from all the "natural" axiomatic systems we've invented yet. For example, the continuum hypothesis can be reformulated as a statement about sets of sets of natural numbers. The root cause is that we can't completely axiomatize which sets of natural numbers exist, because there's too many of them. That's the substantial difference between second-order logic and first-order logic; logicians say that second-order logic is "defined semantically", not by any syntactic procedure of inference.
Mystery #2: if we're allowed to talk about arithmetic and use quantifiers (exists and forall) over numbers, but not over sets of them - in other words, use first-order logic only - then some natural-sounding statements appear to be true, but to prove them, we need to accept as axioms a lot of intuition about concepts other than natural numbers. For example, Goodstein's theorem is a simple arithmetical statement that cannot be proved in Peano arithmetic, but can be proved in "stronger theories". This means the theorem is a consequence of our intuition that some "stronger theory", e.g. ZFC, is consistent - but where did that intuition come from? It doesn't seem to be talking about natural numbers anymore.
Can we teach a computer to think about natural numbers the same way we do, that is, somehow non-axiomatically? Not just treat numbers as opaque "things" that obey the axioms of PA - that would make a lot of true theorems unreachable! This seems to be the simplest AI-hard problem that I've ever seen.