This is a valid point. Sometimes we rely on intuition. So can one reasonably distinguish this case from the case of ZFC or PA? I think the answer is yes.
First, we do have some other (albeit weak) evidence for the consistency of PA and ZFC. In the case of PA we have what looks like a physical model that seems pretty similar. That's only a weak argument because the full induction axiom schema is much stronger than one can represent in any finite chunk of PA in a reasonable fashion. We also have spent a large amount of time on both PA and ZFC making theorems and we haven't seen a contradiction. This is after we've had a lot of experience with systems like naive set theory where we have what seems to be a good idea of how to find contradictions in systems. This is akin to something similar to having a functional AGI and seeing what it does in at least one case for a short period of time. Of course, this argument is also weak since Godelian issues imply that there should be axiomatic systems that are fairly simple and yet have contradictions that only appear when one looks at extremely long chains of inferences compared to the complexity of the systems.
Second in the case of PA (and to a slightly lesser extent ZFC) , different people who have thought about the question have arrived at the same intuition. There are of course a few notable exceptions like Edward Nelson but those exceptions are limited, and in many cases, like Nelson's there seem to be other, extra-mathematical motives for them to reach their conclusions. This is in contrast to the situation in question where a much smaller number of people have thought about the issues and they haven't reached the same intuition.
A third issue is that we have consistency proofs of PA that use somewhat weak systems. Gentzen's theorem is the prime example. The forms of induction required are extremely weak compared to the full induction schema as long as one is allowed a very tiny bit of ordinal arithmetic. I don't know what the relevant comparison would be in the AGI context, but this seems like a type of evidence we don't have in that context.
I thought Ben Goertzel made an interesting point at the end of his dialog with Luke Muehlhauser, about how the strengths of both sides' arguments do not match up with the strengths of their intuitions:
What do we do about this disagreement and other similar situations, both as bystanders (who may not have strong intuitions of their own) and as participants (who do)?
I guess what bystanders typically do (although not necessarily consciously) is evaluate how reliable each party's intuitions are likely to be, and then use that to form a probabilistic mixture of the two sides' positions.The information that go into such evaluations could include things like what cognitive processes likely came up with the intuitions, how many people hold each intuition and how accurate each individual's past intuitions were.
If this is the best we can do (at least in some situations), participants could help by providing more information that might be relevant to the reliability evaluations, and bystanders should pay more conscious attention to such information instead of focusing purely on each side's arguments. The participants could also pretend that they are just bystanders, for the purpose of making important decisions, and base their beliefs on "reliability-adjusted" intuitions instead of their raw intuitions.
Questions: Is this a good idea? Any other ideas about what to do when strong intuitions meet weak arguments?
Related Post: Kaj Sotala's Intuitive differences: when to agree to disagree, which is about a similar problem, but mainly from the participant's perspective instead of the bystander's.