Martial arts can be a good training to ensure your personal security, if you assume the worst about your tools and environment. If you expect to find yourself unarmed in a dark alley, or fighting hand to hand in a war, it makes sense. But most people do a lot better at ensuring their personal security by coordinating to live in peaceful societies and neighborhoods; they pay someone else to learn martial arts. Similarly, while "survivalists" plan and train to stay warm, dry, and fed given worst case assumptions about the world around them, most people achieve these goals by participating in a modern economy.
The martial arts metaphor for rationality training seems popular at this website, and most discussions here about how to believe the truth seem to assume an environmental worst case: how to figure out everything for yourself given fixed info and assuming the worst about other folks. In this context, a good rationality test is a publicly-visible personal test, applied to your personal beliefs when you are isolated from others' assistance and info.
I'm much more interested in how we can can join together to believe truth, and it actually seems easier to design institutions which achieve this end than to design institutions to test individual isolated general tendencies to discern truth. For example, with subsidized prediction markets, we can each specialize on the topics where we contribute best, relying on market consensus on all other topics. We don't each need to train to identify and fix each possible kind of bias; each bias can instead have specialists who look for where that bias appears and then correct it.
Perhaps martial-art-style rationality makes sense for isolated survivalist Einsteins forced by humanity's vast stunning cluelessness to single-handedly block the coming robot rampage. But for those of us who respect the opinions of enough others to want to work with them to find truth, it makes more sense to design and field institutions which give each person better incentives to update a common consensus.
Internal credibility is of little use when we want to compare the credentials of experts in widely differing fields. But is is useful if we want to know whether someone is trusted in their own field. Now suppose that we have enough information about a field to decide that good work in that field generally deserves some of our trust (even if the field's practices fall short of the ideal). By tracking internal credibility, we have picked out useful sources of information.
Note too that this method could be useful if we think a field is epistemically rotten. If someone is especially trusted by literary theorists, we might want to downgrade our trust in them, solely on that basis.
So the two inquiries complement each other: We want to be able to grade different institutions and fields on the basis of overall trustworthiness, and then pick out particularly good experts from within those fields we trust in general.
p.s. Peer review and citation counting are probably incestuous, but I don't think the charge makes sense in the expert witness evaluation context.