Martial arts can be a good training to ensure your personal security, if you assume the worst about your tools and environment. If you expect to find yourself unarmed in a dark alley, or fighting hand to hand in a war, it makes sense. But most people do a lot better at ensuring their personal security by coordinating to live in peaceful societies and neighborhoods; they pay someone else to learn martial arts. Similarly, while "survivalists" plan and train to stay warm, dry, and fed given worst case assumptions about the world around them, most people achieve these goals by participating in a modern economy.
The martial arts metaphor for rationality training seems popular at this website, and most discussions here about how to believe the truth seem to assume an environmental worst case: how to figure out everything for yourself given fixed info and assuming the worst about other folks. In this context, a good rationality test is a publicly-visible personal test, applied to your personal beliefs when you are isolated from others' assistance and info.
I'm much more interested in how we can can join together to believe truth, and it actually seems easier to design institutions which achieve this end than to design institutions to test individual isolated general tendencies to discern truth. For example, with subsidized prediction markets, we can each specialize on the topics where we contribute best, relying on market consensus on all other topics. We don't each need to train to identify and fix each possible kind of bias; each bias can instead have specialists who look for where that bias appears and then correct it.
Perhaps martial-art-style rationality makes sense for isolated survivalist Einsteins forced by humanity's vast stunning cluelessness to single-handedly block the coming robot rampage. But for those of us who respect the opinions of enough others to want to work with them to find truth, it makes more sense to design and field institutions which give each person better incentives to update a common consensus.
Obviously it helps if the experts are required to make predictions that are scoreable. Over time, we could examine both the track records of individual experts and entire disciplines in correctly predicting outcomes. Ideally, we would want to test these predictions against those made by non-experts, to see how much value the expertise is actually adding.
Another proposal, which I raised on a previous comment thread, is to collect third-party credibility assessments in centralized databases. We could collect the rates at which expert witnesses are permitted to testify at trial and the rate at which their conclusions are accepted or rejected by courts, for instance. We could similarly track the frequency with which authors have their articles accepted or rejected by journals engaged in blind peer-review (although if the review is less than truly blind, the data might be a better indication of status than of expertise, to the degree the two are not correlated). Finally, citation counts could serve as a weak proxy for trustworthiness, to the degree the citations are from recognized experts and indicate approval.
The suggestions from the second paragraph all seem rather incestuous. Propagating trust is great but it should flow from a trustworthy fountain. Those designated "experts" need some non-incestuous test as their foundation (a la your first paragraph).