Here's an internal dialogue I just had.
Q: How do we test rationality skills?
A: We haven't come up with a comprehensive test yet.
Q: Maybe we can test some part of rationality?
A: Sure. For example, you could test resistance to akrasia by making two contestants do some simple chores every day. The one who fails first, loses.
Q: That seems like a pointless competition. If I'm feeling competitive, why would I ever skip the chores and lose?
A: Whoa, wait. If competitiveness can cure akrasia, that's pretty cool!
Now we just need to figure out how to make people more competitive in the areas they care about...
Pomodoros is a great metric. Katja Grace makes the case for that here: http://www.overcomingbias.com/2012/08/on-the-goodness-of-beeminder.html (she just calls them blocks of time).
I think raw number of hours is a fine metric too though. Discretizing into pomodoros has both advantages and disadvantages.
If you can quantify actual output, that might be ideal. Like how we track User-Visible Improvements to Beeminder. You might expect that to be too fuzzy a metric but we found a criterion that's been rock solid for years now: If we're willing to publicly tweet it then it counts. Pride prevents us from ever getting too weaselly about it.