In a recent post I posed the question: is the common good served by directing research efforts towards theoretical problems which are interesting to researchers?
komponisto defends interesting problems, arguing that researcher's perceptions of interestingness are often better able to predict future usefulness than anyone trying deliberately to determine what will be useful. This is a plausible claim (although I disagree), and I have encountered it a number of times in the last couple of days. This claim was advanced as a defense of the status quo, but if we really believe it then we should certainly try and understand all of its consequences.
When setting out to predict the usefulness of a research program (as I suggest we should), we are not required to do it via deductive arguments which estimate the likelihood of certain applications. We can use all of the data available, including how interesting the problem seems---to us, to other researchers, to lay people, etc. If intelligent observers' notions of interestingness are substantially corellated with future usefulness, potentially in unpredictable ways, then we would be wise to take this information into account. This is precisely what komponisto and others argue, and they conclude that we should support work on the problems an investigator finds most interesting. I claim this is an example of motivated stopping: the argument was thought through just far enough to support changing nothing.
We have access to many, many indicators of interestingness for any candidate research problem. A problem can seem interesting only to a single person who understands the background in great depth; it can seem interesting to a small group of researchers in related fields; it can seem interesting to mathematicians broadly; it can seem interesting to computer scientists, to physicists, to biologists, to engineers, to laypeople. It can seem particularly interesting to professional mathematicians, or to novices with new ideas. It can invoke feelings of immediacy, of needing to know the answer; it can simply be fun to work on. Particular countries or cultures or time periods or subfields may have objectively better or worse aesthetics.
If our aim is to use interestingness as a predictor of potential usefulness then all of this variability is an asset. We have a historical record to be scoured; patterns to be evaluated. Understanding these patterns is of critical importance to the quality of our predictions and the efficiency of our research institutions. If the historical record is too opaque, we should at least establish a culture of transparency: make records not only of what work is done, but why it is done. Who did it seem interesting to? How did they feel about the research program; why were they really working on it? In the long term, we can hope to discover whose intuitions were valuable and whose were not; we can understand which aesthetics lead to useful work and which do not.
Over time (if not immediately), we can hope to develop a common understanding of the link between interestingness and future usefulness, and develop institutions which exploit this understanding to produce valuable research.
I think you're forgetting the problem of incentives. Whatever standard procedures for evaluating/predicting usefulness you come up with, if they're actually used to allocate resources and status in practice, people will have the incentive to hack them by designing and presenting their own work to come off as better than it really is. And since people who do research are usually very smart, you'll be faced with a host of extremely smart people trying to outsmart and cheat your metrics, in which many will surely be successful. Goodhart's law, and all that.
This, of course, is not even considering whether the influential people whom you'd have to win over to establish such practices have the incentive to submit their past and present work to such evaluation. Unfortunately, although the problems you point out are very real, there is no straightforward solution for them; almost any attempt at fixing institution is likely to run into difficult and unpredictable problems with perverse incentives.
This could be alleviated by making the standards sufficiently retrospective, e.g. evaluate the usefulness of current work in 100 years (which would probably make it more effective anyways).
We could also test these predictions on historical data, although it might be slightly trickier.