I'm not sure I'm doing this right, but it might influence my PhD... I have a feeling that somewhere along the way I should factor in the 'new data should make your estimates more confident' thing, but in a Fermi estimate we don't take this into account, right?
A Fermi estimate of new data on Ophioglossum vulgatum bringing progress in true population estimates. (Problem: a plant with branching rhizome might give several stalks year - or might not give any, if a year is unfavourable - so you have to estimate the number of actual specimens from the number of stalks you see. Exposing and tracing the rhizome is (unethical - possibly lethal to the plant, but I doubt it if precautions are taken) the only way to know for certain. I think that if I dig up several patches of rhizomes, we can extrapolate on how many plants we have, and so have more accurate population censuses. It seems more useful if it does manage to lessen the gap between different researchers - another source of uncertainty - counting stalks.)
Let B, Annual benefit – more accurate estimates of population size (estimated number of clones/true n.o.c., measured in %) Which would be useful to estimate the probability of population going extinct within 10 years of undisturbed succession (= final benefit).
Let B be about 20%, for a start. R(0) be current resources per year, around 6 man-days. Y/Z be around 4, if we take 4 years and count time already spent on research as an annual increment.
expected benefit ≈ 0.7 20% / (6man-days 1.4) = 1.6 %/man-day, so if I actually spend this season about 30 man-days on surveying populations, it would give me 50% closer-to-truth estimates of population sizes given censuses?
For 30 man-days, I need 30 populations to survey, and I only know about around 10. Suppose I want to estimate how closeness of stalks might reveal clonal structure below, and instead of digging plants up immediately, I count the stalks, see if there are any definable patches, and then try making three guess-models of clonal structures based on that observed patchiness.
How should it influence my Fermi estimate? If the expected utility of such severe actions is too low, I'll just stick to counting stalks, and be content with less precize data. Thank you.
There's a continuum between Fermi estimates and more detailed models. At some resolution you'd definitely want to take into account the fact that new data will affect your confidence, but it may not be worth modelling at that resolution unless you think that this is one of the major routes to value.
With the scenario you outline, I think B is under-specified. You just say "more accurate estimates of population size" -- in order to get this model to work you need some way of expressing how big a change in accuracy you're looking at.
I'd also be wary...
In some recent work (particularly this article) I built models for estimating the cost effectiveness of work on problems when we don’t know how hard those problems are. The estimates they produce aren’t perfect, but they can get us started where it’s otherwise hard to make comparisons.
Now I want to know: what can we use this technique on? I have a couple of applications I am working on, but I’m keen to see what estimates other people produce.
There are complicated versions of the model which account for more factors, but we can start with a simple version. This is a tool for initial Fermi calculations: it’s relatively easy to use but should get us around the right order of magnitude. That can be very useful, and we can build more detailed models for the most promising opportunities.
The model is given by:
This expresses the expected benefit of adding another unit of resources to solving the problem. You can denominate the resources in dollars, researcher-years, or another convenient unit. To use this formula we need to estimate four variables:
R(0) denotes the current resources going towards the problem each year. Whatever units you measure R(0) in, those are the units we’ll get an estimate for the benefit of. So if R(0) is measured in researcher-years, the formula will tell us the expected benefit of adding a researcher year.
You want to count all of the resources going towards the problem. That includes the labour of those who work on it in their spare time, and some weighting for the talent of the people working in the area (if you doubled the budget going to an area, you couldn’t get twice as many people who are just as good; ideally we’d use an elasticity here).
Some resources may be aimed at something other than your problem, but be tangentially useful. We should count some fraction of those, according to how much resources devoted entirely to the problem they seem equivalent to.
B is the annual benefit that we’d get from a solution to the problem. You can measure this in its own units, but whatever you use here will be the units of value that come out in the cost-effectiveness estimate.
p and y/z are parameters that we will estimate together. p is the probability of getting a solution by the time y resources have been dedicated to the problem, if z resources have been dedicated so far. Note that we only need the ratio y/z, so we can estimate this directly.
Although y/z is hard to estimate, we will take a (natural) logarithm of it, so don’t worry too much about making this term precise.
I think it will often be best to use middling values of p, perhaps between 0.2 and 0.8.
And that’s it.
Example: How valuable is extra research into nuclear fusion? Assume:
R(0) = $5 billion (after a quick google turns up $1.5B for current spending, and adjusting upwards to account for non-financial inputs);
B = $1000 billion (guesswork, a bit over 1% of the world economy; a fraction of the current energy sector);
There’s a 50% chance of success (p = 0.5) by the time we’ve spent 100 times as many resources as today (log(y/z) = log(100) = 4.6).
Putting these together would give an expected societal benefit of (0.5*$1000B)/(5B*4.6) = $22 for every dollar spent. This is high enough to suggest that we may be significantly under-investing in fusion, and that a more careful calculation (with better-researched numbers!) might be justified.
Caveats
To get the simple formula, the model made a number of assumptions. Since we’re just using it to get rough numbers, it’s okay if we don’t fit these assumptions exactly, but if they’re totally off then the model may be inappropriate. One restriction in particular I’d want to bear in mind:
It should be plausible that we could solve the problem in the next decade or two.
It’s okay if this is unlikely, but I’d want to change the model if I were estimating the value of e.g. trying to colonise the stars.
Request for applications
So -- what would you like to apply this method to? What answers do you get?
To help structure the comment thread, I suggest attempting only one problem in each comment. Include the value of p, and the units of R(0) and units of B that you’d like to use. Then you can give your estimates for R(0), B, and y/z as a comment reply, and so can anyone else who wants to give estimates for the same thing.
I’ve also set up a google spreadsheet where we can enter estimates for the questions people propose. For the time being anyone can edit this.
Have fun!