Implementation of desired things per unit time.
I haven't read the article so I could be full of shit, but essentially:
If you have the list of desired things ready, there should be an ETA on the work time necessary for each desired thing as well as confidence on that estimate. Confidence varies with past data and expected competence, e.g. how easily you believe you can debug the feature if you begin to draft it. Or such. Then you have a set of estimates for each implementable feature.
Then you put in time on that feature over the day tracked by some passive monitoring program like ManictTime or something...
The book opens:
The sciences have many established measurement methods, so Hubbard’s book focuses on the measurement of “business intangibles” that are important for decision-making but tricky to measure: things like management effectiveness, the “flexibility” to create new products, the risk of bankruptcy, and public image.
Basic Ideas
A measurement is an observation that quantitatively reduces uncertainty. Measurements might not yield precise, certain judgments, but they do reduce your uncertainty.
To be measured, the object of measurement must be described clearly, in terms of observables. A good way to clarify a vague object of measurement like “IT security” is to ask “What is IT security, and why do you care?” Such probing can reveal that “IT security” means things like a reduction in unauthorized intrusions and malware attacks, which the IT department cares about because these things result in lost productivity, fraud losses, and legal liabilities.
Uncertainty is the lack of certainty: the true outcome/state/value is not known.
Risk is a state of uncertainty in which some of the possibilities involve a loss.
Much pessimism about measurement comes from a lack of experience making measurements. Hubbard, who is far more experienced with measurement than his readers, says:
Applied Information Economics
Hubbard calls his method “Applied Information Economics” (AIE). It consists of 5 steps:
These steps are elaborated below.
Step 1: Define a decision problem and the relevant variables
Hubbard illustrates this step by telling the story of how he helped the Department of Veterans Affairs (VA) with a measurement problem.
The VA was considering seven proposed IT security projects. They wanted to know “which… of the proposed investments were justified and, after they were implemented, whether improvements in security justified further investment…” Hubbard asked his standard questions: “What do you mean by ‘IT security’? Why does it matter to you? What are you observing when you observe improved IT security?”
It became clear that nobody at the VA had thought about the details of what “IT security” meant to them. But after Hubbard’s probing, it became clear that by “IT security” they meant a reduction in the frequency and severity of some undesirable events: agency-wide virus attacks, unauthorized system access (external or internal),unauthorized physical access, and disasters affecting the IT infrastructure (fire, flood, etc.) And each undesirable event was on the list because of specific costs associated with it: productivity losses from virus attacks, legal liability from unauthorized system access, etc.
Now that the VA knew what they meant by “IT security,” they could measure specific variables, such as the number of virus attacks per year.
Step 2: Determine what you know
Uncertainty and calibration
The next step is to determine your level of uncertainty about the variables you want to measure. To do this, you can express a “confidence interval” (CI). A 90% CI is a range of values that is 90% likely to contain the correct value. For example, the security experts at the VA were 90% confident that each agency-wide virus attack would affect between 25,000 and 65,000 people.
Unfortunately, few people are well-calibrated estimators. For example in some studies, the true value lay in subjects’ 90% CIs only 50% of the time! These subjects were overconfident. For a well-calibrated estimator, the true value will lie in her 90% CI roughly 90% of the time.
Luckily, “assessing uncertainty is a general skill that can be taught with a measurable improvement.”
Hubbard uses several methods to calibrate each client’s value estimators, for example the security experts at the VA who needed to estimate the frequency of security breaches and their likely costs.
His first technique is the equivalent bet test. Suppose you’re asked to give a 90% CI for the year in which Newton published the universal laws of gravitation, and you can win $1,000 in one of two ways:
If you find yourself preferring option #2, then you must think spinning the dial has a higher chance of winning you $1,000 than option #1. That suggest your stated 90% CI isn’t really your 90% CI. Maybe it’s your 65% CI or your 80% CI instead. By preferring option #2, your brain is trying to tell you that your originally stated 90% CI is overconfident.
If instead you find yourself preferring option #1, then you must think there is more than a 90% chance your stated 90% CI contains the true value. By preferring option #1, your brain is trying to tell you that your original 90% CI is under confident.
To make a better estimate, adjust your 90% CI until option #1 and option #2 seem equally good to you. Research suggests that even pretending to bet money in this way will improve your calibration.
Hubbard’s second method for improving calibration is simply repetition and feedback. Make lots of estimates and then see how well you did. For this, play CFAR’s Calibration Game.
Hubbard also asks people to identify reasons why a particular estimate might be right, and why it might be wrong.
He also asks people to look more closely at each bound (upper and lower) on their estimated range. A 90% CI “means there is a 5% chance the true value could be greater than the upper bound, and a 5% chance it could be less than the lower bound. This means the estimators must be 95% sure that the true value is less than the upper bound. If they are not that certain, they should increase the upper bound… A similar test is applied to the lower bound.”
Simulations
Once you determine what you know about the uncertainties involved, how can you use that information to determine what you know about the risks involved? Hubbard summarizes:
The simplest tool for measuring such risks accurately is the Monte Carlo (MC) simulation, which can be run by Excel and many other programs. To illustrate this tool, suppose you are wondering whether to lease a new machine for one step in your manufacturing process.
Your pre-calibrated estimators give their 90% CIs for the following variables:
Thus, your annual savings will equal (MS + LS + RMS) × PL.
When measuring risk, we don’t just want to know the “average” risk or benefit. We want to know the probability of a huge loss, the probability of a small loss, the probability of a huge savings, and so on. That’s what Monte Carlo can tell us.
An MC simulation uses a computer to randomly generate thousands of possible values for each variable, based on the ranges we’ve estimated. The computer then calculates the outcome (in this case, the annual savings) for each generated combination of values, and we’re able to see how often different kinds of outcomes occur.
To run an MC simulation we need not just the 90% CI for each variable but also the shape of each distribution. In many cases, the normal distribution will work just fine, and we’ll use it for all the variables in this simplified illustration. (Hubbard’s book shows you how to work with other distributions).
To make an MC simulation of a normally distributed variable in Excel, we use this formula:
So the formula for the maintenance savings variable should be:
Suppose you enter this formula on cell A1 in Excel. To generate (say) 10,000 values for the maintenance savings value, just (1) copy the contents of cell A1, (2) enter “A1:A10000” in the cell range field to select cells A1 through A10000, and (3) paste the formula into all those cells.
Now we can follow this process in other columns for the other variables, including a column for the “total savings” formula. To see how many rows made a total savings of $400,000 or more (break-even), use Excel’s countif function. In this case, you should find that about 14% of the scenarios resulted in a savings of less than $400,000 – a loss.
The simulation concept can (and in high-value cases should) be carried beyond this simple MC simulation. The first step is to learn how to use a greater variety of distributions in MC simulations. The second step is to deal with correlated (rather than independent) variables by generating correlated random numbers or by modeling what the variables have in common.
A more complicated step is to use a Markov simulation, in which the simulated scenario is divided into many time intervals. This is often used to model stock prices, the weather, and complex manufacturing or construction projects. Another more complicated step is to use an agent-based model, in which independently-acting agents are simulated. This method is often used for traffic simulations, in which each vehicle is modeled as an agent.
Step 3: Pick a variable, and compute the value of additional information for that variable
Information can have three kinds of value:
When you’re uncertain about a decision, this means there’s a chance you’ll make a non-optimal choice. The cost of a “wrong” decision is the difference between the wrong choice and the choice you would have made with perfect information. But it’s too costly to acquire perfect information, so instead we’d like to know which decision-relevant variables are the most valuable to measure more precisely, so we can decide which measurements to make.
Here’s a simple example:
The expected opportunity loss (EOL) for a choice is the probability of the choice being “wrong” times the cost of it being wrong. So for example the EOL if the campaign is approved is $5M × 40% = $2M, and the EOL if the campaign is rejected is $40M × 60% = $24M.
The difference between EOL before and after a measurement is called the “expected value of information” (EVI).
In most cases, we want to compute the VoI for a range of values rather than a binary succeed/fail. So let’s tweak the advertising campaign example and say that a calibrated marketing expert’s 90% CI for sales resulting from the campaign was from 100,000 units to 1 million units. The risk is that we don’t sell enough units from this campaign to break even.
Suppose we profit by $25 per unit sold, so we’d have to sell at least 200,000 units from the campaign to break even (on a $5M campaign). To begin, let’s calculate the expected value of perfect information (EVPI), which will provide an upper bound on how much we should spend to reduce our uncertainty about how many units will be sold as a result of the campaign. Here’s how we compute it:
Of course, we’ll do this with a computer. For the details, see Hubbard’s book and the Value of Information spreadsheet from his website.
In this case, the EVPI turns out to be about $337,000. This means that we shouldn’t spend more than $337,000 to reduce our uncertainty about how many units will be sold as a result of the campaign.
And in fact, we should probably spend much less than $337,000, because no measurement we make will give us perfect information. For more details on how to measure the value of imperfect information, see Hubbard’s book and these three LessWrong posts: (1) VoI: 8 Examples, (2) VoI: Four Examples, and (3) 5-second level case study: VoI.
I do, however, want to quote Hubbard’s comments about the “measurement inversion”:
Hubbard calls this the “Measurement Inversion”:
Here is one example:
Hence the importance of calculating EVI.
Step 4: Apply the relevant measurement instrument(s) to the high-information-value variable
If you followed the first three steps, then you’ve defined a variable you want to measure in terms of the decision it affects and how you observe it, you’ve quantified your uncertainty about it, and you’ve calculated the value of gaining additional information about it. Now it’s time to reduce your uncertainty about the variable – that is, to measure it.
Each scientific discipline has its own specialized measurement methods. Hubbard’s book describes measurement methods that are often useful for reducing our uncertainty about the “softer” topics often encountered by decision-makers in business.
Selecting a measurement method
To figure out which category of measurement methods are appropriate for a particular case, we must ask several questions:
Decomposition
Sometimes you’ll want to start by decomposing an uncertain variable into several parts to identify which observables you can most easily measure. For example, rather than directly estimating the cost of a large construction project, you could break it into parts and estimate the cost of each part of the project.
In Hubbard’s experience, it’s often the case that decomposition itself – even without making any new measurements – often reduces one’s uncertainty about the variable of interest.
Secondary research
Don’t reinvent the world. In almost all cases, someone has already invented the measurement tool you need, and you just need to find it. Here are Hubbard’s tips on secondary research:
I’d also recommend my post Scholarship: How to Do It Efficiently.
Observation
If you’re not sure how to measure your target variable’s observables, ask these questions:
Measure just enough
Because initial measurements often tell you quite a lot, and also change the value of continued measurement, Hubbard often aims for spending 10% of the EVPI on a measurement, and sometimes as little as 2% (especially for very large projects).
Consider the error
It’s important to be conscious of some common ways in which measurements can mislead.
Scientists distinguish two types of measurement error: systemic and random. Random errors are random variations from one observation to the next. They can’t be individually predicted, but they fall into patterns that can be accounted for with the laws of probability. Systemic errors, in contrast, are consistent. For example, the sales staff may routinely overestimate the next quarter’s revenue by 50% (on average).
We must also distinguish precision and accuracy. A “precise” measurement tool has low random error. E.g. if a bathroom scale gives the exact same displayed weight every time we set a particular book on it, then the scale has high precision. An “accurate” measurement tool has low systemic error. The bathroom scale, while precise, might be inaccurate if the weight displayed is systemically biased in one direction – say, eight pounds too heavy. A measurement tool can also have low precision but good accuracy, if it gives inconsistent measurements but they average to the true value.
Random error tends to be easier to handle. Consider this example:
Systemic error is also called a “bias.” Based on his experience, Hubbard suspects the three most important to avoid are:
Choose and design the measurement instrument
After following the above steps, Hubbard writes, “the measurement instrument should be almost completely formed in your mind.” But if you still can’t come up with a way to measure the target variable, here are some additional tips:
Sampling reality
In most cases, we’ll estimate the values in a population by measuring the values in a small sample from that population. And for reasons discussed in chapter 7, a very small sample can often offer large reductions in uncertainty.
There are a variety of tools we can use to build our estimates from small samples, and which one we should use often depends on how outliers are distributed in the population. In some cases, outliers are very close to the mean, and thus our estimate of the mean can converge quickly on the true mean as we look at new samples. In other cases, outliers can be several orders of magnitude away from the mean, and our estimate converges very slowly or not at all. Here are some examples:
Below, I survey just a few of the many sampling methods Hubbard covers in his book.
Mathless estimation
When working with a quickly converging phenomenon and a symmetric distribution (uniform, normal, camel-back, or bow-tie) for the population, you can use the t-statistic to develop a 90% CI even when working with very small samples. (See the book for instructions.)
Or, even easier, make use of the Rule of FIve: “There is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population.”
The Rule of Five has another advantage over the t-statistic: it works for any distribution of values in the population, including ones with slow convergence or no convergence at all! It can do this because it gives us a confidence interval for the median rather than the mean, and it’s the mean that is far more affected by outliers.
Hubbard calls this a “mathless” estimation technique because it doesn’t require us to take square roots or calculate standard deviation or anything like that. Moreover, this mathless technique extends beyond the Rule of Five: If we sample 8 items, there is a 99.2% chance that the median of the population falls within the largest and smallest values. If we take the 2nd largest and smallest values (out of 8 total values), we get something close to a 90% CI for the median. Hubbard generalizes the tool with this handy reference table:
And if the distribution is symmetrical, then the mathless table gives us a 90% CI for the mean as well as for the median.
Catch-recatch
How does a biologist measure the number of fish in a lake? SHe catches and tags a sample of fish – say, 1000 of them – and then releases them. After the fish have had time to spread amongst the rest of the population, she’ll catch another sample of fish. Suppose she caught 1000 fish again, and 50 of them were tagged. This would mean 5% of the fish were tagged, and thus that were about 20,000 fish in the entire lake. (See Hubbard’s book for the details on how to calculate the 90% CI.)
Spot sampling
The fish example was a special case of a common problem: population proportion sampling. Often, we want to know what proportion of a population has a particular trait. How many registered voters in California are Democrats? What percentage of your customers prefer a new product design over the old one?
Hubbard’s book discusses how to solve the general problem, but for now let’s just consider another special case: spot sampling.
In spot sampling, you take random snapshots of things rather than tracking them constantly. What proportion of their work hours do employees spend on Facebook? To answer this, you “randomly sample people through the day to see what they were doing at that moment. If you find that in 12 instances out of 100 random samples” employees were on Facebook, you can guess they spend about 12% of their time on Facebook (the 90% CI is 8% to 18%).
Clustered sampling
Hubbard writes:
Measure to the threshold
For many decisions, one decision is required if a value is above some threshold, and another decision is required if that value is below the threshold. For such decisions, you don’t care as much about a measurement that reduces uncertainty in general as you do about a measurement that tells you which decision to make based on the threshold. Hubbard gives an example:
Hubbard shows how to derive the real chance in his book. The key point is that “the uncertainty about the threshold can fall much faster than the uncertainty about the quantity in general.”
Regression modeling
What if you want to figure out the cause of something that has many possible causes? One method is to perform a controlled experiment, and compare the outcomes of a test group to a control group. Hubbard discusses this in his book (and yes, he’s a Bayesian, and a skeptic of p-value hypothesis testing). For this summary, I’ll instead mention another method for isolating causes: regression modeling. Hubbard explains:
Hubbard’s book explains the basics of linear regressions, and of course gives the caveat that correlation does not imply causation. But, he writes, “you should conclude that one thing causes another only if you have some other good reason besides the correlation itself to suspect a cause-and-effect relationship.”
Bayes
Hubbard’s 10th chapter opens with a tutorial on Bayes’ Theorem. For an online tutorial, see here.
Hubbard then zooms out to a big-picture view of measurement, and recommends the “instinctive Bayesian approach”:
Hubbard says a few things in support of this approach. First, he points to some studies (e.g. El-Gamal & Grether (1995)) showing that people often reason in roughly-Bayesian ways. Next, he says that in his experience, people become better intuitive Bayesians when they (1) are made aware of the base rate fallacy, and when they (2) are better calibrated.
Hubbard says that once these conditions are met,
He also offers a chart showing how a pure Bayesian estimator compares to other estimators:
Also, Bayes’ Theorem allows us to perform a “Bayesian inversion”:
Other methods
Other chapters discuss other measurement methods, for example prediction markets, Rasch models, methods for measuring preferences and happiness, methods for improving the subjective judgments of experts, and many others.
Step 5: Make a decision and act on it
The last step will make more sense if we first “bring the pieces together.” Hubbard now organizes his consulting work with a firm into 3 phases, so let’s review what we’ve learned in the context of his 3 phases.
Phase 0: Project Preparation
Phase 1: Decision Modeling
Phase 2: Optimal measurements
Phase 3: Decision optimization and the final recommendation
Final thoughts
Hubbard’s book includes two case studies in which Hubbard describes how he led two fairly different clients (the EPA and U.S. Marine Corps) through each phase of the AIE process. Then, he closes the book with the following summary: