Value of Information (VoI) is a concept from decision analysis: how much answering a question allows a decision-maker to improve its decision. Like opportunity cost, it's easy to define but often hard to internalize; and so instead of belaboring the definition let's look at some examples.
Gambling with Biased Coins
Normal coins are approximately fair.1 Suppose you and your friend want to gamble, and fair coins are boring, so he takes out a quarter and some gum and sticks the gum to the face of the quarter near the edge. He then offers to pay you $24 if the coin lands gum down, so long as you pay him $12 to play the game. Should you take that bet?
First, let's assume risk neutrality for the amount of money you're wagering. Your expected profit is $24p-12, where p is the probability the coin lands gum down. This is a good deal if p>.5, but a bad deal if p<.5. So... what's p? More importantly, how much should you pay to figure out p?
A Bayesian reasoner looking at this problem first tries to put a prior on p. An easy choice is a uniform distribution between 0 and 1, but there are a lot of reasons to be uncomfortable with that distribution. It might be that the gum will be more likely to be on the bottom- but it also might be more likely to be on the top. The gum might not skew the results very much- or it might skew them massively. You could choose a different prior, but you'd have trouble justifying it because you don't have any solid evidence to update on yet.2
If you had a uniform prior and no additional evidence, then the deal as offered is neutral. But before you choose to accept or reject, your friend offers you another deal- he'll flip the coin once and let you see the result before you choose to take the $12 deal, but you can't win anything on this first flip. How much should you pay to see one flip?
Start by modeling yourself after you see one flip. It'll either come up gum or no gum, and you'll update and produce a posterior for each case. In the first case, your posterior on p is P(p)=2p; in the second, P(p)=2-2p. Your expected profit for playing in the first case is $4;3 your expected profit for playing in the second case is negative $4. You think there's a half chance it'll land gum side up, and a half chance it'll land gum side down, and if it lands gum side down you can choose not to play. There's a half chance you get $4 from seeing the flip, and a half chance you get nothing (because you don't play) from seeing the flip, and so $2 is the VoI of seeing one flip of the biased coin, given your original prior.
Notice that, even though it'd be impossible to figure out the 'true' chance that the coin will land gum down, you can model how much it would be worth it to you to figure that out. If I were able to tell you p directly, then you could choose to gamble only when p>.5, and you would earn an average of $3.4 One coin flip gives you two thirds of the value that perfect information would give you.
Also notice that you need to change your decision to get any value out of more information. Suppose that, instead of letting you choose whether or not to gamble, your friend made you decide, flipped two coins, and then paid you if the second coin landed gum down and you paid him. The coin is flipped the same number of times, but you're worse off because you have to decide with less information.
It's also worth noting that multimodal distributions- where there are strong clusters rather than smooth landscapes- tend to have higher VoI. If we knew the biased coin would either always come up heads or always come up tails, and expected each case were equally likely, then seeing one flip is worth $6, because it's a half chance of a guaranteed $12.
Choosing where to invest
Here's an example I came across in my research:
Kleinmuntz and Willis were trying to determine the value of doing detailed anti-terrorism assessments in the state of California for the Department of Homeland Security. There are hundreds of critical infrastructure sites across the state, and it's simply not possible to do a detailed analysis of each site. There are terrorism experts, though, who can quickly provide an estimate of the risk to various sites.
They gave a carefully designed survey to those experts, asking them to rate the relative probability that a site would be attacked (conditioned on an attack occurring) and the probability that an attack would succeed on a scale from 0 to 10, and the scale of fatalities and economic loss on a logarithmic scale from 0 to 7. The experts were comfortable with the survey5 and able to give meaningful answers.
Now Kleinmutz and Willis were able to take the elicited vulnerability estimates and come up with an estimated score for each facility. This estimated score gave them a prior over detailed scores for each site- if the experts all agreed that a site was a (0, 1, 2, 3), then that still implies a range over actual values. The economic loss resulting from a successful attack (3) could be anywhere from $100 million to $1 billion. (Notice that having a panel of experts gave them a natural way to determine the spread of the prior beyond the range inherent in their answers- where the experts agreed, they could clump the probability mass together, with only a little on answers the experts didn't give, and where the experts disagreed they knew where to spread the probability out over.) They already had, from another source, data on the effectiveness of the risk reductions available at the various sites and the costs of those reductions.
The highest actual consequence elicited was for $6 billion, assuming a value of $6 million per life. The highest VoI of getting a detailed site analysis, though, was only $1.1 million. From the definition, this shouldn't be that surprising- VoI is only large when you would be surprised or uncertainty is high. For some sites, it was obvious that DHS should invest in reducing risk; in others, it was obvious that DHS shouldn't invest in reducing risk. The detailed vulnerability analysis would just tell them what they already knew, and so wouldn't provide any value. Some sites were on the edge- it might be worthwhile to reduce risk, it might not. For those sites, a detailed vulnerability analysis would provide value- but because the site was on the edge, the expected value of learning more was necessarily small!6 Remember, for VoI to be positive you have to change your decision, and if that doesn't happen there's no VoI.
Distressingly, they went on to consider the case where risk reduction could not be performed without a detailed vulnerability analysis. Then, rather than measuring VoI, they were mostly measuring the value of risk reduction- and the maximum value shot up to $840 million. When Bayesian evidence is good enough, requiring legal evidence can be costly.7
Medical Testing
About two years ago, I was sitting at my computer and noticed a black dot on my upper arm. I idly scratched it, and then saw its little legs move.
It was an tick engorged on my blood, which I had probably picked up walking through the woods earlier. I removed it, then looked up online the proper way to remove it. (That's the wrong order, by the way: you need the information before you make your decision for it to be of any use. I didn't do it the proper way, and thus increased my risk of disease transmission.)
Some ticks carry Lyme disease, and so I looked into getting tested. I was surprised to learn that if I didn't present any symptoms by 30 days, the recommendation was against testing. After a moment's reflection, this made sense- tests typically have false positive rates. If I didn't have any symptoms after 30 days, even if I took the test and got a positive result the EV could be higher for no treatment than for treatment. In that case, the VoI of the test would be 0- regardless of its outcome, I would have made the same decision. If I saw symptoms, though, then the test would be worthwhile, as it could distinguish Lyme disease from an unrelated rash, headache, or fever. "Waiting for symptoms to appear" was the test with positive VoI, not getting a blood test right away.
One could argue that the blood test could have "peace of mind" value, but that's distinct from VoI. Even beyond that, it's not clear that you would get positive peace of mind on net. Suppose the test has a 2% false positive rate- what happens when you multiply the peace of mind from a true negative by .98, and subtract the costs of dealing with the false positives by .02? That could easily be negative.
(I remain symptom-free; either the tick didn't have Lyme disease, didn't transfer it to me, or my immune system managed to destroy it.)
Choosing a Career
Many careers have significant prerequisites: if you want to be a doctor, you're going to have to go to medical school. People often have to choose where to invest their time with limited knowledge- you can't know what the career prospects will be like when you graduate, how much you'll enjoy your chosen field, and so on. Many people just choose based on accumulated experience- lawyers were high-status and rich before, so they suspect becoming a lawyer now is a good idea.8
Reducing that uncertainty can help you make a better decision, and VoI helps decide what ways to reduce uncertainty are effective. But this example also helps show the limits of VoI: VoI is best suited to situations where you've done the background research and are now considering further experiments. With the biased coin, we started off with a uniform prior; with the defensive investments, we started off with estimated risks. Do we have a comparable springboard for careers?
If we do, it'll take some building. There's a lot of different value functions we could build- it probably ought to include stress, income (both starting and lifetime)9, risk of unemployment, satisfaction, and status. It's not clear how to elicit weights on those, though. There's research on what makes people in general happy, but you might be uncomfortable just using those weights.10
There are also hundreds, if not thousands, of career options available. Prior distributions on income are easy to find, but stress is harder to determine. Unemployment risk is hard to predict over a lifetime, especially as it relies on macroeconomic trends that may be hard to predict. (The BLS predicts employment numbers out 10 years from data that's a few years old. It seems unlikely that they're set up to see crashes coming, though.)
Satisfaction is probably the easiest place to start: there are lots of career aptitude tests out there that can take self-reported personality factors and turn that into a list of careers you might be well-suited for. Now you have a manageable decision problem- probably somewhere between six and twenty options to research in depth.
What does that look like from a VoI framework? You've done a first screening which has identified places where more information might alter your decision. If you faint at the sight of blood, it doesn't matter how much surgeons make, and so any time spent looking that up is wasted. If you do a quick scoring of the six value components I listed above (after brainstorming for other things relevant to you), just weighting them with those quick values may give you good preliminary results. Only once you know what comparisons are relevant- "what tradeoff between status and unemployment risk am I willing to make?"- would you spend a long time nailing down your weights.
This is also a decision problem that could take a long, long time. (Even after you've selected a career, the option to switch is always present.) It can be useful to keep upper and lower bounds for your estimates and update those along with your estimates- their current values and their changes with the last few pieces of information you found can give you an idea of how much you can expect to get from more research, and so you can finish researching and make a decision at a carefully chosen time, rather than when you get fatigued.
Conclusion
Let's take another look at the definition: how much answering a question allows a decision-maker to improve its decision.
The "answering" is important because we need to consider all possible answers.11 We're replacing one random variable with two random variables- in the case of the biased coin, it replaced one unknown coin (one flip) with either the lucky coin and the unlucky coin (two flips- one to figure out which coin, one to bet on). When computing VoI, you can't just consider one possible answer, but all possible answers considering their relative likelihood.12
The "improve" is important because VoI isn't about sleeping better at night or covering your ass. If you don't expect to change your decision after receiving this information, or you think that the expected value of the information (the chance you change your decision times the relative value of the decisions) is lower than the cost of the information, just bite the bullet and don't run the test you were considering.
The "decision" is important because this isn't just curiosity. Learning facts is often fun, but for it to fit into VoI some decision has to depend on that fact. When watching televised poker, you know what all the hands are- and while that may alter your enjoyment of the hand, it won't affect how any of the players play. You shouldn't pay much for that information, but the players would pay quite a bit for it.13
1. Persi Diaconis predicts most human coin flips are fair to 2 decimals but not 3, and it's possible through training to bias coins you flip. With a machine, you can be precise enough to get the coin to come up the same way every time.
2. There is one thing that isn't coin-related: your friend is offering you this gamble, and probably has information you don't. That suggests the deal favors him- but suppose that you and your friend just thought this up, and so neither of you has more information than the other.
3. Your profit is 24p-12; your distribution on p is P(p)=2p, and so your distribution on profit is 48p2-24p integrated from 0 to 1, which is 4.
4. Again, your profit is 24p-12; you have a uniform distribution on what I will tell you about p, but you only care about the section where p>.5. Integrated from .5 to 1, that's 3.
5. Whenever eliciting information from experts, make sure to repeat back to them what you heard and ensure that they agree with it. You might know decision theory, but the reason you're talking to experts is because they know things you don't. Consistency can take a few iterations, and that's to be expected.
6. A common trope in decision analysis is "if a decision is hard, flip a coin." Most people balk at this because it seems arbitrary (and, more importantly, hard to justify to others)- but if a decision is hard, that typically means both options are roughly equally valuable, and so the loss from the coin flip coming up the wrong value is necessarily small.
7. That said, recommendations for policy-makers are hard to make here. Legal evidence is designed to be hard to game; Bayesian evidence isn't, and so Bayesian evidence is only "good enough" if it's not being gamed. Checking your heuristic (i.e. the expert's estimates) to keep it honest can provide significant value. Performing detailed vulnerability analysis on some (how many?) randomly chosen sites for calibration is often a good choice. Beyond that, I can't do much besides point you to psychology to figure out good ways to diagnose and reduce bias.
8. It doesn't appear that this is the case anymore. The supply of lawyers has dramatically increased, and so wages are declining; as well, law is a pretty soul-crushing field from a stress, work-life balance, and satisfaction perspective. If law looks like the best field for you and you're not in it for the money or status, the advice I hear is to specialize in a niche field that'll put food on the table but stay interesting and tolerably demanding.
9. Both of these capture different information. A job with a high starting salary but no growth prospects might translate into more happiness than a job with a low starting salary but high growth prospects, for example.
10. Most of the happiness/satisfaction literature I've seen has asked people about their attributes and their happiness/satisfaction. That's not a randomized trial, though, and so there could be massive selection effects. If we find that engineers are collectively less happy than waiters, does that mean engineering causes unhappiness, unhappiness causes engineering, that unhappiness and engineering are caused by the same thing, or none of those?
11. Compare this with information theory, where bits are a property of answers, not questions. Here, VoI is a property of questions, not answers.
12. If you already know the cost of the information, then you can stop computing as soon as you find a positive outcome good enough and likely enough that the VoI so far is higher than the cost.
13. In high-stakes poker games, the VoI can get rather high, and the deceit / reading involved is why poker is a more interesting game than, say, the lottery.
Background: lukeprog wrote this post about articles he wouldn't have the time to write, and the first one on the list was something I was confident about, and so I decided to write a post on it. (As a grad student in operations research, practical decision theory is what I spend most of my time thinking about.)
Amusingly enough, I had the most trouble working in his 'classic example.' Decision analysis tends to be hinged on Bayesian assumptions often referred to as "small world"- that is, your model is complete and unbiased (If you knew there was a bias in your model, you'd incorporate that into your model and it would be unbiased!). Choosing a career is more of a search problem, though- specifying what options you have is probably more difficult than picking from them. You can still use the VoI concept- but mostly for deciding when to stop accumulating new information. Before you've done your first research, you can't predict the results of your research very well, and so it's rather hard to put a number on how valuable looking into potential careers is.
There seems to be a lot of interest in abstract decision theory, but is there interest in more practical decision analysis? That's the sort of thing I suspect I could write a useful primer on, whereas I find it hard to care about, say, Sleeping Beauty.
The start of my decision analysis sequence is here.