Learning to get things right first time
These are quick notes on an idea for an indirect strategy to increase the likelihood of society acquiring robustly safe and beneficial AI.
Motivation:
-
Most challenges we can approach with trial-and-error, so many of our habits and social structures are set up to encourage this. There are some challenges where we may not get this opportunity, and it could be very helpful to know what methods help you to tackle a complex challenge that you need to get right first time.
-
Giving an artificial intelligence good values may be a particularly important challenge, and one where we need to be correct first time. (Distinct from creating systems that act intelligently at all, which can be done by trial and error.)
-
Building stronger societal knowledge about how to approach such problems may make us more robustly prepared for such challenges. Having more programmers in the AI field familiar with the techniques is likely to be particularly important.
Idea: Develop methods for training people to write code without bugs.
-
Trying to teach the skill of getting things right first time.
-
Writing or editing code that has to be bug-free without any testing is a fairly easy challenge to set up, and has several of the right kind of properties. There are some parallels between value specification and programming.
-
Set-up puts people in scenarios where they only get one chance -- no opportunity to test part/all of the code, just analyse closely before submitting.
-
Interested in personal habits as well as social norms or procedures that help this.
-
Daniel Dewey points to standards for code on the space shuttle as a good example of getting high reliability code edits.
-
-
How to implement:
-
Ideal: Offer this training to staff at software companies, for profit.
-
Although it’s teaching a skill under artificial hardship, it seems plausible that it could teach enough good habits and lines of thinking to noticeably increase productivity, so people would be willing to pay for this.
-
Because such training could create social value in the short run, this might give a good opportunity to launch as a business that is simultaneously doing valuable direct work.
-
Similarly, there might be a market for a consultancy that helped organisations to get general tasks right the first time, if we knew how to teach that skill.
-
-
More funding-intensive, less labour intensive: run competitions with cash prizes
-
Try to establish it as something like a competitive sport for teams.
-
Outsource the work of determining good methods to the contestants.
-
This is all quite preliminary and I’d love to get more thoughts on it. I offer up this idea because I think it would be valuable but not my comparative advantage. If anyone is interested in a project in this direction, I’m very happy to talk about it.
Counterfactual trade
Counterfactual trade is a form of acausal trade, between counterfactual agents. Compared to a lot of acausal trade this makes it more practical to engage in with limited computational and predictive powers. In Section 1 I’ll argue that some human behaviour is at least interpretable as counterfactual trade, and explain how it could give rise to phenomena such as different moral circles. In Section 2 I’ll engage in wild speculation about whether you could bootstrap something in the vicinity of moral realism from this.
Epistemic status: these are rough notes on an idea that seems kind of promising but that I haven’t thoroughly explored. I don’t think my comparative advantage is in exploring it further, but I do think some people here may have interesting things to say about it, which is why I’m quickly writing this up. I expect at least part of it has issues, and it may be that it’s handicapped by my lack of deep familiarity with the philosophical literature, but perhaps there’s something useful in here too. The whole thing is predicated on the idea of acausal trade basically working.
0. Set-up
Acausal trade is trade between two agents that are not causally connected. In order for this to work they have to be able to predict the other’s existence and how they might act. This seems really hard in general, which inhibits the amount of this trade that happens.
If we had easier ways to make these predictions we’d expect to see more acausal trade. In fact I think counterfactuals give us such a method.
Suppose agents A and B are in scenario X, and A can see a salient counterfactual scenario Y containing agents A’ and B’ (where A is very similar to A’ and B is very similar to B’). Suppose also that from the perspective of B’ in scenario Y, X is a salient counterfactual scenario. Then A and B’ can engage in acausal trade (so long as A cares about A’ and B’ cares about B). Let’s call such trade counterfactual trade.
Agents might engage in counterfactual trade either because they do care about the agents in the counterfactuals (at least seems plausible for some beliefs about a large multiverse), or because it’s instrumentally useful as a tractable decision rule which works as a better approximation to what they’d ideally like to do than similarly tractable versions.
1. Observed counterfactual trade
In fact, some moral principles could arise from counterfactual trade. The rule that you should treat others as you would like to be treated is essentially what you’d expect to get by trading with the counterfactual in which your positions are reversed. Note I’m not claiming that this is the reason people have this rule, but that it could be. I don’t know whether the distinction is important.
It could also explain the fact that people have lessening feelings of obligation to people in widening circles around them. The counterfactual in which your position is swapped with that of someone else in your community is more salient than the counterfactual in which your position is swapped with someone from a very different community -- and you expect it to be more salient to their counterpart in the counterfactual, too. This means that you have a higher degree of confidence in the trade occurring properly with people in close counterfactuals, hence more reason to help them for selfish reasons.
Social shifts can change the salience of different counterfactuals and hence change the degree of counterfactual trade we should expect. (There is something like a testable prediction in this direction, of the theory that humans engage in counterfactual trade! But I haven’t worked through the details enough to get to that test.)
2. Towards moral realism?
Now I will get even more speculative. As people engage in more counterfactual trade, their interests align more closely. If we are willing to engage with a very large set of counterfactual people, then our interests could converge to some kind of average of the interests of these people. This could provide a mechanism for convergent morality.
This would bear some similarities to moral contractualism with a veil of ignorance. There seem to be some differences, though. We’d expect to weigh the interests of others only to the extent to which they too engage (or counterfactually engage?) in counterfactual trade.
It also has some similarities to preference utilitarianism, but again with some distinctions: we would care less about satisfying the preferences of agents who cannot or would not engage in such trade (except insofar as our trade partners may care about the preferences of such agents). We would also care more about the preferences of agents who could have more power to affect the world. Note that this sense of “care less” is as-we-act. If we start out for example with a utilitarian position before engaging in counterfactual trade, then although we will end up putting less effort into helping those who will not trade than before, this will be compensated by the fact that our counterfactual trade partners will put more effort into that.
If this works, I’m not sure whether the result is something you’d want to call moral realism or not. It would be a morality that many agents would converge to, but it would be ‘real’ only in the sense that it was a weighted average of so many agents that individual agents could only shift it infinitessimally.
Neutral hours: a tool for valuing time
Prioritisation is mostly about working out how to trade different resources off against one another. Prioritisation problems come at different scales: for individuals, for companies or organisations, for the world at large. At the Global Priorities Project we’re mostly interested in the large-scale questions. But we sometimes have something to say about smaller scale problems, too.
I’ve just tidied and released old research notes (mostly from 2013) on the personal prioritisation problem of how to value time spent on different activities. This is primarily of use for individuals making decisions about how to spend their time, money, and mental energy.
Abstract: We get lots of opportunities to convert between time and money, and it’s hard to know which ones to take, since they use up other mental resources. I introduce the neutral hour as a tool for thinking about how to make these comparisons. A neutral hour is an hour spent where your mental energy is the same level at the start and the end. I work through some examples of how to use this tool, look at implications for some common scenarios, and explore the theory behind them.
There may be benefits for broader prioritisation questions. Since societies are comprised of individuals, it could help to know how to value time savings or costs to individuals when performing cost-benefit analysis on larger projects. And there may be techniques for comparing between different resources that we could usefully apply in wider contexts. However we think these benefits are secondary. We’re releasing this work now to let others take advantage of it: either for personal benefit; or to build on it and release easier-to-use guidance or tools.
You can find the full document here. I'm happy to answer questions and I'd love to know if people have thoughts on this material.
Report -- Allocating risk mitigation across time
I've just released a Future of Humanity Institute technical report, written as part of the Global Priorities Project.
Abstract:
This article is about priority-setting for work aiming to reduce existential risk. Its chief claim is that all else being equal we should prefer work earlier and prefer to work on risks that might come early. This is because we are uncertain about when we will have to face different risks, because we expect diminishing returns of extra work, and because we expect that more people will work on these risks in the future.
I explore this claim both qualitatively and with explicit models. I consider its implications for two questions: first, “When is it best to do different kinds of work?”; second, “Which risks should we focus on?”.
As a major application, I look at the case of risk from artificial intelligence. The best strategies for reducing this risk depend on when the risk is coming. I argue that we may be underinvesting in scenarios where AI comes soon even though these scenarios are relatively unlikely, because we will not have time later to address them.
You can read the full report here: Allocating risk mitigation across time.
Existential Risk and Existential Hope: Definitions
I'm pleased to announce Existential Risk and Existential Hope: Definitions, a short new FHI technical report.
We look at the strengths and weaknesses of two existing definitions of existential risk, and suggest a new definition based on expected value. This leads to a parallel concept: ‘existential hope’, the chance of something extremely good happening.
Factoring cost-effectiveness
Summary: We can split the cost-effectiveness of an intervention into how good the cause is, and how good the intervention is relative to the cause. This perspective could help our efforts in prioritisation by letting us bring appropriate tools to bear on the different parts.
Cost-effectiveness comparisons
When we choose between giving time or money to different interventions, we’re making a comparison. It’s nice to know what these comparisons come down to. There are a lot of sources of evidence, and different ones will be more appropriate in different contexts. For this post I'll assume that we are seeking the most cost-effective interventions.
Say we are comparing between intervention x in cause area X, and intervention y in cause area Y. How they compare depends on things like how well thought-out x and y are, how competent the people and organisations implementing them are, as well as how valuable X is as a whole compared to Y.
These are all important factors in telling us how x and y ultimately compare, but they’re quite different from one another. So it shouldn’t be a surprise if it’s best to use different methods to compare the different factors. I think this is the case.
Consider the equation:
Cost-effectiveness of intervention = (cost-effectiveness of area) * (leverage ratio of intervention)
The left-hand side of this equation expresses how much good is achieved per unit of resources invested in the intervention. For the intervention x we’ll denote this G(x). The right-hand side breaks this up as C(X), how much good is achieved per unit of resources invested in X as a whole, and a ‘leverage ratio’ L(x) which expresses the ratio of how effective x is compared to X as a whole [1].
Now to compare between x and y we’re interested in the ratio G(x)/G(y). We can use the above equation to expand this:
G(x)/G(y) = (C(X)L(x))/(C(Y)L(y)).
This rearranges to:
G(x)/G(y) = C(X)/C(Y) * L(x)/L(y).
Here we’ve split the comparison into two parts, each of which is comparing like with like. This is a good general strategy: making comparisons between dissimilar things is hard, and our intuitions are sometimes terrible at it, so it’s helpful to break it into more comparable chunks [2].
Comparing cause effectiveness
Comparing the cost-effectiveness of different cause areas is quite far removed from everyday experience, and it doesn’t have a good feedback mechanism as it can be hard to tell how much something helped the world even after the fact. Moreover it’s just the right setting for scope insensitivity to cause problems for intuitive judgements. This means that relative to most areas of experience, we should be particularly cautious about putting too much weight on intuitive judgements. This in turn means that it’s an area where explicit models are particularly valuable.
That doesn’t mean that explicit models always trump intuitive judgements in this domain; in particular, simple models often omit important factors that are incorporated into our intuitions. Nor does it mean that we should put all our trust into a single model. But it does mean that it’s particularly valuable to build, critique, and refine models for the cost-effectiveness of different causes. It also means we should put more weight on the outputs of such models than we do in most domains -- not because the models are more trustworthy, but because the alternatives are worse than usual. This is why I think developing such models is a high value activity, and why I’ve been spending time on it.
Comparing leverage ratios
The leverage ratios are determined largely by things like: whether the intervention is a sensible way of progressing on the cause; the quality of the team involved; how functional the implementing organisation is. In contrast to the overall effectiveness of a cause, these are the much closer to regular experience, so we should be less keen to use explicit models. On the other hand, methods and experience from valuing shares of companies (which have good feedback mechanisms) should be relevant in this context.
There are several reasons why leverage ratios may vary within a cause area. Many of these will be common across cause areas. Because of this, we might expect similar distributions of leverage ratios in different cause areas (but probably some areas have more variance than others, just as some jobs have more variance in the productivity of employees than others). It could be valuable to have an idea of how much leverage ratios do vary in practice. This is an empirical question we might be able to get data for.
Implications for prioritisation work
To choose between interventions, we need to compare cost-effectiveness. I’ve claimed that this is best done by comparing the cost-effectiveness of cause areas, and comparing the leverage ratios of the interventions. If this is right, what’s more valuable to work on evaluating?
Of course they are complementary to each other. The better we are able to identify the best cause areas, the more valuable it is to have good estimates for the leverage ratios of interventions in those areas. And the better we are able to identify interventions with very high leverage ratios, the more valuable it is to be able to say which of those are in the most effective causes. So the answer depends in part on how much work each is already receiving.
It also depends on your beliefs about which component has more variance. If you think that most of the variation in intervention effectiveness comes from leverage ratios, while cost-effectiveness of causes doesn’t vary that much, then it’s more important to evaluate the leverage ratios of interventions. If on the other hand you think more variation comes between causes, then it’s more important to evaluate cause effectiveness. I currently think there is likely to be more variation in cause effectiveness even after you filter to the ones which could plausibly be high value; however I am quite uncertain about this.
There is also an asymmetry which pushes us towards doing more cause assessment first: it’s much easier to cut down the work of evaluating leverage ratios by restricting to a few causes than it is to cut down the work of evaluating cause effectiveness by first identifying opportunities with high leverage ratios. Similarly, if we identify a cause area which is valuable but see no good interventions available to fund, we can advertise this and hopefully create good interventions in the area.
Of course to support giving decisions today we need to compare leverage ratios as well as cause effectiveness. And in some cases studying the interventions may help us to evaluate the cause effectiveness. But I think it will usually be right to investigate leverage ratios only within cause areas that we think have, or might have, high effectiveness, and only after we’ve made an effort to assess that.
Acknowledgements: thanks to Toby Ord and Nick Beckstead for helpful conversations.
Crossposted from the Global Priorities Project.
[1] The leverage ratio is really a function of x together with X. AMF may have one leverage ratio with respect to the area of global health, and another with respect to malaria treatment.
[2] An extra advantage of breaking the comparisons into like-with-like is that it’s easier to track uncertainty so that it doesn’t blow up unnecessarily. I might be very uncertain about how good X is, so I think C(X) lies somewhere in (1, 100). I might also be very uncertain about how good Y is, so that I think C(Y) lies in (1, 100). But it doesn’t follow that C(X)/C(Y) could lie anywhere in (1/100, 100). If my uncertainty about X is related to my uncertainty about Y (say X is reducing carbon emissions and Y is helping communities adapt to climate change), then I might have a better idea of the ratio C(X)/C(Y) than I do about either individually. Of course this just means that my estimates for C(X) and C(Y) are strongly correlated. But I think it’s helpful to have an idea of practical ways to break up the calculation which help to keep the uncertainty under control. For more thoughts on tracking uncertainty through estimates, see here.
Make your own cost-effectiveness Fermi estimates for one-off problems
In some recent work (particularly this article) I built models for estimating the cost effectiveness of work on problems when we don’t know how hard those problems are. The estimates they produce aren’t perfect, but they can get us started where it’s otherwise hard to make comparisons.
Now I want to know: what can we use this technique on? I have a couple of applications I am working on, but I’m keen to see what estimates other people produce.
There are complicated versions of the model which account for more factors, but we can start with a simple version. This is a tool for initial Fermi calculations: it’s relatively easy to use but should get us around the right order of magnitude. That can be very useful, and we can build more detailed models for the most promising opportunities.
The model is given by:

This expresses the expected benefit of adding another unit of resources to solving the problem. You can denominate the resources in dollars, researcher-years, or another convenient unit. To use this formula we need to estimate four variables:
-
R(0) denotes the current resources going towards the problem each year. Whatever units you measure R(0) in, those are the units we’ll get an estimate for the benefit of. So if R(0) is measured in researcher-years, the formula will tell us the expected benefit of adding a researcher year.
-
You want to count all of the resources going towards the problem. That includes the labour of those who work on it in their spare time, and some weighting for the talent of the people working in the area (if you doubled the budget going to an area, you couldn’t get twice as many people who are just as good; ideally we’d use an elasticity here).
-
Some resources may be aimed at something other than your problem, but be tangentially useful. We should count some fraction of those, according to how much resources devoted entirely to the problem they seem equivalent to.
-
-
B is the annual benefit that we’d get from a solution to the problem. You can measure this in its own units, but whatever you use here will be the units of value that come out in the cost-effectiveness estimate.
-
p and y/z are parameters that we will estimate together. p is the probability of getting a solution by the time y resources have been dedicated to the problem, if z resources have been dedicated so far. Note that we only need the ratio y/z, so we can estimate this directly.
-
Although y/z is hard to estimate, we will take a (natural) logarithm of it, so don’t worry too much about making this term precise.
-
I think it will often be best to use middling values of p, perhaps between 0.2 and 0.8.
-
And that’s it.
Example: How valuable is extra research into nuclear fusion? Assume:
-
R(0) = $5 billion (after a quick google turns up $1.5B for current spending, and adjusting upwards to account for non-financial inputs);
-
B = $1000 billion (guesswork, a bit over 1% of the world economy; a fraction of the current energy sector);
-
There’s a 50% chance of success (p = 0.5) by the time we’ve spent 100 times as many resources as today (log(y/z) = log(100) = 4.6).
Putting these together would give an expected societal benefit of (0.5*$1000B)/(5B*4.6) = $22 for every dollar spent. This is high enough to suggest that we may be significantly under-investing in fusion, and that a more careful calculation (with better-researched numbers!) might be justified.
Caveats
To get the simple formula, the model made a number of assumptions. Since we’re just using it to get rough numbers, it’s okay if we don’t fit these assumptions exactly, but if they’re totally off then the model may be inappropriate. One restriction in particular I’d want to bear in mind:
-
It should be plausible that we could solve the problem in the next decade or two.
It’s okay if this is unlikely, but I’d want to change the model if I were estimating the value of e.g. trying to colonise the stars.
Request for applications
So -- what would you like to apply this method to? What answers do you get?
To help structure the comment thread, I suggest attempting only one problem in each comment. Include the value of p, and the units of R(0) and units of B that you’d like to use. Then you can give your estimates for R(0), B, and y/z as a comment reply, and so can anyone else who wants to give estimates for the same thing.
I’ve also set up a google spreadsheet where we can enter estimates for the questions people propose. For the time being anyone can edit this.
Have fun!
Estimating the cost-effectiveness of research
At a societal level, how much money should we put into medical research, or into fusion research? For individual donors seeking out the best opportunities, how can we compare the expected cost-effectiveness of research projects with more direct interventions?
Over the past few months I've been researching this area for the Global Priorities Project. We've written a variety of articles which focus on different parts of the question. Estimating the cost-effectiveness of research is the central example here, but a lot of the methodology is also applicable to other one-off projects with unknown difficulty (perhaps including political lobbying). I don't think it's all solved, but I do think we've made substantial progress.
I think people here might be interested, so I wanted to share our work. To help you navigate and find the most appropriate pieces, here I collect them, summarise what's contained in each, and explain how they fit together.
- I gave an overview of my thinking at the Good Done Right conference, held in Oxford in July 2014. The slides and audio of my talk are available; I have developed more sophisticated models for some parts of the area since then.
- How to treat problems of unknown difficulty introduces the problem: we need to make decisions about when to work more on problems such as research into fusion where we don't know how difficult it will be. It builds some models which allow principled reasoning about how we should act. These models are quite crude but easy to work with: they are intended to lower the bar for Fermi estimates and similar, and provide a starting point for building more sophisticated models.
- Estimating cost-effectiveness for problems of unknown difficulty picks up from the models in the above post, and asks what they mean for the expected cost-effectiveness of work on the problems. This involves building a model of the counterfactual impact, as solvable research problems are likely to be solved eventually, so the main effect is to move the solution forwards. This post includes several explicit formulae that you can use to produce estimates; it also explains analogies between the explicit model we derive and the qualitative 'three factor' model that GiveWell and 80,000 Hours have used for cause selection.
- Estimating the cost-effectiveness of research into neglected diseases is an investigation by Max Dalton, which uses the techniques for estimating cost-effectiveness to provide ballpark figures for how valuable we should expect research into vaccines or treatments for neglected diseases to be. The estimates suggest that, if carefully targeted, such research could be more cost-effective than the best direct health interventions currently available for funding.
- The law of logarithmic returns discusses the question of returns to resources into a field rather than on a single question. With some examples, it suggests that as a first approximation it is often reasonable to assume that diminishing marginal returns take a logarithmic form.
- Theory behind logarithmic returns explains how some simple generating mechanisms can produce roughly logarithmic returns. This is a complement to the above article: we think having both empirical and theoretical justification for the rule helps us to have higher confidence in it, and to better understand when it's appropriate to generalise to new contexts. In this piece I also highlight areas for further research on the theoretical side, into when the approximation will break down, and what we might want to use instead in these cases.
- How valuable is medical research? written with Giving What We Can, applies the logarithmic returns model together with counterfactual reasoning to produce an estimate for the cost-effectiveness of medical research as a whole.
Decision theories as heuristics
Main claims:
- A lot of discussion of decision theories is really analysing them as decision-making heuristics for boundedly rational agents.
- Understanding decision-making heuristics is really useful.
- The quality of dialogue would be improved if it was recognised when they were being discussed as heuristics.
Epistemic status: I’ve had a “something smells” reaction to a lot of discussion of decision theory. This is my attempt to crystallise out what I was unhappy with. It seems correct to me at present, but I haven’t spent too much time trying to find problems with it, and it seems quite possible that I’ve missed something important. Also possible is that this just recapitulates material in a post somewhere I’ve not read.
Existing discussion is often about heuristics
Newcomb’s problem traditionally contrasts the decisions made by Causal Decision Theory (CDT) and Evidential Decision Theory (EDT). The story goes that CDT reasons that there is no causal link between a decision made now and the contents of the boxes, and therefore two-boxes. Meanwhile EDT looks at the evidence of past participants and chooses to one-box in order to get a high probability of being rich.
I claim that both of these stories are applications of the rules as simple heuristics to the most salient features of the case. As such they are robust to variation in the fine specification of the case, so we can have a conversation about them. If we want to apply them with more sophistication then the answers do become sensitive to the exact specification of the scenario, and it’s not obvious that either has to give the same answer the simple version produces.
First consider CDT. It has a high belief that there is no causal link between choosing to one- or two- box and Omega’s previous decision. But in practice, how high is this belief? If it doesn’t understand exactly how Omega works, it might reserve some probability to the possibility of a causal link, and this could be enough to tip the decision towards one-boxing.
On the other hand EDT should properly be able to consider many sources of evidence besides the ones about past successes of Omega’s predictions. In particular it could assess all of the evidence that normally leads us to believe that there is no backwards-causation in our universe. According to how strong this evidence is, and how strong the evidence that Omega’s decision really is locked in, it could conceivably two-box.
Note that I’m not asking here for a more careful specification of the set-up. Rather I’m claiming that a more careful specification could matter -- and so to the extent that people are happy to discuss it without providing lots more details they’re discussing the virtues of CDT and EDT as heuristics for decision-making rather than as an ultimate normative matter (even if they’re not thinking of their discussion that way).
Similarly So8res had a recent post which discussed Newcomblike problems faced by people, and they are very clear examples when the decision theories are viewed as heuristics. If you allow the decision-maker to think carefully through all the unconscious signals sent by her decisions, it’s less clear that there’s anything Newcomblike.
Understanding decision-making heuristics is valuable
In claiming that a lot of the discussion is about heuristics, I’m not making an attack. We are all boundedly rational agents, and this will very likely be true of any artificial intelligence as well. So our decisions must perforce be made by heuristics. While it can be useful to study what an idealised method would look like (in order to work out how to approximate it), it’s certainly useful to study heuristics and determine what their relative strengths and weaknesses are.
In some cases we have good enough understanding of everything in the scenario that our heuristics can essentially reproduce the idealised method. When the scenario contains other agents which are as complicated as ourselves or more so, it seems like this has to fail.
We should acknowledge when we’re talking about heuristics
By separating discussion of the decision-theories-as-heuristics from decision-theories-as-idealised-decision-processes, we should improve the quality of dialogue in both parts. The discussion of the ideal would be less confused by examples of applications of the heuristics. The discussion of the heuristics could become more relevant by allowing people to talk about features which are only relevant for heuristics.
For example, it is relevant if one decision theory tends to need a more detailed description of the scenario to produce good answers. It’s relevant if one is less computationally tractable. And we can start to formulate and discuss hypotheses such as “CDT is the best decision-procedure when the scenario doesn’t involve other agents, or only other agents so simple that we can model them well. Updateless Decision Theory is the best decision-procedure when the scenario involves other agents too complex to model well”.
In addition, I suspect that it would help to reduce disagreements about the subject. Many disagreements in many domains are caused by people talking past each other. Discussion of heuristics without labelling it as such seems like it could generate lots of misunderstandings.
Why we should err in both directions
Crossposted from the Global Priorities Project
This is an introduction to the principle that when we are making decisions under uncertainty, we should choose so that we may err in either direction. We justify the principle, explore the relation with Umeshisms, and look at applications in priority-setting.
Some trade-offs
How much should you spend on your bike lock? A cheaper lock saves you money at the cost of security.
How long should you spend weighing up which charity to donate to before choosing one? Longer means less time for doing other useful things, but you’re more likely to make a good choice.
How early should you aim to arrive at the station for your train? Earlier means less chance of missing it, but more time hanging around at the station.
Should you be willing to undertake risky projects, or stick only to safe ones? The safer your threshold, the more confident you can be that you won’t waste resources, but some of the best opportunities may have a degree of risk, and you might be able to achieve a lot more with a weaker constraint.
The principle
We face trade-offs and make judgements all the time, and inevitably we sometimes make bad calls. In some cases we should have known better; sometimes we are just unlucky. As well as trying to make fewer mistakes, we should try to minimise the damage from the mistakes that we do make.
Here’s a rule which can be useful in helping you do this:
When making decisions that lie along a spectrum, you should choose so that you think you have some chance of being off from the best choice in each direction.
We could call this principle erring in both directions. It might seem counterintuitive -- isn’t it worse to not even know what direction you’re wrong in? -- but it’s based on some fairly straightforward economics. I give a non-technical sketch of a proof at the end, but the essence is: if you’re not going to be perfect, you want to be close to perfect, and this is best achieved by putting your actual choice near the middle of your error bar.
So the principle suggests that you should aim to arrive at the station with a bit of time wasted, but not so much that you won’t miss the train even if something goes wrong.
Refinements
Just saying that you should have some chance of erring in either direction isn’t enough to tell you what you should actually choose. It can be a useful warning sign in the cases where you’re going substantially wrong, though, and as these are the most important cases to fix it has some use in this form.
A more careful analysis would tell you that at the best point on the spectrum, a small change in your decision produces about as much expected benefit as expected cost. In ideal circumstances we can use this to work out exactly where on the spectrum we should be (in some cases more than one point may fit this, so you need to compare them directly). In practice it is often hard to estimate the marginal benefits and costs well enough for this to be useful approach. So although it is theoretically optimal, you will only sometimes want to try to apply this version.
Say in our train example that you found missing the train as bad as 100 minutes waiting at the station. Then you want to leave time so that an extra minute of safety margin gives you a 1% reduction in the absolute chance of missing the train.
For instance, say your options in the train case look like this:
| Safety margin (min) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| Chance of missing train (%) | 50 | 30 | 15 | 8 | 5 | 3 | 2 | 1.5 | 1.1 | 0.8 | 0.6 | 0.4 | 0.3 | 0.2 | 0.1 |
Then the optimal safety margin to leave is somewhere between 6 and 7 minutes: this is where the marginal minute leads to a 1% reduction in the chance of missing the train.
Predictions and track records
So far, we've phrased the idea in terms of the predicted outcomes of actions. Another more well-known perspective on the idea looks at events that have already happened. For example:
- “If you've never missed a flight, you're spending too much time in airports.”
- “If your code never has bugs, you’re being too careful.”
These formulations, dubbed 'Umeshisms', only work for decisions that you make multiple times, so that you can gather a track record.
An advantage of applying the principle to track records is that it’s more obvious when you’re going wrong. Introspection can be hard.
You can even apply the principle to track records of decisions which don’t look like they are choosing from a spectrum. For example it is given as advice in the game of bridge: if you don’t sometimes double the stakes on hands which eventually go against you, you’re not doubling enough. Although doubling or not is a binary choice, erring in both directions still works because ‘how often to do double’ is a trait that roughly falls on a spectrum.
Failures
There are some circumstances where the principle may not apply.
First, if you think the correct point is at one extreme of the available spectrum. For instance nobody says ‘if you’re not worried about going to jail, you’re not committing enough armed robberies’, because we think the best number of armed robberies to commit is probably zero.
Second, if the available points in the spectrum are discrete and few in number. Take the example of the bike locks. Perhaps there are only three options available: the Cheap-o lock (£5), the Regular lock (£20), and the Super lock (£50). You might reasonably decide on the Regular lock, thinking that maybe the Super lock is better, but that the Cheap-o one certainly isn’t. When you buy the Regular lock, you’re pretty sure you’re not buying a lock that’s too tough. But since only two of the locks are good candidates, there is no decision you could make which tries to err in both directions.
Third, in the case of evaluating track records, it may be that your record isn’t long enough to expect to have seen errors in both directions, even if they should both come up eventually. If you haven’t flown that many times, you could well be spending the right amount of time -- or even too little -- in airports, even if you’ve never missed a flight.
Finally, a warning about a case where the principle is not supposed to apply. It shouldn’t be applied directly to try to equalise the probability of being wrong in either direction, without taking any account of magnitude of loss. So for example if someone says you should err on the side of caution by getting an early train to your job interview, it might look as though that were in conflict with the idea of erring in both directions. But normally what’s meant is that you should have a higher probability of failing in one direction (wasting time by taking an earlier train than needed), because the consequences of failing in the other direction (missing the interview) are much higher.
Conclusions and applications to prioritisation
Seeking to err in both directions can provide a useful tool in helping to form better judgements in uncertain situations. Many people may already have internalised key points, but it can be useful to have a label to facilitate discussion. Additionally, having a clear principle can help you to apply it in cases where you might not have noticed it was relevant.
How might this principle apply to priority-setting? It suggests that:
- You should spend enough time and resources on the prioritisation itself that you think some of time may have been wasted (for example you should spend a while at the end without changing your mind much), but not so much that you are totally confident you have the right answer.
- If you are unsure what discount rate to use, you should choose one so that you think that it could be either too high or too low.
- If you don’t know how strongly to weigh fragile cost-effectiveness estimates against more robust evidence, you should choose a level so that you might be over- or under-weighing them.
- When you are providing a best-guess estimate, you should choose a figure which could plausibly be wrong either way.
And one on track records:
- Suppose you’ve made lots of grants. Then if you’ve never backed a project which has failed, you’re probably too risk-averse in your grantmaking.
Questions for readers
Do you know any other useful applications of this idea? Do you know anywhere where it seems to break? Can anyone work out easier-to-apply versions, and the circumstances in which they are valid?
Appendix: a sketch proof of the principle
Assume the true graph of value (on the vertical axis) against the decision you make (on the horizontal axis, representing the spectrum) is smooth, looking something like this: 
The highest value is achieved at d, so this is where you’d like to be. But assume you don’t know quite where d is. Say your best guess is that d=g. But you think it’s quite possible that d>g, and quite unlikely that d<g. Should you choose g?
Suppose we compare g to g’, which is just a little bit bigger than g. If d>g, then switching from g to g’ would be moving up the slope on the left of the diagram, which is an improvement. If d=g then it would be better to stick with g, but it doesn’t make so much difference because the curve is fairly flat at the top. And if g were bigger than d, we’d be moving down the slope on the right of the diagram, which is worse for g’ -- but this scenario was deemed unlikely.
Aggregating the three possibilities, we found that two of them were better for sticking with g, but in one of these (d=g) it didn’t matter very much, and the other (d<g) just wasn’t very likely. In contrast, the third case (d>g) was reasonably likely, and noticeably better for g’ than g. So overall we should prefer g’ to g.
In fact we’d want to continue moving until the marginal upside from going slightly higher was equal to the marginal downside; this would have to involve a non-trivial chance that we are going too high. So our choice should have a chance of failure in either direction. This completes the (sketch) proof.
Note: There was an assumption of smoothness in this argument. I suspect it may be possible to get slightly stronger conclusions or work from slightly weaker assumptions, but I’m not certain what the most general form of this argument is. It is often easier to build a careful argument in specific cases.
Acknowledgements: thanks to Ryan Carey, Max Dalton, and Toby Ord for useful comments and suggestions.
View more: Next
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)