If you're doing things in a group, instead of alone, useful subsets of this framework could be the standard OPSEC process and controls for classified information. There's some pretty big Chesterton's Fences around them.
The OPSEC process is meant specifically for when you're planning a specific activity, the value to the adversary of information about your plans will diminish rapidly as you conclude that specific activity, but any hint as to your plans might be detrimental. So, it's more of a set of guidelines than a specific policy or procedure, and encourages thinking about how many decibels of probability you're allowing access to.
Controls for classified is meant for information that will be harmful even after the conclusion of a specific activity. It's the converse of the OPSEC process: A large collection of highly detailed policies and procedures for marking and protecting information. It's certainly a bit heavyweight for independent research groups smaller than the Manhattan Project, but some principles could apply; like a central classification authority to reduce the cognitive load of marking your products, and uniform procedures for handling products with each level of marking.
Here are some relevant thoughts from Andrew Critch on a FLI podcast episode I just heard (though it was released in 2017):
what if what you discover is not a piece of technology, but a piece of prediction, like Anthony said? What if you discover that it seems quite likely, based on the aggregate opinion of a bunch of skilled predictors, that artificial general human intelligence will be possible within 10 years? Well that, yeah, that has some profound implications for the world, for policy, for business, for military. There’s no denying that. I feel sometimes there’s a little bit of an instinct to kind of pretend like no one’s going to notice that AGI is really important. I don’t think that’s the case.
I had friends in the 2010 vicinity, who thought, surely no one in government will recognize the importance of superintelligence in the next decade. I was almost convinced. I had a little more faith than my friends, so I would have won some bets, but I still was surprised to see Barack Obama talking about superintelligence on an interview. I think the first thing is not to underestimate the possibility that, if you’ve made this prediction, maybe somebody else is about to make it, too.
That said, if you’re Metaculus, maybe you just know who’s running prediction markets, who is studying good prediction aggregation systems, and you just know no one’s putting in the effort, and you really might know that you’re the only people on earth who have really made this prediction, or maybe you and only a few other think tanks have managed to actually come up with a good prediction about when superintelligent AI will be produced, and, moreover, that it’s soon. If you discovered that, I would tell you the same thing I would tell anyone who discovers a potentially dangerous idea, which is not to write a blog post about it right away.
I would say, find three close, trusted individuals that you think reason well about human extinction risk, and ask them to think about the consequences and who to tell next. Make sure you’re fair-minded about it. Make sure that you don’t underestimate the intelligence of other people and assume that they’ll never make this prediction, but … [...]
Then do a rollout procedure. In software engineering, you developed a new feature for your software, but it could crash the whole network. It could wreck a bunch of user experiences, so you just give it to a few users and see what they think, and you slowly roll it out. I think a slow rollout procedure is the same thing you should do with any dangerous idea, any potentially dangerous idea. You might not even know the idea is dangerous. You may have developed something that only seems plausibly likely to be a civilizational scale threat, but if you zoom out and look at the world, and you imagine all the humans coming up with ideas that could be civilizational scale threats.
Maybe they’re a piece of technology, maybe they’re dangerous predictions, but no particular prediction or technology is likely to be a threat, so no one in particular decides to be careful with their idea, and whoever actually produces the dangerous idea is no more careful than anyone else, and they release their idea, and it falls into the wrong hands or it gets implemented in a dangerous way by mistake. Maybe someone accidentally builds Skynet. Somebody accidentally releases replicable plans for a cheap nuclear weapon.
If you zoom out, you don’t want everyone to just share everything right away, and you want there to be some threshold of just a little worry that’s just enough to have you ask your friends to think about it first. If you’ve got something that you think is 1% likely to pose an extinction threat, that seems like a small probability, and if you’ve done calibration training, you’ll realize that that’s supposed to feel very unlikely. Nonetheless, if 100 people have a 1% chance of causing human extinction, well someone probably has a good chance of doing it.
If you just think you’ve got a small chance of causing human extinction, go ahead, be a little bit worried. Tell your friends to be a little bit worried with you for like a day or three. Then expand your circle a little bit. See if they can see problems with the idea, see dangers with the idea, and slowly expand, roll out the idea into an expanding circle of responsible people until such time as it becomes clear that the idea is not dangerous, or you manage to figure out in what way it’s dangerous and what to do about it, because it’s quite hard to figure out something as complicated as how to manage a human extinction risk all by yourself or even by a team of three or maybe even ten people. You have to expand your circle of trust, but, at the same time, you can do it methodically like a software rollout, until you come up with a good plan for managing it. As for what the plan will be, I don’t know. That’s why I need you guys to do your slow rollout and figure it out.
Decrease the likelihood of others developing and/or sharing the information
Promote ideas that make the information hazard seem ridiculous or uninteresting. An example that may or may not be happening is the US government enabling stories of extraterrestrial origin to hide the possibility that they have unreasonably advanced aerospace technology, materially, by encasing it in dumb glowy saucer stuff that doesn't make any sense. (a probably fictional example is good because if someone was smart enough and motivated enough to hide something like this, I probably wouldn't want to tell people about it (if this turns out not to be fictional, USgovt, I'm very sorry, we haven't thought enough about this to understand why you'd want to hide it.))
If the information hazard concerned is going to be around for a long time, you might want to consider constructing an ideological structure that systematically hides the information hazard, under which the only people who get anywhere near questioning enough of their assumptions to find the information hazard also tend to be responsible enough to take it, and where the spread of the information hazard is universally limited. Cease speaking the words that make it articulable. It should be noted, this wont look, from the inside, like a conspiracy. There will not be a single refutation of the idea, under this ideology, because no one would think to write it. It will just seem naturally difficult for most people living under it to notice how the idea might ever be important.
Improve the groups that might discover or use the information:
There's also trying to help improving their ability to handles such information:
There are also some valuable actions one can take to build one’s general capacity for avoiding or mitigating information hazards. Learning about information hazards and how to handle them is one such action. This article touches on some other actions for building that general capacity. ↩︎
This post was written for Convergence Analysis.
Overview
We argue that many people should consider the risk that they could cause harm by developing or sharing (true) information. We think that harm from such information hazards may sometimes be very substantial, and that this applies especially to people who research advanced technologies and/or catastrophic risks, or who often think about such technologies and risks.
However, constantly worrying about information hazards would be paralyzing and unnecessary. We therefore outline a heuristic for quickly identifying whether, in a given situation, it’s worth properly thinking about the hazards some information might pose, and about how to act given those hazards. This heuristic is based on the “potency” and “counterfactual rarity” of the information in question. We next outline, and give examples to illustrate, a range of actions one could take when one has identified that some information could indeed be hazardous.
Why should some people care about information hazards?
An information hazard is “A risk that arises from the dissemination or the potential dissemination of (true) information that may cause harm or enable some agent to cause harm” (Bostrom). There are many types of information hazards. Here are some examples Bostrom gives:
These and other examples highlight just how large the negative impacts of true information can sometimes be - large enough to increase global catastrophic risks, or perhaps even existential risks. It is worth further highlighting two underlying reasons why information could often have such large, negative impacts:
Altogether, we believe that, in some situations (e.g., Bostrom’s examples), developing[1] or sharing information could have an expected negative impact that is larger than the expected positive impact the developer/sharer will achieve through everything else they do in their lives. And we believe that information hazards that are simply very large (rather than so large that they overwhelm the rest of one’s impact) will be much more common.
Thus, at least for some people, it seems like an important way to make their impact more positive would be to reduce the chances that they develop or share hazardous information. This could be just as important as more typical ways to “have a positive impact”.[2]
But it is obviously also true that developing and sharing information is often very beneficial, and that constantly worrying about information hazards would be counterproductive. Thus, in this post we will offer:
Should you care about information hazards?
But who is this relevant to? More specifically, for whom is this topic relevant enough that it’s worth them spending time learning and thinking about it? We think that only a small proportion of people will develop information that poses catastrophic or existential risks. However:
Ultimately, we’d say that, as a rough rule of thumb, this topic is relevant enough that it’s worth spending time learning and thinking about it for people who research advanced technologies and/or catastrophic risks, or who often think about such technologies and risks. We think that such people have non-negligible odds of, at some point, developing or learning of information that poses substantial risks.[3]
When should you think about information hazards?
We’ve argued that it’s worthwhile for people who research advanced technologies and/or catastrophic risks, or who often think about such technologies and risks, to learn and think about the topic of information hazards. But it’d be paralyzing and unnecessary for such people to always worry about information hazards. So when, more specifically, should you take the time to properly think about the hazards some information might pose, and about how to act given these hazards?[4]
We propose a simple heuristic to answer this question:
We will now explain the terms and ideas in that heuristic in more detail.
The potency of some information depends on factors such as how many people the information may affect, how intensely each affected person would be affected, how long the effects would last, and so on. For example, the information that it’s possible to create a nuclear weapon is much more potent than information about what you had for breakfast. (As such, there’s no need to worry about the risks posed by information about what you had for breakfast - feel free to share that with whoever might care, and then get on with your day.)
Counterfactual rarity essentially refers to the number of people who are likely to have already developed or learned this information (or similar information), or to develop or learn this information soon anyway. This depends on factors such as how much specialised knowledge is required to arrive at the information, how counterintuitive the information is, the incentives for developing and sharing the information, and so on.
Counterfactual rarity is important because, however potent some information is, the impact of you developing or sharing that information will depend on whether the information is already widely known, and whether it’s very likely to be discovered and publicised soon in any case. For example, the information that it’s possible to create a nuclear weapon is now very widely known, so, even though the information is very potent, it’s no longer worth worrying about it as an information hazard.
Instead, in such cases, we should focus on what we could call implementation-related responses. By this, we essentially mean any actions for reducing risks other than what we might call an “information-related response” (discussed below). Examples of implementation-related responses to the risks posed by nuclear weapons include trying to control access to nuclear materials and trying to establish norms against the creation of nuclear weapons. This seems more valuable (nowadays) than the information-related response of trying to hide the fact that nuclear weapons can be created. This is because the key bottlenecks to causing harm using nuclear weapons are now related to implementing the information, rather than to accessing the information in the first place.
Some information will have high potency and high counterfactual rarity. This could include, for example, information about a new way to engineer a virus. These are the cases in which it’s worth thinking about how hazardous the information might be.
Additionally, in such cases, you should probably consider using information-related responses. By this we mean actions intended to prevent the development or (further) spread of potentially hazardous information, or to influence how the information is developed or (further) spread. For example, you might avoid conducting a line of research that could reveal a new way to engineer a virus, or push for research of that type to require review before it's conducted, or discourage discussion of such research outside of academic publications. (Note that our purpose here is to illustrate these ideas, rather than to recommend specific actions in response to specific, actual situations.)[5]
The following diagram visually represents our heuristic:
Note that both this heuristic and the response options we discuss below are intended for use at an individual level, to reduce information hazards from one’s own actions. We think that many of the same ideas would apply at an organisational level, or if trying to analyse and reduce the information hazards other people’s actions might cause, but that some modifications would have to be made.[6]
For information with good consequences
Although it's not our focus, we should also quickly note that essentially the mirror image of this heuristic process can be followed in relation to information you suspect may have good consequences.
First, consider whether the information is highly potent. If it isn’t (e.g., if the information is about how many marbles there are in a jar), then don’t bother further assessing the potential impacts of this information.
If the information is highly potent, then consider whether it’s counterfactually rare. If it’s not counterfactually rare, consider using implementation-related responses. For example, the information that washing one’s hands is a good idea is highly potent, but is now very widely known. As such, in many communities, trying to spread the information further may have relatively little value. However, helping people use the information (e.g., by providing access to clean water) may have high value.
If the information is highly potent and counterfactually rare, then it’s worth thinking about the benefits that may result from developing and sharing the information (in a mirror image of thinking about the possible harms from hazardous information). Additionally, in such cases, you should probably consider using information-related responses. For example, if the information is a cure for cancer, it may be worth considering developing the information via research, or, if the information has recently been developed, spreading the information to more doctors and patients.
What can you do about information hazards?
Let’s say you’ve used the above heuristic, and determined that the information in question is indeed high in potency and counterfactually rarity. You know this means that you should probably think further about how hazardous this information might be, and consider using “information-related responses”. But what specific responses can you employ?
We will now provide a (non-exhaustive) list of responses that may often be worth considering when dealing with potential information hazards, along with examples to illustrate each response. Note that it would sometimes be possible to combine multiple responses (i.e., they are not mutually exclusive).
These responses are very approximately ordered from the least extreme responses, worth considering when the expected harms from developing or sharing the information are relatively low, to the most extreme responses. (But we should note that, for the most part, this post won’t offer very specific guidelines on which response to take in particular situations, or how to decide that. We hope to explore those questions more in future work.)
Potential responses
Develop and/or share the information: The information is sufficiently potent and counterfactually rare that it was worth thinking about the risks, but you conclude that the risks are low enough (compared to the benefits) that you can go ahead and develop and/or share the information.
Develop the information, but don’t (yet) share it: Similar to the above, except that you only proceed with developing the information, concluding that sharing it is not worthwhile, or that it’d be better to wait until later to decide whether to share it.
Think more about the risks: Consider in more detail how risky the information might be. You might do this while continuing to develop and/or share the information, or before doing so (with the results of your thinking determining whether to do so).
Frame the information to reduce risks: Make a conscious effort to frame or explain the information in a way that reduces its risks, such as by influencing how the information is likely to be used or who it is likely to reach.
Develop and/or share a subset of the information: Instead of developing and/or sharing none of the information, or all of it, work out what part of the information would be net-beneficial to develop and/or share, and develop and/or share that part only. (If doing this, it’s worth thinking carefully about whether the part you’re planning to develop and/or share would be sufficient for others to (re)construct the other parts of the information as well.)
Share the information with a subset of people: Do share the information, but not indiscriminately. Instead, work out which people it’s likely to be net-beneficial to share the information with, and share it with just these people. (One reason to do this would be to get these people’s opinion on how dangerous it would be to further develop and/or share the information. Another would be to allow these people to develop defences against whatever harms the information relates to or could cause.)
Avoid developing and/or sharing the information: If the information poses high enough risks (relative to its potential benefits), it may be best to simply avoid developing and/or sharing it at all.
Monitor whether others may develop and/or share the information: For sufficiently risky information, it may be worth actively investigating whether anyone else is developing or sharing the information, or whether anyone seems likely to do so in future. If you identify any such people, you might then warn them of the potential risks, let the relevant authorities know that these people may develop or share this information, and/or try to make it harder for the information to be acted on if it is developed and shared (i.e., try to use implementation-related responses). You might do all this after having stopped developing and/or sharing the information yourself, or while continuing to do so (probably very cautiously).
Decrease the likelihood of others developing and/or sharing the information: This overlaps with parts of the above response option. But there are also ways of doing this that don’t involve monitoring whether others may develop and/or share the information. For example, you could delete your own research notes, or try to steer people away from books, articles, concepts, etc. that helped lead you to the information.
There are also two potential responses that seem to us like they could arguably be classified either as implementation-related responses or as information-related responses. These two responses are the following:
Develop countermeasures: Develop ways of preventing, mitigating, or fixing the potential harms of the information. You might do all this after having stopped developing and/or sharing the information yourself, or while continuing to do so (perhaps to inform your efforts to develop countermeasures).
Improve the groups that might discover or use the information: Try to improve the values (e.g., level of altruism) and/or capabilities of the people or organisations who might develop, learn about, or implement the information. This would be done to reduce the risks that they will share or use the information in harmful ways. (See also Improving the future by influencing actors' benevolence, intelligence, and power.)
Again, we emphasise that this list is probably not exhaustive, that some of the response options could be used in combination, and that we aren’t here making specific suggestions on when to use which response. Note also that one may often be able to switch from one response option to another later (e.g., from sharing with a subset of people to sharing publicly), especially if one has started with a relatively cautious option.
Conclusion
We believe that many people have a significant chance of at some point being in a position to develop and/or share information that could cause substantial harm. We think that this is especially true for people who research advanced technologies and/or catastrophic risks, or who often think about such technologies and risks. Thus, we think that responding well to potential information hazards can be a very important way for such people to have a more positive impact on the world.
We suggested a heuristic to use when one is facing a potential information hazard: first ask yourself whether the information has high potency, and then whether it has high counterfactual rarity. If the answer to both questions is no, then you don’t have to worry. If the information is highly potent but not counterfactually rare, you should consider using “implementation-related responses” (e.g., limiting access to uranium). If the information is highly potent and counterfactually rare, then you should think further about whether this information could be hazardous, and consider using “information-related responses”.
We then outlined a set of information-related responses one could consider, which we hope will guide readers in making more informed choices when facing potential information hazards.
There are two key things that this post has not done:
We hope to explore those avenues further in future work.
This post was written for Convergence by MichaelA and Justin Shovelain, based on an earlier post written by Justin and Andrés Gómez Emilsson. We’re also grateful to David Kristoffersson, Aaron Gertler, and Will Bradshaw for helpful comments and edits on earlier drafts, and to Anders Sandberg and Ben Harack for helpful discussions on the general topic.
By “developing”, we mean things like “coming up with” or “independently discovering through research”. Note that, with our usage of the term “develop”, it’s possible to consider the potential dangers of some information that you might develop but haven’t yet developed. For example, before starting research that you expect might reveal a new way to engineer a virus, you could already assess how dangerous the results of that research might be (though this assessment would of course be quite abstract and approximate). ↩︎
We would in fact further argue that there’s no relevant moral difference between “having a positive impact” and “reducing one’s negative impact”, and we hope to write about that point in the future. But that point isn’t necessary for our arguments in this post. ↩︎
We’re also inclined to think that it would be possible to modify the basic ideas in this post so as to make them relevant to various other sets of people, and potentially to all people. But the people we’re most interested in addressing, and for whom we think this post’s version of these ideas is most useful, are indeed people who research advanced technologies and/or catastrophic risks, or who often think about such technologies and risks. ↩︎
An alternative framework to ours for answering a somewhat related question is provided in The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse? (which we read after writing this post). What that article calls “Counterfactual possession” is somewhat similar to what we call “Counterfactual rarity”. ↩︎
One alternative, or additional, approach for deciding whether to worry about information hazards in a particular situation would be the following: Consider how similar the information under consideration is to various types of information hazards that have been described (such as by Bostrom or by Crawford et al.), and whether this highlights a mechanism through which the information could cause harm. For example, you could recall the concept of an attention hazard, and then think about whether the article you’re considering writing could end up raising some information to the wrong people’s attention in a way that causes harm. ↩︎
There are also some valuable actions one can take to build one’s general capacity for avoiding or mitigating information hazards. Learning about information hazards and how to handle them is one such action. This article touches on some other actions for building that general capacity. ↩︎