This is going to be a linkpost from Beren on some severe problems that come with the use of the concept of an infohazard on LW.
The main problem I see that are relevant to infohazards are that it encourages a "Great Man Theory" of progress in science, which is basically false, and this still holds despite vast disparities in ability, since it's very rare for person or small group to be able to single handedly solve scientific fields/problems by themselves, and the culture of AI safety already has a bit of a problem with using the "Great Man Theory" too liberally, especially those that are influenced by MIRI.
There are other severe problems that come with infohazards that cripple the AI safety community, but I think the encouragement of Great Man Theories of scientific progress is the most noteworthy problem to me, but that doesn't mean it has the biggest impact on AI safety, compared to the other problems.
Part of Beren's post is quoted below:
Infohazards assume an incorrect model of scientific progress
One issue I have with the culture of AI safety and alignment in general is that it often presupposes too much of a “great man” theory of progress 1 – the idea that there will be a single ‘genius’ who solves ‘The Problem’ of alignment and that everything else has a relatively small impact. This is not how scientific fields develop in real life. While there are certainly very large individual differences in performance, and a log-normal distribution of impact, with outliers having vastly more impact than the median, nevertheless in almost all scientific fields progress is highly distributed – single individuals very rarely completely solve entire fields themselves.
Solving alignment seems unlikely to be different a-priori, and appears to require a deep and broad understanding of how deep learning and neural networks function and generalize, as well as significant progress in understanding their internal representations, and learned goals. In addition, there must likely be large code infrastructures built up around monitoring and testing of powerful AI systems and an sensible system of multilateral AI regulation between countries. This is not the kind of thing that can be invented by a lone genius from scratch in a cave. This is a problem that requires a large number of very smart people building on each other’s ideas and outputs over a long period of time, like any normal science or technological endeavor. This is why having widespread adoption of the ideas and problems of alignment, as well as dissemination of technical work is crucial.
This is also why some of the ideas proposed to fix some of the issues caused by infohazard norms fall flat. For instance, to get feedback, it is often proposed to have a group of trusted insiders who have access to all the infohazardous information and can build on it themselves. However, not only is such a group likely to just get overloaded with adjudicating infohazard requests, but we should naturally not expect the vast majority of insights to come from a small recognizable group of people at the beginning of the field. The existing set of ‘trusted alignment people’ is strongly unlikely to generate all, or even a majority, of the insights required to successfully align superhuman AI systems in the real world. Even Einstein – the archetypal lone genius – who was at the time a random patent clerk in Switzerland far from the center of the action – would not have been able to make any discoveries if all theoretical physics research of the time was held to be ‘infohazardous’ and only circulated privately among the physics professors of a few elite universities at the time. Indeed, it is highly unlikely that in such a scenario much theoretical physics would have been done at all.
Similarly, take the case in ML. The vast majority of advancements in current ML come from a widely distributed network of contributors in academia and industry. If knowledge of all advancements was restricted to the set of ML experts in 2012 when AlexNet was published, this would have prevented almost everybody who has since contributed to ML from entering the field and slowed progress down immeasurably. Of course there is naturally a power-law distribution of impact where a few individuals show outlier productivity and impact, however progress in almost all scientific fields is extremely distributed and not confined to a few geniuses which originate the vast majority of the inventions.
Another way to think about this is that the AI capabilities research ‘market’ is currently much more efficient than the AI safety market. There are a lot more capabilities researchers between industry and academia than safety researchers. The AI capabilities researchers have zero problem sharing their work and building off the work of others – ML academia directly incentivises this and, until recently it seems, so did the promotion practices of most industry labs. Capabilities researchers also tend to get significantly stronger empirical feedback loops than a lot of alignment research and, generally, better mentorship and experience in actually conducting science. This naturally leads to much faster capabilities progress than alignment progress. Having strict infohazard norms and locking down knowledge of new advances to tiny groups of people currently at the top of the alignment status hierarchy further weakens the epistemics of the alignment community and significantly increases the barriers to entry – which is exactly the opposite of what we want. We need to be making the alignment research market more efficient, and with less barriers to research dissemination and access than capabilities if we want to out-progress them. Strict infohazard norms move things in the wrong direction.
For more on this topic, Beren's linkpost up above is a great reference, and I'd highly recommend it for discussion on more problems with the infohazard concept.
I’d be very interested if anyone has specific examples of ideas like this they could share (that are by now widely known or obviously not hazardous). I’m sympathetic to the sorts of things the article says, but I don’t actually have any picture of the class of ideas it’s talking about.
I'm not "on the inside", but my understanding is that some people at Conjecture came up with Chain of Thought prompting and decided that it was infohazardous, I gather fairly shortly before preprints describing it came out in the open AI literature. That idea does work well, but was of course obvious to any schoolteacher.