I struggle to see what the infohazard policy for AGI safety research writing should be. For example, it has been pointed out multiple times that GPT-3 paper was an infohazard, because so many people realised this kind of model was possible, and developed its equivalents, making big progress towards AGI overall. On the other hand, imagine we would still know nothing about LLMs on the capability level of GPT-3 and higher. So many independent research with these models would have not been done. Safety teams at OpenAI and other leading orgs would do something, but the alignment community complains that the size of these teams is inadequate to the si,e of their capability teams, so we should sort of want AGI labs to publish? Or we also think that the number of AGI wannabies outside the leading AGI labs who want to build something impressive upon the insights from the leading AGI labs and don't concern about safety pretty much at all is still many times larger than the number of AGI safety researchers who use GPT-3 and other proto-AGI results in "differentially right way"?

Similar weighting of harm and benefit should be performed regarding the community involvement: it's known that publishing of GPT-3 paper was a wake-up call for many researchers, and, I believe, have directly led to founding of Conjecture. Similarly, publishing of the many impressive results earlier this year, primarily by Google and DeepMind, was a wake-up call for many other people concerned about the x-risk, including me. But it's harder to estimate the corresponding effect on the excitement of another groups of people, and estimate the numbers of people who decided to become involved and push towards AGI, as the result of these publications. (I believe John Carmack is an example of such a person.)

What are the best resources on this topic? Why they are not promoted as a policy on LessWrong?

Related to the question is the idea that the conceptual/inferential, social, organisational, and political division between the AGI capabilities R&D community and the alignment community is deeply problematic: what we should have is a single community with a singular goal of developing a transformative AI that won't lead to a catastrophe.

New Answer
New Comment

1 Answers sorted by

Gabe M

32

The Conjecture Internal Infohazard Policy seems like a good start!

As for why such standards aren't well established with the rationalist AI safety community or publicized on LessWrong, I suspect there may be some unfortunate conflict between truth-seekingness as a value and censorship in the form of protecting infohazards.

Thanks, though this is more of a process document, than the kind of policy that I'm looking to: the policy which helps to answer the questions "Should I treat this or that idea or other information as infohazardous, and why?". Except for a five-item list of "presumptuous infohazards". If I understand correctly, the actual policy is left to the judgement of project leads and the coordinator. Or, perhaps, it exists at Conjecture as a written artifact, but is considered private or secret itself.