A good safety measure for these models might be to train them with false information about building bombs etc. so their answers will be full of hallucinations. There aren't that many areas of dangerous knowledge, so it could probably be done cheaply and without significantly affecting it's general capabilities.
A good safety measure for these models might be to train them with false information about building bombs etc. so their answers will be full of hallucinations. There aren't that many areas of dangerous knowledge, so it could probably be done cheaply and without significantly affecting it's general capabilities.