Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
lewtun10

Do you have any plans to release the instructions in RefusalBench? I understand the reasons to not provide many details of your underlying technique, but given the limitations you highlight with AdvBench, wouldn't access to RefusalBench provide safety researchers with a better benchmark to test new models on?