Nate Thomas

Redwood Research and Constellation

Wiki Contributions

Comments

Sorted by

To anyone reading this who wants to work on or discuss FHI-flavored work: Consider applying to Constellation's programs (the deadline for some of them is today!), which include salaried positions for researchers.

Nate ThomasΩ91812

Note that it's unsurprising that a different model categorizes this correctly because the failure was generated from an attack on the particular model we were working with. The relevant question is "given a model, how easy is it to find a failure by attacking that model using our rewriting tools?"