Hey everyone! I work on quantifying and demonstrating AI cybersecurity impacts at Palisade Research with @Jeffrey Ladish.
We have a bunch of exciting work in the pipeline, including:
demos of well-known safety issues like agent jailbreaks or voice cloning
replications of prior work on self-replication and hacking capabilities
modelling of above capabilities' economic impact
novel evaluations and tools
Most of my posts here will probably detail technical research or announce new evaluation benchmarks and tools. I also think a lot about responsible release, ... (read more)
Hey everyone! I work on quantifying and demonstrating AI cybersecurity impacts at Palisade Research with @Jeffrey Ladish.
We have a bunch of exciting work in the pipeline, including:
Most of my posts here will probably detail technical research or announce new evaluation benchmarks and tools. I also think a lot about responsible release, ... (read more)