niki.h

Does robustness improve with scale?

Adversarial vulnerabilities have long been an issue in various ML systems. Large language models (LLMs) are no exception, suffering from issues such as jailbreaks: adversarial prompts that bypass model safeguards. At the same time, scale has led to remarkable advances in the capabilities of LLMs, leading us to ask: to...

Jul 25, 202414

LESSWRONG
LW

LESSWRONG
LW

niki.h

niki.h

Does robustness improve with scale?

niki.h

niki.h

niki.h

Does robustness improve with scale?