This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
is fundraising!
Tags
LW
$
Login
Adversarial Training
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Adversarial Training
Random Tag
Contributors
Posts tagged
Adversarial Training
Most Relevant
2
153
Ironing Out the Squiggles
Zack_M_Davis
8mo
36
2
143
Takeaways from our robust injury classifier project [Redwood Research]
Ω
dmz
2y
Ω
12
2
142
High-stakes alignment via adversarial training [Redwood Research report]
Ω
dmz
,
LawrenceC
,
Nate Thomas
3y
Ω
29
2
123
Deep Forgetting & Unlearning for Safely-Scoped LLMs
Ω
scasper
1y
Ω
30
2
100
Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Ω
Buck
2mo
Ω
4
2
87
Solving adversarial attacks in computer vision as a baby version of general AI alignment
Ω
Stanislav Fort
4mo
Ω
8
2
38
Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing
Ω
Buck
3y
Ω
0
2
30
Adversarial Robustness Could Help Prevent Catastrophic Misuse
Ω
aogara
1y
Ω
18
2
25
Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
Ω
scasper
5mo
Ω
0
2
17
AI Safety 101 - Chapter 5.2 - Unrestricted Adversarial Training
Charbel-Raphaël
1y
0
2
16
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
Ω
DanielFilan
2y
Ω
0
2
9
Some thoughts on why adversarial training might be useful
Ω
Beth Barnes
3y
Ω
6
1
50
Latent Adversarial Training
Ω
Adam Jermyn
2y
Ω
13
1
41
Beyond the Board: Exploring AI Robustness Through Go
Ω
AdamGleave
6mo
Ω
2
1
30
EIS IX: Interpretability and Adversaries
Ω
scasper
2y
Ω
8