This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
is fundraising!
Tags
LW
$
Login
Adversarial Examples (AI)
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Adversarial Examples (AI)
Random Tag
Contributors
2
Ruby
1
Multicore
Posts tagged
Adversarial Examples (AI)
Most Relevant
8
677
SolidGoldMagikarp (plus, prompt generation)
Ω
Jessica Rumbelow
,
mwatkins
2y
Ω
205
Review
3
153
Ironing Out the Squiggles
Zack_M_Davis
8mo
36
3
70
AI Safety in a World of Vulnerable Machine Learning Systems
Ω
AdamGleave
,
EuanMcLean
2y
Ω
28
2
124
There are (probably) no superhuman Go AIs: strong human players beat the strongest AIs
Taran
2y
34
2
123
Deep Forgetting & Unlearning for Safely-Scoped LLMs
Ω
scasper
1y
Ω
30
2
87
Solving adversarial attacks in computer vision as a baby version of general AI alignment
Ω
Stanislav Fort
4mo
Ω
8
2
58
Human beats SOTA Go AI by learning an adversarial policy
Vanessa Kosoy
2y
32
2
38
What progress have we made on automated auditing?
Q
Ω
LawrenceC
5mo
Q
Ω
1
2
35
If I were a well-intentioned AI... I: Image classifier
Ω
Stuart_Armstrong
5y
Ω
4
2
31
Adversarial Policies Beat Professional-Level Go AIs
sanxiyn
2y
35
2
30
Adversarial Robustness Could Help Prevent Catastrophic Misuse
Ω
aogara
1y
Ω
18
2
13
The Goodhart Game
Ω
John_Maxwell
5y
Ω
5
2
12
AXRP Episode 1 - Adversarial Policies with Adam Gleave
Ω
DanielFilan
4y
Ω
5
2
5
RAIN: Your Language Models Can Align Themselves without Finetuning - Microsoft Research 2023 - Reduces the adversarial prompt attack success rate from 94% to 19%!
Singularian2501
1y
0
1
142
High-stakes alignment via adversarial training [Redwood Research report]
Ω
dmz
,
LawrenceC
,
Nate Thomas
3y
Ω
29