This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
is fundraising!
Tags
LW
$
Login
Deception
•
Applied to
On Intentionality, or: Towards a More Inclusive Concept of Lying
by
Cornelius Dybdahl
2mo
ago
•
Applied to
Deep Honesty
by
David Gross
2mo
ago
•
Applied to
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
by
Marcus Williams
3mo
ago
•
Applied to
Why is o1 so deceptive?
by
abramdemski
3mo
ago
•
Applied to
Secret Collusion: Will We Know When to Unplug AI?
by
schroederdewitt
3mo
ago
•
Applied to
Finding Deception in Language Models
by
Esben Kran
4mo
ago
•
Applied to
Let’s use AI to harden human defenses against AI manipulation
by
Tom Davidson
5mo
ago
•
Applied to
Ethical Deception: Should AI Ever Lie?
by
Jason Reid
5mo
ago
•
Applied to
[Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
by
Leon Lang
6mo
ago
•
Applied to
Sparse Features Through Time
by
Rogan Inglis
6mo
ago
•
Applied to
Inducing Unprompted Misalignment in LLMs
by
Sam Svenningsen
8mo
ago
•
Applied to
'Empiricism!' as Anti-Epistemology
by
Gyrodiot
9mo
ago
•
Applied to
My Clients, The Liars
by
ymeskhout
10mo
ago
•
Applied to
Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles
by
Zack_M_Davis
10mo
ago
•
Applied to
Difficulty classes for alignment properties
by
Jozdien
10mo
ago
•
Applied to
LLMs can strategically deceive while doing gain-of-function research
by
Igor Ivanov
1y
ago
•
Applied to
Why do so many think deception in AI is important?
by
Gunnar_Zarncke
1y
ago
•
Applied to
(Partial) failure in replicating deceptive alignment experiment
by
claudia.biancotti
1y
ago