This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Sycophancy
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Sycophancy
Random Tag
Contributors
Posts tagged
Sycophancy
Most Relevant
2
161
Sycophancy to subterfuge: Investigating reward tampering in large language models
Ω
Carson Denison
,
evhub
5mo
Ω
22
2
123
Steering Llama-2 with contrastive activation additions
Ω
Nina Panickssery
,
Wuschel Schulz
,
NickGabs
,
Meg
,
evhub
,
TurnTrout
10mo
Ω
29
1
122
Reducing sycophancy and improving honesty via activation steering
Ω
Nina Panickssery
1y
Ω
17
1
26
SAE features for refusal and sycophancy steering vectors
Ω
neverix
,
Dmitrii Kharlapenko
,
Arthur Conmy
,
Neel Nanda
1mo
Ω
4
1
8
Two new datasets for evaluating political sycophancy in LLMs
Ω
alma.liezenga
2mo
Ω
0
1
2
Evaluating LLaMA 3 for political sycophancy
alma.liezenga
2mo
2
1
-8
Antagonistic AI
Xybermancer
9mo
1