This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Personal Blog
0
Using lying to detect human
values
by
Stuart_Armstrong
15th Mar 2018
AI Alignment Forum
1 min read
0
0
This is a linkpost for
https://www.lesserwrong.com/posts/pQz97SLCRMwHs6BzF/using-lying-to-detect-human-values
New Comment
Submit
Moderation Log
More from
Stuart_Armstrong
79
Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
Ω
Stuart_Armstrong
,
rgorman
1mo
Ω
12
170
Using GPT-Eliezer against ChatGPT Jailbreaking
Ω
Stuart_Armstrong
,
rgorman
2y
Ω
85
67
Alignment can improve generalisation through more robustly doing what a human wants - CoinRun example
Ω
Stuart_Armstrong
1y
Ω
9
View more
Curated and popular this week
282
Tracing the Thoughts of a Large Language Model
Ω
Adam Jermyn
2d
Ω
24
191
Impact, agency, and taste
benkuhn
7d
10
137
Early Chinese Language Media Coverage of the AI 2027 Report: A Qualitative Analysis
jeanne_
,
eeeee
1d
4
0
Comments
Previous
Next