This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Alignment Hot Take Advent Calendar
LW
Login
Alignment Hot Take Advent Calendar
38
Take 1: We're not going to reverse-engineer the AI.
Ω
Charlie Steiner
2y
Ω
4
17
Take 2: Building tools to help build FAI is a legitimate strategy, but it's dual-use.
Ω
Charlie Steiner
2y
Ω
1
23
Take 3: No indescribable heavenworlds.
Ω
Charlie Steiner
2y
Ω
12
37
Take 4: One problem with natural abstractions is there's too many of them.
Ω
Charlie Steiner
2y
Ω
4
31
Take 5: Another problem for natural abstractions is laziness.
Ω
Charlie Steiner
2y
Ω
4
14
Take 6: CAIS is actually Orwellian.
Ω
Charlie Steiner
2y
Ω
8
50
Take 7: You should talk about "the human's utility function" less.
Ω
Charlie Steiner
2y
Ω
22
28
Take 8: Queer the inner/outer alignment dichotomy.
Ω
Charlie Steiner
2y
Ω
2
33
Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.
Ω
Charlie Steiner
2y
Ω
13
37
Take 10: Fine-tuning with RLHF is aesthetically unsatisfying.
Ω
Charlie Steiner
2y
Ω
3
34
Take 11: "Aligning language models" should be weirder.
Ω
Charlie Steiner
2y
Ω
0
25
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Ω
Charlie Steiner
2y
Ω
1
54
Take 13: RLHF bad, conditioning good.
Ω
Charlie Steiner
2y
Ω
4
15
Take 14: Corrigibility isn't that great.
Ω
Charlie Steiner
2y
Ω
3