This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
is fundraising!
Tags
LW
$
Login
Aligned AI Proposals
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Aligned AI Proposals
Random Tag
Contributors
Posts tagged
Aligned AI Proposals
Most Relevant
6
58
A "Bitter Lesson" Approach to Aligning AGI and ASI
Ω
RogerDearnaley
6mo
Ω
39
5
64
How to Control an LLM's Behavior (why my P(DOOM) went down)
Ω
RogerDearnaley
1y
Ω
30
5
47
Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
Ω
RogerDearnaley
1y
Ω
8
5
38
Requirements for a Basin of Attraction to Alignment
Ω
RogerDearnaley
10mo
Ω
12
5
37
Striking Implications for Learning Theory, Interpretability — and Safety?
RogerDearnaley
1y
4
5
35
Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?
RogerDearnaley
1y
4
5
15
Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis
RogerDearnaley
11mo
15
4
30
Interpreting the Learning of Deceit
Ω
RogerDearnaley
1y
Ω
14
2
165
A list of core AI safety problems and how I hope to solve them
Ω
davidad
1y
Ω
29
2
117
AI Alignment Metastrategy
Ω
Vanessa Kosoy
1y
Ω
13
2
54
How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith
2mo
5
2
48
Safety First: safety before full alignment. The deontic sufficiency hypothesis.
Ω
Chipmonk
1y
Ω
3
2
40
We have promising alignment plans with low taxes
Ω
Seth Herd
1y
Ω
9
2
38
The (partial) fallacy of dumb superintelligence
Ω
Seth Herd
1y
Ω
5
2
13
[Linkpost] Building Altruistic and Moral AI Agent with Brain-inspired Affective Empathy Mechanisms
Gunnar_Zarncke
2mo
0