This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
is fundraising!
Tags
LW
$
Login
Corrigibility
•
Applied to
Extending the Off-Switch Game: Toward a Robust Framework for AI Corrigibility
by
Raemon
3mo
ago
•
Applied to
A Shutdown Problem Proposal
by
Mateusz Bagiński
5mo
ago
•
Applied to
Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural
by
RobertM
5mo
ago
•
Applied to
Towards shutdownable agents via stochastic choice
by
EJT
6mo
ago
•
Applied to
Corrigibility = Tool-ness?
by
MondSemmel
6mo
ago
•
Applied to
4. Existing Writing on Corrigibility
by
Max Harms
6mo
ago
•
Applied to
3b. Formal (Faux) Corrigibility
by
Max Harms
6mo
ago
•
Applied to
3a. Towards Formal Corrigibility
by
Max Harms
6mo
ago
•
Applied to
2. Corrigibility Intuition
by
Max Harms
6mo
ago
•
Applied to
Corrigibility could make things worse
by
ThomasCederborg
6mo
ago
•
Applied to
5. Open Corrigibility Questions
by
Ruby
6mo
ago
•
Applied to
0. CAST: Corrigibility as Singular Target
by
Max Harms
6mo
ago
•
Applied to
1. The CAST Strategy
by
Max Harms
6mo
ago
•
Applied to
The Shutdown Problem: Incomplete Preferences as a Solution
by
EJT
10mo
ago
•
Applied to
Requirements for a Basin of Attraction to Alignment
by
RogerDearnaley
11mo
ago
•
Applied to
Nash Bargaining between Subagents doesn't solve the Shutdown Problem
by
A.H.
11mo
ago
•
Applied to
Requirements for a STEM-capable AGI Value Learner (my Case for Less Doom)
by
RogerDearnaley
1y
ago
•
Applied to
A Pedagogical Guide to Corrigibility
by
A.H.
1y
ago