This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Wikitags
LW
Login
Inverse Reinforcement Learning
Settings
Applied to
How to Contribute to Theoretical Reward Learning Research
by
Joar Skalse
22d
ago
Applied to
Other Papers About the Theory of Reward Learning
by
Joar Skalse
22d
ago
Applied to
Defining and Characterising Reward Hacking
by
Joar Skalse
22d
ago
Applied to
Misspecification in Inverse Reinforcement Learning - Part II
by
Joar Skalse
22d
ago
Applied to
Misspecification in Inverse Reinforcement Learning
by
Joar Skalse
22d
ago
Applied to
Partial Identifiability in Reward Learning
by
Joar Skalse
22d
ago
Applied to
The Theoretical Reward Learning Research Agenda: Introduction and Motivation
by
Joar Skalse
22d
ago
Dakara
v1.2.0
Dec 30th 2024 GMT
(
+1
/
-132
)
1
Applied to
ACI#9: What is Intelligence
by
Akira Pyinya
3mo
ago
Applied to
Humans can be assigned any values whatsoever...
by
Gunnar_Zarncke
5mo
ago
Applied to
Why do we need RLHF? Imitation, Inverse RL, and the role of reward
by
Ran W
1y
ago
Applied to
[Linkpost] Concept Alignment as a Prerequisite for Value Alignment
by
Bogdan Ionut Cirstea
1y
ago
Applied to
Thinking about maximization and corrigibility
by
James Payor
2y
ago
Phib
v1.1.0
Apr 19th 2023 GMT
(+1170)
1
Applied to
Data for IRL: What is needed to learn human values?
by
Jan Wehner
3y
ago
Applied to
A Survey of Foundational Methods in Inverse Reinforcement Learning
by
Raemon
3y
ago
Applied to
Is CIRL a promising agenda?
by
Ruby
3y
ago
Applied to
Machines vs Memes Part 3: Imitation and Memes
by
ceru23
3y
ago