This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Reinforcement Learning
•
Applied to
Reinforcement Learning: Essential Step Towards AGI or Irrelevant?
by
Double
1mo
ago
•
Applied to
The Explore vs. Exploit Dilemma
by
nathanjzhao
1mo
ago
•
Applied to
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
by
Marcus Williams
1mo
ago
•
Applied to
[Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
by
Yohan Mathew
2mo
ago
•
Applied to
Inference-Only Debate Experiments Using Math Problems
by
Arjun Panickssery
4mo
ago
•
Applied to
Pacing Outside the Box: RNNs Learn to Plan in Sokoban
by
Adrià Garriga-alonso
4mo
ago
•
Applied to
On predictability, chaos and AIs that don't game our goals
by
Alejandro Tlaie
4mo
ago
•
Applied to
Towards shutdownable agents via stochastic choice
by
EJT
4mo
ago
•
Applied to
(Appetitive, Consummatory) ≈ (RL, reflex)
by
Steven Byrnes
5mo
ago
•
Applied to
Language for Goal Misgeneralization: Some Formalisms from my MSc Thesis
by
Giulio
5mo
ago
•
Applied to
The Carnot Engine of Economics
by
StrivingForLegibility
6mo
ago
•
Applied to
Finding the estimate of the value of a state in RL agents
by
Clément Dumas
7mo
ago
•
Applied to
Speedrun ruiner research idea
by
lemonhope
7mo
ago
•
Applied to
Measuring Learned Optimization in Small Transformer Models
by
J Bostock
7mo
ago
•
Applied to
[Aspiration-based designs] 2. Formal framework, basic algorithm
by
Jobst Heitzig
8mo
ago
•
Applied to
[Aspiration-based designs] 1. Informal introduction
by
Jobst Heitzig
8mo
ago