LESSWRONGTags
LW

Reinforcement Learning

•

Applied to Reinforcement Learning: Essential Step Towards AGI or Irrelevant? by Double 1mo ago

•

Applied to The Explore vs. Exploit Dilemma by nathanjzhao 1mo ago

•

Applied to On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback by Marcus Williams 1mo ago

•

Applied to [Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs by Yohan Mathew 2mo ago

•

Applied to Inference-Only Debate Experiments Using Math Problems by Arjun Panickssery 4mo ago

•

Applied to Pacing Outside the Box: RNNs Learn to Plan in Sokoban by Adrià Garriga-alonso 4mo ago

•

Applied to On predictability, chaos and AIs that don't game our goals by Alejandro Tlaie 4mo ago

•

Applied to Towards shutdownable agents via stochastic choice by EJT 4mo ago

•

Applied to (Appetitive, Consummatory) ≈ (RL, reflex) by Steven Byrnes 5mo ago

•

Applied to Language for Goal Misgeneralization: Some Formalisms from my MSc Thesis by Giulio 5mo ago

•

Applied to The Carnot Engine of Economics by StrivingForLegibility 6mo ago

•

Applied to Finding the estimate of the value of a state in RL agents by Clément Dumas 7mo ago

•

Applied to Speedrun ruiner research idea by lemonhope 7mo ago

•

Applied to Measuring Learned Optimization in Small Transformer Models by J Bostock 7mo ago

•

Applied to [Aspiration-based designs] 2. Formal framework, basic algorithm by Jobst Heitzig 8mo ago

•

Applied to [Aspiration-based designs] 1. Informal introduction by Jobst Heitzig 8mo ago