This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
is fundraising!
Tags
LW
$
Login
Redwood Research
•
Applied to
Measuring whether AIs can statelessly strategize to subvert security measures
by
Alex Mallen
7d
ago
•
Applied to
Benchmarks for Detecting Measurement Tampering [Redwood Research]
by
Magdalena Wache
1y
ago
•
Applied to
LLMs are (mostly) not helped by filler tokens
by
Kshitij Sachan
1y
ago
•
Applied to
Critiques of prominent AI safety labs: Redwood Research
by
Omega.
2y
ago
•
Applied to
[Linkpost] Critiques of Redwood Research
by
Akash
2y
ago
•
Applied to
Some common confusion about induction heads
by
Ruby
2y
ago
•
Applied to
Practical Pitfalls of Causal Scrubbing
by
Jérémy Scheurer
2y
ago
•
Applied to
Causal scrubbing: Appendix
by
jenny
2y
ago
•
Applied to
Causal scrubbing: results on induction heads
by
jenny
2y
ago
•
Applied to
Causal scrubbing: results on a paren balance checker
by
jenny
2y
ago
•
Applied to
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
by
jenny
2y
ago
•
Applied to
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
by
Alexandre Variengien
2y
ago
•
Applied to
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
by
Rudi C
2y
ago
•
Applied to
Help out Redwood Research’s interpretability team by finding heuristics implemented by GPT-2 small
by
Multicore
2y
ago
•
Applied to
Takeaways from our robust injury classifier project [Redwood Research]
by
Ruby
2y
ago
•
Applied to
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
by
DanielFilan
2y
ago
•
Applied to
High-stakes alignment via adversarial training [Redwood Research report]
by
Multicore
3y
ago
•
Applied to
Redwood Research is hiring for several roles (Operations and Technical)
by
Jessica W
3y
ago
•
Applied to
Redwood's Technique-Focused Epistemic Strategy
by
Ruby
3y
ago
•
Applied to
Redwood Research is hiring for several roles
by
Multicore
3y
ago