This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Wikitags
LW
Login
Sparse Autoencoders (SAEs)
Settings
Applied to
Proof-of-Concept Debugger for a Small LLM
by
StefanHex
21h
ago
Applied to
Topological Data Analysis and Mechanistic Interpretability
by
Jakob Hansen
17d
ago
Applied to
Takeaways From Our Recent Work on SAE Probing
by
Josh Engels
17d
ago
Applied to
SAE Training Dataset Influence in Feature Matching and a Hypothesis on Position Features
by
Seonglae Cho
23d
ago
Applied to
[PAPER] Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
by
Lucy Farnik
23d
ago
Applied to
Deep sparse autoencoders yield interpretable features too
by
Armaan A. Abraham
26d
ago
Applied to
Sparse Autoencoder Features for Classifications and Transferability
by
Shan23Chen
1mo
ago
Applied to
Comparing the effectiveness of top-down and bottom-up activation steering for bypassing refusal on harmful prompts
by
RobertM
1mo
ago
Applied to
Cross-Layer Feature Alignment and Steering in Large Language Model
by
dlaptev
1mo
ago
Applied to
SAE regularization produces more interpretable models
by
Logan Riggs
2mo
ago
Applied to
Empirical Insights into Feature Geometry in Sparse Autoencoders
by
Jason Boxi Zhang
2mo
ago
Applied to
Finding Features Causally Upstream of Refusal
by
Daniel Lee
2mo
ago
Applied to
Scaling Sparse Feature Circuit Finding to Gemma 9B
by
Diego Caples
2mo
ago
Applied to
Broken Latents: Studying SAEs and Feature Co-occurrence in Toy Models
by
chanind
3mo
ago
Applied to
Are Sparse Autoencoders a good idea for AI control?
by
Gerard Boxo
3mo
ago
Applied to
Learning Multi-Level Features with Matryoshka SAEs
by
Bart Bussmann
3mo
ago
Applied to
Compositionality and Ambiguity: Latent Co-occurrence and Interpretable Subspaces
by
Matthew A. Clarke
3mo
ago
Applied to
Matryoshka Sparse Autoencoders
by
Noa Nabeshima
3mo
ago
Applied to
Measuring Nonlinear Feature Interactions in Sparse Crosscoders [Project Proposal]
by
Jason Gross
3mo
ago