LESSWRONG
is fundraising!
Tags
LW
$

MATS Program

•

Applied to Gradient Routing: Masking Gradients to Localize Computation in Neural Networks by Ryan Kidd 3d ago

•

Applied to Intricacies of Feature Geometry in Large Language Models by 7vik 19d ago

•

Applied to Debating with More Persuasive LLMs Leads to More Truthful Answers by Ryan Kidd 2mo ago

•

Applied to Automating LLM Auditing with Developmental Interpretability by DanielFilan 2mo ago

•

Applied to SAE Probing: What is it good for? Absolutely something! by Subhash Kantamneni 2mo ago

•

Applied to Bridging the VLM and mech interp communities for multimodal interpretability by Sonia Joseph 2mo ago

•

Applied to The slingshot helps with learning by Wilson Wu 2mo ago

•

Applied to Improving Model-Written Evals for AI Safety Benchmarking by Sunishchal Dev 2mo ago

•

Applied to On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback by Marcus Williams 2mo ago

•

Applied to Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution by Kola Ayonrinde 2mo ago

•

Applied to [Job Ad] MATS is hiring! by Jana 2mo ago

•

Applied to MATS AI Safety Strategy Curriculum v2 by Ryan Kidd 2mo ago

•

Applied to Domain-specific SAEs by jacob_drori 2mo ago

•

Applied to [Interim research report] Evaluating the Goal-Directedness of Language Models by Rauno Arike 3mo ago

•

Applied to MATS Alumni Impact Analysis by Ryan Kidd 3mo ago

•

Applied to The Geometry of Feelings and Nonsense in Large Language Models by Ryan Kidd 3mo ago

•

Applied to Apply to MATS 7.0! by Ryan Kidd 3mo ago

•

Applied to Calendar feature geometry in GPT-2 layer 8 residual stream SAEs by Ryan Kidd 3mo ago

•

Applied to Showing SAE Latents Are Not Atomic Using Meta-SAEs by Ryan Kidd 3mo ago

•

Applied to Experiments with an alternative method to promote sparsity in sparse autoencoders by Ryan Kidd 4mo ago