This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
is fundraising!
Tags
LW
$
Login
Language Models
•
Applied to
Exploring the petertodd / Leilan duality in GPT-2 and GPT-J
by
mwatkins
3d
ago
•
Applied to
What o3 Becomes by 2028
by
Vladimir_Nesov
4d
ago
•
Applied to
A short critique of Omohundro's "Basic AI Drives"
by
Soumyadeep Bose
7d
ago
•
Applied to
Densing Law of LLMs
by
Bogdan Ionut Cirstea
18d
ago
•
Applied to
Are SAE features from the Base Model still meaningful to LLaVA?
by
Shan23Chen
21d
ago
•
Applied to
Are SAE features from the Base Model still meaningful to LLaVA?
by
Shan23Chen
21d
ago
•
Applied to
The Polite Coup
by
Charlie Sanders
22d
ago
•
Applied to
Intricacies of Feature Geometry in Large Language Models
by
7vik
23d
ago
•
Applied to
Two interviews with the founder of DeepSeek
by
Cosmia_Nebula
1mo
ago
•
Applied to
Depression and Creativity
by
Bill Benzon
1mo
ago
•
Applied to
I, Token
by
Ivan Vendrov
1mo
ago
•
Applied to
Chat Bankman-Fried: an Exploration of LLM Alignment in Finance
by
claudia.biancotti
1mo
ago
•
Applied to
Why is Gemini telling the user to die?
by
Burny
1mo
ago
•
Applied to
Which AI Safety Benchmark Do We Need Most in 2025?
by
Loïc Cabannes
1mo
ago
•
Applied to
Sparks of Consciousness
by
Charlie Sanders
1mo
ago
•
Applied to
LLMs Look Increasingly Like General Reasoners
by
eggsyntax
2mo
ago
•
Applied to
Analyzing how SAE features evolve across a forward pass
by
bensenberner
2mo
ago
•
Applied to
SAEs are highly dataset dependent: a case study on the refusal direction
by
Connor Kissane
2mo
ago
•
Applied to
Current safety training techniques do not fully transfer to the agent setting
by
Simon Lermen
2mo
ago
•
Applied to
GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
by
ChengCheng
2mo
ago