faul_sname — LessWrong

Even "illegible" Mythos reasoning traces seem pretty legible

The Claude Fable 5/Mythos 5 System Card has a section in which they talk about illegible reasoning, and provide an "extreme" example thereof. Models developing their own uninterpretable, unmonitorable internal language has been a major theoretical concern for a while, and when o3 was released last year with its disclaim...

Jun 10155

Introducing LIMBO: Managing the Simulation to Maintain Optimal P(DOOM)

We are excited to publicly introduce the Laboratory for Importance-sampled Measure and Bayesian Observation (LIMBO), a small research group working at the intersection of cosmological theory, probability, and existential risk. We believe that the mechanisms by which observers continue to exist in the universe are important, neglected, and tractable to...

Apr 167

Many can write faster asm than the compiler, yet don't. Why?

There's a take I've seen going around, which goes approximately like this: > It used to be the case that you had to write assembly to make computers do things, but then compilers came along. Now we have optimizing compilers, and those optimizing compilers can write assembly better than pretty...

Dec 30, 202577

Humans Are Spiky (In an LLM World)

Assessments of "general" vs "spiky" capability profiles are secretly assessments of "matches existing infrastructure" vs "doesn't". Human societies contain human-shaped roles because humans were the only available workers for most of history. Packaging tasks into human-sized, human-shaped jobs was efficient. Given LLMs, the obvious thing to do is to try...

Oct 15, 202539

How load-bearing is KL divergence from a known-good base model in modern RL?

Motivation One major risk from powerful optimizers is that they can find "unexpected" solutions to the objective function, which score very well on the objective function but are not what the human designer intended. The canonical example is > Suppose your aged mother is trapped in a burning building, and...

May 22, 202522

Is AlphaGo actually a consequentialist utility maximizer?

TL;DR: does stapling an adaptation executor to a consequentialist utility maximizer result in higher utility outcomes in the general case, or is AlphaGo just weird? So I was reading the AlphaGo paper recently, as one does. I noticed that architecturally, AlphaGo has 1. A value network: "Given a board state,...

Dec 7, 202335

faul_sname's Shortform

Dec 3, 20237