LESSWRONG
LW

Charlie Steiner — LessWrong

I (well, mostly claude code) simulated proportional representation methods.

17d

Low-ish effort post just sharing something I found fun. No AI-written text outside the figures.

I was recently nerd-sniped by proportional representation voting, and so when playing around with claude code I decided to have it build a simulation.

Hot take:

If you're electing a legislature and want it to be proportional, use approval ballots and seq-PAV^[1].

Other key points:

There's a tradeoff between how well you represent people on average, and how much inequality there is in how well you represent people.
- If you disproportionately cluster winning candidates near the center of the distribution of voter preferences, this does pretty well on average, but is more unequal, vice versa for spreading winning candidates apart from each other.
Voting

... (read 916 more words →)

Low-effort review of "AI For Humanity"

Charlie Steiner

Stumbled across a book in the new section of the library: "AI For Humanity," by Andeed Ma, James Ong (founder of the think tank AIII, which is also the sound I make when thinking about AI risk), and Siok Siok Tan. It's a mass-market-ish book about, well, AI for humanity, by a sampler of Singapore-centered technologists.

Chapter-by-chapter summary:

Different people have different opinions along both axes of "AI Impactful?" and "AI Risky?" We'll try to take them all seriously. Here's the example of Geoff Hinton talking about AI risk.
The nearest/most important 'AI Trap' is misinformation and general destruction of the epistemic environment.
Covers some sort of history of AI, with an emphasis on how vague

... (read 951 more words →)

Rabin's Paradox

Charlie Steiner

Quick psychology experiment

Right now, if I offered you a bet^[1] that was a fair coin flip, on tails you give me $100, heads I give you $110, would you take it?

Got an answer? Good.

Hover over the spoiler to see what other people think:

About 90% of undergrads will reject this bet^[2].

Second part now, if I offered you a bet that was a fair coin flip, on tails you give me $1000, on heads I give you $1,000,000,000, would you take it?

Got an answer?

Hover over the spoiler to reveal Rabin's paradox^[3]:

If you rejected the first bet and accepted the second bet, just that^[4] is enough to rule you out from having any^[5] utility function consistent with your

... (read 745 more words →)

•••

Humans aren't fleeb.

Charlie Steiner

In the oceans of the planet Water, a species of intelligent squid-like aliens - we'll just call them the People - debate about what it means to be fleeb.

Fleeb is a property of great interest to the People, or at least they think so, but they also have a lot of trouble defining it. They're fleeb when they're awake, but less fleeb or maybe not fleeb at all when they're asleep. Some animals that act clever are probably somewhat fleeb, and other animals that are stupid and predictable probably aren't fleeb.

But fleeb isn't just problem-solving ability, because philosophers of the People have written of hypothetical alien lifeforms that could be good at... (read 554 more words →)

Neural uncertainty estimation review article (for alignment)

Charlie Steiner

EDIT 1/27: This post neglects the entire sub-field of estimating uncertainty of learned representations, as in https://openreview.net/pdf?id=e9n4JjkmXZ. I might give that a separate follow-up post.

Introduction

Suppose you've built some AI model of human values. You input a situation, and it spits out a goodness rating. You might want to ask: "What are the error bars on this goodness rating?" In addition to it just being nice to know error bars, an uncertainty estimate can also be useful inside the AI: guiding active learning^[1], correcting for the optimizer's curse^[2], or doing out-of-distribution detection^[3].

I recently got into the uncertainty estimation literature for neural networks (NNs) for a pet reason: I think it would be useful... (read 3108 more words →)

How to solve deception and still fail.

Charlie Steiner

A mostly finished post I'm kicking out the door. You'll get the gist.

There's a tempting picture of alignment that centers on the feeling of "As long as humans stay in control, it will be okay." Humans staying in control, in this picture, is something like humans giving lots of detailed feedback to powerful AI, staying honestly apprised of the consequences of its plans, and having the final say on how plans made by an AI get implemented.^[1]

Of course, this requires the AI to be generating promising plans to begin with, or else the humans are just stuck rejecting bad plans all day. But conveniently, in this picture we don't have to solve... (read 1667 more words →)

Two Hot Takes about Quine

Charlie Steiner

I read Quine's Word and Object on vacation last week. Overall it was fine, but there were two things that might be worth quick mentions.

Quine, Supervised Learning Supremacist

One important facet of the book is Quine's picture of how humans learn language. It's not quite that Quine is a behaviorist, though the influence is overt. Fortunately I live in 2023 when we have the abstractions of supervised and unsupervised learning, because that's a much better-fitting category to stick Quine's picture in than "behaviorism."

Quine's picture of how humans learned things was all about supervised learning. The baby learns to say "mama" because the parents reward speaking behavior similar to it. Sure, there's some unsupervised... (read 353 more words →)

Some background for reasoning about dual-use alignment research

Charlie Steiner

This is pretty basic. But I still made a bunch of mistakes when writing this, so maybe it's worth writing. This is background to a specific case I'll put in the next post.

It's like a a tech tree

If we're looking at the big picture, then whether some piece of research is net positive or net negative isn't an inherent property of that research; it depends on how that research is situated in the research ecosystem that will eventually develop superintelligent AI.

A tech tree of many connected nodes, with good or bad outcomes at the end of the tree. — A tech tree, with progress going left to right. Blue research is academic, green makes you money, red is a bad ending, yellow is a good ending. Stronger connections are more important

... (read 2551 more words →)

127

[Simulators seminar sequence] #2 Semiotic physics - revamped

Jan

Jan, Charlie Steiner, Logan Riggs, janus, jacquesthibs, metasemi, Michael Oesterle, Lucas Teixeira, peligrietzer, remember

Update February 21st: After the initial publication of this article (January 3rd) we received a lot of feedback and several people pointed out that propositions 1 and 2 were incorrect as stated. That was unfortunate as it distracted from the broader arguments in the article and I (Jan K) take full responsibility for that. In this updated version of the post I have improved the propositions and added a proof for proposition 2. Please continue to point out weaknesses in the argument; that is a major motivation for why we share these fragments.

For comments and clarifications on the conceptual and philosophical aspects of this article, please read metasemi's excellent follow-up note here.

Meta:... (read 3871 more words →)

Shard theory alignment has important, often-overlooked free parameters.

Charlie Steiner

A delayed hot take. This is pretty similar to previous comments from Rohin.

Shard theory alignment requires magic - not in the sense of magic spells, but in the technical sense of steps we need to remind ourselves we don't know how to do. Locating magic is an important step in trying to demystify it.

"Shard theory alignment" means building an AI that does good things and not bad things by encouraging an RL agent to want to do good things, via kinds of reward shaping analogous to the diamond maximizer example.

How might the story go?

You start out with some unsupervised model of sensory data.
On top of its representation of the world you start

... (read 718 more words →)

LESSWRONG
LW

LESSWRONG
LW

Charlie Steiner

Some background for reasoning about dual-use alignment research

Neural uncertainty estimation review article (for alignment)

How to turn money into AI safety?

The Presumptuous Philosopher, self-locating information, and Solomonoff induction

Charlie Steiner

Charlie Steiner

I (well, mostly claude code) simulated proportional representation methods.

Low-effort review of "AI For Humanity"

Rabin's Paradox

Humans aren't fleeb.

Neural uncertainty estimation review article (for alignment)

How to solve deception and still fail.

Two Hot Takes about Quine

Alignment Hot Take Advent Calendar

Reducing Goodhart

Philosophy Corner

Charlie Steiner

Some background for reasoning about dual-use alignment research

Neural uncertainty estimation review article (for alignment)

How to turn money into AI safety?

The Presumptuous Philosopher, self-locating information, and Solomonoff induction

Charlie Steiner

Charlie Steiner

I (well, mostly claude code) simulated proportional representation methods.

Low-effort review of "AI For Humanity"

Rabin's Paradox

Humans aren't fleeb.

Neural uncertainty estimation review article (for alignment)

How to solve deception and still fail.

Two Hot Takes about Quine

Alignment Hot Take Advent Calendar

Reducing Goodhart

Philosophy Corner

Quick psychology experiment

Introduction

Quine, Supervised Learning Supremacist

It's like a a tech tree