johnswentworth

Long before modern ML, it was not difficult to write a program which would generate programs, test those programs on some task, and make random changes to the programs in an attempt to find programs which perform better on the task.

It was also not difficult to apply such a program to the task of self-improvement, i.e. score how quickly it finds new-and-better programs on a basket of object-level tasks (including the self-improvement task itself).

All that would have been straightforward in the 1970s, if one had access to a machine to run it all. By the 1990s, a bright CS undergrad could do it on their home computer. The loop you're talking about was easy to code long before modern ML.

Replying toOK, what's the difference between coherence and representation theorems?

johnswentworth6d

OK, what's the difference between coherence and representation theorems?

So, I take it that Savage's theorem is a representation theorem under your schema?

Yes. Arguably it is also a coherence theorem, the two are not mutually exclusive, but it's more unambiguously a representation theorem.

Theoretically or practically? I.e. you can't derive an exploitability result easily from a parto suboptimality?

Practically. Consider e.g. applying coherence tools to an e coli. That thing is not capable of signing arbitrary contracts or meaningfully choosing between arbitrary trades, and insofar as it's wasting resources those resources likely end up e.g. as waste heat. For another agent to "eat up" the wasted resources, it would need to e.g. restructure the e coli's metabolic pathways; it's not really something which can happen via things-which-look-like-trading-with-an-agent.

Replying toOK, what's the difference between coherence and representation theorems?

johnswentworthFeb 10, 2026

OK, what's the difference between coherence and representation theorems?

I don't claim that this is canonical, but here's the way David and I use the terms amongst ourselves.

First, we don't usually use the term "representation theorem" at all, but if we did, that would naturally refer to a theorem saying that some preferences/behavior/etc can be represented in a particular way, like e.g. expected utility maximization over some particular states/actions/whatever. We would probably classify e.g. VNM as a representation theorem, though we basically-never think about VNM at all so we don't really need a term for it.

Second, coherence. When we talk about coherence theorems, we usually don't think about exploitability, but rather about pareto suboptimality. (Of course exploitability is a special case... (read more)

Replying toExamples of Highly Counterfactual Discoveries?

johnswentworth11d

Examples of Highly Counterfactual Discoveries?

In this case, I'm mildly skeptical, because probability before Laplace bore a lot less resemblance to today's probability IIUC (though I have not personally read source texts, so don't update too hard on my understanding). Bayes did discover the theorem, but I don't know if he conceptually thought of it like we do today or would have used it like we do today; I view that as largely coming from Laplace. On the flip side, that means Laplace' work on probability theory was maybe highly counterfactual.

Replying toThe Median Researcher Problem

johnswentworth13d

The Median Researcher Problem

Dude, the first sentence of the post says "roughly median researchers"; note the "roughly". Researchers at the upper 20th percentile in a field are roughly median researchers; in a field where e.g. the median researcher doesn't get stats 101, the 20th percentile researcher also probably does not understand stats 101.

Replying toThe Weirdness of Dating/Mating: Deep Nonconsent Preference

johnswentworth13d

The Weirdness of Dating/Mating: Deep Nonconsent Preference

I would buy that if I actually observed women who are interested in me orchestrating situations in which I find myself alone with them. That is not what I observe in most cases (excluding the women who explicitly ask me out, which is technically "orchestrating a situation in which I find myself alone with her", but presumably not what you meant). Yes, there's a common narrative about how women create affordances for the men to make moves, but that sure is not what I see actually happen.

My clearest data on this comes from slutcon, because I got explicit data (afterward) telling me that a whole bunch of women were interested. Two of them explicitly asked me out. Zero of them orchestrated a situation in which we were alone together, or anything else along those lines.

It has ever happened that an interested woman orchestrated such a situation with me, but it sure does not seem to be the typical case.

Replying toOn green

johnswentworth1moReview for 2024 Review

On green

"On Green" is one of the LessWrong essays which I most often refer to in my own thoughts (along with "Deep Atheism", which I think of as a partner to "On Green"). Many essays I absorb into my thinking, metabolize the contents, but don't often think of the essay itself. But this one is different. The essay itself has stuck in my head as the canonical pointer to... well, Green, or whatever one wants to call the thing Joe is gesturing at.

When I first read the essay, my main thought was "Wow, Joe did a really good job pointing to a thing that I do not like. Screw Green. And thank you... (read 750 more words →)

-16

•••

Replying toMy AI Model Delta Compared To Yudkowsky

johnswentworth1mo

My AI Model Delta Compared To Yudkowsky

I'd have liked if @johnswentworth has responded to Eliezer's comment at more length, though maybe he did and I missed? Still, giving this post a 4 in the review, same as the other.

Done.

Replying toMy AI Model Delta Compared To Yudkowsky

johnswentworth1mo*

My AI Model Delta Compared To Yudkowsky

It's now the 2024 year-in-review, and Ruby expressed interest in seeing me substantively respond to this comment, so here I am responding a year and a half later.

The concept of "easy to understand" seems like a good example, so let's start there. My immediate reaction to that example is "yeah duh of course an AI won't use that concept internally in a human-like way", followed by "wait what the heck is Eliezer picturing such that that would even be relevant?", followed by "oh... probably he's used to noobs anthropomorphizing AI all the time". There is an argument to be made that "easy for a human to understand" is ontologically natural in our... (read 502 more words →)

Replying toWhy Motivated Reasoning?

johnswentworth1mo

Why Motivated Reasoning?

LLMs mimic human text. That is the first and primary thing they are optimized for. Humans motivatedly reason, which shows up in their text. So, LLMs trained to mimic human text will also mimic motivated reasoning, insofar as they are good at mimicking human text. This seems like the clear default thing one would expect from LLMs; it does not require hypothesizing anything about motivated reasoning being adaptive.

Why Motivated Reasoning?

johnswentworth

1mo

There’s a standard story which says roughly "motivated reasoning in humans exists because it is/was adaptive for negotiating with other humans". I do not think that story stands up well under examination; when I think of standard day-to-day examples of motivated reasoning, that pattern sounds like a plausible generator for some-but-a-lot-less-than-all of them.

Examples

Suppose it's 10 pm and I've been playing Civ all evening. I know that I should get ready for bed now-ish. But... y'know, this turn isn't a very natural stopping point. And it's not that bad if I go to bed half an hour late, right? Etc. Obvious motivated reasoning. But man, that motivated reasoning sure does not seem very... (read 1372 more words →)

Maybe twice a year I go looking for this comment and can't find it, so I'm copying it into shortform:

Oh, I can just give you a class of nontrivial predictions of expected utility theory. I have not seen any empirical results on whether these actually hold, so consider them advance predictions.
So, a bacteria needs a handful of different metabolic resources - most obviously energy (i.e. ATP), but also amino acids, membrane lipids, etc. And often bacteria can produce some metabolic resources via multiple different paths, including cyclical paths - e.g. it's useful to be able to turn A into B but also B into A, because sometimes the environment will have lots

... (read 427 more words →)

The Weirdness of Dating/Mating: Deep Nonconsent Preference

johnswentworth

1mo

Every time I see someone mention statistics on nonconsent kink online, someone else is surprised by how common it is. So let’s start with some statistics from Lehmiller^[1]: roughly two thirds of women and half of men have some fantasy of being raped. A lot of these are more of a rapeplay fantasy than an actual rape fantasy, but for purposes of this post we don’t need to get into those particular weeds. The important point is: the appeal of nonconsent is the baseline, not the exception, especially for women.

But this post isn’t really about rape fantasies. I claim that the preference for nonconsent typically runs a lot deeper than a sex... (read 1721 more words →)

The Plan - 2025 Update

johnswentworth

johnswentworth, David Lorell

2mo

What’s “The Plan”?

For several years now, around the end of the year, I (John) write a post on our plan for AI alignment. That plan hasn’t changed too much over the past few years, so both this year’s post and last year’s are written as updates to The Plan - 2023 Version.

I’ll give a very quick outline here of what’s in the 2023 Plan post. If you have questions or want to argue about points, you should probably go to that post to get the full version.

What is The Plan for AI alignment? Briefly: Sort out our fundamental confusions about agency and abstraction enough to do interpretability that works and generalizes robustly. Then, look

... (read 1990 more words →)

Conditional On Long-Range Signal, Ising Still Factors Locally

johnswentworth

johnswentworth, David Lorell

2mo

Background: The Ising Model

The Ising Model is a classic toy model of magnets. We imagine a big 2D or 3D grid, representing a crystal lattice. At each grid vertex $i$ , there’s a little magnetic atom with state $σ_{i}$ , which can point either up ( $σ_{i} = + 1$ ) or down ( $σ_{i} = - 1$ ). When two adjacent atoms point the same direction, their joint energy is lower than when they point different directions; atoms further apart don’t directly interact. So we write the energy function (aka Hamiltonian) as $H (σ) = - J \sum_{i adjacent to j} σ_{i} σ_{j}$ for some constant $J$ ; two adjacent atoms with the same direction contribute $- J$ energy, while two adjacent atoms with different directions contribute $+ J$ energy.

Read: Blue -> -1 or "cold" or "down" , Red -> +1 or "hot" or

... (read 1554 more words →)

November Retrospective

johnswentworth

3mo

Throughout November, I’ve been keeping up with the Inkhaven mandate to write and post a blogpost, of at least 500 words, every day. It’s the last day of November, so how’d that go?

First and foremost: most of my blogposts from this month are pretty mediocre, by my own standards. Not necessarily bad, plausibly worthwhile, but I am not particularly impressed by them.

Largely, that’s because (unlike the Inkhaven program proper) I did not set aside the entire month for post-writing. I worked most of the month, took a vacation in the last week and a half, and none of that time was primarily focused on writing. The large majority of posts were low-effort in... (read 352 more words →)

•••

The Moonrise Problem

johnswentworth

3mo

On October 5, 1960, the American Ballistic Missile Early-Warning System station at Thule, Greenland, indicated a large contingent of Soviet missiles headed towards the United States. Fortunately, common sense prevailed at the informal threat-assessment conference that was immediately convened: international tensions weren't particularly high at the time. The system had only recently been installed. Kruschev was in New York, and all in all a massive Soviet attack seemed very unlikely. As a result no devastating counter-attack was launched. What was the problem? The moon had risen, and was reflecting radar signals back to earth. Needless to say, this lunar reflection hadn't been predicted by the system's designers.
Over the last ten years, the

... (read 2149 more words →)

•••

Not A Love Letter, But A Thank You Letter

johnswentworth

3mo

Context: Each day on her blog Letters To Boys, Gretta Duleba has been posting things she once sent to men she dated. One of those, recently, was a love letter to me, which she “sent” me by posting it on the blog.

With a setup like that, it would be an absolute waste not to respond in kind.

Gretta,

I don’t love you in the usual sense of the word, or in the sense you defined at the beginning of your letter. Nor do I feel limerence or a crush toward you. But I feel something toward you, something subtle but strong, something which wants to be expressed, and I know no better words for it yet. So I... (read 720 more words →)

•••

You Are Much More Salient To Yourself Than To Everyone Else

johnswentworth

3mo

Back in the Boy Scouts, at summer camp, myself and a couple friends snuck out one night after curfew to commandeer a couch someone had left by a dumpster at the other end of the camp (maybe a half kilometer away).

Now, our particular designated adult was a very stick-to-the-rules type, so we definitely did not want to get caught. I, therefore, made my way slowly and sneakily. The summer camp was in the woods, so I’d keep myself behind trees and bushes and out of the light anytime someone went by. At one point I literally hunkered down in a ditch, staying below the line of the headlights of a passing car.... (read 386 more words →)

114

•••

Courtship Confusions Post-Slutcon

johnswentworth

3mo

Going into slutcon, one of my main known-unknowns was… I’d heard many times that the standard path to hooking up or dating starts with two people bantering for an hour or two at a party, lacing in increasingly-unsubtle hints of interest. And even in my own imagination, I was completely unable to make up a plausible-sounding conversation which would have that effect. Certainly it had never played out that way for me, nor had I personally witnessed it play out for anyone else.

So at slutcon itself, I volunteered for a session in which a few guys would be paired up with women and flirt on stage, with the session moderator giving live feedback... (read 1188 more words →)

•••

Snippets on Living In Reality

johnswentworth

3mo

Social reality is quite literally another world, in the same sense that the Harry Potter universe is another world. Like the Harry Potter universe, social reality is a world portrayed primarily in text and in speech and in our imaginations. Like the Harry Potter universe, social reality doesn’t diverge completely from physical reality - they contain mostly the same cities, for instance. Like the Harry Potter universe, social reality matches physical reality “by default” wherever there’s no particular pressure pushing against that match - after all, it’s just easier to make up fewer details rather than more. But like the Harry Potter universe, social reality does diverge in many places, and then... (read 856 more words →)

One of the classic conceptual problems with a Solomonoff-style approach to probability, information, and stat mech is "Which Turing machine?". The choice of Turing machine is analogous to the choice of prior in Bayesian probability. While universality means that any two Turing machines give roughly the same answers in the limit of large data (unlike two priors in Bayesian probability, where there is no universality assumption/guarantee), they can be arbitrarily different before then.

My usual answer to this problem is "well, ultimately this is all supposed to tell us things about real computational systems, so pick something which isn't too unreasonable or complex for a real system".

But lately I've been looking at Aram... (read more)

One thing we've been working on lately is finding natural latents in real datasets. Looking for natural latents between pairs of variables with only a few values each is relatively easy in practice with the math we have at this point. But that doesn't turn up much in excel-style datasets, and one wouldn't particularly expect it to turn up much in such datasets. Intuitively, it seems like more "distributed" latents are more practically relevant for typical excel-style datasets - i.e. latents for which many different observables each yield some weak information.

Here's one operationalization, which runs into some cute math/numerical algorithm problems for which I have a working solution but not a very... (read 600 more words →)

About a month ago, after some back-and-forth with several people about their experiences (including on lesswrong), I hypothesized that I don't feel the emotions signalled by oxytocin, and never have. (I do feel some adjacent things, like empathy and a sense of responsibility for others, but I don't get the feeling of loving connection which usually comes alongside those.)

Naturally I set out to test that hypothesis. This note is an in-progress overview of what I've found so far and how I'm thinking about it, written largely to collect my thoughts and to see if anyone catches something I've missed.

Under the hypothesis, this has been a life-long thing for me, so the obvious... (read 533 more words →)

Just got my whole genome sequenced. A thing which I could have figured out in advance but only realized once the results came back: if getting a whole genome sequence, it's high value to also get your parents' genomes sequenced.

Here's why.

Suppose I have two unusual variants at two different positions (not very close together) within the same gene. So, there's a variant at location A, and a variant at location B. But (typically) I have two copies of each gene, one from each parent. So, I might have the A and B variants both on the same copy, and the other copy could be normal. OR, I could have the A variant... (read more)

Does The Information-Throughput-Maximizing Input Distribution To A Sparsely-Connected Channel Satisfy An Undirected Graphical Model?

[EDIT: Never mind, proved it.]

Suppose I have an information channel $X \to Y$ . The X components $X_{1}, . . ., X_{m}$ and the Y components $Y_{1}, . . ., Y_{n}$ are sparsely connected, i.e. the typical $Y_{i}$ is downstream of only a few parent X-components $X_{p a (i)}$ . (Mathematically, that means the channel factors as $P [Y | X] = \prod_{i} P [Y_{i} | X_{p a (i)}]$ .)

Now, suppose I split the Y components into two sets, and hold constant any X-components which are upstream of components in both sets. Conditional on those (relatively few) X-components, our channel splits into two independent channels.

E.g. in the image above, if I hold $X_{4}$ constant, then I have two independent channels: $(X_{1}, X_{2}, X_{3}) \to (Y_{1}, Y_{2}, Y_{3}, Y_{4})$ and $(X_{5}, X_{6}, X_{7}) \to (Y_{5}, Y_{6}, Y_{7}, Y_{8})$ .

Now, the information-throughput-maximizing input distribution to a pair of independent channels is just the product of the... (read more)

Continuing the "John asks embarrassing questions about how social reality actually works" series...

I’ve always heard (and seen in TV and movies) that bars and clubs are supposed to be a major place where single people pair up romantically/sexually. Yet in my admittedly-limited experience of actual bars and clubs, I basically never see such matching?

I’m not sure what’s up with this. Is there only a tiny fraction of bars and clubs where the matching happens? If so, how do people identify them? Am I just really, incredibly oblivious? Are bars and clubs just rare matching mechanisms in the Bay Area specifically? What’s going on here?

Question I'd like to hear peoples' takes on: what are some things which are about the same amount of fun for you as (a) a median casual conversation (e.g. at a party), or (b) a top-10% casual conversation, or (c) the most fun conversations you've ever had? In all cases I'm asking about how fun the conversation itself was, not about value which was downstream of the conversation (like e.g. a conversation with someone who later funded your work).

For instance, for me, a median conversation is about as fun as watching a mediocre video on youtube or reading a mediocre blogpost. A top-10% conversation is about as fun as watching a generic-but-fun... (read 431 more words →)

Here's a place where I feel like my models of romantic relationships are missing something, and I'd be interested to hear peoples' takes on what it might be.

Background claim: a majority of long-term monogamous, hetero relationships are sexually unsatisfying for the man after a decade or so. Evidence: Aella's data here and here are the most legible sources I have on hand; they tell a pretty clear story where sexual satisfaction is basically binary, and a bit more than half of men are unsatisfied in relationships of 10 years (and it keeps getting worse from there). This also fits with my general models of mating markets: women usually find the large majority... (read 377 more words →)

What if physics equations were written like statically-typed programming languages?

$(\frac{m a s s \cdot l e n g t h}{t i m e^{2}} : F) = (\frac{m a s s}{-} : m) (\frac{l e n g t h}{t i m e^{2}} : a)$

$(\frac{m a s s}{l e n g t h \cdot t i m e^{2}} : P) (\frac{l e n g t h^{3}}{-} : V) = (\frac{-}{-} : N) (\frac{m a s s \cdot l e n g t h^{2}}{t i m e^{2} \cdot t e m p} : R) (\frac{t e m p}{-} : T)$

Making Vaccine

Orienting Toward Wizard Power

Being the (Pareto) Best in the World

How To Write Quickly While Maintaining Epistemic Rigor

johnswentworth

Why Motivated Reasoning?

The Weirdness of Dating/Mating: Deep Nonconsent Preference

The Plan - 2025 Update

Conditional On Long-Range Signal, Ising Still Factors Locally

November Retrospective

The Moonrise Problem

Not A Love Letter, But A Thank You Letter

From Atoms To Agents

"Why Not Just..."

Basic Foundations for Agent Models

Framing Practicum

Gears Which Turn The World

Abstraction 2020

johnswentworth

Making Vaccine

Orienting Toward Wizard Power

Being the (Pareto) Best in the World

How To Write Quickly While Maintaining Epistemic Rigor

johnswentworth

Why Motivated Reasoning?

The Weirdness of Dating/Mating: Deep Nonconsent Preference

The Plan - 2025 Update

Conditional On Long-Range Signal, Ising Still Factors Locally

November Retrospective

The Moonrise Problem

Not A Love Letter, But A Thank You Letter

From Atoms To Agents

"Why Not Just..."

Basic Foundations for Agent Models

Framing Practicum

Gears Which Turn The World

Abstraction 2020

Examples

What’s “The Plan”?

Background: The Ising Model

Does The Information-Throughput-Maximizing Input Distribution To A Sparsely-Connected Channel Satisfy An Undirected Graphical Model?