Vanessa and diffractor introduce a new approach to epistemology / decision theory / reinforcement learning theory called Infra-Bayesianism, which aims to solve issues with prior misspecification and non-realizability that plague traditional Bayesianism.

Customize

Quick Takes

Daniel Kokotajlo1d*726

Epistemic status: Probably a terrible idea, but fun to think about, so I'm writing my thoughts down as I go. Here's a whimsical simple AGI governance proposal: "Cull the GPUs." I think of it as a baseline that other governance proposals should compare themselves to and beat. The context in which we might need an AGI governance proposal: Suppose the world gets to a point similar to e.g. March 2027 in AI 2027. There are some pretty damn smart, pretty damn autonomous proto-AGIs that can basically fully automate coding, but they are still lacking in some other skills so that they can't completely automate AI R&D yet nor are they full AGI. But they are clearly very impressive and moreover it's generally thought that full AGI is not that far off, it's plausibly just a matter of scaling up and building better training environments and so forth. Suppose further that enough powerful people are concerned about possibilities like AGI takeoff, superintelligence, loss of control, and/or concentration of power, that there's significant political will to Do Something. Should we ban AGI? Should we pause? Should we xlr8 harder to Beat China? Should we sign some sort of international treaty? Should we have an international megaproject to build AGI safely? Many of these options are being seriously considered. Enter the baseline option: Cull the GPUs. The proposal is: The US and China (and possibly other participating nations) send people to fly to all the world's known datacenters and chip production facilities. They surveil the entrances and exits to prevent chips from being smuggled out or in. They then destroy 90% of the existing chips (perhaps in a synchronized way, e.g. once teams are in place in all the datacenters, the US and China say "OK this hour we will destroy 1% each. In three hours if everything has gone according to plan and both sides seem to be complying, we'll destroy another 1%. Etc." Similarly, at the chip production facilities, a committee of representatives

Mikhail Samin8m20

Everyone should do more fun stuff![1] I thought it'd just be very fun to develop a new sense. Remember vibrating belts and ankle bracelets that made you have a sense of the direction of north? (1, 2) I made some LLMs make me an iOS app that does this! Except the sense doesn't go away the moment you stop the app! I am now very good at telling where's north and also much better at knowing where i am and connecting different parts of the territory to each other in my map. Previously, I would often remember my paths as collections of local movements (there, I turn left); now, I generally know where places are relative to one another, and Google Maps feel much more connected to the territory. If you want to try it, it's on TestFlight: https://testflight.apple.com/join/kKKfMuDq It can vibrate when you face north; even better, if you're in headphones, it can give you spatial sounds coming from north; better still, a second before playing a sound coming from north, it can play a non-directional cue sound to make you anticipate the north sound and learn very quickly. None of this interferes with listening to any other kind of audio. probably less relevant to the US, as your roads are in a grid anyway; great for London though. If you know how to make it have more pleasant sounds, or optimize directional sounds (make realistic binaural audio), or make react-native do nice vibrations when the app is in the background instead of bzzzz, and want to help, please do! The source code is on GitHub: https://github.com/mihonarium/sonic-compass/ 1. ^ unless it would take too much time, especially given the short timelines

Cole Wyeth6h91

Since this is mid-late 2025, we seem to be behind the aggressive AI 2027 schedule? The claims here are pretty weak, but if LLMs really don’t boost coding speed, this description still seems to be wrong.

leogao2dΩ194522

i think of the idealized platonic researcher as the person who has chosen ultimate (intellectual) freedom over all else. someone who really cares about some particular thing that nobody else does - maybe because they see the future before anyone else does, or maybe because they just really like understanding everything about ants or abstract mathematical objects or something. in exchange for the ultimate intellectual freedom, they give up vast amounts of money, status, power, etc. one thing that makes me sad is that modern academia is, as far as I can tell, not this. when you opt out of the game of the Economy, in exchange for giving up real money, status, and power, what you get from Academia is another game of money, status, and power, with different rules, and much lower stakes, and also everyone is more petty about everything. at the end of the day, what's even the point of all this? to me, it feels like sacrificing everything for nothing if you eschew money, status, and power, and then just write a terrible irreplicable p-hacked paper that reduces the net amount of human knowledge by adding noise and advances your career so you can do more terrible useless papers. at that point, why not just leave academia and go to industry and do something equally useless for human knowledge but get paid stacks of cash for it? ofc there are people in academia who do good work but it often feels like the incentives force most work to be this kind of horrible slop.

Rauno Arike1d263

Why do frontier labs keep a lot of their safety research unpublished? In Reflections on OpenAI, Calvin French-Owen writes: This makes me wonder: what's the main bottleneck that keeps them from publishing this safety research? Unlike capabilities research, it's possible to publish most of this work without giving away model secrets, as Anthropic has shown. It would also have a positive impact on the public perception of OpenAI, at least in LW-adjacent communities. Is it nevertheless about a fear of leaking information to competitors? Is it about the time cost involved in writing a paper? Something else?

Popular Comments

Recent Discussion

Cornelius Dybdahl2d3615

Critic Contributions Are Logically Irrelevant

Humans are social animals, and this is true even of the many LessWrongers who seem broadly in denial of this fact (itself strange since Yudkowsky has endlessly warned them against LARPing as Vulcans, but whatever). The problem Duncan Sabien was getting at was basically the emotional effects of dealing with smug, snarky critics. Being smug and snarky is a gesture of dominance, and indeed, is motivated by status-seeking (again, despite the opinion of many snarkers who seem to be in denial of this fact). If people who never write top-level posts proceed to engage in snark and smugness towards people who do, that's a problem, and they ought to learn a thing or two about proper decorum, not to mention about the nature of their own vanity (eg. by reading Notes From Underground by Fyodor Dostoevsky) Moreover, since top-level contributions ought to be rewarded with a certain social status, what those snarky critics are doing is an act of subversion. I am not principally opposed to subversion, but subversion is fundamentally a kind of attack. This is why I can understand the "Killing Socrates" perspective, but without approving of it: Socrates was subverting something that genuinely merited subversion. But it is perfectly natural that people who are being attacked by subversives will be quite put off by it. Afaict., the emotional undercurrent to this whole dispute is the salient part, but there is here a kind of intangible taboo against speaking candidly about the emotional undercurrent underlying intellectual arguments.

Cole Wyeth3d5612

Do confident short timelines make sense?

This is a valuable discussion to have, but I believe Tsvi has not raised or focused on the strongest arguments. For context, like Tsvi, I don't understand why people seem to be so confident of short timelines. However (though I did not read everything, and honestly I think this was justified since the conversation eventually seems to cycle and become unproductive) I generally found Abram's arguments more persuasive and I seem to consider short timelines much more plausible than Tsvi does. I agree that "originality" / "creativity" in models is something we want to watch, but I think Tsvi fails to raise to the strongest argument that gets at this: LLMs are really, really bad at agency. Like, when it comes to the general category of "knowing stuff" and even "reasoning stuff out" there can be some argument around whether LLMs have passed through undergrad to grad student level, and whether this is really crystalized or fluid intelligence. But we're interested in ASI here. ASI has to win at the category we might call "doing stuff." Obviously this is a bit of a loose concept, but the situation here is MUCH more clear cut. Claude cannot run a vending machine business without making wildly terrible decisions. A high school student would do a better job than Claude at this, and it's not close. Before that experiment, my best (flawed) example was Pokemon. Last I checked, there is no LLM that has beaten Pokemon end-to-end with fixed scaffolding. Gemini beat it, but the scaffolding was adapted as it played, which is obviously cheating, and as far as I understand it was still ridiculously slow for such a railroaded children's game. And Claude 4 did not even improve at this task significantly beyond Claude 3. In other words, LLMs are below child level at this task. I don't know as much about this, but based on dropping in to a recent RL conference I believe LLMs are also really bad at games like NetHack. I don't think I'm cherry picking here. These seem like reasonable and in fact rather easy test cases for agentic behavior. I expect planning in the real world to be much harder for curse-of-dimensionality reasons. And in fact I am not seeing any robots walking down the street (I know this is partially manufacturing / hardware, and mention this only as a sanity check. As a similar unreliable sanity check, my robotics and automation etf has been a poor investment. Probably someone will explain to me why I'm stupid for even considering these factors, and they will probably be right). Now let's consider the bigger picture. The recent METR report on task length scaling for various tasks overall moved me slightly towards shorter timelines by showing exponential scaling across many domains. However, note that more agentic domains are generaly years behind less agentic domains, and in the case of FSD (which to me seems "most agentic") the scaling is MUCH slower. There is more than one way to interpret these findings, but I think there is a reasonable interpretation which is consistent with my model: the more agency a task requires, the slower LLMs are gaining capability at that task. I haven't done the (underspecified) math, but this seems to very likely cash out to subexponential scaling on agency (which I model as bottlenecked by the first task you totally fall over on). None of this directly gets at AI for AI research. Maybe LLMs will have lots of useful original insights while they are still unable to run a vending machine business. But... I think this type of reasoning: "there exists a positive feedback loop -> singularity" is pretty loose to say the least. LLMs may significantly speed up AI research and this may turn out to just compensate for the death of Moore's law. It's hard to say. It depends how good at research you expect an LLM to get without needing the skills to run a vending machine business. Personally, I weakly suspect that serious research leans on agency to some degree, and is eventually bottlenecked by agency. To be explicit, I want to replace the argument "LLMs don't seem to be good at original thinking" with "There are a priori reasons to doubt that LLMs will succed at original thinking. Also, they are clearly lagging significantly at agency. Plausibly, this implies that they in fact lack some of the core skills needed for serious original thinking. Also, LLMs still do not seem to be doing much original thinking (I would argue still nothing on the level of a research contribution, though admittedly there are now some edge cases), so this hypothesis has at least not been disconfirmed." To me, that seems like a pretty strong reason not to be confident about short timelines. I see people increasingly arguing that agency failures are actually alignment failures. This could be right, but it also could be cope. In fact I am confused about the actual distinction - an LLM with no long-term motivational system lacks both agency and alignment. If it were a pure alignment failure, we would expect LLMs to do agentic-looking stuff, just not what we wanted. Maybe you can view some of their (possibly miss-named) reward hacking behavior that way, on coding tasks. Or you know, possibly they just can't code that well or delude themselves and so they cheat (they don't seem to perform sophisticated exploits unless researchers bait them into it?). But Pokemon and NetHack and the vending machine? Maybe they just don't want to win. But they also don't seem to be doing much instrumental power seeking, so it doesn't really seem like they WANT anything. Anyway, this is my crux. If we start to see competent agentic behavior I will buy into the short timelines view at 75% + One other objection I want to head off: Yes, there must be some brain-like algorithm which is far more sample efficient and agentic than LLMs (though it's possible that large enough trained and post-trained LLMs eventually are just as good, which is kind of the issue at dispute here). That brain-like algorithm has not been discovered and I see no reason to expect it to be discovered in the next 5 years unless LLMs have already foomed. So I do not consider this particularly relevant to the discussion about confidence in very short timelines. Also, worth stating explicitly that I agree with both interlocutors that we should pause AGI development now out of reasonable caution, which I consider highly overdetermined.

johnswentworth5d11278

Why is LW not about winning?

> If you want to solve alignment and want to be efficient about it, it seems obvious that there are better strategies than researching the problem yourself, like don't spend 3+ years on a PhD (cognitive rationality) but instead get 10 other people to work on the issue (winning rationality). And that 10x s your efficiency already. Alas, approximately every single person entering the field has either that idea, or the similar idea of getting thousands of AIs to work on the issue instead of researching it themselves. We have thus ended up with a field in which nearly everyone is hoping that somebody else is going to solve the hard parts, and the already-small set of people who are just directly trying to solve it has, if anything, shrunk somewhat. It turns out that, no, hiring lots of other people is not actually how you win when the problem is hard.

LessWrong Feed [new, now in beta]

Ruby

2mo

The modern internet is replete with feeds such as Twitter, Facebook, Insta, TikTok, Substack, etc. They're bad in ways but also good in ways. I've been exploring the idea that LessWrong could have a very good feed.

I'm posting this announcement with disjunctive hopes: (a) to find enthusiastic early adopters who will refine this into a great product, or (b) find people who'll lead us to an understanding that we shouldn't launch this or should launch it only if designed a very specific way.

You can check it out right now: www.lesswrong.com/feed

From there, you can also enable it on the frontpage in place of Recent Discussion. Below I have some practical notes on using the New Feed.

Note! This feature is very much in beta. It's rough around the edges.

...

(Continue Reading – 2163 more words)

Adele Lopez6m20

I've tried to get used to it, but it still feels significantly worse to me. Not quite sure why, I think the UI feels a bit busier which is part of it, but I think it's mainly because it feels like it's trying to control what I see in a way I don't like.

Mikhail Samin's Shortform

Mikhail Samin

Mikhail Samin8m20

Everyone should do more fun stuff!^[1]

I thought it'd just be very fun to develop a new sense.

Remember vibrating belts and ankle bracelets that made you have a sense of the direction of north? (1, 2)

I made some LLMs make me an iOS app that does this! Except the sense doesn't go away the moment you stop the app!

I am now very good at telling where's north and also much better at knowing where i am and connecting different parts of the territory to each other in my map. Previously, I would often remember my paths as collections of local movements (there, I turn... (read more)

Cole Wyeth's Shortform

Cole Wyeth

10mo

Aaron Staley9m10

If I understand correctly, Claude's pass@X benchmarks mean multiple sampling and taking the best result. This is valid so long as compute cost isn't exceeding equivalent cost of an engineer.

codex's pass @ 8 score seems to be saying "the correct solution was present in 8 attempts, but the model doesn't actually know what the correct result is". That shouldn't count.

2Cole Wyeth2h

Okay yes but this thread of discussion has gone long enough now I think - we basically agree up to a month.

1sam b2h

The prediction is correct on all counts, and perhaps slightly understates progress (though it obviously makes weak/ambiguous claims across the board). The claim that "coding and research agents are beginning to transform their professions" is straightforwardly true (e.g. 50% of Google lines of code are now generated by AI). The METR study was concentrated in March (which is early 2025). And it is not currently "mid-late 2025", it is 16 days after the exact midpoint of the year.

2Cole Wyeth1h

Where is that 50% number from? Perhaps you are referring to this post from google research. If so, you seem to have taken it seriously out of context. Here is the text before the chart that shows 50% completion: This is referring to inline code completion - so its more like advanced autocomplete than an AI coding agent. It's hard to interpret this number, but it seems very unlikely this means half the coding is being done by AI and much more likely that it is often easy to predict how a line of code will end given the first half of that line of code and the previous context. Probably 15-20% of what I type into a standard linux terminal is autocompleted without AI? Also, the right metric is how much AI assistance is speeding up coding. I know of only one study on this, from METR, which showed that it is slowing down coding.

Narrow Misalignment is Hard, Emergent Misalignment is Easy

125

Edward Turner, Anna Soligo, Senthooran Rajamanoharan, Neel Nanda

Ω 604d

Anna and Ed are co-first authors for this work. We’re presenting these results as a research update for a continuing body of work, which we hope will be interesting and useful for others working on related topics.

TL;DR

We investigate why models become misaligned in diverse contexts when fine-tuned on narrow harmful datasets (emergent misalignment), rather than learning the specific narrow task.
We successfully train narrowly misaligned models using KL regularization to preserve behavior in other domains. These models give bad medical advice, but do not respond in a misaligned manner to general non-medical questions.
We use this method to train narrowly misaligned steering vectors, rank 1 LoRA adapters and rank 32 LoRA adapters, and compare these to their generally misaligned counterparts.
- The steering vectors are particularly interpretable, we introduce Training Lens as a

...

(Continue Reading – 1433 more words)

Raphael Roche12m10

It's like if training a child to punch doctors also made them kick cats and trample flowers

My hypothesis would be that during pre-training the base model learns from the training data: punch doctor (bad), kick cats (bad), trample flowers (bad). So it learns something like a function bad that can return different solutions: punch doctor, kick cats and trample flowers.

Now you train the upper layer, the assistant, to punch doctors. This training reinforces not only the output of punching doctors, but as it is a solution of the bad function, other solutions en... (read more)

Emergent Gravity—Order out of Chaos

James Stephen Brown

This is a linkpost for https://nonzerosum.games/graviton.html

This story is reposted from nonzerosum.games where it appears in it’s intended form, full colour with functioning interactive elements, jump over to the site for the authentic experience.

The picture above is from a simulation you can explore at nonzerosum.games and is the result of some very…

… Simple Rules

Randomly positioned points
Each assigned one of five colours
Each colour has a randomly assigned but consistent attraction or repulsion from each colour

As with Conway’s Game of Life, we see that, from these simple rules, complexity arises — and specifically, we see gravitational forms and patterns that feel like bodies, cells, or atomic oscillations.

A Speculation

What you are about to read is no doubt a stunning example of the Dunning-Kruger Effect. That is, that I have done so little actual study of theoretical...

(See More – 920 more words)

Adele Lopez37m20

How do we make something simple even simpler? We eliminate specifications.
Gravity requires particles with mass, so let’s do away with that.
Gravity sucks, so let’s do away with that.
Gravity has one particular measure of force, let’s lose that.

None of these are really true. Photons, which are massless, are affected by gravity as evidenced by gravitational lenses. Even in a universe with only photons, general relativity says that there will be non-trivial gravitational effects.

Gravity is simply the curvature of space-time. It's more like how two objects movin... (read more)

Gram Stone's Shortform

Gram Stone

37m

Gram Stone37m10

Me: "My alcoholic parents would always describe a 'bubble in a beer glass' cosmology anytime I brought up cosmology (I'm talking younger than 10), foragers have cosmologies, Tolkien's Ea has a cosmology that mixes flat earth and spherical earth, why are humans compelled to invent cosmologies regardless of context? I'm guessing because the night 🌙 sky is a human universal"

Claude Sonnet 4: "That's a beautiful observation. Your alcoholic parents with their "bubble in a beer glass" cosmology were doing exactly what humans have always done - looking at t... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Get sued or kill someone: The trolly problems of Psychological practice.

Brad Dunn

40m

The way I spend most of my day right now is as a student studying neuropsychology. I'm a fourth year, mature age student, and something has come up recently which made me think this community might have something to offer—how do people make choices relating to ethical dilemmas.

I’m doing a course with now on solving ethical dilemmas within psychological practice and the reason this is heavily taught in is that psychologists are often tied up in ethical problems with lead to lawsuits, getting deregistered, or even arrested. As you can imagine, there are all these instances of boundaries which can be crossed, violated, and sometimes, sadly, people die, so lawsuits and testifying before panels about the care which is given to people who are high risk is...

(See More – 700 more words)

Generalizing zombie arguments

jessicata

This is a linkpost for https://unstableontology.com/2025/07/15/generalizing-zombie-arguments/

Chalmers' zombie argument, best presented in The Conscious Mind, concerns the ontological status of phenomenal consciousness in relation to physics. Here I'll present a somewhat more general analysis framework based on the zombie argument.

Assume some notion of the physical trajectory of the universe. This would consist of "states" and "physical entities" distributed somehow, e.g. in spacetime. I don't want to bake in too many restrictive notions of space or time, e.g. I don't want to rule out relativity theory or quantum mechanics. In any case, there should be some notion of future states proceeding from previous states. This procession can be deterministic or stochastic; stochastic would mean "truly random" dynamics.

There is a decision to be made on the reality of causality. Under a block universe theory, the universe's...

(Continue Reading – 1864 more words)

Said Achmiz41m20

Not quite; my point in the linked comment is not about neural encoding, but about functional asymmetry—perception of red and perception of green have different functional properties, in humans. (Of course this does also imply differences in neurobiological implementation details, but we need not concern ourselves with that.)

(For instance, the perceptual (photometric) lightness of red is considerably lower than the perceptual lightness of green at a given equal level of actual (radiometric) lightness. This is an inherent part of the perceptual experience of... (read more)

resume limiting

bhauth

44m

This is a linkpost for https://www.bhauth.com/blog/institutions/resume%20limiting.html

the problem

In America, it's become fairly common for people looking for a job to send out a lot of resumes, sometimes hundreds. As a result, companies accepting resumes online now tend to get too many applications to properly review.

So, some fast initial filter is needed. For example, requiring a certain university degree - there are even a few companies who decided to only consider applications from graduates of a few specific universities. Now that LLMs exist, it's common for companies to use AI for initial resume screening, and for people to use LLMs to write things for job applications. (LLMs even seem to prefer their own writing.)

People who get fired more, or pass interviews less, spend more time trying to get hired. The result is that job...

(See More – 555 more words)

Daniel Kokotajlo1d*726

Mikhail Samin8m20

Cole Wyeth6h91

leogao2dΩ194522

Rauno Arike1d263