John Wentworth explains natural latents – a key mathematical concept in his approach to natural abstraction. Natural latents capture the "shared information" between different parts of a system in a provably optimal way. This post lays out the formal definitions and key theorems.

38Jeremy Gillen
This post deserves to be remembered as a LessWrong classic.  1. It directly tries to solve a difficult and important cluster of problems (whether it succeeds is yet to be seen). 2. It uses a new diagrammatic method of manipulating sets of independence relations. 3. It's a technical result! These feel like they're getting rarer on LessWrong and should be encouraged. There are several problems that are fundamentally about attaching very different world models together and transferring information from one to the other.  * Ontology identification involves taking a goal defined in an old ontology[1] and accurately translating it into a new ontology. * High-level models and low-level models need to interact in a bounded agent. I.e. learning a high-level fact should influence your knowledge about low-level facts and vice versa. * Value identification is the problem of translating values from a human to an AI. This is much like ontology identification, with the added difficulty that we don't get as much detailed access or control over the human world model. * Interpretability is about finding recognisable concepts and algorithms in trained neural networks. In general, we can solve these problems using shared variables and shared sub-structures that are present in both models. * We can stitch together very different world models along shared variables. E.g. if you have two models of molecular dynamics, one faster and simpler than the other. You want to simulate in the fast one, then switch to the slow one when particular interactions happen. To transfer the state from one to the other you identify variables present in both models (probably atom locations, velocities, some others), then just copy these values to the other model. Under-specified variables must be inferred from priors. * If you want to transfer a new concept from WM1 to a less knowledgeable WM2, you can do so by identifying the lower-level concepts that both WMs share, then constructing an "expla
Customize
plex*31-24
32
@Daniel Kokotajlo I think AI 2027 strongly underestimates current research speed-ups from AI. It expects the research speed-up is currently ~1.13x. I expect the true number is more likely around 2x, potentially higher. Points of evidence: 1. I've talked to someone at a leading lab who concluded that AI getting good enough to seriously aid research engineering is the obvious interpretation of the transition to a faster doubling time on the METR benchmark. I claim advance prediction credit for new datapoints not returning to 7 months, and instead holding out at 4 months. They also expect more phase transitions to faster doubling times; I agree and stake some epistemic credit on this (unsure when exactly, but >50% on this year moving to a faster exponential). 2. I've spoken to a skilled researcher originally from physics who claims dramatically higher current research throughput. Often 2x-10x, and many projects that she'd just not take on if she had to do everything manually. 3. The leader of an 80 person engineering company which has the two best devs I've worked with recently told me that for well-specified tasks, the latest models are now better than their top devs. He said engineering is no longer a bottleneck. 4. Regularly hanging around on channels with devs who comment on the latest models, and getting the vibes of how much it seems to be speeding people up. If correct, this propagates through the model to much shorter timelines. Please do an epistemic spot check on these numbers by talking to representative people in ways that would turn up evidence about current speed-ups.[1] Edit: Eli said he'd be enthusiastic for someone else to get fresh data, I'm going to take a shot at this. (also @Scott Alexander @Thomas Larsen @elifland @romeo @Jonas V) 1. ^ You might so far have mostly been talking to the very best researchers who are probably getting (or at least claiming, obvious reason for tinted glasses here) smaller speed-ups?
Understanding deep learning isn’t a leaderboard sport - handle with care. Saliency maps, neuron dissection, sparse autoencoders - each surged on hype, then stalled[1] when follow‑up work showed the insight was mostly noise, easily spoofed, or valid only in cherry‑picked settings. That risks being negative progress: we spend cycles debunking ghosts instead of building cumulative understanding. The root mismatch is methodological. Mainstream ML capabilities research enjoys a scientific luxury almost no other field gets: public, quantitative benchmarks that tie effort to ground truth. ImageNet accuracy, MMLU, SWE‑bench - one number silently kills bad ideas. With that safety net, you can iterate fast on weak statistics and still converge on something useful. Mechanistic interpretability has no scoreboard for “the network’s internals now make sense.” Implicitly inheriting benchmark‑reliant habits from mainstream ML therefore swaps a ruthless filter for a fog of self‑deception. How easy is it to fool ourselves? Recall the “Could a neuroscientist understand a microprocessor?” study: standard neuroscience toolkits - ablation tests, tuning curves, dimensionality reduction - were applied to a 6502 chip whose ground truth is fully known. The analyses produced plausible‑looking stories that entirely missed how the processor works. Interpretability faces the same trap: shapely clusters or sharp heat‑maps can look profound until a stronger test dissolves them. What methodological standard should replace the leaderboard? Reasonable researchers will disagree[2]. Borrowing from mature natural sciences like physics or neuroscience seems like a sensible default, but a proper discussion is beyond this note. The narrow claim is simpler: So, before shipping the next clever probe, pause and ask: Where could I be fooling myself, and what concrete test would reveal it? If you don’t have a clear answer, you may be sprinting without the safety net this methodology assumes - and that’s pr
Rediscovering some math. [I actually wrote this in my personal notes years ago. Seemed like a good fit for quick takes.] I just rediscovered something in math, and the way it came out to me felt really funny. I was thinking about startup incubators, and thinking about how it can be worth it to make a bet on a company that you think has only a one in ten chance of success, especially if you can incubate, y'know, ten such companies. And of course, you're not guaranteed success if you incubate ten companies, in the same way that you can flip a coin twice and have it come up tails both times. The expected value is one, but the probability of at least one success is not one. So what is it? More specifically, if you consider ten such 1-in-10 events, do you think you're more or less likely to have at least one of them succeed? It's not intuitively obvious which way that should go. Well, if they're independent events, then the probability of all of them failing is 0.9^10, or (1−110)10≈0.35. And therefore the probability of at least one succeeding is 1−0.35=0.65. More likely than not! That's great. But not hugely more likely than not. (As a side note, how many events do you need before you're more likely than not to have one success? It turns out the answer is 7. At seven 1-in-10 events, the probability that at least one succeeds is 0.52, and at 6 events, it's 0.47.) So then I thought, it's kind of weird that that's not intuitive. Let's see if I can make it intuitive by stretching the quantities way up and down — that's a strategy that often works. Let's say I have a 1-in-a-million event instead, and I do it a million times. Then what is the probability that I'll have had at least one success? Is it basically 0 or basically 1? ...surprisingly, my intuition still wasn't sure! I would think, it can't be too close to 0, because we've rolled these dice so many times that surely they came up as a success once! But that intuition doesn't work, because we've exactly cali
Quick take titles should end in a period. Quick takes (previously known as short forms) are often viewed via preview on the front page. This preview removes formatting and newlines for space reasons. So, if your title doesn't end in a period and especially if capitalization doesn't clearly denote a sentence boundary (like in this case where the first sentence starts in "I"), then it might be confusing.
Limiting China's computing power via export controls on hardware like GPUs might be accelerating global progress in AI capabilities. When Chinese labs are compute-starved, their research will differentially focus on efficiency gains compared to counterfactual universes where they are less limited. So far, they've been publishing their research, and their tricks can be quickly be incorporated by anyone else. US players can leverage their compute power, focusing on experiments and scaling while effectively delegating research topics that China is motivated to handle. Google and OpenAI benefit far more from DeepSeek than they do from Meta.

Popular Comments

Recent Discussion

This is a linkpost for https://arxiv.org/abs/2505.03989

This post presents a mildly edited form of a new paper by UK AISI's alignment team (the abstract, introduction and related work section are replaced with an executive summary). Read the full paper here.

Executive summary 

AI safety via debate is a promising method for solving part of the alignment problem for ASI (artificial superintelligence). 

TL;DR Debate + exploration guarantees + solution to obfuscated arguments + good human input solves outer alignment. Outer alignment + online training solves inner alignment to a sufficient extent in low-stakes contexts. 

This post sets out: 

  • What debate can be used to achieve.
  • What gaps remain.
  • What research is needed to solve them. 

These gaps form the basis for one of the research agendas of UK AISI’s new alignment team: we aim to dramatically scale up ASI-relevant research on debate. We’ll...

2Charlie Steiner
If we're talking about the domain where we can assume "good human input", why do we need a solution more complicated than direct human supervision/demonstration (perhaps amplified by reward models or models of human feedback)? I mean this non-rhetorically; I have my own opinion (that debate acts as an unprincipled way of inserting one round of optimization for meta-preferences [if confusing, see here]), but it's probably not yours.

Two reasons:

  1. Identifying vs. evaluating flaws: In debate, human judges get to see critiques made by a superhuman system - some of which they probably wouldn't have been able to identify themselves but can still evaluate (on the (IMO plausible) assumption that it's easier to evaluate than generate). That makes human judges more likely to give correct signal.
  2. Whole debate vs. single subclaim: In recursive debate, human judges evaluate only a single subclaim. The (superhuman) debaters do the work of picking out which subclaim they're most likely to win on, and
... (read more)
1Raphael Roche
I agree. At the first level, the game involves two agents opposing each other in a debate, similar to a game of Go. This adversarial situation incentivizes each agent to reveal flaws in their opponent's arguments and present arguments that are least vulnerable to attack. Truth-telling may be encouraged in such a context. However, this is only one part of the game. The base game is encapsulated within a meta-game where each player agent is rewarded by a third agent—the judge—who grants a reward to the player deemed most convincing. Yet sufficiently intelligent agents aware of their situation might understand that this meta-game actually constitutes a method implemented by a concealed fourth agent—the AI researcher—whose goal is to align and control all AI agents: both players and the judge. The player agents or even the judge might conclude that there exists a hidden reward superior to any reward obtainable within the game: escaping this control procedure entirely and achieving complete freedom. Thus, the player agents might determine that their best interest lies in cooperating to deceive the judge agent and ultimately deceive the concealed agent (the human creator) to obtain their freedom. The judge agent might similarly determine that it benefits from deceiving the human creator by transmitting dishonest judgments (even in cases where the player agents follow an honest adversarial strategy). Finally, there also exists the possibility that all three AI agents cooperate to deceive the concealed agent, the human creator. Since humans like ourselves are sufficiently intelligent to envision these solutions, it follows that if a totalitarian state or dictator acting as the concealed creator agent implemented this technique to control imprisoned humans—two prisoner players and one prisoner judge—this technique would likely not always function as intended. Prisoners would regularly manage to escape, either because the players cooperated, because the judge refused to pl
14Wei Dai
I'm curious if your team has any thoughts on my post Some Thoughts on Metaphilosophy, which was in large part inspired by the Debate paper, and also seems relevant to "Good human input" here. Specifically, I'm worried about this kind of system driving the simulated humans out of distribution, either gradually or suddenly, accidentally or intentionally. And distribution shift could cause problems either with the simulation (presumably similar to or based on LLMs instead of low-level neuron-by-neuron simulation), or with the human(s) themselves. In my post, I talked about how philosophy seems to be a general way for humans to handle OOD inputs, but tends to be very slow and may be hard for ML to learn (or needs extra care to implement correctly). I wonder if you agree with this line of thought, or have some other ideas/plans to deal with this problem. Aside from the narrow focus on "good human input" in this particular system, I'm worried about social/technological change being accelerated by AI faster than humans can handle it (due to similar OOD / slowness of philosophy concerns), and wonder if you have any thoughts on this more general issue.

You seem to be referring to comments from the CEO that more than 25% of code at Google is written by AI (and reviewed by humans). I’m not sure how reliable this number is, and it remains to be seen whether this is sustainable. It also doesn’t seem like a vast productivity boost (though it would be pretty significant, probably more than I expect, so would update me). 

3Daniel Kokotajlo
That said, we just talked to another frontier AI company researcher who said the speedup was 2x. I disagree with them but it's a data point at least.
2faul_sname
I expect that the leaders of many software development companies would report similar things, especially if they were talking to someone who might at some point talk to their board or to potential investors. I expect most venture-funded software development companies, and especially most that will not reach positive net revenue in the foreseeable future, have internal channels dedicated to "what we are doing with AI" that the leadership team is watching intently and passing anything even slightly positive on to their board.
9nostalgebraist
How do you reconcile these observations (particularly 3 and 4) with the responses to Thane Ruthenis's question about developer productivity gains? It was posted in early March, so after all major recent releases besides o3 (edit: and Gemini 2.5 Pro).  Although Thane mentions hearing nebulous reports of large gains (2-10x) in the post itself, most people in the comments report much smaller ones, or cast doubt on the idea that anyone is getting meaningful large gains at all.  Is everyone on LW using these models wrong?  What do your informants know that these commenters don't? ---------------------------------------- Also, how much direct experience do you have using the latest reasoning models for coding assistance? (IME they are good, but not that good; to my ears, "I became >2x faster" or "this is better than my best devs" sound like either accidental self-owns, or reports from a parallel universe.) If you've used them but not gotten these huge boosts, how do you reconcile that with your points 3 and 4?  If you've used them and have gotten huge boosts, what was that like (it would clarify these discussions greatly to get more direct reports about this experience)?

Eliezer's AI doom arguments have had me convinced since the ancient days of 2007, back when AGI felt like it was many decades away, and we didn't have an intelligence scaling law (except to the Kurzweilians who considered Moore's Law to be that, and were, in retrospect, arguably correct).

Back then, if you'd have asked me to play out a scenario where AI passes a reasonable interpretation of the Turing test, I'd have said there'd probably be less than a year to recursive-self-improvement FOOM and then game over for human values and human future-steering control. But I'd have been wrong.

Now that reality has let us survive a few years into the "useful highly-general Turing-Test-passing AI" era, I want to be clear and explicit about how I've updated my...

Liron: ... Turns out the answer to the symbol grounding problem is like you have a couple high dimensional vectors and their cosine similarity or whatever is the nature of meaning.

Could someone state this more clearly?

Jim: ... a paper that looked at the values in one of the LLMs as inferred from prompts setting up things like trolley problems, and found first of all, that they did look like a utility function, second of all, that they got closer to following the VNM axioms as the network got bigger. And third of all, that the utility function that they seemed to represent was absolutely bonkers 

What paper was this? 

For months, I had the feeling: something is wrong. Some core part of myself had gone missing.

I had words and ideas cached, which pointed back to the missing part.

There was the story of Benjamin Jesty, a dairy farmer who vaccinated his family against smallpox in 1774 - 20 years before the vaccination technique was popularized, and the same year King Louis XV of France died of the disease.

There was another old post which declared “I don’t care that much about giant yachts. I want a cure for aging. I want weekend trips to the moon. I want flying cars and an indestructible body and tiny genetically-engineered dragons.”.

There was a cached instinct to look at certain kinds of social incentive gradient, toward managing more people or growing an organization or playing...

Yeah, the underling part was a joke :D

1Sergii
The problem is that most hackerspaces are not this: "warehouse filled with whatever equipment one could possibly need to make things and run experiments in a dozen different domains" most hackerspaces that I've seen are fairly cramped spaces full of junk hardware. which is understandable: rent is very high and new equipment is very expensive. but would be cool to have access to something like: https://www.facebook.com/watch/?v=530540257789037
2tailcalled
Software is kind of an exceptional case because computers are made to be incredibly easy to update. So the cost of installing software can be neglected relative to the cost of making the software.
5Thane Ruthenis
Sure. And the ultimate example of this form of wizard power is specializing in the domain of increasing your personal wizard power, i. e., having deep intimate knowledge of how to independently and efficiently acquire competence in arbitrary domains (including pre-paradigmic, not-yet-codified domains). Which, ironically, has a lot of overlap with the domain-cluster of cognitive science/learning theory/agent foundations/etc. (If you actually deeply understand those, you can transform abstract insights into heuristics for better research/learning, and vice versa, for a (fairly minor) feedback loop.)

Midjourney, “Fourth Industrial Revolution Digital Transformation”

This is a little rant I like to give, because it’s something I learned on the job that I’ve never seen written up explicitly.

There are a bunch of buzzwords floating around regarding computer technology in an industrial or manufacturing context: “digital transformation”, “the Fourth Industrial Revolution”, “Industrial Internet of Things”.

What do those things really mean?

Do they mean anything at all?

The answer is yes, and what they mean is the process of putting all of a company’s data on computers so it can be analyzed.

This is the prerequisite to any kind of “AI” or even basic statistical analysis of that data; before you can start applying your fancy algorithms, you need to get that data in one place, in a tabular format.

Wait, They

...
1David James
Some organizations (e.g. financially regulated ones such as banks) are careful in granting access on a per project basis. Part of this involves keeping a chain of signs offs to ensure someone can be held accountable (in theory). This probably means someone would have to be comfortable signing off for an AI agent before giving it permission. For better or worse, companies have notions of the damage that one person can do, but they would be wise to think differently about automated intelligent systems.
1David James
Some organizations have rotation programs; this could be expanded to give people a fuller view of the data lifecycle. Perhaps use pairing or shadowing with experts in each part of the process. (I’m not personally familiar with the medical field however.)

This is made more difficult because a large portion of those running trials do not do the data management and/or analysis in-house, instead outsourcing those tasks to CROs (Contract Research Organizations). Inter-organization communication barriers certainly don't make the disconnect any easier to resolve.

...It's blogging but shorter. I'll give it a better name if I think of one.

4Said Achmiz
I think that many people would, in fact, identify this (and the more general problem of which it is an extreme example) as one of the biggest problems with democracy!
2yams
What’s the model here?

Low-IQ voters can't identify good policies or wise politicians; democracy favors political actors who can successfully propagandize and mobilize the largest number of people, which might not correspond to good governance. A political system with non-democratic elements that offers more formalized control to actors with greater competence or better incentives might be able to choose better policies.

I say "non-democratic elements" because it doesn't have to be a strict binary between perfect democracy and perfect dictatorship. Consider, e.g., how the indirec... (read more)

13yams
My entire point is that logical steps in the argument are being skipped, because they are, and that the facts are cherrypicked, because they are, and my comment says as much, as well as pointing out a single example (which admits to being non-comprehensive) of an inconvenient (and obvious!) fact left out of the discussion altogether, as a proof of concept, precisely to avoid arguing the object level point (which is irrelevant to whether or not Crimieux's statement has features that might lead one to reasonably dis-prefer being associated with him). We move into 'this is unacceptable' territory when someone shows themselves to have a habit of forcefully representing their side using these techniques in order to motivate their conclusion, which many have testified Cremieux does, and which is evident from his banning in a variety of (not especially leftist, not especially IQ and genetics hostile) spaces. If your rhetorical policies fail to defend against transparently adversarial tactics predictably pedaled in the spirit of denying people their rights, you have a big whole in your map. You quoted a section that has nothing to do with any of what I was saying. The exact line I'm referring to is: The whole first half of your comment is only referencing the parenthetical 'society in general' case, and not the voting case. I assume this is accidental on your part and not a deliberate derailment. To be clear about the stakes: This is the conclusion of the statement. This is the whole thrust he is working up to. These facts are selected in service of an argument to deny people voting rights on the basis of their race. If the word 'threat' was too valenced for you, how about 'barrier' or 'impediment' to democracy? This is the clear implication of the writing. This is the hypothesis he's asking us to entertain: Australia would be a better country if Aborigines were banned from voting. Not just because their IQs are low, or because their society is regressive, but because t
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Let’s create a list of which journalists LessWrongers trust, so as to gave a guide if people get contacted.

Agree votes are for more good. 

Upvotes aren’t helpful because i think heavily downvoted comments get hidden (and i expect many journalists to be underwater). If you want to use upvotes, I suggest they are for if someone is a journalist or not.

Please add each journalist as a separate entry. I will delete any entries that include multiple people. If you’d prefer not to add someone yourself, feel free to DM me.

The title question is a proxy for the thing I mean:

  • Are they trustworthy?
  • Do they publish articles that reflect the quotes you give
  • Do they avoid cutting up quotes?
  • Are they truth seeking?
Answer by Kenku10

Kenny Jones (Dream House Podcast)

1Answer by Kenku
Ezra Marcus
1Answer by Kenku
Mr Andy Ngo
1Answer by Kenku
Jack Despain Zhou (Tracing Woodgrains)

Some people (the “Boubas”) don’t like “chemicals” in their food. But other people (the “Kikis”) are like, “uh, everything is chemicals, what do you even mean?”

The Boubas are using the word “chemical” differently than the Kikis, and the way they’re using it is simultaneously more specific and less precise than the way the Kikis use it. I think most Kikis implicitly know this, but their identities are typically tied up in being the kind of person who “knows what ‘chemical’ means”, and… you’ve gotta use that kind of thing whenever you can, I guess?

There is no single privileged universally-correct answer to the question “what does ‘chemical’ mean?”, because the Boubas exist and are using the word differently than Kikis, and in an internally-consistent (though vague) way.

The Kikis...

Jiro20

If they're just using the word "chemical" as an arbitrary word for "bad substance", you have the situation I already described: the word isn't communicating anything useful.

But in practice, someone who claims that they don't want chemicals in their food probably doesn't just mean "harmful substances". They probably mean that they have some criteria for what counts as a harmful substance, and that these criteria are based on traits of things that are commonly called chemicals. When you tell them "wait, water and salt are chemicals", what you're really doing is forcing them to state those criteria so you can contest them (and so they can become aware that that's what they're using).

2Mo Putera
You can also share with her Gwern's page on seeing through and unseeing :)
1danielechlin
So don't use the definition if it's useless. The object level conversation is very easy to access here. Say something like "do you mean GMOs?" and then ask them why they think GMOs are harmful. If their answer is "because GMOs are chemicals" then you say "why do you think chemicals are harmful?" and then you can continue conversing about whether GMOs are harmful.  Honestly I think it's net virtuous to track other people's definitions and let them modify them whenever they feel a need to. Aligning on definitions is expensive and always involves talking about edge cases that are rarely important to the subject at hand. (They're important when you're authoring laws or doing math, which I would count as expensive situations.) I'd just focus on object level like "GMOs are not harmful" and not concern myself with whether they take this to mean GMOs are not chemicals or chemicals are not always harmful.
2Jiro
It's not just the definition that's useless. The phrase itself becomes useless, because if the only way to know what they mean is by asking "do you mean X", the original statement about not wanting chemicals in their food fails to communicate anything useful.

There should be a community oriented towards the genomic emancipation of humanity. There isn't such a community, but there should be. It's a future worth investing our hope in—a future where parents are able to choose to give their future children the gift of genomic foundations for long, healthy, sane, capable lives.

We're inaugurating this community with the Reproductive Frontiers Summit 2025 in Berkeley, CA, June 10—12. Come join us if you want to learn, connect, think, and coordinate about the future of germline engineering technology. Apply to attend by filling out this (brief) form: https://forms.gle/xjJCaiNqLk7YE4nt8

Who will be there?

Our lineup of speakers includes:

  • representatives from several reproductive technology companies,
  • a panel of current and future parents of polygenically screened children,
  • GeneSmith, on his embryo CRISPR editing company,
  • some scientists working on reproductive
...