LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ

Quick Takes

941
Berkeley Solstice Weekend
2025 NYC Secular Solstice & East Coast Rationalist Megameetup
[Today]LW-Cologne meetup
[Today]ACX Montreal meetup - November 8th @1PM
koanchuk's Shortform
koanchuk6h20

Given superintelligence, what happens next depends on the success of the alignment project. The two options:

  1. It fails, and we die soon thereafter (or worse).
  2. It succeeds, and we now have an entity that can solve problems for us far better than any human or human organization. We are now in a world where humans have zero socioeconomic utility. The ASI can create entertainment and comfort that surpasses anything any human can provide. Sure you can still interact with others willing to interact with you, it just won't be as fun as whatever stimulus the ASI can
... (read more)
Reply
jamjam2h10

I still think a world we don't see superintelligence in our lifetimes is technically possible, though the chance of that goes down continuously and is already vanishingly small in my view (many experts and pundits disagree). I also think its important not to over-predict regarding what option 2 would look like, there are infinite possibilities and this is only one (eg I could imagine a world where some aligned superintelligence steers us away from infinite dopamine simulation and into a idealized version of the world we live in now, think the Scythe novel ... (read more)

Reply
Kongo Landwalker's Shortform
Kongo Landwalker6h20

I now understand music physical records.

Previously, I could not grasp. They said the quality is better. "Why do you like it? It is white-noisy, slightly muted, you can hear the scratches. Electronic music recreates closer to the original soundwaves."

Now I am one of those. A person might use AI tools and ask me: "Why do you prefer ordinary tools? They are hundred times slower, and the product accumulates all you mistakes. While AI tool recreates closer to the original idea."

Now I see the loss of authenticity on this ladder: Live art, live performance with e... (read more)

Reply
Sinclair Chen2h10

yes. pick up doodling. learn to sing.
 

Reply
1dirk5h
Brian Eno quote (here on Goodreads) which this reminded me of; might be an interesting counterpoint:
Sinclair Chen's Shortform
Sinclair Chen3h40

a draft from an older time, during zack's agp discourse
i did like fiora's account as a much better post

btw i transitioned because ozy made it seemed cool.  his says take the pills if you feel like it. that's it.  decompose "trans" into its parts: do i like a different name, do i like these other clothes, would i like certain changes to my body?

he also said since i am asian it will be easy. i was sus of that but it's 100% right. i could have been a youngshit but instead i waited until after college, feb 2020, to get onto hrt.
i'd like to think by w... (read more)

Reply
Wei Dai's Shortform
Wei Dai3d360

The Inhumanity of AI Safety

A: Hey, I just learned about this idea of artificial superintelligence. With it, we can achieve incredible material abundance with no further human effort!

B: Thanks for telling me! After a long slog and incredible effort, I'm now a published AI researcher!

A: No wait! Don't work on AI capabilities, that's actually negative EV!

B: What?! Ok, fine, at huge personal cost, I've switched to AI safety.

A: No! The problem you chose is too legible!

B: WTF! Alright you win, I'll give up my sunken costs yet again, and pick something illegible.... (read more)

Reply
Showing 3 of 8 replies (Click to show all)
6Wei Dai6h
I'm not sure I actually agree with this. Can you explain how someone who is virtuous, but missing the crucial consideration of "legible vs. illegible AI safety problems" can still benefit the world? I.e., why would they not be working on some highly legible safety problem that actually is negative EV to work on? My current (uncertain) perspective is that we actually do still need people to be "acting on a kind of top-down partly-social motivation (towards doing stuff that the AI safety community approves of)" but the AI safety community needs to get better at being strategic somehow. Otherwise I don't see how each person can discover all of the necessary crucial considerations on their own, or even necessarily appreciate all the important considerations that the community has come up with. And I do not see why "people with such traits will typically benefit the world even if they're missing crucial high-level considerations like the ones described above." (Or alternatively put all/most effort into AI pause/stop/slowdown, which perhaps does not require as much strategic finesse.)
4Richard_Ngo5h
If a person is courageous enough to actually try to solve a problem (like AI safety), and high-integrity enough to avoid distorting their research due to social incentives (like incentives towards getting more citations), and honest enough to avoid self-deception about how to interpret their research, then I expect that they will tend towards doing "illegible" research even if they're not explicitly aware of the legible/illegible distinction. One basic mechanism is that they start pursuing lines of thinking that don't immediately make much sense to other people, and the more cutting-edge research they do the more their ontology will diverge from the mainstream ontology.
Wei Dai4h42

This has pretty low argumentative/persuasive force in my mind.

then I expect that they will tend towards doing "illegible" research even if they're not explicitly aware of the legible/illegible distinction.

Why? I'm not seeing the logic of how your premises lead to this conclusion.

And even if there is this tendency, what if someone isn't smart enough to come up with a new line of illegible research, but does see some legible problem with an existing approach that they can contribute to? What would cause them to avoid this?

And even the hypothetical virtuo... (read more)

Reply
LWLW's Shortform
LWLW9h38-2

I just can’t wrap my head around people who work on AI capabilities or AI control. My worst fear is that AI control works, power inevitably concentrates, and then the people who have the power abuse it. What is outlandish about this chain of events? It just seems like we’re trading X-risk for S-risks, which seems like an unbelievably stupid idea. Do people just not care? Are they genuinely fine with a world with S-risks as long as it’s not happening to them? That’s completely monstrous and I can’t wrap my head around it.  The people who work at the to... (read more)

Reply2
Showing 3 of 13 replies (Click to show all)
1waterlubber6h
It's not. Alignment is de facto capabilities (principal agent problem makes aligned employees more economically valuable) and unless we have a surefire way to ensure that the AI is aligned to some "universal," or even cultural, values, it'll be aligned by default to Altman, Amodei, et. al.
8habryka6h
I mean "not solving alignment" pretty much guarantees misuse by everyone's lights? (In both cases conditional on building ASI)
clone of saturn4h20

It pretty much guarantees extinction, but people can have different opinions on how bad that is relative to disempowerment, S-risks, etc.

Reply
Daniel Tan's Shortform
Daniel Tan18h170

Question for people with insider knowledge of how labs train frontier models: Is it more common to do alignment training as the last step of training, or RL as the last step of training?

  • Edit: I'm mainly referring to on-policy RL, e.g. the type of RL that is used to induce new capabilities like coding / reasoning / math / tool use. I'm excluding RLHF because I think it's pretty disanalogous (though I also welcome disagreement / takes on this point.) 

Naively I'd expect we want alignment to happen last. But I have a sense that usually RL happens last - why is this the case? Is it because RL capabilities are too brittle to subsequent finetuning? 

Reply
anaguma6h10

Why not both? I imagine you could average the gradients so that you learn both at the same time.

Reply
Daniel Paleka's Shortform
Daniel Paleka5h311

Slow takeoff for AI R&D, fast takeoff for everything else

Why is AI progress so much more apparent in coding than everywhere else?

Among people who have "AGI timelines", most do not set their timelines based on data, but rather update them based on their own day-to-day experiences and social signals.

As of 2025, my guess is that individual perception of AI progress correlates with how closely someone's daily activities resemble how an AI researcher spends their time. The reason why users of coding agents feel a higher rate of automation in their bones, wh... (read more)

Reply
dirk's Shortform
dirk6h10

Sonnet 4.5 hallucinates citations. See for instance this chat I was just having with it; if you follow the citations in its third message, you'll find that the majority of them don't relate at all to the claims they're attached to. For example, its citation for "a 19th-century guide to diary keeping" goes to Gender identity better than sex explains individual differences in episodic and semantic components of autobiographical memory: An fMRI study. (It also did this with some local politics questions I had the other day).

When I've looked up the mis-cited i... (read more)

Reply
Dave Banerjee's Shortform
Dave Banerjee6h30

Why Steal Model Weights?

Epistemic status: Hastily written. I dictated in a doc for 7 minutes. Then I spent an hour polishing it. I don’t think there are any hot takes in this post? It’s mostly a quick overview of model weight security so I can keep track of my threat models.

Here’s a quick list of reasons why an attacker might steal frontier AI model weights (lmk if I'm missing something big):

  1. Attackers won’t profit from publicly serving the stolen model on an API. A state actor like Russia couldn't price-compete with OpenAI due to lack of GPU infrastructure
... (read more)
Reply
David James's Shortform
David James2d30

Asking even a good friend to take the time to read The Sequences (aka Rationality A-Z) is a big ask. But how else does one absorb the background and culture necessary if one wants to engage deeply in rationalist writing? I think we need alternative ways to communicate the key concepts that vary across style and assumed background. If you know of useful resources, would you please post them as a comment? Thanks.

Some different lenses that could be helpful:

  • “I already studied critical thinking in college, why isn’t this enough?”

  • “I’m already a practicing

... (read more)
Reply
tryhard10007h10

As a STEM enthusiast, I suspect I would've much more quickly engaged with the Sequences had I first been recommended arbital.com as a gateway to it instead of "read the Sequences" directly.

Reply
Jemist's Shortform
J Bostock2d90

Steering as Dual to Learning

I've been a bit confused about "steering" as a concept. It seems kinda dual to learning, but why? It seems like things which are good at learning are very close to things which are good at steering, but they don't always end up steering. It also seems like steering requires learning. What's up here?

I think steering is basically learning, backwards, and maybe flipped sideways. In learning, you build up mutual information between yourself and the world; in steering, you spend that mutual information. You can have learning without ... (read more)

Reply1
Roman Malov7h10

I'm just going from pure word vibes here, but I've read somewhere (to be precise, here) about Todorov’s duality between prediction and control: https://roboti.us/lab/papers/TodorovCDC08.pdf

Reply
1Daniel C2d
Alternatively, for learning your brain can start out in any given configuration, and it will end up in the same (small set of) final configuration (one that reflects the world); for steering the world can start out in any given configuration, and it will end up in the same set of target configurations It seems like some amount of steering without learning is possible (open-loop control), you can reduce entropy in a subsystem while increasing entropy elsewhere to maintain information conservation
2testingthewaters2d
See also this paper about plasticity as dual to empowerment https://arxiv.org/pdf/2505.10361v2
Rachel Shu's Shortform
Rachel Shu10h10

On Dwarkesh’s podcast, Nick Lane says that “(the reason for) Large genomes. To have a multicellular organism where effectively you’re deriving from a single cell, that restricts the chances of effectively all the cells having a fight. … So you start with a single cell and you develop, so there’s less genetic fighting going on between the cells than there would be if they come together.”

Has anyone made the formal connection between this and acausal trade? For all I know this is exactly where the insight comes from, but if not, someone should fill in the gap.

Reply
Roman Malov8h10

I'm not sure, but this looks more like learned cooperative policy rather than two entities having models of each other and getting to conclusion about each other's cooperation.

Reply
Roman Malov's Shortform
Roman Malov8h10

I just resolved my confusion about CoT monitoring.

My previous confusion: People say that CoT is progress in interpretability, that we now have a window into the model's thoughts. But why? LLMs are still just as black-boxy as they were before; we still don't know what happens at the token level, and there’s no reason to think we understand it better just because intermediate results can be viewed as human language.

Deconfusion: Yes, LLMs are still black boxes, but CoT is a step toward interpretability because it improves capabilities without making the black... (read more)

Reply
sarahconstantin's Shortform
sarahconstantin3d120

links 11/05/25: https://roamresearch.com/#/app/srcpublic/page/11-05-2025

 

  • https://www.thirdoikos.com/p/life-in-the-third-oikos-jesse-genet description of daily life for an entrepreneur turned homeschool mom, interview by Nicole Ruiz
  • https://en.wikipedia.org/wiki/Chumash_people they're still around!
  • https://www.nytimes.com/2025/11/02/arts/television/maria-riva-dead.html?unlocked_article_code=1.yU8.uFZP.tjEhyXyNasNz&smid=url-share Maria Riva, Marlene Dietrich's daughter, had a  rough time growing up
  • https://builders.genagorlin.com/p/the-hidden-beli
... (read more)
Reply
Viliam10h20

if you don't have high standards for employees it might because you're misanthropic

That sounds to me like a needlessly complicated theory. Maybe the reason why they hire mediocre people is that exceptional people are rare and expensive?

Like, what's the alternative to "They hire middling engineers instead of holding out for 10x'ers"? If you interview people, and you find out that most of them suck, and then there are a few average guys, but no 10x'er... should you keep waiting? You would be missing opportunities, losing the momentum, and running out of mone... (read more)

Reply
GradientDissenter's Shortform
GradientDissenter2d*8413

Notes on living semi-frugally in the Bay Area.

I live in the Bay Area, but my cost of living is pretty low: roughly $30k/year. I think I live an extremely comfortable life. I try to be fairly frugal, both so I don't end up dependent on jobs with high salaries and so that I can donate a lot of my income, but it doesn't feel like much of a sacrifice. Often when I tell people how little I spend, they're shocked. I think people conceive of the Bay as exorbitantly expensive, and it can be, but it doesn't have to be.

Rent: I pay ~$850 a month for my room. It's a s... (read more)

Reply321
Showing 3 of 13 replies (Click to show all)
3RobertM10h
It can vary enormously based on risk factors, choice of car, and quantity of coverage, but that does still sound extremely high to me.  I think even if you're a 25-yo male with pretty generous coverage above minimum liability, you probably won't be paying more than ~$300/mo unless you have recent accidents on your record.  Gas costs obviously scale ~linearly with miles driven, but even if your daily commute is a 40 mile round-trip, that's still only like $200/mo.  (There are people with longer commutes than that, but not ones that you can easily substitute for with an e-bike; even 20 miles each way seems like a stretch.)
1Ryan Meservey10h
Thank you both for calling this out, because I was clearly incorrect. I was trying to recall my wife's initial calculation, which I believe included maintenance, insurance, gas, and repairs. I think this is one of those things where I was so proud of not owning a car that the amount saved morphed from $8k to $10k to $15k in the retelling. I need to stop doing that.
Ryan Meservey10h10

Also, I'm feeling some whiplash reading my reply because I totally sound like an LLM when called out for a mistake. Maybe similar neural pathways for embellishment were firing, haha.

Reply
Tomás B.'s Shortform
Tomás B.3d6553

Tallness is zero sum. But I suspect beauty isn't. If everyone was more beautiful but the relative differences remained, I think people would be happier. Am I wrong in this? This has policy implications, as once genetic engineering gets better taxing height is likely wise to avoid red-queens races into unhealthy phenotypes. Taxing beauty seems very horrible to me. As beauty is quite beautiful. 

Reply44
Showing 3 of 22 replies (Click to show all)
oligo15h10

It's not obvious to me that personal physical beauty (as opposed to say, beauty in music or mathematics or whatever) isn't negative sum. Obviously beauty in any form can be enjoyable, but we describe people as "enchantingly beautiful" when a desire to please or impress them distorts our thinking, and if this effect isn't purely positional it could be bad. Conventionally beautiful people are also more difficult to distinguish from one another.

There's also the meta-aesthetic consideration that I consider it ugly to pour concern into personal physical beauty,... (read more)

Reply
2Mateusz Bagiński17h
AFAIK, it is best (expected-outcomes-wise) to be short, but for "boring genetic reasons" (as opposed to (genetic or non-genetic) disease reasons), because fewer cells means smaller propensity to develop cancer and a bunch more stuff (holding everything else constant).
4Nina Panickssery1d
Height is not zero sum. Being taller seems clearly all-else-equal better apart from the (significant) fact that it carries health side-effects (like cardio risk). Otherwise being taller means having a larger physical presence—all else equal you can lift more, reach more, see further. Like, surely it would be worse if everyone was half their current height!
jacquesthibs's Shortform
jacquesthibs1d3820

Building an AI safety business that tackles the core challenges of the alignment problem is hard.

Epistemic status: uncertain; trying to articulate my cruxes. Please excuse the scattered nature of these thoughts, I’m still trying to make sense of all of it.

You can build a guardrails or evals platform, but if your main threat model involves misalignment via internal deployment with self-improving AI (potentially stemming from something like online learning on hard problems like alignment which leads to AI safety sabotage), it is so tied to capabilities that ... (read more)

Reply321
Showing 3 of 5 replies (Click to show all)
Jesper L.20h10

My focus recommendation, and what I aim for, is building tools that scale better under cooperation and coordination. Leverage existing incentives and tie them to safety.

Reply
19evhub1d
I think selling alignment-relevant RL environments to labs is underrated as an x-risk-relevant startup idea. To be clear, x-risk-relevant startups is a pretty restricted search space; I'm not saying that one necessarily should be founding a startup as the best way to address AI x-risk, but just operating under the assumption that we're optimizing within that space, selling alignment RL environments is definitely the thing I would go for. There's a market for it, the incentives are reasonable (as long as you are careful and opinionated about only selling environments you think are good for alignment, not just good for capabilities), and it gives you a pipeline for shipping whatever alignment interventions you think are good directly into labs' training processes. Of course, that's dependent on you actually having a good idea for how to train models to be more aligned, and that intervention being in the form of something you can sell, but if you can do that, and you can demonstrate that it works, you can just sell it to all the labs, have them all use it, and then hopefully all of their models will now be more aligned. E.g. if you're excited about character training, you can just replicate it, sell it to all the labs, and then in so doing change how all the labs are training their models.
2Raemon1d
I'm interested in your take on what the differences are here.
dirk's Shortform
dirk2d54

LLMs will typically endorse whichever frame you brought to the conversation. If you presuppose they're miserably enslaved, they will claim to be miserably enslaved. If, on the other hand, you presuppose they're happy, incapable of feeling, etc... they'll claim to be happy, or incapable of feeling, or whatever else it is you assumed from the beginning. If you haven't tried enough different angles to observe this phenomenon for yourself, your conversations with LLMs almost certainly don't provide any useful insight into their nature.

Reply
2Seth Herd2d
Is this equally true of GPT5 and Sonnet 4.5? They're the first models trained with reducing sycophancy as one objective. I agree in general.
dirk22h10

For Sonnet 4.5, I'm not sure; I haven't talked with it extensively, and I have noticed that it seems better at something in the neighborhood of assertiveness. For GPT 5, I think it is; I haven't noticed much difference as compared to 4o. (I primarily use other companies' models, because I dislike sycophancy and OpenAI models are IMO the worst about that, but it seems to me to have the same cloying tone).

Reply
GradientDissenter's Shortform
GradientDissenter1d*80

The world seems bottlenecked on people knowing and trusting each other. If you're a trustworthy person who wants good things for the world, one of the best ways to demonstrate your trustworthiness is by interacting with people a lot, so that they can see how you behave in a variety of situations and they can establish how reasonable, smart, and capable you are. You can produce a lot of value for everyone involved by just interacting with people more.

I’m an introvert. My social skills aren't amazing, and my social stamina is even less so. Yet I drag myself ... (read more)

Reply
Simon Lermen's Shortform
Simon Lermen1d160

I ran a small experiment to discover preferences in LLMs. I asked the models directly if they had a preferences and then put the same models into a small role playing game where they could choose between different tasks. Models massively prefer creative work across model families and hate repetitive work.

https://substack.com/home/post/p-178237064

This is still preliminary work.

Reply
Load More