LESSWRONG
LW

2177
habryka
48693Ω17992705647118
Message
Dialogue
Subscribe

Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com. 

(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
56Habryka's Shortform Feed
Ω
6y
Ω
439
Plans A, B, C, and D for misalignment risk
habryka19h69

Shouldn't we at least proceed until we can't very confidently proceed safely?

I mean, I think AI ending up uncontrollably powerful are on the order of 1-3% likely for the next generation of models. That seems far far too high. I think we are right now in a position where we can't very confidently proceed safely. 

Reply
Plans A, B, C, and D for misalignment risk
habryka1dΩ342

Thus, the numbers I give below are somewhat more optimistic than what you'd get just given the level of political will corresponding to each of these scenarios (as this will might be spent incompetently).

FWIW, for at least plan A and plan B, I feel like the realistic multiplier on how optimistic these are is like at least 3x? Like, I don't see an argument for this kind of plan working with 90%+ probability given realistic assumptions about execution quality.

(I also have disagreements about whether this will work, but at least plan A well-executed seems like it would notice it was starting to be very reckless and then be in a good position to slow down more)

Reply
Irresponsible Companies Can Be Made of Responsible Employees
habryka1d20

also commonly heard: "

Looks like you accidentally dropped a sentence there.

Reply
Cole Wyeth's Shortform
habryka2d40

I am wondering whether your experiences were formed via the first generation of reasoning models, and my guess is you are also thinking of asking different kinds of questions.

The thing that LLMs are really great at is to speak and think in the ontology and structure that is prevalent among experts in any field. This is usually where the vast majority of evidence comes from. LLMs aren't going to make up whole ontologies about how bankruptcy law works, or how datacenter security works. It might totally make up details, but it won't make up the high-level picture. 

Second, this has just gotten a lot better over the last 6 months. GPT-5 still lies a good amount, but vastly less than o1 or o3. I found o1 almost unusable on this dimension.

Reply
Cole Wyeth's Shortform
habryka3d2-2

I agree the LLMs are somewhat worse, especially compared to rationalist-adjacent experts in specialized fields, but they really aren't that bad for most things. Like I researched the state of the art of datacenter security practices yesterday, and I am not like 99% confident that the AI got everything right, but I am pretty sure it helped me understand the rough shape of things a lot better.

Reply
Cole Wyeth's Shortform
habryka4d86

That is not how bayesian evidence works. I am treating LLM output as somewhat less trustworthy than I would trust what a colleague of mine says, but not fundamentally different. I am skeptical that you spend your days double checking every conversation you have with another human. I also don’t think you should spend your days double checking every single thing an LLM tells you.


This feels kind of like the early conversations about Wikipedia where people kept trying to insist Wikipedia is “not a real source”.

Reply
Cole Wyeth's Shortform
habryka4d5-8

Not most of the time! Like, I sometimes ask multiple LLMs, but I don't verify every fact that an LLM tells me, unless it's a domain where I predict LLMs are particularly likely to hallucinate. I keep in mind that stuff is sometimes hallucinated, but most of the time it's fine to know that something is quite probably true. 

Reply
Cole Wyeth's Shortform
habryka5d2218

I use LLMs for basically anything substantial that I write. Like, a lot of my knowledge of random facts about the world is downstream of having asked LLMs about it. It would be IMO pretty dumb to write a post that is e.g. trying to learn from past social movement failures and not have an LLM look over it to see whether it's saying anything historically inaccurate.

So I do think there needs to be some bar here that is not "LLMs were involved in any way". I do share a bunch of concerns in the space.

Reply
Omelas Is Perfectly Misread
habryka6d30

But like, how do you know about the details here? Did you read anything that made you believe this? See an interview? Did you know anyone involved directly or indirectly?

Reply
Omelas Is Perfectly Misread
habryka6d20

Maybe referencing this? http://www.castaliahouse.com/when-good-science-fiction-goes-bad/ 

Reply
Load More
A Moderate Update to your Artificial Priors
A Moderate Update to your Organic Priors
Concepts in formal epistemology
244Banning Said Achmiz (and broader thoughts on moderation)
2mo
395
97Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
3mo
43
23Open Thread - Summer 2025
4mo
69
93ASI existential risk: Reconsidering Alignment as a Goal
6mo
14
358LessWrong has been acquired by EA
6mo
52
782025 Prediction Thread
9mo
21
23Open Thread Winter 2024/2025
9mo
59
46The Deep Lore of LightHaven, with Oliver Habryka (TBC episode 228)
10mo
4
36Announcing the Q1 2025 Long-Term Future Fund grant round
10mo
2
112Sorry for the downtime, looks like we got DDosd
10mo
13
Load More
CS 2881r
a month ago
(+204)
Roko's Basilisk
3 months ago
Roko's Basilisk
3 months ago
AI Psychology
9 months ago
(+58/-28)