1365

LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ
Customize
Load More

Quick Takes

Your Feed
Load More

Popular Comments

Superintelligence FAQ
By Scott Alexander

A basic primer on why AI might lead to human extinction, and why solving the problem is difficult. Scott Alexander walks readers through a number of questions with evidence based on progress from machine learning.

Raemon1dΩ215419
Please, Don't Roll Your Own Metaethics
What are you supposed to do other than roll your own metaethics?
Eliezer Yudkowsky8h2616
Warning Aliens About the Dangerous AI We Might Create
There is an extremely short period where aliens as stupid as us would benefit at all from this warning. In humanities's case, there's only a couple of centuries between when we can send and detect radio signals, and when we either destroy ourselves or perhaps get a little wiser. Aliens cannot be remotely common or the galaxies would be full and we would find ourselves at an earlier period when those galaxies were not yet full. The chance that any one of these signals helps anyone close enough to decode them at all is nearly 0.
Wei Dai2d*5025
The problem of graceful deference
> Yudkowsky, being the best strategic thinker on the topic of existential risk from AGI This seems strange to say, given that he: 1. decided to aim for "technological victory", without acknowledging or being sufficiently concerned that it would inspire others to do the same 2. decided it's feasible to win the AI race with a small team and while burdened by Friendliness/alignment/x-safety concerns 3. overestimated likely pace of progress relative to difficulty of problems, even on narrow problems that he personally focused on like decision theory (still far from solved today, ~16 years later. Edit: see UDT shows that decision theory is more puzzling than ever) 4. had large responsibility for others being overly deferential to him by writing/talking in a highly confident style, and not explicitly pushing back on the over-deference 5. is still overly focused on one particular AI x-risk (takeover due to misalignment) and underemphasizing or ignoring many other disjunctive risks These seemed like obvious mistakes even at the time (I wrote posts/comments arguing against them), so I feel like the over-deference to Eliezer is a completely different phenomenon from "But you can’t become a simultaneous expert on most of the questions that you care about." or has very different causes. In other words, if you were going to spend your career on AI x-safety, of course you could have become an expert on these questions first.
Load More
Baybar5h5517
Fabien Roger, faul_sname
2
Today's news of the large scale, possibly state sponsored, cyber attack using Claude Code really drove home for me how much we are going to learn about the capabilities of new models over time once they are deployed. Sonnet 4.5's system card would have suggested this wasn't possible yet. It described Sonnet 4.5s cyber capabilities like this:  I think it's clear based on this news of this cyber attack that mostly-autonomous and advanced cyber operations are possible with Sonnet 4.5. From the report: What's even worse about this is that Sonnet 4.5 wasn't even released at the time of the cyber attack. That means that this capability emerged in a previous generation of Anthropic model, presumably Opus 4.1 but possibly Sonnet 4. Sonnet 4.5 is likely more capable of large scale cyber attacks than whatever model did this, since it's system card notes that it performs better on cyber attack evals than any previous Anthropic model. I imagine when new models are released, we are going to continue to discover new capabilities of those models for months and maybe even years into the future, if this case is any guide. What's especially concerning to me is that Anthropic's team underestimated this dangerous capability in its system card. Increasingly, it is my expectation that system cards are understating capabilities, at least in some regards. In the future, misunderstanding of emergent capabilities could have even more serious consequences. I am updating my beliefs towards near-term jumps in AI capabilities being dangerous and harmful, since these jumps in capability could possibly go undetected at the time of model release.
Drake Thomas2d9317
breaker25, Zach Stein-Perlman
2
A few months ago I spent $60 ordering the March 2025 version of Anthropic's certificate of incorporation from the state of Delaware, and last week I finally got around to scanning and uploading it. Here's a PDF! After writing most of this shortform, I discovered while googling related keywords that someone had already uploaded the 2023-09-21 version online here, which is slightly different.  I don't particularly bid that people spend their time reading it; it's very long and dense and I predict that most people trying to draw important conclusions from it who aren't already familiar with corporate law (including me) will end up being somewhat confused by default. But I'd like more transparency about the corporate governance of frontier AI companies and this is an easy step. Anthropic uses a bunch of different phrasings of its mission across various official documents; of these, I believe the COI's is the most legally binding one, which says that "the specific public benefit that the Corporation will promote is to responsibly develop and maintain advanced Al for the long term benefit of humanity." I like this wording less than others that Anthropic has used like "Ensure the world safely makes the transition through transformative AI", though I don't expect it to matter terribly much. I think the main thing this sheds light on is stuff like Maybe Anthropic's Long-Term Benefit Trust Is Powerless: as of late 2025, overriding the LTBT takes 85% of voting stock or all of (a) 75% of founder shares (b) 50% of series A preferred (c) 75% of non-series-A voting preferred stock. (And, unrelated to the COI but relevant to that post, it is now public that neither Google nor Amazon hold voting shares.)  The only thing I'm aware of in the COI that seems concerning to me re: the Trust is a clause added to the COI sometime between the 2023 and 2025 editions, namely the italicized portion of the following: I think this means that the 3 LTBT-appointed directors do not have the abi
Wei Dai15h*175
TsviBT, Eli Tyre, and 2 more
7
Today I was author-banned for the first time, without warning and as a total surprise to me, ~8 years after banning power was given to authors, but less than 3 months since @Said Achmiz was removed from LW. It seems to vindicate my fear that LW would slide towards a more censorious culture if the mods went through with their decision. Has anyone noticed any positive effects, BTW? Has anyone who stayed away from LW because of Said rejoined? Edit: In addition to the timing, previously, I do not recall seeing a ban based on just one interaction/thread, instead of some long term pattern of behavior. Also, I'm not linking the thread because IIUC the mods do not wish to see authors criticized for exercising their mod powers, and I also don't want to criticize the specific author. I'm worried about the overall cultural trend caused by admin policies/preferences, not trying to apply pressure to the author who banned me.
Thomas Kwa3h30
0
Blue Origin just landed their New Glenn rocket on a barge! This was an orbital mission to Mars, which makes them the second company to land an orbital booster. For context, New Glenn has over twice the payload (45 tons) of Falcon 9 (~18 tons reuseable), and half the next version of Starship (100 tons). Not many were expecting this. SpaceX did this in December 2015, nearly 10 years ago. But going forwards, their lead will likely be smaller. The next milestone is to land a booster five times (June 2020), and between Blue, Rocket Lab's Neutron, and numerous Chinese companies I would be surprised if it takes until 2030.
Lao Mein6h62
0
>No paper, not even a pre-print >All news articles link to a not-yet-released documentary as the sole source. It doesn't even have a writeup or summary. >The company that made it is know for making "docu-dramas" >No raw data >Kallmann Syndrome primarily used to mock Hitler for having a micropenis Yeah, I don't think the Hitler DNA stuff is legit.
Cleo Nardo1d*2112
Sheikh Abdur Raheem Ali, gwern, and 2 more
5
Remember Bing Sydney? I don't have anything insightful to say here. But it's surprising how little people mention Bing Sydney. If you ask people for examples of misaligned behaviour from AIs, they might mention: * Sycophancy from 4o * Goodharting unit tests from o3 * Alignment-faking from Opus 3 * Blackmail from Opus 4 But like, three years ago, Bing Sydney. The most powerful chatbot was connected to the internet and — unexpectedly, without provocation, apparently contrary to its training objective and prompting — threatening to murder people! Are we memory-holing Bing Sydney or are there are good reasons for not mentioning this more? Here are some extracts from Bing Chat is blatantly, aggressively misaligned (Evan Hubinger, 15th Feb 2023).
GradientDissenter2d*30-8
Zack_M_Davis, lesswronguser123, and 2 more
8
LessWrong feature request: make it easy for authors to opt-out of having their posts in the training data. If most smart people were put in the position of a misaligned AI and tried to take over the world, I think they’d be caught and fail.[1] If I were a misaligned AI, I think I’d have a much better shot at succeeding, largely because I’ve read lots of text about how people evaluate and monitor models, strategies schemers can use to undermine evals and take malicious actions without being detected, and creative paths to taking over the world as an AI. A lot of that information is from LessWrong.[2] It's unfortunate that this information will probably wind up in the pre-training corpus of new models (though sharing the information is often still worth it overall to share most of this information[3]). LessWrong could easily change this for specific posts! They could add something to their robots.txt to ask crawlers looking to scrape training data to ignore the pages. They could add canary strings to the page invisibly. (They could even go a step further and add something like copyrighted song lyrics to the page invisibly.) If they really wanted, they could put the content of a post behind a captcha for users who aren’t logged in. This system wouldn't be perfect (edit: please don't rely on these methods. They're harm-reduction for information where you otherwise would have posted without any protections), but I think even reducing the odds or the quantity of this data in the pre-training corpus could help. I would love to have this as a feature at the bottom of drafts. I imagine a box I could tick in the editor that would enable this feature (and maybe let me decide if I want the captcha part or not). Ideally the LessWrong team could prompt an LLM to read users’ posts before they hit publish. If it seems like the post might be something the user wouldn't want models trained on, the site could could proactively ask the user if they want to have their post be remove
Load More (7/49)
494Welcome to LessWrong!
Ruby, Raemon, RobertM, habryka
6y
76
62
Human Values ≠ Goodness
johnswentworth
1d
54
336
Legible vs. Illegible AI Safety Problems
Ω
Wei Dai
4d
Ω
92
42Solstice Season 2025: Ritual Roundup & Megameetups
Raemon
7d
7
126Paranoia: A Beginner's Guide
habryka
17h
6
336Legible vs. Illegible AI Safety Problems
Ω
Wei Dai
4d
Ω
92
120Please, Don't Roll Your Own Metaethics
Ω
Wei Dai
1d
Ω
28
745The Company Man
Tomás B.
2mo
70
304I ate bear fat with honey and salt flakes, to prove a point
aggliu
10d
50
304Why I Transitioned: A Case Study
Fiora Sunshine
12d
53
694The Rise of Parasitic AI
Adele Lopez
2mo
178
194Unexpected Things that are People
Ben Goldhaber
5d
11
112Do not hand off what you cannot pick up
habryka
2d
16
100How I Learned That I Don't Feel Companionate Love
johnswentworth
2d
24
110The problem of graceful deference
TsviBT
3d
34
52What's so hard about...? A question worth asking
Ruby
20h
3
Load MoreAdvanced Sorting/Filtering
Agentic property-based testing: finding bugs across the Python ecosystem
Thu Nov 13•Toronto
Rationalist Shabbat
Fri Nov 14•Rockville
Berkeley Solstice Weekend
Fri Dec 5•Berkeley
2025 NYC Secular Solstice & East Coast Rationalist Megameetup
Fri Dec 19•New York