1

LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ
Customize
Load More

Quick Takes

Your Feed
Load More

Popular Comments

Harry Potter and The Methods of Rationality

What if Harry was a scientist? What would you do if the universe had magic in it? 
A story that conveys many rationality concepts, helping to make them more visceral, intuitive and emotionally compelling.

Anon User13h2710
Do not hand off what you cannot pick up
I do not see how this has any chance at scaling. Who sits at the root of the delegation tree? The CEO? And they are spending all their time doing things they do not know how to do (as your rule does not allow them to delegate those tasks, and presumably there are enough to take all their time)? That does not sound to me like how competent delegation should look like. And being able to do X vs being able to evaluate someone else doing X are of course related, but still quite different skills.
Wei Dai13h*214
The problem of graceful deference
> Yudkowsky, being the best strategic thinker on the topic of existential risk from AGI This seems strange to say, given that he: 1. decided to aim for "technological victory", without acknowledging or being sufficiently concerned that it would inspire others to do the same 2. decided it's feasible to win the AI race with a small team and while burdened by Friendliness/alignment/x-safety concerns 3. overestimated likely pace of progress relative to difficulty of problems, even on narrow problems that he personally focused on like decision theory (still far from solved today, ~16 years later. Edit: see UDT shows that decision theory is more puzzling than ever) 4. had large responsibility for others being overly deferential to him by writing/talking in a highly confident style, and not explicitly pushing back on the over-deference 5. is still overly focused on one particular AI x-risk (takeover due to misalignment) and underemphasizing or ignoring many other disjunctive risks These seemed like obvious mistakes even at the time (I wrote posts/comments arguing against them), so I feel like the over-deference to Eliezer is a completely different phenomenon from "But you can’t become a simultaneous expert on most of the questions that you care about." or has very different causes. In other words, if you were going to spend your career on AI x-safety, of course you could have become an expert on these questions first.
cousin_it1d*222
Universal Basic Income in an AGI Future
Thank you for writing this! I think a lot of people miss this point, and keep talking about UBI in the AI future without being clear which power bloc will ensure UBI will continue existing, and why. However, I'd like to make a big correction to this. Your point exactly matches my thinking until a few months ago. Then I realized something that changes it a lot, and is also I think crucial to understand. Namely, elites have always needed the labor of the masses. The labor of serfs was needed, the labor of slaves was needed. That circumstance kept serfs and slaves alive, but not in an especially good position. The masses were exploited by elites throughout most of history. And it doesn't depend on economic productivity either: a slave in a diamond mine can have very high productivity by the numbers, but still be enslaved. The circumstance that changed things, and made the masses in Western countries enjoy (temporarily) a better position than serfs in the past, was the military relevance of the masses. It started with the invention of firearms. A peasant with a gun can be taught to shoot a knight dead, and knights correctly saw even at the time that this would erode their position. I'm not talking about rebellion here (rebellions by the masses against the elites have always been very hard), but rather on whether the masses are needed militarily for large scale conflicts. And given military relevance, economic productivity isn't actually that important. It's possible to have a leisure class that doesn't do much work except for being militarily relevant; knights are a good example. It's actually pretty hard to find historical examples of classes that were militarily relevant but treated badly. Even warhorses were treated much better than peasant horses. Being useful keeps you alive, but exploited; being dangerous is what keeps you alive and treated well. If we by some miracle end up with a world where the masses of people remain militarily relevant, but not needed for productive work, then I can imagine the entire masses becoming such a leisure class. That'd be a nice future if we could get it. However, as you point out, the future will have not just AI labor, but AI armies as well. Ensuring the military relevance of the masses seems just as difficult as ensuring their economic relevance. So my comment, unfortunately, isn't replacing the problem with an easier one; just with a different one.
Load More
[Today]AGI Forum @ Purdue University
[Tomorrow]Agentic property-based testing: finding bugs across the Python ecosystem
494Welcome to LessWrong!
Ruby, Raemon, RobertM, habryka
6y
76
Drake Thomas22h7916
Zach Stein-Perlman
1
A few months ago I spent $60 ordering the March 2025 version of Anthropic's certificate of incorporation from the state of Delaware, and last week I finally got around to scanning and uploading it. Here's a PDF! After writing most of this shortform, I discovered while googling related keywords that someone had already uploaded the 2023-09-21 version online here, which is slightly different.  I don't particularly bid that people spend their time reading it; it's very long and dense and I predict that most people trying to draw important conclusions from it who aren't already familiar with corporate law (including me) will end up being somewhat confused by default. But I'd like more transparency about the corporate governance of frontier AI companies and this is an easy step. Anthropic uses a bunch of different phrasings of its mission across various official documents; of these, I believe the COI's is the most legally binding one, which says that "the specific public benefit that the Corporation will promote is to responsibly develop and maintain advanced Al for the long term benefit of humanity." I like this wording less than others that Anthropic has used like "Ensure the world safely makes the transition through transformative AI", though I don't expect it to matter terribly much. I think the main thing this sheds light on is stuff like Maybe Anthropic's Long-Term Benefit Trust Is Powerless: as of late 2025, overriding the LTBT takes 85% of voting stock or all of (a) 75% of founder shares (b) 50% of series A preferred (c) 75% of non-series-A voting preferred stock. (And, unrelated to the COI but relevant to that post, it is now public that neither Google nor Amazon hold voting shares.)  The only thing I'm aware of in the COI that seems concerning to me re: the Trust is a clause added to the COI sometime between the 2023 and 2025 editions, namely the italicized portion of the following: I think this means that the 3 LTBT-appointed directors do not have the abi
GradientDissenter13h*302
Eli Tyre, Jozdien
4
LessWrong feature request: make it easy for authors to opt-out of having their posts in the training data. If most smart people were put in the position of a misaligned AI and tried to take over the world, I think they’d be caught and fail.[1] If I were a misaligned AI, I think I’d have a much better shot at succeeding, largely because I’ve read lots of text about how people evaluate and monitor models, strategies schemers can use to undermine evals and take malicious actions without being detected, and creative paths to taking over the world as an AI. A lot of that information is from LessWrong.[2] It's unfortunate that this information will probably wind up in the pre-training corpus of new models (though sharing the information is often still worth it overall to share most of this information[3]). LessWrong could easily change this for specific posts! They could add something to their robots.txt to ask crawlers looking to scrape training data to ignore the pages. They could add canary strings to the page invisibly. (They could even go a step further and add something like copyrighted song lyrics to the page invisibly.) If they really wanted, they could put the content of a post behind a captcha for users who aren’t logged in. This system wouldn't be perfect (edit: please don't rely on these methods. They're harm-reduction for information where you otherwise would have posted without any protections), but I think even reducing the odds or the quantity of this data in the pre-training corpus could help. I would love to have this as a feature at the bottom of drafts. I imagine a box I could tick in the editor that would enable this feature (and maybe let me decide if I want the captcha part or not). Ideally the LessWrong team could prompt an LLM to read users’ posts before they hit publish. If it seems like the post might be something the user wouldn't want models trained on, the site could could proactively ask the user if they want to have their post be remove
Simon Lermen18h*2922
avturchin, plex, and 3 more
9
The Term Recursive Self-Improvement Is Often Used Incorrectly Also on my substack. The term Recursive Self-Improvement (RSI) now seems to get used sometimes for any time AI automates AI R&D. I believe this is importantly different from its original meaning and changes some of the key consequences. OpenAI has stated that their goal is recursive self-improvement, with projections of hundreds of thousands of automated AI R&D researchers by next year and full AI researchers by 2028. This appears to be AI-automated AI research rather than RSI in the narrow sense. When Eliezer Yudkowsky discussed RSI in 2008, he was referring specifically to an AI instance improving itself by rewriting the cognitive algorithm it is running on—what he described as "rewriting your own source code in RAM." According to the LessWrong wiki, RSI refers to "making improvements on one's own ability of making self-improvements." However, current AI systems have no special insights into their own opaque functioning. Automated R&D might mostly consist of curating data, tuning parameters, and improving RL-environments to try to hill-climb evaluations much like human researchers do. Eliezer concluded that RSI (in the narrow sense) would almost certainly lead to fast takeoff. The situation is more complex for AI-automated R&D, where the AI does not understand what it is doing. I still expect AI-automated R&D to substantially speed up AI development. Why This Distinction Matters Eliezer described the critical transition as when "the AI's metacognitive level has now collapsed to identity with the AI's object level." I believe he was basically imagining something like if the human mind and evolution merged to the same goal—the process that designs the cognitive algorithm and the cognitive algorithm itself merging. As an example, imagine the model realizes that its working memory is too small to be very effective at R&D and it directly edits its working memory. This appears less likely if the AI r
Mo Putera14h160
0
Interesting anecdotes from an ex-SpaceX engineer who started out thinking "Elon's algorithm" was obviously correct and gradually grew cynical as SpaceX scaled: This makes me wonder if SpaceX could actually be substantially faster if it took systems engineering as seriously as the author hoped (like say the Apollo program did), overwhelmingly dominant as they currently are in terms of mass launch fraction etc. To quote the author:
Mo Putera15h122
0
The ever-colorful Peter Watts on how science works because, not despite, scientists are asses: (This might be biased by the fields Watts is familiar with and with his own tendency to seek fights though, cf. Scott's different worlds. I don't get the sense that this is universal or all that effectiveness-improving at finding out the truth of the matter.)
dynomight4d9210
the gears to ascension, Trinley Goldenberg, and 5 more
11
Just had this totally non-dystopian conversation: "...So for other users, I spent a few hours helping [LLM] understand why it was wrong about tariffs." "Noooo! That does not work." "Relax, it thanked me and stated it was changing its answer." "It's lying!" "No, it just confirmed that it's not lying."
Daniel Paleka5d10130
0
Slow takeoff for AI R&D, fast takeoff for everything else Why is AI progress so much more apparent in coding than everywhere else? Among people who have "AGI timelines", most do not set their timelines based on data, but rather update them based on their own day-to-day experiences and social signals. As of 2025, my guess is that individual perception of AI progress correlates with how closely someone's daily activities resemble how an AI researcher spends their time. The reason why users of coding agents feel a higher rate of automation in their bones, whereas people in most other occupations don't, is because automating engineering has been the focus of the industry for a while now. Despite the expectations for 2025 to be the year of the AI agent, it turns out the industry is small and cannot have too many priorities, hence basically the only competent agents we got in 2025 so far are coding agents. Everyone serious about winning the AI race is trying to automate one job: AI R&D. To a first approximation, there is no point yet in automating anything else, except to raise capital (human or investment), or to earn money. Until you are hitting diminishing returns on your rate of acceleration, unrelated capabilities are not a priority. This means that a lot of pressure is being applied to AI research tasks at all times; and that all delays in automation of AI R&D are, in a sense, real in a way that's not necessarily the case for tasks unrelated to AI R&D. It would be odd if there were easy gains to be made in accelerating the work of AI researchers on frontier models in addition to what is already being done across the industry. I don't know whether automating AI research is going to be smooth all the way there or not; my understanding is that slow vs fast takeoff hinges significantly on how bottlenecked we become by non-R&D factors over time. Nonetheless, the above suggests a baseline expectation: AI research automation will advance more steadily compared to auto
Load More (7/53)
41Solstice Season 2025: Ritual Roundup & Megameetups
Raemon
6d
6
315Legible vs. Illegible AI Safety Problems
Ω
Wei Dai
3d
Ω
92
302I ate bear fat with honey and salt flakes, to prove a point
aggliu
9d
49
745The Company Man
Tomás B.
2mo
70
187Unexpected Things that are People
Ben Goldhaber
4d
10
302Why I Transitioned: A Case Study
Fiora Sunshine
11d
53
693The Rise of Parasitic AI
Adele Lopez
2mo
178
64How I Learned That I Don't Feel Companionate Love
johnswentworth
16h
7
129Condensation
Ω
abramdemski
3d
Ω
13
69Do not hand off what you cannot pick up
habryka
14h
10
90The problem of graceful deference
TsviBT
2d
20
157Mourning a life without AI
Nikola Jurkovic
5d
59
100From Vitalik: Galaxy brain resistance
Gabriel Alfour
2d
1
Load MoreAdvanced Sorting/Filtering
44
Human Values ≠ Goodness
johnswentworth
23m
32
315
Legible vs. Illegible AI Safety Problems
Ω
Wei Dai
3d
Ω
92
First Post: Chapter 1: A Day of Very Low Probability
Berkeley Solstice Weekend
2025 NYC Secular Solstice & East Coast Rationalist Megameetup