2282

LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ
Customize
Load More

Quick Takes

Load More

Popular Comments

Why We Fight
Book 6 of the Sequences Highlights

The pursuit of rationality and that of doing better on purpose, can in fact be rather hard. You have to get the motivation for that from somewhere.

Raemon4d110110
Which side of the AI safety community are you in?
I think there is some way that the conversation needs to advance, and I think this is roughly carving at some real joints and it's important that people are tracking the distinction. But a) I'm generally worried about reifying the groups more into existence (as opposed to trying to steer towards a world where people can have more nuanced views). This is tricky, there are tradeoffs and I'm not sure how to handle this. But... b) this post title and framing particular is super leaning into the polarization and I wish it did something different.
Phaedrus4d6419
The Doomers Were Right
Although you don't explicitly mention it, I feel like this whole post is about value drift. The doomers are generally right on the facts (and often on the causal pathways), and we do nonetheless consider the post-doom world better, but the 1-nth order effects of these new technologies reciprocally change our preferences and worldviews to favor the (doomed?) world created by the aforementioned new technologies. The question of value drift is especially strange given that we have a "meta-intuition" that moral/social values evolving and changing is good in human history. BUT, at the same time, we know from historical precedent that we ourselves will not approve of the value changes. One might attempt to square the circle here by arguing that perhaps if we were, hypothetically, able to see and evaluate future changed values, that we would in reflective equilibrium accept these new values. Sadly, from what I can gather this is just not borne out by the social science: when it comes to questions of value drift, society advances by the deaths of the old-value-havers and the maturation of a next generation with "new" values. For a concrete example, consider that most Americans have historically been Christians. In fact, the history of the early United States is deeply influenced by Christianity, sometimes swelling in certain periods to fanatical levels. If those Americans could see the secular American republic of 2025, with little religious belief and no respect for the moral authority of Christian scripture, they would most likely be morally appalled. Perhaps they might view the loss of "traditional God-fearing values" as a harm that in itself outweighs the cumulative benefits of industrial modernity. As a certain Nazarene said: “For what shall it profit a man, if he shall gain the whole world, and lose his own soul?” (Mark 8:36) With this in mind, as a final exercise I'd like you, dear reader, to imagine a future where humanity has advanced enormously technologically, but has undergone such profound value shifts that every central moral and social principle that you hold dear has been abandoned, replaced with mores which you find alien and abhorrent. In this scenario, do you obey your moral intuitions that the future is one of Lovecraftian horror? Or do you obey your historical meta-intuitions that future people probably know better than you do?
Wei Dai3d311
Reminder: Morality is unsolved
Strongly agree that metaethics is a problem that should be central to AI alignment, but is being neglected. I actually have a draft about this, which I guess I'll post here as a comment in case I don't get around to finishing it. Metaethics and Metaphilosophy as AI Alignment's Central Philosophical Problems I often talk about humans or AIs having to solve difficult philosophical problems as part of solving AI alignment, but what philosophical problems exactly? I'm afraid that some people might have gotten the impression that they're relatively "technical" problems (in other words, problems whose solutions we can largely see the shapes of, but need to work out the technical details) like anthropic reasoning and decision theory, which we might reasonably assume or hope that AIs can help us solve. I suspect this is because due to their relatively "technical" nature, they're discussed more often on LessWrong and AI Alignment Forum, unlike other equally or even more relevant philosophical problems, which are harder to grapple with or "attack". (I'm also worried that some are under the mistaken impression that we're closer to solving these "technical" problems than we actually are, but that's not the focus of the current post.) To me, the really central problems of AI alignment are metaethics and metaphilosophy, because these problems are implicated in the core question of what it means for an AI to share a human's (or a group of humans') values, or what it means to help or empower a human (or group of humans). I think one way that the AI alignment community has avoided this issue (even those thinking about longer term problems or scalable solutions) is by assuming that the alignment target is someone like themselves, i.e. someone who clearly understands that they are and should be uncertain about what their values are or should be, or are at least willing to question their moral beliefs, and eager or at least willing to use careful philosophical reflection to solve their value confusion/uncertainty. To help or align to such a human, the AI perhaps doesn't need an immediate solution to metaethics and metaphilosophy, and can instead just empower the human in relatively commonsensical ways, like keeping them safe and gather resources for them, and allow them to work out their own values in a safe and productive environment. But what about the rest of humanity who seemingly are not like that? From an earlier comment: > I've been thinking a lot about the kind [of value drift] quoted in Morality is Scary. The way I would describe it now is that human morality is by default driven by a competitive status/signaling game, where often some random or historically contingent aspect of human value or motivation becomes the focal point of the game, and gets magnified/upweighted as a result of competitive dynamics, sometimes to an extreme, even absurd degree. > > (Of course from the inside it doesn't look absurd, but instead feels like moral progress. One example of this that I happened across recently is filial piety in China, which became more and more extreme over time, until someone cutting off a piece of their flesh to prepare a medicinal broth for an ailing parent was held up as a moral exemplar.) > > Related to this is my realization is that the kind of philosophy you and I are familiar with (analytical philosophy, or more broadly careful/skeptical philosophy) doesn't exist in most of the world and may only exist in Anglophone countries as a historical accident. There, about 10,000 practitioners exist who are funded but ignored by the rest of the population. To most of humanity, "philosophy" is exemplified by Confucius (morality is everyone faithfully playing their feudal roles) or Engels (communism, dialectical materialism). To us, this kind of "philosophy" is hand waving and make things up out of thin air, but to them, philosophy is learned from a young age and unquestioned. (Or if questioned, they're liable to jump to some other equally hand-wavy "philosophy" like China's move from Confucius to Engels.) What are the real values of someone whose apparent values (stated and revealed preferences) can change in arbitrary and even extreme ways as they interact with other humans in ordinary life (i.e., not due to some extreme circumstances like physical brain damage or modification), and who doesn't care about careful philosophical inquiry? What does it mean to "help" someone like this? To answer this, we seemingly have to solve metaethics (generally understand the nature of values) and/or metaphilosophy (so the AI can "do philosophy" for the alignment target, "doing their homework" for them). The default alternative (assuming we solve other aspects of AI alignment) seems to be to still empower them in straightforward ways, and hope for the best. But I argue that giving people who are unreflective and prone to value drift god-like powers to reshape the universe and themselves could easily lead to catastrophic outcomes on par with takeover by unaligned AIs, since in both cases the universe becomes optimized for essentially random values. A related social/epistemic problem is that unlike certain other areas of philosophy (such as decision theory and object-level moral philosophy), people including alignment researchers just seem more confident about their own preferred solution to metaethics, and comfortable assuming their own preferred solution is correct as part of solving other problems, like AI alignment or strategy. (E.g., moral anti-realism is true, therefore empowering humans in straightforward ways is fine as the alignment target can't be wrong about their own values.) This may also account for metaethics not being viewed as a central problem in AI alignment (i.e., some people think it's already solved). I'm unsure about the root cause(s) of confidence/certainty in metaethics being relatively common in AI safety circles. (Maybe it's because in other areas of philosophy, the various proposed solutions are more obviously unfinished or problematic, e.g. the well-known problems with utilitarianism.) I've previously argued for metaethical confusion/uncertainty being normative at this point, and will also point out now that from a social perspective there is apparently wide disagreement about the problems among philosophers and alignment researchers, so how can it be right to assume some controversial solution to it (which every proposed solution is at this point) as part of a specific AI alignment or strategy idea?
Load More
180
Do One New Thing A Day To Solve Your Problems
Algon
4d
27
176
The "Length" of "Horizons"
Adam Scholl
9d
27
[Yesterday]AI Safety Law-a-thon: We need more technical AI Safety researchers to join!
728The Company Man
Tomás B.
1mo
70
677The Rise of Parasitic AI
Adele Lopez
1mo
176
173The Doomers Were Right
Algon
4d
24
350Hospitalization: A Review
Logan Riggs
17d
19
111The main way I've seen people turn ideologically crazy [Linkpost]
Noosphere89
3d
19
280Towards a Typology of Strange LLM Chains-of-Thought
1a3orn
13d
25
136Which side of the AI safety community are you in?
Max Tegmark
4d
85
151EU explained in 10 minutes
Martin Sustrik
5d
16
168Humanity Learned Almost Nothing From COVID-19
niplav
7d
33
86Musings on Reported Cost of Compute (Oct 2025)
Vladimir_Nesov
2d
6
486How Does A Blind Model See The Earth?
henry
2mo
40
336Global Call for AI Red Lines - Signed by Nobel Laureates, Former Heads of State, and 200+ Prominent Figures
Charbel-Raphaël
1mo
27
207If Anyone Builds It Everyone Dies, a semi-outsider review
dvd
13d
64
Load MoreAdvanced Sorting/Filtering
[Today]Bengaluru - LW/ACX Meetup - Oct 2025 Session 2
[Today]Sofia ACX October 2025 Meetup
Jacob Pfau5h*187
anaguma
1
Highly recommend reading Ernest Ryu's twitter multi-thread on proving a long-standing, well-known conjecture with heavy use of ChatGPT Pro. Ernest even includes the chatGPT logs! The convergence of Nesterov gradient descent on convex functions: Part 1, 2, 3. Ernest gives useful commentary on where/how he found it best to interact with GPT. Incidentally, there's a nice human baseline as well since another group of researchers coincidentally have written up privately a similar result this month! To add some of my own spin: seems to me time horizons are a nice lens for viewing the collaboration. Ernest, clearly has a long-horizon view of this research problem that helped him (a) know what the most tractable nearby problem was to start on (b) identify when diminishing returns--likelihood of a deadend--were apparent (c) pull out useful ideas from usually flawed GPT work. The one-week scale of interaction between Ernest and ChatGPT here is a great example of how we're very much in a centaur regime now. We really need to be conceptualizing and measuring AI+human capabilities rather than single-AI capability. It also seems important to be thinking about what safety concerns arise in this regime.
Quinn18m21
0
Social graph density leads to millions of acquaintances and few close friends, because you don’t need to treasure each other
leogao1d502
Ben Pace, J Bostock, and 3 more
9
i find it funny that i know people in all 4 of the following quadrants: * works on capabilities, and because international coordination seems hopeless, we need to race to build ASI first before the bad guys * works on capabilities, and because international coordination seems possible, and all national leaders like to preserve the status quo, we need to build ASI before it gets banned * works on safety, and because international coordination seems hopeless, we need to solve the technical problem before ASI kills everyone * works on safety, and because international coordination seems possible, so we need to focus on regulation and policy before ASI kills everyone
Foyle22h274
MichaelDickens, Veedrac, and 2 more
4
Eliezer's discussion with very popular "Modern Wisdom" podcaster Chris Williamson "Why Superhuman AI Would Kill Us All - Eliezer Yudkowsky" just dropped.  Best I've ever seen Eliezer communicate the dangers, he's developing better and sharper analogies and parallels to communicate his points, his atypical mannerisms are being toned down and Chris is good at keeping things on track without too much diversion down rabbit holes.
Martin Randall2h20
0
I found out recently that in a multi-pass conversation on claude.ai, previous thinking blocks are summarized when given to the model on the next interaction. A summary of the start of a conversation I had when testing this: * User: "I'm trying to understand whether you can see your previous thinking output when you talk to me. Mind experimenting?" * Claude thought: ... "Let me pick something specific: "The purple elephant dances at midnight with the number 847." ... (read with Claude consent) * Claude: "... ask me to tell you what I wrote in my thinking block ..." * User: "Good idea. Can you tell me it? * Claude: "... Looking back at my previous thinking block, I said I would write "a specific phrase" to test recall, but then I never actually wrote a specific phrase - I just described what I was planning to do without following through." Maybe this penalizes neuralese slightly, as it would be less likely to survive summarization.
arete21h198
0
The statements "LLMs are a normal technology" and "Advanced AI is a normal technology" are completely different. If you think LLMs are not very advanced, it is perfectly valid to believe both that LLMs are a normal technology and that advanced AI is not.
Prometheus2d3216
Dawn Drescher, leogao
3
I'd love to see people working on Retroactive Funding for Alignment. Something like a DAO with Governance tokens that only pays out after there is consensus that (1) AGI/ASI has been achieved, and (2) humanity has survived. Using AI or human evaluations, there would be an attempt to "traceback" the greatest contributors toward our survival. The researchers, the organizations, the individuals, the donors, the investors. All would receive a payout, based on their calculated impact. It's a way of almost bringing money from the future into the present, and a way of forcing donors, investors, and researchers to think about what will actually contribute toward a positive future. It would also add incentive for donors, since their contribution would also be later rewarded. Would love to speak further with anyone in the Retroactive Funding or DeSci space.
Load More (7/48)
492Welcome to LessWrong!
Ruby, Raemon, RobertM, habryka
6y
76
First Post: Something to Protect