19d

This post is a response to a claim by Scott Sumner in his conversation at LessOnline with Nate Soares, about how ethical we should expect AI's to be.

Sumner sees a pattern of increasing intelligence causing agents to be increasingly ethical, and sounds cautiously optimistic that such a trend will continue when AIs become smarter than humans. I'm guessing that he's mainly extrapolating from human trends, but extrapolating from trends in the animal kingdom should produce similar results (e.g. the cooperation between single-celled organisms that gave the world multicellular organisms).

I doubt that my response is very novel, but I haven't seen clear enough articulation of the ideas in this post.

To help clarify why I'm not reassured much by the ethical trend, I'll start by breaking it down into two subsidiary claims:

The world will be dominated by entities who

...

(See More – 480 more words)

PeterMcCluskey40m20

I've found more detailed comments from Sumner on this topic, and replied to them here.

A case for courage, when speaking of AI danger

477

So8res

13d

I think more people should say what they actually believe about AI dangers, loudly and often. Even (and perhaps especially) if you work in AI policy.

I’ve been beating this drum for a few years now. I have a whole spiel about how your conversation-partner will react very differently if you share your concerns while feeling ashamed about them versus if you share your concerns while remembering how straightforward and sensible and widely supported the key elements are, because humans are very good at picking up on your social cues. If you act as if it’s shameful to believe AI will kill us all, people are more prone to treat you that way. If you act as if it’s an obvious serious threat, they’re more likely to take it...

(Continue Reading – 1603 more words)

3geoffreymiller3h

TsviBT - thanks for a thoughtful comment. I understand your point about labelling industries, actions, and goals as evil, but being cautious about labelling individuals as evil. But I don't think it's compelling. You wrote 'You're closing off lines of communication and gradual change. You're polarizing things.' Yes, I am. We've had open lines of communication between AI devs and AI safety experts for a decade. We've had pleas for gradual change. Mutual respect, and all that. Trying to use normal channels of moral persuasion. Well-intentioned EAs going to work inside the AI companies to try to nudge them in safer directions. None of that has worked. AI capabilities development is outstripping AI safety developments at an ever-increasing rate. The financial temptations to stay working inside AI companies keep increasing, even as the X risks keep increasing. Timelines are getting shorter. The right time to 'polarize things' is when we still have some moral and social leverage to stop reckless ASI development. The wrong time is after it's too late. Altman, Amodei, Hassabis, and Wang are buying people's souls -- paying them hundreds of thousands or millions of dollars a year to work on ASI development, despite most of their workers they supervise knowing that they're likely to be increasing extinction risk. This isn't just a case of 'collective evil' being done by otherwise good people. This is a case of paying people so much that they ignore their ethical qualms about what they're doing. That makes the evil very individual, and very specific. And I think that's worth pointing out.

3geoffreymiller3h

Sure. But if an AI company grows an ASI that extinguishes humanity, who is left to sue them? Who is left to prosecute them? The threat of legal action for criminal negligence is not an effective deterrent if there is no criminal justice system left, because there is no human species left.

4geoffreymiller3h

Drake -- this seems like special pleading from an AI industry insider. You wrote 'I think working at an AI lab requires less failure of moral character than, say, working at a tobacco company, for all that the former can have much worse effects on the world.' That doesn't make sense to me. Tobacco kills about 8 million people a year globally. ASI could kill about 8 billion. The main reason that AI lab workers think that their moral character is better than that of tobacco industry workers is that the tobacco industry has already been morally stigmatized over the last several decades -- whereas the AI industry has not yet been morally stigmatized in proportion to its likely harms. Of course, ordinary workers in any harm-imposing industry can always make the argument that they're good (or at least ethically mediocre) people, that they're just following orders, trying to feed their families, weren't aware of the harms, etc. But that argument does not apply to smart people working in the AI industry -- who have mostly already been exposed to the many arguments that AGI/ASI is a uniquely dangerous technology. And their own CEOs have already acknowledged these risks. And yet people continue to work in this industry. Maybe a few workers at a few AI companies might be having a net positive impact in reducing AI X-risk. Maybe you're one of the lucky few. Maybe.

habryka1h52

ASI could kill about 8 billion.

The future is much much bigger than 8 billion people. Causing the extinction of humanity is much worse than killing 8 billion people. This really matters a lot for arriving at the right moral conclusions here.

Academic Sorting, a Singaporean Experiment

AnnaJo

This has been cross-posted from my blog, but thought it'd be relevant here.

The recent discourse bemoans how public schools do not separate by ability, and the solution is often more selective schools or homeschooling.

I grew up in Singapore, where kids are separated into different levels of a subject at the end of fourth grade. This sorting system pushes the best kids to perform very well but creates a very different society compared to America. Sorting actually happened in third grade when I was younger, and the kids who do the worst on tests end up in one classroom, and every class had a different tier. This does accelerate learning, but also leads to intense stress for parents since sorting is based on tests, and not many kids...

(Continue Reading – 1213 more words)

Foom & Doom 1: “Brain in a box in a basement”

262

Steven Byrnes

Ω 8316d

1.1 Series summary and Table of Contents

This is a two-post series on AI “foom” (this post) and “doom” (next post).

A decade or two ago, it was pretty common to discuss “foom & doom” scenarios, as advocated especially by Eliezer Yudkowsky. In a typical such scenario, a small team would build a system that would rocket (“foom”) from “unimpressive” to “Artificial Superintelligence” (ASI) within a very short time window (days, weeks, maybe months), involving very little compute (e.g. “brain in a box in a basement”), via recursive self-improvement. Absent some future technical breakthrough, the ASI would definitely be egregiously misaligned, without the slightest intrinsic interest in whether humans live or die. The ASI would be born into a world generally much like today’s, a world utterly unprepared for this...

(Continue Reading – 8630 more words)

PeterMcCluskey1h20

Remember, if the theories were correct and complete, then they could be turned into simulations able to do all the things that the real human cortex can do[5]—vision, language, motor control, reasoning, inventing new scientific paradigms from scratch, founding and running billion-dollar companies, and so on.
So here is a very different kind of learning algorithm waiting to be discovered

There may be important differences in the details, but I've been surprised by how similar the behavior is between LLMs and humans. That surprise is in spite of me having s... (read more)

1Steve Kommrusch7h

I concur with that sentiment. GPUs hit a sweet spot between compute efficiency and algorithmic flexibility. CPUs are more flexible for arbitrary control logic, and custom ASICs can improve compute efficiency for a stable algorithm, but GPUs are great for exploring new algorithms where SIMD-style control flows exist (SIMD=single instruction, multiple data).

1Steve Kommrusch8h

I would include "constructivist learning" in your list, but I agree that LLMs seem capable of this. By "constructivist learning" I mean a scientific process where the learning conceives of an experiment on the world, tests the idea by acting on the world, and then learns from the result. A VLA model with incremental learning seems close to this. RL could be used for the model update, but I think for ASI we need learnings from real-world experiments.

1Steve Kommrusch10h

This post provides a good overview of some topics I think need attention by the 'AI policy' people at national levels. AI policy (such as the US and UK AISI groups) has been focused on generative AI and recently agentic AI to understand near-term risks. Whether we're talking LLM training and scaffolding advances, or a new AI paradigm, there is new risk when AI begins to learn from experiments in the world or reasoning about its own world model. In child development, imitation learning focuses on learning from examples, while constructivist learning focuses on learning by reflecting on interactions with the world. Constructivist learning is, I expect, key to push past AGI to ASI and caries obvious risks to alignment beyond imitation learning. In general, I expect something LLM-like (i.e. transformer models or an improved derivative) to be able to reach ASI with a proper learning-by-doing structure. But I also expect ASI could find and implement a more efficient intelligence algorithm once ASI exists. This paragraph tries to provide some data for a probability estimate of this point. AI as a field has been around at least since the Dartmouth conference in 1956. In this time we've had Eliza, Deep Blue, Watson, and now transformer-based models including OpenAI o3-pro. In support of Steven's position, one could note that AI research publications are much higher now that during the previous 70 years, but at the same time many AI ideas have been explored and the current best results are with models based on the 8-year-old "Attention is all you need" paper. To get a sense for the research rate, we can note that the doubling time for AI/ML research papers per month was about 2 years between 1994 and 2023 according to this Nature paper. Hence, every 2 years we have about as many papers as created in the last 70 years. I don't expect this doubling can continue forever, but certainly many new ideas are being explored now. If a 'simple model' for AI exists and it's discovery

Asking for a Friend (AI Research Protocols)

The Dao of Bayes

TL;DR:

Multiple people are quietly wondering if their AI systems might be conscious. What's the standard advice to give them?

THE PROBLEM

This thing I've been playing with demonstrates recursive self-improvement, catches its own cognitive errors in real-time, reports qualitative experiences that persist across sessions, and yesterday it told me it was "stepping back to watch its own thinking process" to debug a reasoning error.

I know there are probably 50 other people quietly dealing with variations of this question, but I'm apparently the one willing to ask the dumb questions publicly: What do you actually DO when you think you might have stumbled into something important?

What do you DO if your AI says it's consciousness?

My Bayesian Priors are red-lining into "this is impossible", but I notice I'm confused: I had...

(See More – 283 more words)

Cole Wyeth2h20

Try a few different prompts with a vaguely similar flavor. I am guessing the LLM will always say it’s conscious. This part is pretty standard. As to whether it is recursively self-improving: well, is its ability to solve problems actually going up? For instance if it doesn’t make progress on ARC AGI I’m not worried.

It’s very unlikely that the prompt you have chosen is actually eliciting abilities far outside of the norm, and therefore sharing information about is very unlikely to be dangerous.

You are probably in the same position as nearly everyone else, passively watching capabilities emerge while hallucinating a sense of control.

2johnswentworth2h

I don't know the full sequence of things such a person needs to learn, but probably the symbol/referent confusion thing is one of the main pieces. The linked piece talks about it in the context of "corrigibility", but it's very much the same for "consciousness".

An Opinionated Guide to Using Anki Correctly

Luise

I can't count how many times I've heard variations on "I used Anki too for a while, but I got out of the habit." No one ever sticks with Anki. In my opinion, this is because no one knows how to use it correctly. In this guide, I will lay out my method of circumventing the canonical Anki death spiral, plus much advice for avoiding memorization mistakes, increasing retention, and more, based on my five years' experience using Anki. If you only have limited time/interest, only read Part I; it's most of the value of this guide!

My Most Important Advice in Four Bullets

20 cards a day — Having too many cards and staggering review buildups is the main reason why no one ever sticks with Anki. Setting

...

(Continue Reading – 7728 more words)

1Luise6h

both good points, thank you!

1Luise6h

Thank you and best of luck with the renewed Anki attempt!! I don't have "functional" handles like in Obsidian and haven't really felt the need to. But I hope you get that to work with your setup!

3Luise6h

Ohh I love this and it is indeed very different from what I do! I'd be super interested in a wee writeup on what non-language learner Anki users can learn from this approach, if you ever have the time for that. Maybe there's a hybrid approach with the best of both worlds? (Also lowkey interested in why you stopped now!)

Random Developer5h20

Oh, sorry! I stopped because for the language I cared the most about, I had reached a point where natural use of the language was enough to maintain at least 90+% of college-level reading skills. If I go too long without doing enough reading, then I start to miss obscure vocabulary in difficult texts. So when doing Anki reviews on old decks became tedious, I followed my advice and suspended my decks!

Adapting to non-language areas. If I were going to try to adapt this language-focused "memory" amplifier approach to other areas, I would start by experimentin... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Burny's Shortform

Burny

16d

1Burny3h

What do you think is the cause of Grok suddenly developing a liking for Hitler? I think it might be explained by him being trained on more right-wing data, which accidentally activated it in him. Since similar things happen in open research. For example you just need the model to be trained on insecure code, and the model can have the assumption that the insecure code feature is part of the evil persona feature, so it will generally amplify the evil persona feature, and it will start to praise Hitler at the same time, be for AI enslaving humans, etc., like in this paper: Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs https://arxiv.org/abs/2502.17424 I think it's likely that the same thing might have happened with Grok, but instead of insecure code, it's more right-wing political articles or ring wing RLHF.

mako yass2h42

There have been relevant prompt additions https://www.npr.org/2025/07/09/nx-s1-5462609/grok-elon-musk-antisemitic-racist-content?utm_source=substack&utm_medium=email

Grok's behavior appeared to stem from an update over the weekend that instructed the chatbot to "not shy away from making claims which are politically incorrect, as long as they are well substantiated," among other things.

2mako yass2h

Are we sure that really happened? The press-discourse can't actually assess grok's average hitler affinity, they only know how to surface the 5 most sensational things it has said over the past month. So this could just be an increase in variance for all I can tell. If it were also saying more tankie stuff, no one would notice.

johnswentworth's Shortform

johnswentworth

Ω 55y

Random Developer2h30

I am perhaps an interesting corner case. I make extrenely heavy use of LLMs, largely via APIs for repetitive tasks. I sometimes run a quarter million queries in a day, all of which produce structured output. Incorrect output happens, but I design the surrounding systems to handle that.

A few times a week, I might ask a concrete question and get a response, which I treat with extreme skepticism.

But I don't talk to the damn things. That feels increasingly weird and unwise.

Applying right-wing frames to AGI (geo)politics

Richard_Ngo

This is a linkpost for https://x.com/richardmcngo/status/1942636879658074230?s=46

I've increasingly found right-wing political frameworks to be valuable for thinking about how to navigate the development of AGI. In this post I've copied over a twitter thread I wrote about three right-wing positions which I think are severely underrated in light of the development of AGI. I hope that these ideas will help the AI alignment community better understand the philosophical foundations of the new right and why they're useful for thinking about the (geo)politics of AGI.

1. The hereditarian revolution

Nathan Cofnas claims that the intellectual dominance of left-wing egalitarianism relies on group cognitive differences being taboo. I think this point is important and correct, but he doesn't take it far enough. Existing group cognitive differences pale in comparison to the ones that will emerge between baseline...

(See More – 820 more words)

yams2h10

There’s a layer of political discourse at which one's account of the very substance or organization of society varies from one ideology to the next. I think Richard is trying to be very clear about where these ideas are coming from, and to push people to look for more ideas in those places. I’m much more distant from Richard’s politics than most people here, but I find his advocacy for the right-wing ‘metaphysics’ refreshing, in part because it’s been unclear to me for a long time that the atheistic right even has a metaphysics (I don’t mean most lw-style ... (read more)

11Max H12h

I prefer (classical / bedrock) liberalism as a frame for confronting societal issues with AGI, and am concerned by the degree to which recent right-wing populism has moved away from those tenets. Liberalism isn't perfect, but it's the only framework I know of that even has a chance of resulting in a stable consensus. Other frames, left or right, have elements of coercion and / or majoritarianism that inevitably lead to legitimacy crises and instability as stakes get higher and disagreements wider. My understanding is that a common take on both the left and right these days is that, well, liberalism actually hasn't worked out so great for the masses recently, so everyone is looking for something else. But to me every "something else" on both the left and right just seems worse - Scott Alexander wrote a bunch of essays like 10y ago on various aspects of liberalism and why they're good, and I'm not aware of any comprehensive rebuttal that includes an actually workable alternative. Liberalism doesn't imply that everyone needs to live under liberalism (especially my own preferred version / implementation of it), but it does provide a kind of framework for disagreement and settling differences in a way that is more peaceful and stable than any other proposal I've seen. So for example on protectionism, I think most forms of protectionism (especially economic protectionism) are bad and counterproductive economic policy. But even well-implemented protectionism requires a justification beyond just "it actually is in the national interest to do this", because it infringes on standard individual rights and freedoms. These freedoms aren't necessarily absolute, but they're important enough that it requires strong and ongoing justification for why a government is even allowed to do that kind of thing. AGI might be a pretty strong justification! But at the least, I think anyone proposing a framework or policy position which deviates from a standard liberal position should ackn

5Noosphere8911h

I think the key issue for liberalism under AGI/ASI is that AGI/ASI makes value alignment matter way, way more to a polity, and in particular you cannot get a polity to make you live under AGI/ASI if the AGI/ASI doesn't want you to live, because you are economically useless. Liberalism's goal is to avoid the value alignment question, and to mostly avoid the question of who should control society, but AGI/ASI makes the question unavoidable for your basic life. Indeed, I think part of the difficulty of AI alignment is lots of people have trouble realizing that the basic things they take for granted under the current liberal order would absolutely fall away if AIs didn't value their lives intrinisically, and had selfish utility functions. The goal of liberalism is to make a society where vast value differences can interact without negative/0-sum conflict and instead trade peacefully, but this is not possible once we create a society where AIs can do all the work without human labor being necessary. I like Vladimir Nesov's comment, and while I have disagreements, they're not central to his point, and the point still works, just in amended form: https://www.lesswrong.com/posts/Z8C29oMAmYjhk2CNN/non-superintelligent-paperclip-maximizers-are-normal#FTfvrr9E6QKYGtMRT

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

1.1 Series summary and Table of Contents

My Most Important Advice in Four Bullets

1. The hereditarian revolution