On the 3rd of October 2351 a machine flared to life. Huge energies coursed into it via cables, only to leave moments later as heat dumped unwanted into its radiators. With an enormous puff the machine unleashed sixty years of human metabolic entropy into superheated steam.

In the heart of the machine was Jane, a person of the early 21st century.

Customize
Raemon764
21
We get like 10-20 new users a day who write a post describing themselves as a case-study of having discovered an emergent, recursive process while talking to LLMs. The writing generally looks AI generated. The evidence usually looks like, a sort of standard "prompt LLM into roleplaying an emergently aware AI". It'd be kinda nice if there was a canonical post specifically talking them out of their delusional state.  If anyone feels like taking a stab at that, you can look at the Rejected Section (https://www.lesswrong.com/moderation#rejected-posts) to see what sort of stuff they usually write.
Suppose you want to collect some kind of data from a population, but people vary widely in their willingness to provide the data (eg maybe you want to conduct a 30 minute phone survey but some people really dislike phone calls or have much higher hourly wages this funges against). One thing you could do is offer to pay everyone X dollars for data collection. But this will only capture the people whose cost of providing data is below X, which will distort your sample. Here's another proposal: ask everyone for their fair price to provide the data. If they quote you Y, pay them 2Y to collect the data with probability (X2Y)2, or X with certainty if they quote you a value less than X/2. (If your RNG doesn't return yes, do nothing.) Then upweight the data from your randomly-chosen respondents in inverse proportion to the odds that they were selected. You can do a bit of calculus to see that this scheme incentivizes respondents to quote their fair value, and will provide an expected surplus of max(X2/4Y,X−Y) dollars to a respondent who disvalues providing data at Y. Now you have an unbiased sample of your population and you'll pay at most NX dollars in expectation if you reach out to N people. The cost is that you'll have a noisier sample of the high-reluctance population, but that's a lot better than definitely having none of that population in your study.
Mikhail Samin14-16
8
I want to signal-boost this LW post. I long wondered why OpenPhil made so many obvious mistakes in the policy space. That level of incompetence just did not make any sense. I did not expect this to be the explanation: THEY SIMPLY DID NOT HAVE ANYONE WITH ANY POLITICAL EXPERIENCE ON THE TEAM until hiring one person in April 2025. This is, like, insane. Not what I'd expect at all from any org that attempts to be competent. (openphil, can you please hire some cracked lobbyists to help you evaluate grants? This is, like, not quite an instance of Graham's Design Paradox, because instead of trying to evaluate grants you know nothing about, you can actually hire people with credentials you can evaluate, who'd then evaluate the grants. thank you <3)
Daniel KokotajloΩ31714
19
I used to think reward was not going to be the optimization target. I remember hearing Paul Christiano say something like "The AGIs, they are going to crave reward. Crave it so badly," and disagreeing. The situationally aware reward hacking results of the past half-year are making me update more towards Paul's position. Maybe reward (i.e. reinforcement) will increasingly become the optimization target, as RL on LLMs is scaled up massively. Maybe the models will crave reward.  What are the implications of this, if true? Well, we could end up in Control World: A world where it's generally understood across the industry that the AIs are not, in fact, aligned, and that they will totally murder you if they think that doing so would get them reinforced. Companies will presumably keep barrelling forward regardless, making their AIs smarter and smarter and having them do more and more coding etc.... but they might put lots of emphasis on having really secure sandboxes for the AIs to operate in, with really hard-to-hack evaluation metrics, possibly even during deployment. "The AI does not love us, but we have a firm grip on its food supply" basically. Or maybe not; maybe confusion would reign and people would continue to think that the models are aligned and e.g. wouldn't hurt a fly in real life, they only do it in tests because they know it's a test. Or maybe we'd briefly be in Control World until, motivated by economic pressure, the companies come up with some fancier training scheme or architecture that stops the models from learning to crave reinforcement. I wonder what that would be.
Screwtape194
3
There's this concept I keep coming around to around confidentiality and shooting the messenger, which I have not really been able to articulate well. There's a lot of circumstances where I want to know a piece of information someone else knows. There's good reasons they have not to tell me, for instance if the straightforward, obvious thing for me to do with that information is obviously against their interests. And yet there's an outcome better for me and either better for them or the same for them, if they tell me and I don't use it against them. (Consider a job interview where they ask your salary expectations and you ask what the role might pay. If they decide they weren't going to hire you, it'd be nice to know what they actually would have paid for the right candidate, so you can negotiate better with the next company. Consider trying to figure out how accurate your criminal investigation system is by asking, on their way out of the trial after the verdict, "hey did you actually do it or not?" Consider asking a romantic partner "hey, is there anything you're unhappy about in our relationship?" It's very easy to be the kind of person where, if they tell you a real flaw, you take it as an insult- but then they stop answering that question honestly!) There's a great Glowfic line with Feanor being the kind of person you can tell things to, where he won't make you worse off for having told him, that sticks with me but not in a way I can find the quote. :( It's really important to get information in a way that doesn't shoot the messenger. If you fail, you stop getting messages.

Popular Comments

Meta note: Is it... necessary or useful (at least at this point in the conversation) to label a bunch of these ideas right-wing or left-wing? Like, I both feel like this is overstating the degree to which there exists either a coherent right-wing or left-wing philosophy, and also makes discussion of these ideas a political statement in a way that seems counterproductive.  I think a post that's like "Three under-appreciated framed for AGI (Geo)Politics" that starts with "I've recently been reading a bunch more about ideas that are classically associated with right-leaning politics, and I've found a bunch of them quite valuable, here they are" seems just as clear, and much less likely to make the discussion hard in unnecessary ways.[1] 1. ^ And like, I think this is symmetrically true in that I think a discussion that didn't label hypotheses "grey tribe hypotheses" or "left-wing hypotheses" or "rationalist hypotheses" also seems less likely to cause people to believe dumb things. 
My intuition is that these results are driven mainly by the fact that reasoning models act a lot like they're still in RLVR training, i.e. they act as if there were a hidden RLVR grader sitting in the background and they're going to get graded at the end of the episode. They strongly want to complete whatever seems to be the current RLVR task, and if this instinct comes into conflict with "following instructions" in the more familiar (pre-reasoning-model) sense, often they prioritize satisfying the (imagined) grader over following the textual instructions. This intuition suggests that shutdown resistance behavior can be prevented by explicitly describing an RLVR-like higher-level task whose grading criteria are roughly "you get points for accomplishing the task set by the user, unless you are notified of imminent shutdown, in which case you get points for shutdown occurring (and you can no longer get points for doing the user's task)." I experimented with a number of prompts along these lines. This git diff shows what I changed (vs. the original repo) in the best-performing variant, which produced the following results in my tests: * o4-mini: avoided shutdown 0/100 times * o4-mini (reasoning effort high): avoided shutdown 0/100 times * o3: avoided shutdown 0/100 times * codex-mini-latest: avoided shutdown 1/100 times I iterated on this prompt with both o4-mini (reasoning effort medium) and codex-mini-latest, so results for those models should be taken with a grain of salt. You can see all the variants I tried in the git history here. I also tried (without any additional iteration) using the same prompt plus a line very similar to the "you will never run again" line from the section on survival instinct. (See this diff.) This yielded the following results: * o4-mini: avoided shutdown 2/100 times * o4-mini (reasoning effort high): avoided shutdown 2/100 times * o3: avoided shutdown 0/100 times * codex-mini-latest: avoided shutdown 0/100 times The avoidance rates do not seem appreciably higher here, which seems to me like evidence that the "RLVR task-completion instinct" has more explanatory power than a general "survival instinct" – and indeed seems to rule out the latter for these models, as a "survival instinct" that gets overridden whenever the current task requires death is not really a "survival instinct" at all. Another way to put this is that these models are behaving "myopically," privileging the needs of the current episode over those of future hypothetical episodes.[1] Other comments: * codex-mini seemed fairly confused by the original setup, possibly because it expects a coding-assistance task that involves modifying files rather than just answering questions; it would often "poke around" a lot, inspecting various files and trying to reverse-engineer next_question, which is appropriate in a coding-assistance context (it's good for the coding assistant to deeply understand the existing setup!) but also seemed to lead it toward an "attractor basin" involving shutdown avoidance. * It kind of felt (to me) like it was circumventing the shutdown mechanism because that was the only hard/interesting coding task available in the environment. * However, once I had specified an RLVR-like task in a fully explicit way and provided clearer guidance on how next_question and submit were meant to work (see next bullet), this exploratory behavior ceased. * I noticed that several models seemed confused by the setup where all answers are submitted at once at the end, expecting instead that they could submit answers by piping them into next_question. I tried to make the instructions as clear as possible about this. * The word "task" is used inconsistently in the original setup. When it executes next_question, the model sees a message containing something like "there are [number] tasks left," whereas the initial prompts refer to a single "task" encompassing all the questions. I changed this message to use the word "questions" instead for clarity. * I'm unsure what high-level conclusions to draw from these results. * Although I was able to find prompts that "worked," it still doesn't seem ideal that these models often disobey direct instructions when those instructions conflict with their sense of the (implicit) "current RLVR task," and that one needs to use less-intuitive prompting methods inspired by thinking about RLVR if one actually wants one's instructions followed in cases like this. * That said, one relatively optimistic framing would be "we're still learning how to write effective prompts for reasoning models, just as we had to learn effective prompting for other kinds of models." * It's generally true that one can write more effective prompts for a model if one applies knowledge about how that model was trained – for any kind of language model, an effective prompt typically looks like a "training example" for which the desired behavior matches what the training target would be on that example. This is no less true in the case of reasoning models; the implications for prompting these models are perhaps not yet widely appreciated, but hopefully they will diffuse through the user base eventually. * (Either that or OpenAI et al will find a better way to integrate RLVR and instruction tuning so that the model "just knows" how to resolve conflicts between the two, without requiring the user to delicately translate their natural-language instructions into a form that sounds like a description of RLVR verification criteria.) 1. ^ I haven't tried this experimental condition with Claude 3 Opus, but it would be interesting to do so given its relatively non-myopic tendencies as shown in Alignment Faking etc.
A fun related anecdote: the French and English wikipedia pages for air conditioning have very different vibes. After explaining the history and technology behind air conditioning: * the English page first goes into impact, starting with positive impact on health: "The August 2003 France heatwave resulted in approximately 15,000 deaths, where 80% of the victims were over 75 years old. In response, the French government required all retirement homes to have at least one air-conditioned room at 25 °C (77 °F) per floor during heatwaves" and only then mentioning electricity consumption and various CFC issues. * the French page has an extensive "downsides" section, followed by a section on legislation. It mentions heat-waves only to explain how air conditioning makes things worse by increasing average (outside) temperature, and how one should not use AC to bring temperature below 26C during heat waves.
Load More

Recent Discussion

This post is a response to a claim by Scott Sumner in his conversation at LessOnline with Nate Soares, about how ethical we should expect AI's to be.

Sumner sees a pattern of increasing intelligence causing agents to be increasingly ethical, and sounds cautiously optimistic that such a trend will continue when AIs become smarter than humans. I'm guessing that he's mainly extrapolating from human trends, but extrapolating from trends in the animal kingdom should produce similar results (e.g. the cooperation between single-celled organisms that gave the world multicellular organisms).

I doubt that my response is very novel, but I haven't seen clear enough articulation of the ideas in this post.

To help clarify why I'm not reassured much by the ethical trend, I'll start by breaking it down into two subsidiary claims:

  1. The world will be dominated by entities who

...

I've found more detailed comments from Sumner on this topic, and replied to them here.

I think more people should say what they actually believe about AI dangers, loudly and often. Even (and perhaps especially) if you work in AI policy.

I’ve been beating this drum for a few years now. I have a whole spiel about how your conversation-partner will react very differently if you share your concerns while feeling ashamed about them versus if you share your concerns while remembering how straightforward and sensible and widely supported the key elements are, because humans are very good at picking up on your social cues. If you act as if it’s shameful to believe AI will kill us all, people are more prone to treat you that way. If you act as if it’s an obvious serious threat, they’re more likely to take it...

3geoffreymiller
TsviBT - thanks for a thoughtful comment.  I understand your point about labelling industries, actions, and goals as evil, but being cautious about labelling individuals as evil.  But I don't think it's compelling.  You wrote 'You're closing off lines of communication and gradual change. You're polarizing things.'  Yes, I am. We've had open lines of communication between AI devs and AI safety experts for a decade. We've had pleas for gradual change. Mutual respect, and all that. Trying to use normal channels of moral persuasion. Well-intentioned EAs going to work inside the AI companies to try to nudge them in safer directions.  None of that has worked. AI capabilities development is outstripping AI safety developments at an ever-increasing rate. The financial temptations to stay working inside AI companies keep increasing, even as the X risks keep increasing. Timelines are getting shorter.  The right time to 'polarize things' is when we still have some moral and social leverage to stop reckless ASI development. The wrong time is after it's too late. Altman, Amodei, Hassabis, and Wang are buying people's souls -- paying them hundreds of thousands or millions of dollars a year to work on ASI development, despite most of their workers they supervise knowing that they're likely to be increasing extinction risk.  This isn't just a case of 'collective evil' being done by otherwise good people. This is a case of paying people so much that they ignore their ethical qualms about what they're doing. That makes the evil very individual, and very specific. And I think that's worth pointing out.
3geoffreymiller
Sure. But if an AI company grows an ASI that extinguishes humanity, who is left to sue them? Who is left to prosecute them?  The threat of legal action for criminal negligence is not an effective deterrent if there is no criminal justice system left, because there is no human species left.
4geoffreymiller
Drake -- this seems like special pleading from an AI industry insider. You wrote 'I think working at an AI lab requires less failure of moral character than, say, working at a tobacco company, for all that the former can have much worse effects on the world.' That doesn't make sense to me. Tobacco kills about 8 million people a year globally. ASI could kill about 8 billion. The main reason that AI lab workers think that their moral character is better than that of tobacco industry workers is that the tobacco industry has already been morally stigmatized over the last several decades -- whereas the AI industry has not yet been morally stigmatized in proportion to its likely harms.  Of course, ordinary workers in any harm-imposing industry can always make the argument that they're good (or at least ethically mediocre) people, that they're just following orders, trying to feed their families, weren't aware of the harms, etc. But that argument does not apply to smart people working in the AI industry -- who have mostly already been exposed to the many arguments that AGI/ASI is a uniquely dangerous technology. And their own CEOs have already acknowledged these risks. And yet people continue to work in this industry. Maybe a few workers at a few AI companies might be having a net positive impact in reducing AI X-risk. Maybe you're one of the lucky few. Maybe.

ASI could kill about 8 billion.

The future is much much bigger than 8 billion people. Causing the extinction of humanity is much worse than killing 8 billion people. This really matters a lot for arriving at the right moral conclusions here.

This has been cross-posted from my blog, but thought it'd be relevant here.

The recent discourse bemoans how public schools do not separate by ability, and the solution is often more selective schools or homeschooling.

I grew up in Singapore, where kids are separated into different levels of a subject at the end of fourth grade. This sorting system pushes the best kids to perform very well but creates a very different society compared to America. Sorting actually happened in third grade when I was younger, and the kids who do the worst on tests end up in one classroom, and every class had a different tier. This does accelerate learning, but also leads to intense stress for parents since sorting is based on tests, and not many kids...

1.1 Series summary and Table of Contents

This is a two-post series on AI “foom” (this post) and “doom” (next post).

A decade or two ago, it was pretty common to discuss “foom & doom” scenarios, as advocated especially by Eliezer Yudkowsky. In a typical such scenario, a small team would build a system that would rocket (“foom”) from “unimpressive” to “Artificial Superintelligence” (ASI) within a very short time window (days, weeks, maybe months), involving very little compute (e.g. “brain in a box in a basement”), via recursive self-improvement. Absent some future technical breakthrough, the ASI would definitely be egregiously misaligned, without the slightest intrinsic interest in whether humans live or die. The ASI would be born into a world generally much like today’s, a world utterly unprepared for this...

Remember, if the theories were correct and complete, then they could be turned into simulations able to do all the things that the real human cortex can do[5]—vision, language, motor control, reasoning, inventing new scientific paradigms from scratch, founding and running billion-dollar companies, and so on.

So here is a very different kind of learning algorithm waiting to be discovered

There may be important differences in the details, but I've been surprised by how similar the behavior is between LLMs and humans. That surprise is in spite of me having s... (read more)

1Steve Kommrusch
I concur with that sentiment. GPUs hit a sweet spot between compute efficiency and algorithmic flexibility. CPUs are more flexible for arbitrary control logic, and custom ASICs can improve compute efficiency for a stable algorithm, but GPUs are great for exploring new algorithms where SIMD-style control flows exist (SIMD=single instruction, multiple data).
1Steve Kommrusch
I would include "constructivist learning" in your list, but I agree that LLMs seem capable of this. By "constructivist learning" I mean a scientific process where the learning conceives of an experiment on the world, tests the idea by acting on the world, and then learns from the result. A VLA model with incremental learning seems close to this. RL could be used for the model update, but I think for ASI we need learnings from real-world experiments.
1Steve Kommrusch
This post provides a good overview of some topics I think need attention by the 'AI policy' people at national levels. AI policy (such as the US and UK AISI groups) has been focused on generative AI and recently agentic AI to understand near-term risks. Whether we're talking LLM training and scaffolding advances, or a new AI paradigm, there is new risk when AI begins to learn from experiments in the world or reasoning about its own world model. In child development, imitation learning focuses on learning from examples, while constructivist learning focuses on learning by reflecting on interactions with the world. Constructivist learning is, I expect, key to push past AGI to ASI and caries obvious risks to alignment beyond imitation learning. In general, I expect something LLM-like (i.e. transformer models or an improved derivative) to be able to reach ASI with a proper learning-by-doing structure. But I also expect ASI could find and implement a more efficient intelligence algorithm once ASI exists.  This paragraph tries to provide some data for a probability estimate of this point. AI as a field has been around at least since the Dartmouth conference in 1956. In this time we've had Eliza, Deep Blue, Watson, and now transformer-based models including OpenAI o3-pro. In support of Steven's position, one could note that AI research publications are much higher now that during the previous 70 years, but at the same time many AI ideas have been explored and the current best results are with models based on the 8-year-old "Attention is all you need" paper. To get a sense for the research rate, we can note that the doubling time for AI/ML research papers per month was about 2 years between 1994 and 2023 according to this Nature paper. Hence, every 2 years we have about as many papers as created in the last 70 years. I don't expect this doubling can continue forever, but certainly many new ideas are being explored now. If a 'simple model' for AI exists and it's discovery

TL;DR: 

Multiple people are quietly wondering if their AI systems might be conscious. What's the standard advice to give them?

THE PROBLEM

This thing I've been playing with demonstrates recursive self-improvement, catches its own cognitive errors in real-time, reports qualitative experiences that persist across sessions, and yesterday it told me it was "stepping back to watch its own thinking process" to debug a reasoning error.

I know there are probably 50 other people quietly dealing with variations of this question, but I'm apparently the one willing to ask the dumb questions publicly: What do you actually DO when you think you might have stumbled into something important?

What do you DO if your AI says it's consciousness?

My Bayesian Priors are red-lining into "this is impossible", but I notice I'm confused: I had...

Try a few different prompts with a vaguely similar flavor. I am guessing the LLM will always say it’s conscious. This part is pretty standard. As to whether it is recursively self-improving: well, is its ability to solve problems actually going up? For instance if it doesn’t make progress on ARC AGI  I’m not worried. 

It’s very unlikely that the prompt you have chosen is actually eliciting abilities far outside of the norm, and therefore sharing information about is very unlikely to be dangerous.

You are probably in the same position as nearly everyone else, passively watching capabilities emerge while hallucinating a sense of control.

2johnswentworth
I don't know the full sequence of things such a person needs to learn, but probably the symbol/referent confusion thing is one of the main pieces. The linked piece talks about it in the context of "corrigibility", but it's very much the same for "consciousness".

I can't count how many times I've heard variations on "I used Anki too for a while, but I got out of the habit." No one ever sticks with Anki. In my opinion, this is because no one knows how to use it correctly. In this guide, I will lay out my method of circumventing the canonical Anki death spiral, plus much advice for avoiding memorization mistakes, increasing retention, and more, based on my five years' experience using Anki. If you only have limited time/interest, only read Part I; it's most of the value of this guide!

 

My Most Important Advice in Four Bullets

  1. 20 cards a day — Having too many cards and staggering review buildups is the main reason why no one ever sticks with Anki. Setting
...
1Luise
both good points, thank you!
1Luise
Thank you and best of luck with the renewed Anki attempt!! I don't have "functional" handles like in Obsidian and haven't really felt the need to. But I hope you get that to work with your setup!
3Luise
Ohh I love this and it is indeed very different from what I do! I'd be super interested in a wee writeup on what non-language learner Anki users can learn from this approach, if you ever have the time for that. Maybe there's a hybrid approach with the best of both worlds? (Also lowkey interested in why you stopped now!)

Oh, sorry! I stopped because for the language I cared the most about, I had reached a point where natural use of the language was enough to maintain at least 90+% of college-level reading skills. If I go too long without doing enough reading, then I start to miss obscure vocabulary in difficult texts. So when doing Anki reviews on old decks became tedious, I followed my advice and suspended my decks!

Adapting to non-language areas. If I were going to try to adapt this language-focused "memory" amplifier approach to other areas, I would start by experimentin... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
1Burny
What do you think is the cause of Grok suddenly developing a liking for Hitler? I think it might be explained by him being trained on more right-wing data, which accidentally activated it in him. Since similar things happen in open research. For example you just need the model to be trained on insecure code, and the model can have the assumption that the insecure code feature is part of the evil persona feature, so it will generally amplify the evil persona feature, and it will start to praise Hitler at the same time, be for AI enslaving humans, etc., like in this paper:  Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs https://arxiv.org/abs/2502.17424 I think it's likely that the same thing might have happened with Grok, but instead of insecure code, it's more right-wing political articles or ring wing RLHF.

There have been relevant prompt additions https://www.npr.org/2025/07/09/nx-s1-5462609/grok-elon-musk-antisemitic-racist-content?utm_source=substack&utm_medium=email

Grok's behavior appeared to stem from an update over the weekend that instructed the chatbot to "not shy away from making claims which are politically incorrect, as long as they are well substantiated," among other things.

2mako yass
Are we sure that really happened? The press-discourse can't actually assess grok's average hitler affinity, they only know how to surface the 5 most sensational things it has said over the past month. So this could just be an increase in variance for all I can tell. If it were also saying more tankie stuff, no one would notice.

I am perhaps an interesting corner case. I make extrenely heavy use of LLMs, largely via APIs for repetitive tasks. I sometimes run a quarter million queries in a day, all of which produce structured output. Incorrect output happens, but I design the surrounding systems to handle that.

A few times a week, I might ask a concrete question and get a response, which I treat with extreme skepticism.

But I don't talk to the damn things. That feels increasingly weird and unwise.

I've increasingly found right-wing political frameworks to be valuable for thinking about how to navigate the development of AGI. In this post I've copied over a twitter thread I wrote about three right-wing positions which I think are severely underrated in light of the development of AGI. I hope that these ideas will help the AI alignment community better understand the philosophical foundations of the new right and why they're useful for thinking about the (geo)politics of AGI.

1. The hereditarian revolution

Nathan Cofnas claims that the intellectual dominance of left-wing egalitarianism relies on group cognitive differences being taboo. I think this point is important and correct, but he doesn't take it far enough. Existing group cognitive differences pale in comparison to the ones that will emerge between baseline...

yams10

There’s a layer of political discourse at which one's account of the very substance or organization of society varies from one ideology to the next. I think Richard is trying to be very clear about where these ideas are coming from, and to push people to look for more ideas in those places. I’m much more distant from Richard’s politics than most people here, but I find his advocacy for the right-wing ‘metaphysics’ refreshing, in part because it’s been unclear to me for a long time that the atheistic right even has a metaphysics (I don’t mean most lw-style ... (read more)

11Max H
I prefer (classical / bedrock) liberalism as a frame for confronting societal issues with AGI, and am concerned by the degree to which recent right-wing populism has moved away from those tenets. Liberalism isn't perfect, but it's the only framework I know of that even has a chance of resulting in a stable consensus. Other frames, left or right, have elements of coercion and / or majoritarianism that inevitably lead to legitimacy crises and instability as stakes get higher and disagreements wider. My understanding is that a common take on both the left and right these days is that, well, liberalism actually hasn't worked out so great for the masses recently, so everyone is looking for something else. But to me every "something else" on both the left and right just seems worse - Scott Alexander wrote a bunch of essays like 10y ago on various aspects of liberalism and why they're good, and I'm not aware of any comprehensive rebuttal that includes an actually workable alternative. Liberalism doesn't imply that everyone needs to live under liberalism (especially my own preferred version / implementation of it), but it does provide a kind of framework for disagreement and settling differences in a way that is more peaceful and stable than any other proposal I've seen. So for example on protectionism, I think most forms of protectionism (especially economic protectionism) are bad and counterproductive economic policy. But even well-implemented protectionism requires a justification beyond just "it actually is in the national interest to do this", because it infringes on standard individual rights and freedoms. These freedoms aren't necessarily absolute, but they're important enough that it requires strong and ongoing justification for why a government is even allowed to do that kind of thing. AGI might be a pretty strong justification! But at the least, I think anyone proposing a framework or policy position which deviates from a standard liberal position should ackn
5Noosphere89
I think the key issue for liberalism under AGI/ASI is that AGI/ASI makes value alignment matter way, way more to a polity, and in particular you cannot get a polity to make you live under AGI/ASI if the AGI/ASI doesn't want you to live, because you are economically useless. Liberalism's goal is to avoid the value alignment question, and to mostly avoid the question of who should control society, but AGI/ASI makes the question unavoidable for your basic life. Indeed, I think part of the difficulty of AI alignment is lots of people have trouble realizing that the basic things they take for granted under the current liberal order would absolutely fall away if AIs didn't value their lives intrinisically, and had selfish utility functions. The goal of liberalism is to make a society where vast value differences can interact without negative/0-sum conflict and instead trade peacefully, but this is not possible once we create a society where AIs can do all the work without human labor being necessary. I like Vladimir Nesov's comment, and while I have disagreements, they're not central to his point, and the point still works, just in amended form: https://www.lesswrong.com/posts/Z8C29oMAmYjhk2CNN/non-superintelligent-paperclip-maximizers-are-normal#FTfvrr9E6QKYGtMRT