Edit: Changed the title.
Or, why I no longer agree with the standard LW position on AI anymore.
In a sense, this is sort of a weird post compared to what LW usually posts on AI.
A lot of this is going to depend on some posts that changed my worldview on AI risk, and they will be linked below:
Deceptive alignment skepticism sequence, especially the 2nd post in the sequence is here:
Evidence of the natural abstractions hypothesis in action:
https://www.lesswrong.com/posts/BdfQMrtuL8wNfpfnF/natural-categories-update
https://www.lesswrong.com/posts/obht9QqMDMNLwhPQS/asot-natural-abstractions-and-alphazero#comments
Summary: The big updates I made was that deceptive alignment was way more unlikely than I thought, and given that deceptive alignment was a big part of my model of how AI risk would happen (about 30-60% of my probability mass was on that failure mode), that takes a big bite out of the probability mass of extinction enough to make increasing AI capabilities having positive expected value. Combine this with the evidence that at least some form of the natural abstractions hypothesis is being borne out by empirical evidence, and I now think the probabilities of AI risk have steeply declined to only 0.1-10%, and all of that probability mass is plausibly reducible to ridiculously low numbers by going to the stars and speeding up technological progress.
In other words, I now believe a significant probability, on the order of 50-70%, that alignment is solved by default.
EDIT: While I explained why I increased my confidence in alignment by default in response to Shiminux, I now believe that for now I was overconfident on the precise probabilities on alignment by default.
What implications does this have, if this rosy picture is correct?
The biggest implication is that technological progress looks vastly positive, compared to what most LWers and the general public think.
This also implies a purpose shift for Lesswrong. For arguably 20 years, the site was focused on AI risk, though it arguably exploded with LLMs and actual AI capabilities being released.
What it will shift to is important, but assuming that this rosy model of alignment is correct, then I'd argue a significant part of the field of AI Alignment should and can change purpose to something else.
As for Lesswrong, I'd say we should probably focus more on progress studies like Jason Crawford and inadequate equilibria and how to change them.
I welcome criticism and discussion of this post, due to it's huge implications for LW.
Let's suppose that you are entirely right about deceptive alignment being unlikely. (So we'll set aside things like "what specific arguments caused you to update?" and tricky questions about modest epistemology/outside views).
I don't see how "alignment is solved by default with 30-50% probability justifies claims like "capabilities progress is net positive" or "AI alignment should change purpose to something else."
If a doctor told me I had a disease that had a 50-70% chance to resolve on its own, otherwise it would kill me, I wouldn't go "oh okay, I should stop trying to fight the disease."
The stakes are also not symmetrical. Getting (aligned) AGI 1 year sooner is great, but it only leads to one extra year of flourishing. Getting unaligned AGI leads to a significant loss over the entire far-future.
So even if we have a 50-70% chance of alignment by default, I don't see how your central conclusions follow.
I'll make another version of the thought experiment, in which we can get a genetic upgrade in which it gives you +1000 utils if you have it for a 70% chance, or it gives -1000 utils at a 30% chance.
Should you take it?
The answer is yes, in expectation, and it will give you +400 utils in expectation.
This is related to a general principle: As long as the probabilities of positive outcomes are over 50% and the costs and benefits are symmetrical, it is a good thing to do that activity.
And my contention is that AGI/ASI is just a larger version of the thought experiment above. AGI/ASI is a symmetric technology wrt good and bad outcomes, so that's why it's okay to increase capabilities.