Edit: Changed the title.
Or, why I no longer agree with the standard LW position on AI anymore.
In a sense, this is sort of a weird post compared to what LW usually posts on AI.
A lot of this is going to depend on some posts that changed my worldview on AI risk, and they will be linked below:
Deceptive alignment skepticism sequence, especially the 2nd post in the sequence is here:
Evidence of the natural abstractions hypothesis in action:
https://www.lesswrong.com/posts/BdfQMrtuL8wNfpfnF/natural-categories-update
https://www.lesswrong.com/posts/obht9QqMDMNLwhPQS/asot-natural-abstractions-and-alphazero#comments
Summary: The big updates I made was that deceptive alignment was way more unlikely than I thought, and given that deceptive alignment was a big part of my model of how AI risk would happen (about 30-60% of my probability mass was on that failure mode), that takes a big bite out of the probability mass of extinction enough to make increasing AI capabilities having positive expected value. Combine this with the evidence that at least some form of the natural abstractions hypothesis is being borne out by empirical evidence, and I now think the probabilities of AI risk have steeply declined to only 0.1-10%, and all of that probability mass is plausibly reducible to ridiculously low numbers by going to the stars and speeding up technological progress.
In other words, I now believe a significant probability, on the order of 50-70%, that alignment is solved by default.
EDIT: While I explained why I increased my confidence in alignment by default in response to Shiminux, I now believe that for now I was overconfident on the precise probabilities on alignment by default.
What implications does this have, if this rosy picture is correct?
The biggest implication is that technological progress looks vastly positive, compared to what most LWers and the general public think.
This also implies a purpose shift for Lesswrong. For arguably 20 years, the site was focused on AI risk, though it arguably exploded with LLMs and actual AI capabilities being released.
What it will shift to is important, but assuming that this rosy model of alignment is correct, then I'd argue a significant part of the field of AI Alignment should and can change purpose to something else.
As for Lesswrong, I'd say we should probably focus more on progress studies like Jason Crawford and inadequate equilibria and how to change them.
I welcome criticism and discussion of this post, due to it's huge implications for LW.
You're choosing a certain death for 32% of currently living humans. Or at least, the humans alive after [some medical research interval] at the time the AGI delay decision is made.
The [medical research interval] is the time it requires, withly massively parallel research, for a network of AGI systems to learn which medical interventions will prevent most forms of human death, from injury and aging. The economic motivation for a company to research this is obvious.
Delaying AGi is choosing to shift the time until [humans start living their MTBF given perfect bodies and only accidents and murder, which is thousands of years], 20 years into the future.
Note also that cryonics could be made to work, with clear and convincing evidence including revival of lab mammals, probably within a short time. That [research interval until working cryo] might be months.
Personally as a member of that subgroup, the improvement in odds ratio for misaligned AI for that 20 year period would need to be greater than 32%, or it isn't worth it. Or essentially you'd have to show pDoom really was almost 1.0 to justify such a long delay.
Basically you would have to build AGIs and show they all inherently collaborate with each other to kill us by default. Too few people are convinced by EY, even if he is correct.