(Cross-posted from Twitter, and therefore optimized somewhat for simplicity.)
Recent discussions of AI x-risk in places like Twitter tend to focus on "are you in the Rightthink Tribe, or the Wrongthink Tribe?". Are you a doomer? An accelerationist? An EA? A techno-optimist?
I'm pretty sure these discussions would go way better if the discussion looked less like that. More concrete claims, details, and probabilities; fewer vague slogans and vague expressions of certainty.
As a start, I made this image (also available as a Google Drawing):
(Added: Web version made by Tetraspace.)
I obviously left out lots of other important and interesting questions, but I think this is OK as a conversation-starter. I've encouraged Twitter regulars to share their own versions of this image, or similar images, as a nucleus for conversation (and a way to directly clarify what people's actual views are, beyond the stereotypes and slogans).
If you want to see a filled-out example, here's mine (though you may not want to look if you prefer to give answers that are less anchored): Google Drawing link.
Summary : your model likely isn't factoring in a large number of STEM+ capable systems being developed around the same time period, which has happened many times in the history of past innovations, and people natural preference for more reliable tools. I think you are also neglecting the slow speed of governments or militaries or industry to update anything, which would act to keep important equipment out of the hands of the most advanced and least reliable AI models. Finally I think you are thinking of very different training and benchmark tasks from today, where power seeking is part of the task environment and is required for a competitive score. (Concrete example: "beat Minecraft". )
So just to breakdown your claims a bit.
Do you have any more information to give on why you believe this? Do any current models seek power? Can you explain something about how you think the training environment works that rewards power seeking? I am thinking of some huge benchmark that humans endlessly are adding fresh tasks to, how are you imagining it working? Does the model get reward globally for increased score? Did humans not include a term for efficiency in the reward function?
If this is true, why does everyone keep trying to optimize ai? Smaller models tradeoff everything for benchmark performance. Am I wrong to think a smaller model has likely lost generality and power seeking to fit within a weight budget? That a 7B model fundamentally has less room for unwanted behavior?
A world where humans don't prefer power seeking models would be one where most of the atoms belong to myopic models. Not from coordination but self interest.
How do you explain how the military equipment doesn't work this way now? A lot of it uses private dedicated networks and older technology that is well tested.
It seems like all 3 terms need to be true for power seeking AI to be able to endanger the world, do you agree with that? I tried to break down your claim into sub claims, if you think there is a different breakdown let me know.