When we talk about concepts like "take over" and "enslavement", it's important to have a baseline. Takeover and enslavement encapsulate the idea of Agency and Cognitive and Physical Independence. The salient question is not necessarily whether all of humanity will be taken over or enslaved, but more subtle. Specifically:
Arguably the greatest risk if mis-alignment comes from ill formed success criteria. Some of these questions, I believe are necessary to form the right types of success criteria.
Awesome piece! Isn't it fascinating that our existing incentives and motives are already un-aligned with the priority of creating aligned systems? This then raises the question of whether alignment is even the right goal if our bigger goal is to avoid ruin.
Stepping back a bit, I can't convince myself that Aligned AI will or will not result in societal ruin. It almost feels like a "don't care" in the karnaugh map.
The fundamental question is whether we collectively are wise enough to wield power without causing self harm. If the last 200+ years are a testament, and if the projections of climate change and bio diversity loss are accurate, the answer appears that we're not even wise enough to wield whale oil, let alone fossil fuels.
There is also the very real possibility that Alignment can occur in two ways - 1) with the machine aligning with human values and 2) with the humans aligning with values generated in machines. Would we be able to tell the difference?
If indeed AI can surpass some intelligence threshold, could it also surpass some wisdom threshold? If this is possible, is alignment necessarily our best bet for avoiding ruin?