The arguments by Bostrom, Yudkowsky and others can be summarised as follows:
- Superintelligence is possible
- We don't know how to align a superintelligence
- An unaligned superintelligence could be catastrophically dangerous
I'm not sure if premise 1 is falsifiable, but it is provable. If someone either develops an AI with greater intelligence than a human, or discovers an alien with same, or provides proof through information theory or other scientific knowledge that greater-than-human intelligence is possible, then premise 1 is proven. (Someone more qualified than me: is this already proven?)
Premise 2 is falsifiable: if you can prove that some method will safely align a superintelligence then you have disproved the claim. To date, no one understands intelligence well enough to come up with such a proof, despite a lot of effort by people like Yudkowski, but the claim is not unfalsifiable in principle.
Admittedly premise 3 is less falsifiable, because it's a claim about risk (an unaligned superintelligence could be very dangerous, not definitely 100% will be). But to disagree with premise 3 you have to believe that an unaligned super-intelligence is definitely safe. Either you claim that no superintelligence of any alignment will ever be dangerous or you claim that humanity will always be able to restrain a rogue superintelligence. Neither of those are the sort of claim you could reasonably consider to be 100% certain.
At this point, we're down to debates about how large the risk is, and IMO that explains why Yudkowsky and Bostrom give lots of different scenarios, as a counter-argument to people who want to assume that only certain narrow paths lead to catastrophe.
Thank you for those examples. I think this shows that the way I used a utility function but without placing it in a 'real' situation, i.e. not some locked-off situation without much in terms of viable alternative actions with some utility, is a fallacy.
I suppose then that I conflated the “What can I know?” with the “What must I do?”, separating a belief from an associated action (I think) resolves most of the conflicts that I saw.