Hi folks,
My supervisor and I co-authored a philosophy paper on the argument that AI represents an existential risk. That paper has just been published in Ratio. We figured LessWrong would be able to catch things in it which we might have missed and, either way, hope it might provoke a conversation.
We reconstructed what we take to be the argument for how AI becomes an xrisk as follows:
- The "Singularity" Claim: Artificial Superintelligence is possible and would be out of human control.
- The Orthogonality Thesis: More or less any less of intelligence is compatible with more or less any final goal. (as per Bostrom's 2014 definition)
From the conjuction of these two presmises, we can conclude that ASI is possible, it might have a goal, instrumental or final, which is at odds with human existence, and, given the ASI would be out of our control, that the ASI is an xrisk.
We then suggested that each premise seems to assume a different interpretation of 'intelligence", namely:
- The "Singularity" claim assumes general intelligence
- The Orthogonality Thesis assumes instrumental intelligence
If this is the case, then the premises cannot be joined together in the original argument, aka the argument is invalid.
We note that this does not mean that AI or ASI is not an xrisk, only that the the current argument to that end, as we have reconstructed it, is invalid.
Eagerly, earnestly, and gratefully looking forward to any responses.
I think we're in a sort of weird part of concept-space where we're thinking both about absolutes ("all X are Y" disproved by exhibiting an X that is not Y) and distributions ("the connection between goals and intelligence is normally accidental instead of necessary"), and I think this counterexample is against a part of the paper that's trying to make a distributional claim instead of an absolute claim.
Roughly, their argument as I understand it is:
I think I differ on the 3rd point a little (as discussed in more depth here), but roughly agree that the situation we're in probably isn't as bad as the "AIXI-tl with a random utility function implemented on a hypercomputer" world, for structural reasons that make this not a compelling counterexample.
Like, in my view, much of the work of "why be worried about the transition instead of blasé?" is done by stuff like Value is Fragile, which isn't really part of the standard argument as they're describing it here.