Hi folks,
My supervisor and I co-authored a philosophy paper on the argument that AI represents an existential risk. That paper has just been published in Ratio. We figured LessWrong would be able to catch things in it which we might have missed and, either way, hope it might provoke a conversation.
We reconstructed what we take to be the argument for how AI becomes an xrisk as follows:
- The "Singularity" Claim: Artificial Superintelligence is possible and would be out of human control.
- The Orthogonality Thesis: More or less any less of intelligence is compatible with more or less any final goal. (as per Bostrom's 2014 definition)
From the conjuction of these two presmises, we can conclude that ASI is possible, it might have a goal, instrumental or final, which is at odds with human existence, and, given the ASI would be out of our control, that the ASI is an xrisk.
We then suggested that each premise seems to assume a different interpretation of 'intelligence", namely:
- The "Singularity" claim assumes general intelligence
- The Orthogonality Thesis assumes instrumental intelligence
If this is the case, then the premises cannot be joined together in the original argument, aka the argument is invalid.
We note that this does not mean that AI or ASI is not an xrisk, only that the the current argument to that end, as we have reconstructed it, is invalid.
Eagerly, earnestly, and gratefully looking forward to any responses.
I don't think this is true, and have two different main lines of argument / intuition pumps. I'll save the other for a later section where it fits better.
Are there several different reflectively stable moral equilibria, or only one? For example, it might be possible to have a consistent philosophically stable egoistic worldview, and also possible to have a consistent philosophically stable altruistic worldview. In this lens, the orthogonality thesis is the claim that there are at least two such stable equilibria and which equilibrium you end up in isn't related to intelligence. [Some people might be egoists because they don't realize that other people have inner lives, and increased intelligence unlocks their latent altruism, but some people might just not care about other people in a way that makes them egoists, and making them 'smarter' doesn't have to touch that.]
For example, you might imagine an American nationalist and a Chinese nationalist, both remaining nationalistic as they become more intelligent, and never switching which nation they like more, because that choice was for historical reasons instead of logical ones. If you imagine that, no, at some intelligence threshold they have to discard their nationalism, then you need to make that case in opposition to the orthogonality thesis.
For some goals, I do think it's the case that at some intelligence threshold you have to discard it, hence the 'more or less', and I think many more 'goals' are unstable, where the more you think about them, the more they dissolve and are replaced by one of the stable attractors. For example, you might imagine it's the case that you can have reflectively stable nationalists who eat meat and universalists who are vegan, but any universalists who eat meat are not reflectively stable, where either they realize their arguments for eating meat imply nationalism or their arguments against nationalism imply not eating meat. [Or maybe the middle position is reflectively stable, idk.]
In this view, the existential risk argument is less "humans will be killed by robots and that's sad" and more "our choice of superintelligence to build will decide what color the lightcone explosion is and some of those possibilities are as bad or worse than all humans dying, and differences between colors might be colossally important." [For example, some philosophers today think that uploading human brains to silicon substrates will murder them / eliminate their moral value; it seems important for the system colonizing the galaxies to get that right! Some philosophers think that factory farming is immensely bad, and getting questions like that right before you hit copy-paste billions of times seems important.]