Yep. Now keep in mind that the CIA, or whatever other agency you care to bring to bear, is staffed with humans — fallible humans, the same sorts of agents who can be brought in remarkable numbers to defend a religion. The same sorts of agents who have at least once¹, and possibly twice², come within a single human decision of destroying the world for reasons that were later better classified as mistakes, or narrowly-averted disasters.
Given the fact that an agency full of humans is convinced that a given bunch of AGI-tators are within epsilon of dooming the world, what is the chance that they are right? And what is the chance that they have misconceived the situation such that by pulling the trigger, they will create an even worse situation?
My point isn't some sort of hippy-dippy pacifism. My point is: Humans — all of us; you, me, the CIA — are running on corrupted hardware. At some point when we make a severe decision, one that goes against some well-learned rules such as not-killing, we have to take into account that almost everyone who's ever been in that situation has been making a bad decision.
¹ Stanislav Petrov; 26 September 1983
² Jack Kennedy; Cuban Missile Crisis, October 1962
Given the fact that an agency full of humans is convinced that a given bunch of AGI-tators are within epsilon of dooming the world, what is the chance that they are right?
Fairly high. This is a far simpler situation than dealing with foreign powers. Raiding the research centre to investigate is a straightforward task. While they are in no place to evaluate friendliness themselves they are certainly capable of working out whether there is AI code that is about to be run - either by looking around or interrogating. Bear in mind that if it comes down to &q...
It's probably easier to build an uncaring AI than a friendly one. So, if we assume that someone, somewhere is trying to build an AI without solving friendliness, that person will probably finish before someone who's trying to build a friendly AI.
[redacted]
[redacted]
further edit:
Wow, this is getting a rather stronger reaction than I'd anticipated. Clarification: I'm not suggesting practical measures that should be implemented. Jeez. I'm deep in an armchair, thinking about a problem that (for the moment) looks very hypothetical.
For future reference, how should I have gone about asking this question without seeming like I want to mobilize the Turing Police?