So I submit the only useful questions we can ask are not about AGI, "goals", and other such anthropomorphic, infeasible, irrelevant, and/or hopelessly vague ideas. We can only usefully ask computer security questions. For example some researchers I know believe we can achieve virus-safe computing. If we can achieve security against malware as strong as we can achieve for symmetric key cryptography, then it doesn't matter how smart the software is or what goals it has: if one-way functions exist no computational entity, classical or quantum, can crack symmetric key crypto based on said functions. And if NP-hard public key crypto exists, similarly for public key crypto. These and other security issues, and in particular the security of property rights, are the only real issues here and the rest is BS.
-- Nick Szabo
Nick Szabo and I have very similar backrounds and interests. We both majored in computer science at the University of Washington. We're both very interested in economics and security. We came up with similar ideas about digital money. So why don't I advocate working on security problems while ignoring AGI, goals and Friendliness?
In fact, I once did think that working on security was the best way to push the future towards a positive Singularity and away from a negative one. I started working on my Crypto++ Library shortly after reading Vernor Vinge's A Fire Upon the Deep. I believe it was the first general purpose open source cryptography library, and it's still one of the most popular. (Studying cryptography led me to become involved in the Cypherpunks community with its emphasis on privacy and freedom from government intrusion, but a major reason for me to become interested in cryptography in the first place was a desire to help increase security against future entities similar to the Blight described in Vinge's novel.)
I've since changed my mind, for two reasons.
1. The economics of security seems very unfavorable to the defense, in every field except cryptography.
Studying cryptography gave me hope that improving security could make a difference. But in every other security field, both physical and virtual, little progress is apparent, certainly not enough that humans might hope to defend their property rights against smarter intelligences. Achieving "security against malware as strong as we can achieve for symmetric key cryptography" seems quite hopeless in particular. Nick links above to a 2004 technical report titled "Polaris: Virus Safe Computing for Windows XP", which is strange considering that it's now 2012 and malware have little trouble with the latest operating systems and their defenses. Also striking to me has been the fact that even dedicated security software like OpenSSH and OpenSSL have had design and coding flaws that introduced security holes to the systems that run them.
One way to think about Friendly AI is that it's an offensive approach to the problem of security (i.e., take over the world), instead of a defensive one.
2. Solving the problem of security at a sufficient level of generality requires understanding goals, and is essentially equivalent to solving Friendliness.
What does it mean to have "secure property rights", anyway? If I build an impregnable fortress around me, but an Unfriendly AI causes me to give up my goals in favor of its own by crafting a philosophical argument that is extremely convincing to me but wrong (or more generally, subverts my motivational system in some way), have I retained my "property rights"? What if it does the same to one of my robot servants, so that it subtly starts serving the UFAI's interests while thinking it's still serving mine? How does one define whether a human or an AI has been "subverted" or is "secure", without reference to its "goals"? It became apparent to me that fully solving security is not very different from solving Friendliness.
I would be very interested to know what Nick (and others taking a similar position) thinks after reading the above, or if they've already had similar thoughts but still came to their current conclusions.
A sufficient cause for Nick to claim this would be that he believed that no human-conceivable AI design would be able to incorporate by any means, including by reasoning from first principles or even by reference, anything functionally equivalent to the results of all the various dynamics of updating that have (for instance) made present legal systems as (relatively) robust (against currently engineerable methods of exploitation) as they are.
This seems somewhat strange to you, because you believe humans can conceive of AI designs that could reason some things from first principles (given observations of the world that the reasoning needed to be relevant to, plus reasonably anticipatable advantages of computing power over single humans) or incorporate results by reference.
One possible reason he might believe this would be that he believed that, whenever a human reasons about history or evolved institutions, there are something like two distinct levels of a computational complexity hierarchy at work, and that the powers of the greater level (history and the evolution of institutions) are completely inacessible to the powers of the lesser level (the human). (The machines representing the two levels in this case might be "the mental states accessible to a single armchair philosophy community", or, alternatively, "fledgling AI which, per a priori economic intuition, has no advantage over a few philosophers", versus "the physical states accessible in human history".)
This belief of his might be charged with a sort of independent half-intuitive aversion to making the sorts of (frequently catastrophic) mistakes that are routinely made by people who think they can metaphorically breach this complexity barrier. One effect of such an aversion would be that he would intuitively anticipate that he would always be, at least in expected value, wrong to agree with such people, no matter what arguments they could turn out to have. That is, it wouldn't increase his expected rightness to check to see if they were right about some proposed procedure to get around the complexity barrier, because, intuitively, the prior probability that they were wrong, the conditional probability that they would still be wrong despite being persuasive by any conventional threshold, and the wrongness of the cost that had empirically been inflicted on the world by mistakes of that sort, would all be so high. (I took his reference to Hayek's Fatal Conceit, and the general indirect and implicitly argued emotional dynamic of this interaction, to be confirmation of this intuitive aversion.) By describing this effect explicitly, I don't mean to completely psychologize here, or make a status move by objectification. Intuitions like the one I'm attributing can (and very much should!), of course, be raised to the level of verbally presented propositions, and argued for explicitly.
(For what it's worth, the most direct counter to the complexity argument expressed this way is: "with enough effort it is almost certainly possible, even from this side of the barrier, to formalize how to set into motion entities that would be on the other side of the barrier". To cover the pragmatics of the argument, one would also need to add: "and agreeing that this amount of effort is possible can even be safe, so long as everyone who heard of your agreement was sufficiently strongly motivated not to attempt shortcuts".)
Another, possibly overlapping reason would have to do with the meta level that people around here normally imagine approaching AI safety problems from -- that being, "don't even bother trying to invent all the required philosophy yourself; instead do your best to try to formalize how to mechanically refer to the process that generated, and could continue to generate, something equivalent to the necessary philosophy, so as to make that process happen better or at least to maximally stay out of its way" ("even if this formalization turns out to be very hard to do, as the alternatives are even worse"). That meta level might be one that he doesn't really think of as even being possible. One possible reason for this would be that he weren't aware that anyone actually ever meant to refer to a meta level that high, so that he never developed a separate concept for it. Perhaps when he first encountered e.g. Eliezer's account of the AI safety philosophy/engineering problem, the concept he came away with was based on a filled-in assumption about the default mistake that Eliezer must have made and the consequent meta level at which Eliezer meant to propose that the problem should be attacked, and that meta level was far too low for success to be conceivable, and he didn't afterwards ever spontaneously find any reason to suppose you or Eliezer might not have made that mistake. Another possible reason would be that he disbelieved, on the above-mentioned a priori grounds, that the proposed meta level was possible at all. (Or, at least, that it could ever be safe to believe that it were possible, given the horrors perpetrated and threatened by other people who were comparably confident in their reasons for believing similar things.)