Work on Security Instead of Friendliness?

Wei Dai

So I submit the only useful questions we can ask are not about AGI, "goals", and other such anthropomorphic, infeasible, irrelevant, and/or hopelessly vague ideas. We can only usefully ask computer security questions. For example some researchers I know believe we can achieve virus-safe computing. If we can achieve security against malware as strong as we can achieve for symmetric key cryptography, then it doesn't matter how smart the software is or what goals it has: if one-way functions exist no computational entity, classical or quantum, can crack symmetric key crypto based on said functions. And if NP-hard public key crypto exists, similarly for public key crypto. These and other security issues, and in particular the security of property rights, are the only real issues here and the rest is BS.

-- Nick Szabo

Nick Szabo and I have very similar backrounds and interests. We both majored in computer science at the University of Washington. We're both very interested in economics and security. We came up with similar ideas about digital money. So why don't I advocate working on security problems while ignoring AGI, goals and Friendliness?

In fact, I once did think that working on security was the best way to push the future towards a positive Singularity and away from a negative one. I started working on my Crypto++ Library shortly after reading Vernor Vinge's A Fire Upon the Deep. I believe it was the first general purpose open source cryptography library, and it's still one of the most popular. (Studying cryptography led me to become involved in the Cypherpunks community with its emphasis on privacy and freedom from government intrusion, but a major reason for me to become interested in cryptography in the first place was a desire to help increase security against future entities similar to the Blight described in Vinge's novel.)

I've since changed my mind, for two reasons.

1. The economics of security seems very unfavorable to the defense, in every field except cryptography.

Studying cryptography gave me hope that improving security could make a difference. But in every other security field, both physical and virtual, little progress is apparent, certainly not enough that humans might hope to defend their property rights against smarter intelligences. Achieving "security against malware as strong as we can achieve for symmetric key cryptography" seems quite hopeless in particular. Nick links above to a 2004 technical report titled "Polaris: Virus Safe Computing for Windows XP", which is strange considering that it's now 2012 and malware have little trouble with the latest operating systems and their defenses. Also striking to me has been the fact that even dedicated security software like OpenSSH and OpenSSL have had design and coding flaws that introduced security holes to the systems that run them.

One way to think about Friendly AI is that it's an offensive approach to the problem of security (i.e., take over the world), instead of a defensive one.

2. Solving the problem of security at a sufficient level of generality requires understanding goals, and is essentially equivalent to solving Friendliness.

What does it mean to have "secure property rights", anyway? If I build an impregnable fortress around me, but an Unfriendly AI causes me to give up my goals in favor of its own by crafting a philosophical argument that is extremely convincing to me but wrong (or more generally, subverts my motivational system in some way), have I retained my "property rights"? What if it does the same to one of my robot servants, so that it subtly starts serving the UFAI's interests while thinking it's still serving mine? How does one define whether a human or an AI has been "subverted" or is "secure", without reference to its "goals"? It became apparent to me that fully solving security is not very different from solving Friendliness.

I would be very interested to know what Nick (and others taking a similar position) thinks after reading the above, or if they've already had similar thoughts but still came to their current conclusions.

So I submit the only useful questions we can ask are not about AGI, "goals", and other such anthropomorphic, infeasible, irrelevant, and/or hopelessly vague ideas. We can only usefully ask computer security questions. For example some researchers I know believe we can achieve virus-safe computing. If we can achieve security against malware as strong as we can achieve for symmetric key cryptography, then it doesn't matter how smart the software is or what goals it has: if one-way functions exist no computational entity, classical or quantum, can crack symmetric key crypto based on said functions. And if NP-hard public key crypto exists, similarly for public key crypto. These and other security issues, and in particular the security of property rights, are the only real issues here and the rest is BS.

-- Nick Szabo

I've since changed my mind, for two reasons.

1. The economics of security seems very unfavorable to the defense, in every field except cryptography.

One way to think about Friendly AI is that it's an offensive approach to the problem of security (i.e., take over the world), instead of a defensive one.

2. Solving the problem of security at a sufficient level of generality requires understanding goals, and is essentially equivalent to solving Friendliness.

The reason UDT is called "updateless" is that it doesn't eliminate or change weight of any of the possible worlds. You might want to re-read the UDT post to better understand it.

A particular instance of UDT running particular execution history got to condition on this execution history; you can say that you call conditioning what I call updates; in practice you will want not to run the computations irrelevant to the particular machine, and you will have strictly less computing power in the machine than in the universe it inhabits including the machine itself. It would be good if you could provide example of experimentations it might perform, somewhat formally derived. It feels to me that while it is valuable that you formalized some of the notions you largely have shifted/renamed all the actual problems.

E.g. it is problematic to specify utility function on reality, its incoherent. In your case the utility function is specified on all mathematically representable theories, which may well not allow to actually value a paperclip. Plus the number of potential paperclips within a theory would grow larger than any computable function of size of the theory, and the actions may well be dominated by relatively small, but absolutely enormous, differences between huge theories. Can you make actual example of some utility function? It doesn't have to correspond to paperclips - anything so that UDT with this plugged in would actually do something to our reality rather than the imaginary BusyBeaver(100) beings with imaginary dustspecks in their eyes which might be running a boxed sim of our world.

With regards to Ben Goertzel, where does his AGI include anything like this not so vague utility function of yours? The marketing spiel in question is, indeed, that Ben Goertzel's AI (or someone else's) would maximize an utility function and kill everyone or something, which leads me to assume that they are not talking of your utility function.

With regards to neuromorphic AGIs, I think there's far too much science fiction and far too little understanding of neurology in the rationalization of 'why am I getting paid'. While I do not doubt that brain does implement some sort of 'master trick' in, perhaps, every cortical column, there is an elaborate system for motivating this whole, and that system quite thoroughly fails to care about the real world state, in deed. And once again, why do you think neuromorphic AGIs would have the sort of values of real world as per UDT?

edit: furthermore it seems fairly preposterous to assume high probability that your utility function will actually be implemented in a working manner - say, paperclip maximizing manner - by people who really need SI to tell them to beware of creating skynet. SI is the typical 'high level idea guys' with a belief that the tech guys much smarter than them in fact are specialized in lowly stuff and need the high level idea guys to provide philosophical guidance or else we all die. Incredibly common sight in startups that should never have started up (and fail invariably).

With regards to Ben Goertzel, where does his AGI include anything like this not so vague utility function of yours?

You seem to think that I'm claiming that UDT's notion of utility function is the only way real-world goals might be implemented in an AGI. I'm instead suggesting that it is one way to do so. It currently seems to be the most promising approach for FAI, but I certainly wouldn't say that only AIs using UDT can be said to have real-world goals.

At this point I'm wondering if Nick's complaint of vagueness was about this more general usage of &... (read more)

71

Work on Security Instead of Friendliness?

71

71

71

Work on Security Instead of Friendliness?

71

71