Gunnar_Zarncke comments on Debunking Fallacies in the Theory of AI Motivation - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (343)
With all of the above in mind, a quick survey of some of the things that you just said, with my explanation for why each one would not (or probably would not) be as much of an issue as you think:
For a massive-weak-constraint system, psychological manipulation would be automatically understood to be in the forceful category, because the concept of "psychological manipulation" is defined by a cluster of features that involve intentional deception, and since the "friendliness" concept would ALSO involve a cluster of weak constraints, it would include the extended idea of intentional deception. It would have to, because intentional deception is connected to doing harm, which is connected with unfriendly, etc.
Conclusion: that is not really an "edge" case in the sense that someone has to explicitly remember to deal with it.
We will not need to 'understand' the AGI's concept space too much, if we are both using massive weak constraints, with convergent semantics. This point I addressed in more detail already.
What you are talking about here is the idea of simulating a human to predict their response. Now, humans already do this in a massive way, and they do not do it by making gigantic simulations, but just by doing simple modeling. And, crucially, they rely on the masive-weak-constraints-with-convergent-semantics (you can see now why I need to coin the concise term "Swarm Relaxation") between the self and other minds to keep the problem manageable.
That particular idea - of predicting human response - was not critical to the argument that followed, however.
No, we would not have to solve a FAI-complete problem to do it. We will be developing the AGI from a baby state up to adulthood, keeping its motivation system in sync all the way up, and looking for deviations. So, in other words, we would not need to FIRST build the AGI (with potentially dangerous alen semantics), THEN do a translation between the two semantic systems, THEN go back and use the translation to reconstruct the motivation system of the AGI to make sure it is safe.
Much more could be said about the process of "growing" and "monitoring" the AGI during the development period, but suffice it to say that this process is extremely different if you have a Swarm Relaxation system vs. a logical system of the sort your words imply.
This hits the nail on the head. This comes under the heading of a strong constraint, or a point-source failure mode. The motivation system of a Swarm Relaxation system would not contain "decision rules" of that sort, precisely because they could have large, divergent effects on the behavior. If motivation is, instead, governed by large numbers of weak constraints, and in this case your decision rule would be seen to be a type of deliberate deception, or manipulation, of the humans. And that contradicts a vast array of constraints that are consistent with friendliness.
Same as previous: with a design that does not use decision rules that are prone to point-source failure modes, the issue evaporates.
To summarize: much depends on an understanding of the concept of a weak constraint system. There are no really good readings I can send you (I know I should write one), but you can take a look at the introductory chapter of McClelland and Rumelhart that I gave in the references to the paper.
Also, there is a more recent reference to this concept, from an unexpected source. Yann LeCun has been giving some lectures on Deep Learning in which he came up with a phrase that could have been used two decades ago to describe exactly the sort of behavior to be expected from SA systems. He titles his lecture "The Unreasonable Effectiveness of Deep Learning". That is a wonderful way to express it: swarm relaxation systems do not have to work (there really is no math that can tell you that they should be as good as they are), but they do. They are "unreasonably effective".
There is a very deep truth buried in that phrase, and a lot of what I have to say about SA is encapsulated in it.
That concept spaces can be matched without gotchas is reassuring and may point into a direction AGI can be made friendly. If the concepts are suitably matched in your proposed checking modules. If. And if no other errors are made.