jessicat comments on Debunking Fallacies in the Theory of AI Motivation - LessWrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (343)
With all of the above in mind, a quick survey of some of the things that you just said, with my explanation for why each one would not (or probably would not) be as much of an issue as you think:
For a massive-weak-constraint system, psychological manipulation would be automatically understood to be in the forceful category, because the concept of "psychological manipulation" is defined by a cluster of features that involve intentional deception, and since the "friendliness" concept would ALSO involve a cluster of weak constraints, it would include the extended idea of intentional deception. It would have to, because intentional deception is connected to doing harm, which is connected with unfriendly, etc.
Conclusion: that is not really an "edge" case in the sense that someone has to explicitly remember to deal with it.
We will not need to 'understand' the AGI's concept space too much, if we are both using massive weak constraints, with convergent semantics. This point I addressed in more detail already.
What you are talking about here is the idea of simulating a human to predict their response. Now, humans already do this in a massive way, and they do not do it by making gigantic simulations, but just by doing simple modeling. And, crucially, they rely on the masive-weak-constraints-with-convergent-semantics (you can see now why I need to coin the concise term "Swarm Relaxation") between the self and other minds to keep the problem manageable.
That particular idea - of predicting human response - was not critical to the argument that followed, however.
No, we would not have to solve a FAI-complete problem to do it. We will be developing the AGI from a baby state up to adulthood, keeping its motivation system in sync all the way up, and looking for deviations. So, in other words, we would not need to FIRST build the AGI (with potentially dangerous alen semantics), THEN do a translation between the two semantic systems, THEN go back and use the translation to reconstruct the motivation system of the AGI to make sure it is safe.
Much more could be said about the process of "growing" and "monitoring" the AGI during the development period, but suffice it to say that this process is extremely different if you have a Swarm Relaxation system vs. a logical system of the sort your words imply.
This hits the nail on the head. This comes under the heading of a strong constraint, or a point-source failure mode. The motivation system of a Swarm Relaxation system would not contain "decision rules" of that sort, precisely because they could have large, divergent effects on the behavior. If motivation is, instead, governed by large numbers of weak constraints, and in this case your decision rule would be seen to be a type of deliberate deception, or manipulation, of the humans. And that contradicts a vast array of constraints that are consistent with friendliness.
Same as previous: with a design that does not use decision rules that are prone to point-source failure modes, the issue evaporates.
To summarize: much depends on an understanding of the concept of a weak constraint system. There are no really good readings I can send you (I know I should write one), but you can take a look at the introductory chapter of McClelland and Rumelhart that I gave in the references to the paper.
Also, there is a more recent reference to this concept, from an unexpected source. Yann LeCun has been giving some lectures on Deep Learning in which he came up with a phrase that could have been used two decades ago to describe exactly the sort of behavior to be expected from SA systems. He titles his lecture "The Unreasonable Effectiveness of Deep Learning". That is a wonderful way to express it: swarm relaxation systems do not have to work (there really is no math that can tell you that they should be as good as they are), but they do. They are "unreasonably effective".
There is a very deep truth buried in that phrase, and a lot of what I have to say about SA is encapsulated in it.
Okay, thanks a lot for the detailed response. I'll explain a bit about where I'm coming from with understading the concept learning problem:
I do think that figuring out if we can get more optimistic (but still justified) assumptions is good. You mention empirical experience with swarm relaxation as a possible way of gaining confidence that it is learning concepts correctly. Now that I think about it, bad handling of novel edge cases might be a form of "meta-overfitting", and perhaps we can gain confidence in a system's ability to deal with context shifts by having it go through a series of context shifts well without overfitting. This is the sort of thing that might work, and more research into whether it does is valuable, but it still seems worth preparing for the case where it doesn't.
Anyway, thanks for giving me some good things to think about. I think I see how a lot of our disagreements mostly come down to how much convergence we expect from different concept learning systems. For example, if "psychological manipulation" is in some sense a natural category, then of course it can be added as a weak (or even strong) constraint on the system.
I'll probably think about this a lot more and eventually write up something explaining reasons why we might or might not expect to get convergent concepts from different systems, and the degree to which this changes based on how value-laden a concept is.
I didn't really understand a lot of what you said here. My current model is something like "if a concept is defined by lots of weak constraints, then lots of these constraints have to go wrong at once for the concept to go wrong, and we think this is unlikely due to induction and some kind of independence/uncorrelatedness assumption"; is this correct? If this is the right understanding, I think I have low confidence that errors in each weak constraint are in fact not strongly correlated with each other.
I think you have homed in exactly on the place where the disagreement is located. I am glad we got here so quickly (it usually takes a very long time, where it happens at all).
Yes, it is the fact that "weak constraint" systems have (supposedly) the property that they are making the greatest possible attempt to find a state of mutual consistency among the concepts, that leads to the very different conclusions that I come to, versus the conclusions that seem to inhere in logical approaches to AGI. There really is no underestimating the drastic difference between these two perspectives: this is not just a matter of two possible mechanisms, it is much more like a clash of paradigms (if you'll forgive a cliche that I know some people absolutely abhor).
One way to summarize the difference is by imagining a sequence of AI designs, with progressive increases in sophistication. At the beginning, the representation of concepts is simple, the truth values are just T and F, and the rules for generating new theorems from the axioms are simple and rigid.
As the designs get better various new features are introduced ... but one way to look at the progression of features is that constraints between elements of the system get more widespread, and more subtle in nature, as the types of AI become better and better.
An almost trivial example of what I mean: when someone builds a real-time reasoning engine in which there has to be a strict curtailment of the time spent doing certain types of searches in the knowledge base, a wise AI programmer will insert some sanity checks that kick in after the search has to be curtailed. The sanity checks are a kind of linkage from the inference being examined, to the rest of the knowledge that the system has, to see if the truncated reasoning left the system in a state where it concluded something that is patently stupid. These sanity checks are almost always extramural to the logical process -- for which read: they are glorified kludges -- but in a real world system they are absolutely vital. Now, from my point of view what these sanity checks do is act as weak constraints on one little episode in the behavior of the system.
Okay, so if you buy my suggestion that in practice AI systems become better, the more that they allow the little reasoning episodes to be connected to the rest of system by weak constraints, then I would like to go one step further and propose the following:
1) As a matter of fact, you can build AI systems (or, parts of AI systems) that take the whole "let's connect everything up with weak constraints" idea to an extreme, throwing away almost everything else (all the logic!) and keeping only the huge population of constraints, and something amazing happens: the system works better that way. (An old classic example, but one which still has lessons to teach, is the very crude Interactive Activation model of word recognition. Seen in its historical context it was a bombshell, because it dumped all the procedural programming that people had thought was necessary to do word recognition from features, and replaced it with nothing-but-weak-constraints .... and it worked better than any procedural program was able to do.)
2) This extreme attitude to the power of weak constraints comes with a price: you CANNOT have mathematical assurances or guarantees of correct behavior. Your new weak-constraint system might actually be infinitely more reliable and stable than any of the systems you could build, where there is a possibility to get some kind of mathematical guarantees of correctness or convergence, but you might never be able to prove that fact (except with some general talk about the properties of ensembles).
All of that is what is buried in the phrase I stole from Yann LeCun: the "unreasonable effectiveness" idea. These systems are unreasonably good at doing what they do. They shouldn't be so good. But they are.
As you can imagine, this is such a huge departure from the traditional way of thinking in AI, that many people find it completely alien. Believe it or not, I know people who seem willing to go to any lengths to destroy the credibility of someone who suggests the idea that mathematical rigor might be a bad thing in AI, or that there are ways of doing AI that are better than the status quo, but which involve downgrading the role of mathematics to just technical-support level, rather than primacy.
--
On your last question, I should say that I was only referring to the fact that in systems of weak constraints, there is extreme independence between the constraints, and they are all relatively small, so it is hard for an extremely inconsistent 'belief' or 'fact' to survive without being corrected. This is all about the idea of "single point of failure" and its antithesis.
I think it would not go amiss to read Vikash Masinghka's PhD thesis and the open-world generation paper to see a helpful probabilistic programming approach to these issues. In summary: we can use probabilistic programming to learn the models we need, use conditioning/
queryto condition the models on the constraints we intend to enforce, and then sample the resulting distributions to generate "actions" which are very likely to be "good enough" and very unlikely to be "bad". We sample instead of inferring the maximum-a-posteriori action or expected action precisely because as part of the Bayesian modelling process we assume that the peak of our probability density does not necessary correspond to an in-the-world optimum.I agree that choosing an action randomly (with higher probability for good actions) is a good way to create a fuzzy satisficer. Do you have any insights into how to:
create queries for planning that don't suffer from "wishful thinking", with or without nested queries. Basically the problem is that if I want an action conditioned on receiving a high utility (e.g. we have a factor on the expected utility node U equal to e^(alpha * U) ), then we are likely to choose high-variance actions while inferring that the rest of the model works out such that these actions return high utilities
extend this to sequential planning without nested nested nested nested nested nested queries