User Comment Replies

Debunking Fallacies in the Theory of AI Motivation

My bizarre question was just an illustrative example. It seems neither you nor I believe that would be an adequate criterion (though perhaps for different reasons).

If I may translate what you're saying into my own terms, you're saying that for a problem like "shoot first or ask first?" the criteria (i.e., constraints) would be highly complex and highly contextual. Ok. I'll grant that's a defensible design choice.

Earlier in the thread you said

the AI is supposed to take an action in spite of the fact that it is getting '''massive feedback''' f

... (read more)

1[anonymous]10y

I am with you on your rejection of 1 and 2, if only because they are both framed as absolutes which ignore context. And, yes, I do favor 3. However, you insert some extra wording that I don't necessarily buy.... You see, hidden in these words seems to be an understanding of how the AI is working, that might lead you to see a huge problem, and me to see something very different. I don't know if this is really what you are thinking, but bear with me while I run with this for a moment. Trying to formulate criteria for something, in an objective, 'codified' way, can sometimes be incredibly hard even when most people would say they have internal 'judgement' that allowed them to make a ruling very easily: the standard saw being "I cannot define what 'pornography' is, but I know it when I see it." And (stepping quickly away from that example because I don't want to get into that quagmire) there is a much more concrete example in the old interactive activation model of word recognition, which is a simple constraint system. In IAC, word recognition is remarkably robust in the face of noise, whereas attempts to write symbolic programs to deal with all the different kinds of noisy corruption of the image turn out to be horribly complex and faulty. As you can see, I am once again pointing to the fact that Swarm Relaxation systems (understood in the very broad sense that allows all varieties of neural net to be included) can make criterial decisions seem easy, where explicit codification of the decision is a nightmare. So, where does that lead to? Well, you go on to say: The key phrase here is "Screw up once, and...". In a constraint system it is impossible for one screw-up (one faulty constraint) to unbalance the whole system. That is the whole raison-d'etre of constraint systems. Also, you say that the problem of making good choices might be FAI-complete. Now, I have some substantial quibbles with that whole "FAI-complete" idea, but in this case I will just ask a questi

Debunking Fallacies in the Theory of AI Motivation

misterbailey10y50

I understand your desire to stick to an exegesis of your own essay, but part of a critical examination of your essay is seeing whether or not it is on point, so these sorts of questions really are "about" your essay.

Regardng your preliminary answer, I by "correct" I assume you mean "correctly reflecting the desires of the human supervisors"? (In which case, this discussion feeds into our other thread.)

3[anonymous]10y

With the best will in the world, I have to focus on one topic at a time: I do not have the bandwidth to wander across the whole of this enormous landscape. As your question: I was using "correct" as a verb, and the meaning was "self-correct" in the sense of bringing back to the previosuly specified course. In this case this would be about the AI perceiving some aspects of its design that it noticed might cause it to depart from what it's goal was nominally supposed to be. In that case it would suggest modifications to correct the problem.

Debunking Fallacies in the Theory of AI Motivation

misterbailey10y10

With respect, your first point doesn't answer my question. My question was, what criteria would cause the AI to submit a given proposed action or plan for human approval? You might say that the AI submits every proposed atomic action for approval (in this case, the criterion is the trivial one, "always submit proposal"), but this seems unlikely. Regardless, it doesn't make sense to say the humans have already heard of the plan about which the AI is just now deciding whether to tell them.

In your second point you seem to be suggesting an answer ... (read more)

3[anonymous]10y

As you wrote, the second point filled in the missing part from the first: it uses its background contextual knowledge. You say you are unsure what this means. That leaves me a little baffled, but here goes anyway. Suppose I asked a person, today, to write a book for me on the subject of "What counts as an action that is significant enough that, if you did that action in a way that it would affect people, it would rise above some level of "nontrivialness" and you should consult them first? Include in your answer a long discussion of the kind of thought processes you went through to come up with your answers" I know many articulate people who could, if they had the time, write a massive book on that subject. Now, that book would contain a huge number of constraints (little factoids about the situation) about "significant actions", and the SOURCE of that long list of constraints would be .... the background knowledge of the person who wrote the book. They would call upon a massive body of knowledge about many aspects of life, to organize their thoughts and come up with the book. If we could look into the head of the person who wrote the book we could find that background knowledge. It would be similar in size to the number of constraints mentioned in the book, or it woudl be larger. That background knowledge -- both its content AND its structure -- is what I refer to when I talk about the AI using contextual information or background knowledge to assess the degree of significance of an action. You go on to ask a bizarre question: This would be an example of an intelligent system sitting there with that massive array of contextual/background knowledge that could be deployed ...... but instead of using that knowledge to make a preliminary assessement of whether "shooting first" would be a good idea, it ignores ALL OF IT and substitutes one single constraint taken from its knowldege base or its goal system: It would entirely defeat the object of using large numbers

Debunking Fallacies in the Theory of AI Motivation

misterbailey10y10

There are other methods than "sitting around thinking of as many exotic disaster scenarios as possible" by which one could seek to make AI friendly. Thus, believing that "sitting around [...]" will not be sufficient does not imply that we should halt AI research.

0TheAncientGeek10y

So where are the multiple solutions to the multiple failure modes?

Debunking Fallacies in the Theory of AI Motivation

misterbailey10y10

My question was about what criteria would cause the AI to make a proposal to the human supervisors before executing its plan. In this case, I don’t think the criteria can be that humans are objecting, since they haven’t heard its plan yet.

(Regarding the point that you're only addressing the scenarios proposed by Yudkowsky et al, see my remark here .)

3[anonymous]10y

That is easy: * Why would the humans have "not heard the plan yet"? It is a no-brainer part of this AI's design that part of the motivation engine (the goals) will be a goal that says "Check with the humans first." The premise in the paper is that we are discussing an AI that was designed as best we could, BUT it then went maverick anyway: it makes no sense for us to switch, now, to talk about an AI that was actually built without that most elementary of safety precautions! * Quite independently, the AI can use its contextual understanding of the situation. Any intelligent system with such a poor understanding of the context and implications of its plans that it just goes ahead with the first plan off the stack, without thinking about implications, is an intelligent system that will walk out in front of a bus just because it wants to get to the other side of the road. In the case in question you are imagining an AI that would be capable of executing a plan to put all humans into bottles, without thinking for one moment to mention to anybody that it was considering this plan? That makes sense in any version of the real world. Such an AI is an implausible hypothetical.

Debunking Fallacies in the Theory of AI Motivation

misterbailey10y10

The problem with you objecting to the particular scenarios Yudkowsky et al propose is that the scenarios are merely illustrative. Of course, you can probably guard against any specific failure mode. The claim is that there will be a lot of failure modes, and we can’t expect to guard against all of them by just sitting around thinking of as many exotic disaster scenarios as possible.

Mind you, I know your argument is more than just “I can see why these particular disasters could be avoided”. You’re claiming that certain features of AI will in general tend to make it careful and benevolent. Still, I don’t think it’s valid for you to complain about bait-and-switch, since that’s precisely the problem.

1[anonymous]10y

I have explicitly addressed this point on many occasions. My paper had nothing in it that was specific to any failure mode. The suggestion is that the entire class of failure modes suggested by Yudkowsky et al. has a common feature: they all rely on the AI being incapable of using a massive array of contextual constraints when evaluating plans. By simply proposing an AI in which such massive constraint deployment is the norm, the ball is now in the other court: it is up to Yudkowsky et al. to come up with ANY kind of failure mode that could get through. The scenarios I attacked in the paper have the common feature that they have been predicated on such a simplistic type of AI that they were bound to fail. They had failure built into them. As soon as everyone moves on from those "dumb" superintelligences and starts to discuss the possible failure modes that could occur in a superintelligence that makes maximum use of constraints, we can start to talk about possible AI dangers. I'm ready to do that. Just waiting for it to happen, is all.

0TheAncientGeek10y

I doubt that, since, coupled with claims of existential risk, the logical conclusion would be to halt AI research , but MIRI isnt saying that,

Debunking Fallacies in the Theory of AI Motivation

misterbailey10y20

Yudkowsky et al don't argue that the problem is unsolvable, only that it is hard. In particular, Yudkowsky fears it may be harder than creating AI in the first place, which would mean that in the natural evolution of things, UFAI appears before FAI. However, I needn't factor what I'm saying through the views of Yudkowsky. For an even more modest claim, we don't have to believe that FAI is hard in hindsight in order to claim that AI will be unfriendly unless certain failure modes are guarded against. On this view of the FAI project, a large part of the effort is just noticing the possible failure modes that were only obvious in hindsight, and convincing people that the problem is important and won't solve itself.

0TheAncientGeek10y

If no one is building AIs with utility functions, then the one kind of failure MIRI is talking about has solved itself,

Debunking Fallacies in the Theory of AI Motivation

misterbailey10y10

Thanks for replying. Yes it does help. My apologies. I think I misunderstood your argument initially. I confess I still don't see how it works though.

You criticize the doctrine of logical infallibility, claiming that a truly intelligent AI would not believe such a thing. Maybe so. I'll set the question aside for now. My concern is that I don't think this doctrine is an essential part of the arguments or scenarios that Yudkowsky et al present.

An intelligent AI might come to a conclusion about what it ought to do, and then recognize "yes, I might... (read more)

0[anonymous]10y

Let me first address the way you phrased it before you gave me the two options. After saying you add: The answer to this is that in all the scenarios I address in the paper - the scenarios invented by Yudkowsky and the rest - the AI is supposed to take an action in spite of the fact that it is getting '''massive feedback''' from all the humans on the planet, that they do not want this action to be executed. That is an important point: nobody is suggesting that these are really subtle fringe cases where the AI thinks that it might be wrong, but it is not sure -- rather, the AI is supposed to go ahead and be unable to stop itself from carrying out the action in spite of clear protests from the humans. That is the meaning of "wrong" here. And it is really easy to produce a good definition of "something going wrong" with the AI's action plans, in cases like these: if there is an enormous inconsistency between descriptions of a world filled with happy humans (and here we can weigh into the scale a thousand books describing happiness in all its forms) and the fact that virtually every human on the planet reacts to the postulated situation by screaming his/her protests, then a million red flags should go up. I think that when posed in this way, the question answers itself, no? In other words, option 2 is close enough to what I meant, except that it is not exactly as a result of its fallibility that it hesitates (knowledge of fallibility is there as a background all the time), but rather due to the immediate fact that its proposed plan causes concern to people.

Debunking Fallacies in the Theory of AI Motivation

misterbailey10y110

I see a fair amount of back-and-forth where someone says "What about this?" and you say "I addressed that in several places; clearly you didn't read it." Unfortunately, while you may think you have addressed the various issues, I don't think you did (and presumably your interlocutors don't). Perhaps you will humor me in responding to my comment. Let me try and make the issue as sharp as possible by pointing out what I think is an out-and-out mistake made by you. In the section you call the heart of your argument, you say.

If the AI

... (read more)

3[anonymous]10y

I'll walk you through it. I did not claim (as you imply) that the fact of there being a programming error was what implied that there is "an inconsistency in its reasoning." In the two paragraphs immediately before the one you quote (and, indeed, in that whole section), I explain that the system KNOWS that it is following these two imperatives: 1) Conclusions produced by my reasoning engine are always correct. [This is the Doctrine of Logical Infallibility] 2) I know that AGI reasoning engines in general, and mine in particular, sometimes come to incorrect conclusions that are the result of a failure in their design. Or, paraphrasing this in the simplest possible way: 1) My reasoning engine is infallible. 2) My reasoning engine is fallible. That, right there, is a flat-out contradiction between two of its core "beliefs". It is not, as you state, that the existence of a programming error is evidence of inconsistency, it is the above pair of beliefs (engendered by the programming error) that constitute the inconsistency. Does that help?

Welcome to Less Wrong! (7th thread, December 2014)

misterbailey11y60

Hi. I'm a long time lurker (a few years now), and I finally joined so that I could participate in the community and the discussions. This was borne partly out of a sense that I'm at a place in my life where I could really benefit from this community (and it could benefit from me), and partly out of a specific interest in some of the things that have been posted recently: the MIRI technical research agenda.

In particular, once I've had more time to digest it, I want to post comments and questions about Reasoning Under Logical Uncertainty.

More about me: I'm... (read more)

LESSWRONG
LW

All of misterbailey's Comments + Replies