Comment author: Richard_Loosemore 19 May 2015 02:16:56PM 2 points [-]

As you wrote, the second point filled in the missing part from the first: it uses its background contextual knowledge.

You say you are unsure what this means. That leaves me a little baffled, but here goes anyway. Suppose I asked a person, today, to write a book for me on the subject of "What counts as an action that is significant enough that, if you did that action in a way that it would affect people, it would rise above some level of "nontrivialness" and you should consult them first? Include in your answer a long discussion of the kind of thought processes you went through to come up with your answers" I know many articulate people who could, if they had the time, write a massive book on that subject.

Now, that book would contain a huge number of constraints (little factoids about the situation) about "significant actions", and the SOURCE of that long list of constraints would be .... the background knowledge of the person who wrote the book. They would call upon a massive body of knowledge about many aspects of life, to organize their thoughts and come up with the book.

If we could look into the head of the person who wrote the book we could find that background knowledge. It would be similar in size to the number of constraints mentioned in the book, or it woudl be larger.

That background knowledge -- both its content AND its structure -- is what I refer to when I talk about the AI using contextual information or background knowledge to assess the degree of significance of an action.

You go on to ask a bizarre question:

Is it reasonable to suppose that the AI would make the decision about whether to "shoot first" or "ask first" based on things like, eg., the lower end of its 99% confidence interval for how satisfied its supervisors will be?

This would be an example of an intelligent system sitting there with that massive array of contextual/background knowledge that could be deployed ...... but instead of using that knowledge to make a preliminary assessement of whether "shooting first" would be a good idea, it ignores ALL OF IT and substitutes one single constraint taken from its knowldege base or its goal system:

"Does this satisfy my criteria for how satisfied my supervisors will be?"

It would entirely defeat the object of using large numbers of constraints in the system, to use only one constraint. The system design is (assumed to be) such that this is impossible. That is the whole point of the Swarm Relaxation design that I talked about.

Comment author: misterbailey 20 May 2015 03:58:21PM *  1 point [-]

My bizarre question was just an illustrative example. It seems neither you nor I believe that would be an adequate criterion (though perhaps for different reasons).

If I may translate what you're saying into my own terms, you're saying that for a problem like "shoot first or ask first?" the criteria (i.e., constraints) would be highly complex and highly contextual. Ok. I'll grant that's a defensible design choice.

Earlier in the thread you said

the AI is supposed to take an action in spite of the fact that it is getting '''massive feedback''' from all the humans on the planet, that they do not want this action to be executed.

This is why I have honed in on scenarios where the AI has not yet received feedback on its plan. In these scenarios, the AI presumably must decide (even if the decision is only implicit) whether to consult humans about its plan first, or to go ahead with its plan first (and halt or change course in response to human feedback). To lay my cards on the table, I want to consider three possible policies the AI could have regarding this choice.

  1. Always (or usually) consult first. We can rule this out as impractical, if the AI is making a large number of atomic actions.
  2. Always (or usually) shoot first, and see what the response is. Unless the AI only makes friendly plans, I think this policy is catastrophic, since I believe there are many scenarios where an AI could initiate a plan and before we know what hit us we're in an unrecoverably bad situation. Therefore, implementing this policy in a non-catastrophic way is FAI-complete.
  3. Have some good critera for picking between "shoot first" or "ask first" on any given chunk of planning. This is what you seem to be favoring in your answer above. (Correct me if I'm wrong.) These criteria will tend to be complex, and not necessarily formulated internally in an axiomatic way. Regardless, I fear making good choices between "shoot first" or "ask first" is hard, even FAI-complete. Screw up once, and you are in a catastrophe like in case 2.

Can you let me know: have I understood you correctly? More importantly, do you agree with my framing of the dilemma for the AI? Do you agree with my assessment of the pitfalls of each of the 3 policies?

Comment author: Richard_Loosemore 05 May 2015 02:42:15PM *  7 points [-]

As to not understanding the argument - that's understandable, because this is a long and dense paper.

If you are trying to summarize the whole paper when you say "if we succeed to make the Friendly AI perfectly on the first attempt, then we do not have to worry about what could go wrong, because the perfect Friendly AI would not do anything stupid", then that would not be right. The argument includes a statement that resembles that, but only as an aside.

As to your question about what happens next, or what happens if we only get the "Friendly" part 90% correct .... well, you are dragging me off into new territory, because that was not really within the scope of the paper. Don't get me wrong: I like being dragged off into that territory! But there just isn't time to write down and argue the whole domain of AI friendliness all in one sitting.

The preliminary answer to that question is that everything depends on the details of the motivation system design and my feeling (as a designer of AGI motivation systems) is that beyond a certain point the system is self-stabilizing. That is, it will understand its own limitations and try to correct them.

But that last statement tends to get (some other) people inflamed, because they do not realize that it comes within the "swarm relaxation" context, and they misunderstand the manner in which a system would self correct. Although I said a few things about swarm relaxation in the paper, I did not give enough detail to be able to address this whole topic here.

Comment author: misterbailey 19 May 2015 09:18:18AM *  3 points [-]

I understand your desire to stick to an exegesis of your own essay, but part of a critical examination of your essay is seeing whether or not it is on point, so these sorts of questions really are "about" your essay.

Regardng your preliminary answer, I by "correct" I assume you mean "correctly reflecting the desires of the human supervisors"? (In which case, this discussion feeds into our other thread.)

Comment author: Richard_Loosemore 18 May 2015 05:04:52PM 2 points [-]

That is easy:

  • Why would the humans have "not heard the plan yet"? It is a no-brainer part of this AI's design that part of the motivation engine (the goals) will be a goal that says "Check with the humans first." The premise in the paper is that we are discussing an AI that was designed as best we could, BUT it then went maverick anyway: it makes no sense for us to switch, now, to talk about an AI that was actually built without that most elementary of safety precautions!

  • Quite independently, the AI can use its contextual understanding of the situation. Any intelligent system with such a poor understanding of the context and implications of its plans that it just goes ahead with the first plan off the stack, without thinking about implications, is an intelligent system that will walk out in front of a bus just because it wants to get to the other side of the road. In the case in question you are imagining an AI that would be capable of executing a plan to put all humans into bottles, without thinking for one moment to mention to anybody that it was considering this plan? That makes sense in any version of the real world. Such an AI is an implausible hypothetical.

Comment author: misterbailey 19 May 2015 09:08:45AM 1 point [-]

With respect, your first point doesn't answer my question. My question was, what criteria would cause the AI to submit a given proposed action or plan for human approval? You might say that the AI submits every proposed atomic action for approval (in this case, the criterion is the trivial one, "always submit proposal"), but this seems unlikely. Regardless, it doesn't make sense to say the humans have already heard of the plan about which the AI is just now deciding whether to tell them.

In your second point you seem to be suggesting an answer to my question. (Correct me if I'm wrong.) You seem to be suggesting "context." I'm not sure what is meant by this. Is it reasonable to suppose that the AI would make the decision about whether to "shoot first" or "ask first" based on things like, eg., the lower end of its 99% confidence interval for how satisfied its supervisors will be?

Comment author: TheAncientGeek 18 May 2015 10:58:11AM *  0 points [-]

The claim is that there will be a lot of failure modes, and we can’t expect to guard against all of them by just sitting around thinking of as many exotic disaster scenarios as possible.

I doubt that, since, coupled with claims of existential risk, the logical conclusion would be to halt AI research , but MIRI isnt saying that,

Comment author: misterbailey 18 May 2015 02:12:13PM 1 point [-]

There are other methods than "sitting around thinking of as many exotic disaster scenarios as possible" by which one could seek to make AI friendly. Thus, believing that "sitting around [...]" will not be sufficient does not imply that we should halt AI research.

Comment author: Richard_Loosemore 11 May 2015 09:56:01PM *  1 point [-]

Let me first address the way you phrased it before you gave me the two options.

After saying

My concern is that I don't think this doctrine [of Logical Infallibility] is an essential part of the arguments or scenarios that Yudkowsky et al present.

you add:

An intelligent AI might come to a conclusion about what it ought to do, and then recognize "yes, I might be wrong about this" (whatever is meant by "wrong"---this is not at all clear).

The answer to this is that in all the scenarios I address in the paper - the scenarios invented by Yudkowsky and the rest - the AI is supposed to take an action in spite of the fact that it is getting '''massive feedback''' from all the humans on the planet, that they do not want this action to be executed. That is an important point: nobody is suggesting that these are really subtle fringe cases where the AI thinks that it might be wrong, but it is not sure -- rather, the AI is supposed to go ahead and be unable to stop itself from carrying out the action in spite of clear protests from the humans.

That is the meaning of "wrong" here. And it is really easy to produce a good definition of "something going wrong" with the AI's action plans, in cases like these: if there is an enormous inconsistency between descriptions of a world filled with happy humans (and here we can weigh into the scale a thousand books describing happiness in all its forms) and the fact that virtually every human on the planet reacts to the postulated situation by screaming his/her protests, then a million red flags should go up.

I think that when posed in this way, the question answers itself, no?

In other words, option 2 is close enough to what I meant, except that it is not exactly as a result of its fallibility that it hesitates (knowledge of fallibility is there as a background all the time), but rather due to the immediate fact that its proposed plan causes concern to people.

Comment author: misterbailey 18 May 2015 09:23:24AM 1 point [-]

My question was about what criteria would cause the AI to make a proposal to the human supervisors before executing its plan. In this case, I don’t think the criteria can be that humans are objecting, since they haven’t heard its plan yet.

(Regarding the point that you're only addressing the scenarios proposed by Yudkowsky et al, see my remark here .)

Comment author: Richard_Loosemore 18 May 2015 02:16:18AM 0 points [-]

Maybe I could try to reduce possible confusion here. The paper was written to address a category of "AI Risk" scenarios in which we are told:

"Even if the AI is programmed with goals that are ostensibly favorable to humankind, it could execute those goals in such a way that would lead to disaster".

Given that premise, it would be a bait-and-switch if I proposed a fix for this problem, and someone objected with "But you cannot ASSUME that the programmers would implement that fix!"

The whole point of the problem under consideration is that even if the engineers tried, they could not get the AI to stay true.

Comment author: misterbailey 18 May 2015 09:19:47AM *  1 point [-]

The problem with you objecting to the particular scenarios Yudkowsky et al propose is that the scenarios are merely illustrative. Of course, you can probably guard against any specific failure mode. The claim is that there will be a lot of failure modes, and we can’t expect to guard against all of them by just sitting around thinking of as many exotic disaster scenarios as possible.

Mind you, I know your argument is more than just “I can see why these particular disasters could be avoided”. You’re claiming that certain features of AI will in general tend to make it careful and benevolent. Still, I don’t think it’s valid for you to complain about bait-and-switch, since that’s precisely the problem.

Comment author: Richard_Loosemore 18 May 2015 02:16:18AM 0 points [-]

Maybe I could try to reduce possible confusion here. The paper was written to address a category of "AI Risk" scenarios in which we are told:

"Even if the AI is programmed with goals that are ostensibly favorable to humankind, it could execute those goals in such a way that would lead to disaster".

Given that premise, it would be a bait-and-switch if I proposed a fix for this problem, and someone objected with "But you cannot ASSUME that the programmers would implement that fix!"

The whole point of the problem under consideration is that even if the engineers tried, they could not get the AI to stay true.

Comment author: misterbailey 18 May 2015 09:16:17AM 1 point [-]

Yudkowsky et al don't argue that the problem is unsolvable, only that it is hard. In particular, Yudkowsky fears it may be harder than creating AI in the first place, which would mean that in the natural evolution of things, UFAI appears before FAI. However, I needn't factor what I'm saying through the views of Yudkowsky. For an even more modest claim, we don't have to believe that FAI is hard in hindsight in order to claim that AI will be unfriendly unless certain failure modes are guarded against. On this view of the FAI project, a large part of the effort is just noticing the possible failure modes that were only obvious in hindsight, and convincing people that the problem is important and won't solve itself.

Comment author: Richard_Loosemore 07 May 2015 02:00:46PM 2 points [-]

I'll walk you through it.

I did not claim (as you imply) that the fact of there being a programming error was what implied that there is "an inconsistency in its reasoning." In the two paragraphs immediately before the one you quote (and, indeed, in that whole section), I explain that the system KNOWS that it is following these two imperatives:

1) Conclusions produced by my reasoning engine are always correct. [This is the Doctrine of Logical Infallibility]

2) I know that AGI reasoning engines in general, and mine in particular, sometimes come to incorrect conclusions that are the result of a failure in their design.

Or, paraphrasing this in the simplest possible way:

1) My reasoning engine is infallible.

2) My reasoning engine is fallible.

That, right there, is a flat-out contradiction between two of its core "beliefs". It is not, as you state, that the existence of a programming error is evidence of inconsistency, it is the above pair of beliefs (engendered by the programming error) that constitute the inconsistency.

Does that help?

Comment author: misterbailey 11 May 2015 09:47:42AM 1 point [-]

Thanks for replying. Yes it does help. My apologies. I think I misunderstood your argument initially. I confess I still don't see how it works though.

You criticize the doctrine of logical infallibility, claiming that a truly intelligent AI would not believe such a thing. Maybe so. I'll set the question aside for now. My concern is that I don't think this doctrine is an essential part of the arguments or scenarios that Yudkowsky et al present.

An intelligent AI might come to a conclusion about what it ought to do, and then recognize "yes, I might be wrong about this" (whatever is meant by "wrong"---this is not at all clear). The AI might always recognize this possibility about every one of its conclusions. Still, so what? Does this mean it won't act?

Can you tell me how you feel about the following two options? Or, if you prefer a third option, could you explain it? You could

1) explicitly program the AI to ask the programmers about every single one of its atomic actions before executing them. I think this is unrealistic. ("Should I move this articulator arm .5 degrees clockwise?")

2) or, expect the AI to conclude, through its own intelligence, that the programmers would want it to check in about some particular plan, P, before executing it. Presumably, the reason the AI would have for this checking-in would be that it sees that, as a result of its fallibility, there is a high chance that this course of action, P, might actually be unsatisfying to the programmers. But the point is that this checking-in is triggered by a specific concern the AI has about the risk to programmer satisfaction. This checking-in would not be triggered by plan Q that the AI didn't have a reasonable concern was a risk to programmer satisfaction.

Do you agree with either of these options? Can you suggest alternatives?

Comment author: misterbailey 07 May 2015 12:46:34PM 7 points [-]

I see a fair amount of back-and-forth where someone says "What about this?" and you say "I addressed that in several places; clearly you didn't read it." Unfortunately, while you may think you have addressed the various issues, I don't think you did (and presumably your interlocutors don't). Perhaps you will humor me in responding to my comment. Let me try and make the issue as sharp as possible by pointing out what I think is an out-and-out mistake made by you. In the section you call the heart of your argument, you say.

If the AI is superintelligent (and therefore unstoppable), it will be smart enough to know all about its own limitations when it comes to the business of reasoning about the world and making plans of action. But if it is also programmed to utterly ignore that fallibility—for example, when it follows its compulsion to put everyone on a dopamine drip, even though this plan is clearly a result of a programming error—then we must ask the question: how can the machine be both superintelligent and able to ignore a gigantic inconsistency in its reasoning?

Yes, the outcome is clearly the result of a "programming error" (in some sense). However, you then ask how a superintelligent machine could ignore such an "inconsistency in its reasoning." But a programming error is not the same thing as an inconsistency in reasoning.

Note: I want to test your argument (at least at first), so I would rather not get a response from you claiming I've failed to take into account other arguments or other evidence, therefore my objection is invalid. Let me propose that you either 1) dispute that this was, in fact, a mistake, 2) explain how I have misunderstood, 3) grant that it was a mistake, and reformulate the claim here, or 4) state that this claim is not necessary for your argument.

If you can help me understand this point, I would be happy to continue to engage.

Comment author: misterbailey 12 January 2015 02:15:57PM 5 points [-]

Hi. I'm a long time lurker (a few years now), and I finally joined so that I could participate in the community and the discussions. This was borne partly out of a sense that I'm at a place in my life where I could really benefit from this community (and it could benefit from me), and partly out of a specific interest in some of the things that have been posted recently: the MIRI technical research agenda.

In particular, once I've had more time to digest it, I want to post comments and questions about Reasoning Under Logical Uncertainty.

More about me: I'm currently working as a postdoctoral fellow in mathematics. My professional work is in physics-y differential geometry, so only connected to the LW material indirectly via things like quantum mechanics. I practice Buhddist meditation, without definitively endorsing any of the doctrines. I'm surprised meditation hasn't gotten more airtime in the rationalist community.

My IRL exposure to the LWverse is limited (hi Critch!), but I gather there's a meetup group in Utrecht, where I'm living now.

Anyway, I look forward to good discussions. Hello everyone!

View more: Next