Comment author: TheAncientGeek 16 May 2015 06:45:09PM *  0 points [-]

Since it is a steelman it isnt supposed to be what the paper is saying,

Are you maintaining, in contrast, that the maverick nanny is flatly impossible?

Comment author: Richard_Loosemore 18 May 2015 08:09:13PM 0 points [-]

Sorry, I may have been confused about what you were trying to say because you were responding to someone else, and I hadn't come across the 'steelman' term before.

I withdraw 'parody' (sorry!) but ... it isn't quite what the logical structure of the paper was supposed to be.

It feels like you steelmanned it onto some other railroad track, so to speak.

Comment author: OrphanWilde 18 May 2015 05:57:48PM 1 point [-]

Alright, I'll take you up on it:

Failure Mode I: The AI doesn't do anything useful, because there's no way of satisfying every contextual constraint.

Predicting your response: "That's not what I meant."

Failure Mode II: The AI weighs contextual constraints incorrectly and sterilizes all humans to satisfy the sort of person who believes in Voluntary Human Extinction.

Predicting your response: "It would (somehow) figure out the correct weighting for all the contextual constraints."

Failure Mode III: The AI weighs contextual constraints correctly (for a given value of "correctly") and sterilizes everybody of below-average intelligence or any genetic abnormalities that could impose costs on offspring, and in the process, sterilizes all humans.

Predicting your response: "It wouldn't do something so dumb."

Failure Mode IV: The AI weighs contextual constraints correctly and puts all people of minority ethical positions into mind-rewriting machines so that there's no disagreement anymore.

Predicting your response: "It wouldn't do something so dumb."

We could keep going, but the issue is that so far, you've defined -any- failure mode as "dumb"ness, and have argued that the AI wouldn't do anything so "dumb", because you've already defined that it is superintelligent.

I don't think you know what intelligence -is-. Intelligence does not confer immunity to "dumb" behaviors.

Comment author: Richard_Loosemore 18 May 2015 08:01:38PM 0 points [-]

I will take them one at a time:

Failure Mode I: The AI doesn't do anything useful, because there's no way of satisfying every contextual constraint.

An elementary error. The constraints in question are referred to in the literature as "weak" constraints (and I believe I used that qualifier in the paper: I almost always do). Weak constraints never need to be ALL satisfied at once. No AI could ever be designed that way, and no-one ever suggested that it would. See the reference to McClelland, J.L., Rumelhart, D.E. & Hinton, G.E. (1986) in the paper: that gives a pretty good explanation of weak constraints.

Predicting your response: "That's not what I meant."

That's an insult. But I will overlook it, since I know it is just your style.

Failure Mode II: The AI weighs contextual constraints incorrectly and sterilizes all humans to satisfy the sort of person who believes in Voluntary Human Extinction.

How exactly do you propose that the AI "weighs contextual constraints incorrectly" when the process of weighing constraints requires most of the constraints involved (probably thousands of them) to all suffer a simultaneous, INDEPENDENT 'failure' for this to occur?

That is implicit in the way that weak constraint systems are built. Perhaps you are not familiar with the details.

Predicting your response: "It would (somehow) figure out the correct weighting for all the contextual constraints."

Assuming this isn't more of the same, what you are saying here is isomorphic to the statement that somehow, a neural net might figure out the correct weighting for all the connections so that it produces the correctly trained output for a given input. That problem was solved in so many different NN systems that most NN people, these days, would consider your statement puzzling.

Failure Mode III: The AI weighs contextual constraints correctly (for a given value of "correctly") and sterilizes everybody of below-average intelligence or any genetic abnormalities that could impose costs on offspring, and in the process, sterilizes all humans.

A trivial variant of your second failure mode. The AI is calculating the constraints correctly, according to you, but at the same time you suggest that it has somehow NOT included any of the constraints that relate to the ethics of forced sterilization, etc. etc. You offer no explanation of why all of those constraints were not counted by your proposed AI, you just state that they weren't.

Predicting your response: "It wouldn't do something so dumb."

Yet another insult. This is getting a little tiresome, but I will carry on.

Failure Mode IV: The AI weighs contextual constraints correctly and puts all people of minority ethical positions into mind-rewriting machines so that there's no disagreement anymore.

This is identical to your third failure mode, but here you produce a different list of constraints that were ignored. Again, with no explanation of why a massive collection of constraints suddenly disappeared.

Predicting your response: "It wouldn't do something so dumb."

No comment.

We could keep going, but the issue is that so far, you've defined -any- failure mode as "dumb"ness, and have argued that the AI wouldn't do anything so "dumb", because you've already defined that it is superintelligent.

This is a bizarre statement, since I have said no such thing. Would you mind including citations, from now on, when you say that I "said" something? And please try not to paraphrase, because it takes time to correct the distortions in your paraphrases.

I don't think you know what intelligence -is-. Intelligence does not confer immunity to "dumb" behaviors.

Another insult, and putting words into my mouth, and showing no understanding of what a weak constraint system actually is.

Comment author: misterbailey 18 May 2015 09:19:47AM *  1 point [-]

The problem with you objecting to the particular scenarios Yudkowsky et al propose is that the scenarios are merely illustrative. Of course, you can probably guard against any specific failure mode. The claim is that there will be a lot of failure modes, and we can’t expect to guard against all of them by just sitting around thinking of as many exotic disaster scenarios as possible.

Mind you, I know your argument is more than just “I can see why these particular disasters could be avoided”. You’re claiming that certain features of AI will in general tend to make it careful and benevolent. Still, I don’t think it’s valid for you to complain about bait-and-switch, since that’s precisely the problem.

Comment author: Richard_Loosemore 18 May 2015 05:11:41PM 0 points [-]

I have explicitly addressed this point on many occasions. My paper had nothing in it that was specific to any failure mode.

The suggestion is that the entire class of failure modes suggested by Yudkowsky et al. has a common feature: they all rely on the AI being incapable of using a massive array of contextual constraints when evaluating plans.

By simply proposing an AI in which such massive constraint deployment is the norm, the ball is now in the other court: it is up to Yudkowsky et al. to come up with ANY kind of failure mode that could get through.

The scenarios I attacked in the paper have the common feature that they have been predicated on such a simplistic type of AI that they were bound to fail. They had failure built into them.

As soon as everyone moves on from those "dumb" superintelligences and starts to discuss the possible failure modes that could occur in a superintelligence that makes maximum use of constraints, we can start to talk about possible AI dangers. I'm ready to do that. Just waiting for it to happen, is all.

Comment author: misterbailey 18 May 2015 09:23:24AM 1 point [-]

My question was about what criteria would cause the AI to make a proposal to the human supervisors before executing its plan. In this case, I don’t think the criteria can be that humans are objecting, since they haven’t heard its plan yet.

(Regarding the point that you're only addressing the scenarios proposed by Yudkowsky et al, see my remark here .)

Comment author: Richard_Loosemore 18 May 2015 05:04:52PM 2 points [-]

That is easy:

  • Why would the humans have "not heard the plan yet"? It is a no-brainer part of this AI's design that part of the motivation engine (the goals) will be a goal that says "Check with the humans first." The premise in the paper is that we are discussing an AI that was designed as best we could, BUT it then went maverick anyway: it makes no sense for us to switch, now, to talk about an AI that was actually built without that most elementary of safety precautions!

  • Quite independently, the AI can use its contextual understanding of the situation. Any intelligent system with such a poor understanding of the context and implications of its plans that it just goes ahead with the first plan off the stack, without thinking about implications, is an intelligent system that will walk out in front of a bus just because it wants to get to the other side of the road. In the case in question you are imagining an AI that would be capable of executing a plan to put all humans into bottles, without thinking for one moment to mention to anybody that it was considering this plan? That makes sense in any version of the real world. Such an AI is an implausible hypothetical.

Comment author: ZacHirschman 16 May 2015 07:02:09PM 0 points [-]

You have my apologies if you thought I was attacking or pigeonholing your argument. While I lack the technical expertise to critique the technical portion of your argument, I think it could benefit from a more explicit avoidance of the fallacy mentioned above. I thought the article was very interesting and I will certainly come back to it if I ever get to the point where I can understand your distinctions between swarm intelligence and CFAI. I understand you have been facing attacks for your position in this article, but that is not my intention. Your meticulous arguments are certainly impressive, but you do them a disservice by dismissing well intentioned critique, especially as it applies to the structure of your argument and not the substance.

Einstein made predictions about what the universe would look like if there were a maximum speed. Your prediction seems to be that well built ai will not misunderstand its goals (please assume that I read your article thoroughly and that any misunderstandings are benign). What does the universe look like if this is false?

I probably fall under category a in your disjunction. Is it truly pointless to help me overcome my misunderstanding? From the large volume of comments, it seems likely that this misunderstanding is partially caused by a gap between what you are trying to say, and what was said. Please help me bridge this gap instead of denying its existence or calling such an exercise pointless.

Comment author: Richard_Loosemore 18 May 2015 04:52:23PM *  0 points [-]

Let me see if I can deal with the "no true scotsman" line of attack.

The way that that fallacy might apply to what I wrote would be, I think, something like this:

  • MIRI says that a superintelligence might unpack a goal statement like "maximize human happiness" by perpetrating a Maverick Nanny attack on humankind, but Loosemore says that no TRUE superintelligence would do such a thing, because it would be superintelligent enough to realize that this was a 'mistake' (in some sense).

This would be a No True Scotsman fallacy, because the term "superintelligence" has been, in effect, redefined by me to mean "something smart enough not to do that".

Now, my take on the NTS idea is that it cannot be used if there are substantive grounds for saying that there are two categories involved, rather than a real category and a fake category that is (for some unexplained reason) exceptional.

Example: Person A claims that a sea-slug caused the swimmer's leg to be bitten off, but Person B argues that no "true" sea-slug would have done this. In this example, Person B is not using a No True Scotsman argument, because there are darned good reasons for supposing that sea-slugs cannot bite off the legs of swimmers.

So it all comes down to whether someone accused of NTS is inventing a ficticious category distinction ("true" versus "non-true" Scotsman) solely for the purposes of supporting their argument.

In my case, what I have argued is right up there with the sea-slug argument. What I have said, in effect, is that if we sit down and carefully think about the type of "superintelligence" that MIRI et al. put into their scenarios, and if we explore all the implications of what that hypothetical AI would have to be like, we quickly discover some glaring inconsistencies in their scenarios. The sea-slug, in effect, is supposed to have bitten through bone with a mouth made of mucous. And the sea-slug is so small it could not wrap itself around the swimmer's leg. Thinking through the whole sea-slug scenario leads us into a mass of evidence indicating that the proposed scenario is nuts. Similarly, thinking through the implications of an AI that is so completely unable to handle context, that it can live with Grade A contradictions at the heart of its reasoning, leads us to a mass of unbelievable inconsistencies in the 'intelligence' of this supposed superintelligence.

So, where the discussion needs to be, in respect of the paper, is in the exact details of why the proposed SI might not be a meaningful hypothetical. It all comes down to a meticulous dissection of the mechanisms involved.

To conclude: sorry if I seemed to come down a little heavy on you in my first response. I wasn't upset, it was just that the NTS critique had occurred before. In some of those previous cases the NTS attack was accompanied by language that strongly implied that I had not just committed an NTS fallacy, but that I was such an idiot that my idiocy was grounds for recommending to all not to even read the paper. ;-)

Comment author: ZacHirschman 16 May 2015 07:02:09PM 0 points [-]

You have my apologies if you thought I was attacking or pigeonholing your argument. While I lack the technical expertise to critique the technical portion of your argument, I think it could benefit from a more explicit avoidance of the fallacy mentioned above. I thought the article was very interesting and I will certainly come back to it if I ever get to the point where I can understand your distinctions between swarm intelligence and CFAI. I understand you have been facing attacks for your position in this article, but that is not my intention. Your meticulous arguments are certainly impressive, but you do them a disservice by dismissing well intentioned critique, especially as it applies to the structure of your argument and not the substance.

Einstein made predictions about what the universe would look like if there were a maximum speed. Your prediction seems to be that well built ai will not misunderstand its goals (please assume that I read your article thoroughly and that any misunderstandings are benign). What does the universe look like if this is false?

I probably fall under category a in your disjunction. Is it truly pointless to help me overcome my misunderstanding? From the large volume of comments, it seems likely that this misunderstanding is partially caused by a gap between what you are trying to say, and what was said. Please help me bridge this gap instead of denying its existence or calling such an exercise pointless.

Comment author: Richard_Loosemore 18 May 2015 02:18:34AM 1 point [-]

Hey, no problem. I was really just raising an issue with certain types of critique, which involve supposed fallacies that actually don't apply.

I am actually pressed for time right now, so I have to break off and come back to this when I can. Just wanted to clarify if I could.

Comment author: Richard_Loosemore 17 May 2015 05:44:21PM 2 points [-]

This is just a placeholder: I will try to reply to this properly later.

Meanwhile, I only want to add one little thing.

Don't forget that all of this analysis is supposed to be about situations in which we have, so to speak "done our best" with the AI design. That is sort of built into the premise. If there is a no-brainer change we can make to the design of the AI, to guard against some failure mode, then is assumed that this has been done.

The reason for that is that the basic premise of these scenarios is "We did our best to make the thing friendly, but in spite of all that effort, it went off the rails."

For that reason, I am not really making arguments about the characteristics of a "generic" AI.

Comment author: Richard_Loosemore 18 May 2015 02:16:18AM 0 points [-]

Maybe I could try to reduce possible confusion here. The paper was written to address a category of "AI Risk" scenarios in which we are told:

"Even if the AI is programmed with goals that are ostensibly favorable to humankind, it could execute those goals in such a way that would lead to disaster".

Given that premise, it would be a bait-and-switch if I proposed a fix for this problem, and someone objected with "But you cannot ASSUME that the programmers would implement that fix!"

The whole point of the problem under consideration is that even if the engineers tried, they could not get the AI to stay true.

Comment author: Vaniver 17 May 2015 03:49:40AM *  2 points [-]

I am finding this comment thread frustrating, and so expect this will be my last reply. But I'll try to make the most of that by trying to write a concise and clear summary:

What you said here amounts to the claim that an AI of unspecified architecture, will, on noticing a difference between hardcoding goal and instrumental knowledge, side with hardcoded goal

Loosemore, Yudkowsky, and myself are all discussing AIs that have a goal misaligned with human values that they nevertheless find motivating. (That's why we call it a goal!) Loosemore observes that if these AIs understand concepts and nuance, they will realize that a misalignment between their goal and human values is possible--if they don't realize that, he doesn't think they deserve the description "superintelligent."

Now there are several points to discuss:

  1. Whether or not "superintelligent" is a meaningful term in this context. I think rationalist taboo is a great discussion tool, and so looked for nearby words that would more cleanly separate the ideas under discussion. I think if you say that such designs are not superwise, everyone agrees, and now you can discuss the meat of whether or not it's possible (or expected) to design superclever but not superwise systems.

  2. Whether we should expect generic AI designs to recognize misalignments, or whether such a realization would impact the goal the AI pursues. Neither Yudkowsky nor I think either of those are reasonable to expect--as a motivating example, we are happy to subvert the goals that we infer evolution was directing us towards in order to better satisfy "our" goals. I suspect that Loosemore thinks that viable designs would recognize it, but agrees that in general that recognition does not have to lead to an alignment.

  3. Whether or not such AIs are likely to be made. Loosemore appears pessimistic about the viability of these undesirable AIs and sees cleverness and wisdom as closely tied together. Yudkowsky appears "optimistic" about their viability, thinking that this is the default outcome without special attention paid to goal alignment. It does not seem to me that cleverness, wisdom, or human-alignment are closely tied together, and so it seems easy to imagine a system with only one of those, by straightforward extrapolation from current use of software in human endeavors.

I don't see any disagreement that AIs pursue their goals, which is the claim you thought needed explanation. What I see is disagreement over whether or not the AI can 'partially solve' the problem of understanding goals and pursuing them. We could imagine a Maverick Nanny that hears "make humans happy," comes up with the plan to wirehead all humans, and then rewrites its sensory code to hallucinate as many wireheaded humans as it can (or just tries to stick as large a number as it can into its memory), rather than actually going to all the trouble of actually wireheading all humans. We can also imagine a Nanny that hears "make humans happy" and actually goes about making humans happy. If the same software underpins both understanding human values and executing plans, what risk is there? But if it's different software, then we have the risk.

Comment author: Richard_Loosemore 17 May 2015 05:44:21PM 2 points [-]

This is just a placeholder: I will try to reply to this properly later.

Meanwhile, I only want to add one little thing.

Don't forget that all of this analysis is supposed to be about situations in which we have, so to speak "done our best" with the AI design. That is sort of built into the premise. If there is a no-brainer change we can make to the design of the AI, to guard against some failure mode, then is assumed that this has been done.

The reason for that is that the basic premise of these scenarios is "We did our best to make the thing friendly, but in spite of all that effort, it went off the rails."

For that reason, I am not really making arguments about the characteristics of a "generic" AI.

Comment author: TheAncientGeek 17 May 2015 07:31:25AM *  3 points [-]

Why is it that the AI does this only when it encounters a goal such as "make humans happy", and not in a million other goals? 

MIRI distinguishes between terminal and instrumental goals, so there are two answers to the question

instrumental goals of any kind almost certainly would be revised if they became noticeably out of correspondence to reality, because that would make then less effective at achieving terminal goals , and the raison d'etre of such transient sub-goals is is to support the achievement of terminal goals.

By MIRIs reasoning, a terminal goal could be any of a 1000 things other than human happiness , and the same conclusion would follow: an AI with a highest priority terminal goal wouldn't have any motivation to override it. To be motivated to rewrite a goal because it false implies a higher priority goal towards truth. It should not be surprising that an entity that doesn't value truth, in a certain sense, doesn't behave rationally, in a certain sense. (Actually, there is a bunch of supplementary assumptions involved, which I have dealt with elsewhere)

That's an account of the MIRI position, not a defence if it. It is essentially a model of rational decision making, and there is a gap between it and real world AI research, a gap which MIRI routinely ignores. The conclusion follows logically from the premises, but atoms aren't pushed around by logic,

In other words, I seriously believe that using certain types of planning mechanism you absolutely would get the crazy (to us) behaviors described by all those folks that I criticised in the paper.Only reason I am not worried about that is: those kinds of planning mechanisms are known to do that kind of random-walk behavior, and it is for that reason that they will never be the basis for a future AGI that makes it up to a level of superintelligence at which the system would be dangerous. An AI that was so dumb that it did that kind of t

That reinforces my point. I was saying that MIRI is basically making armchair assumptions about the AI architectures. You are saying these assumptions aren't merely unjustified, they go against what a competent AI builder would do.

Comment author: Richard_Loosemore 17 May 2015 05:34:44PM 0 points [-]

Understood, and the bottom line is that the distinction between "terminal" and "instrumental" goals is actually pretty artificial, so if the problem with "maximize friendliness" is supposed to apply ONLY if it is terminal, it is a trivial fix to rewrite the actual terminal goals to make that one become instrumental.

But there is a bigger question lurking in the background, which is the flip side of what I just said: it really isn't necessary to restrict the terminal goals, if you are sensitive to the power of constraints to keep a motivation system true. Notice one fascinating thing here: the power of constraint is basically the justification for why instrumental goals should be revisable under evidence of misbehavior .... it is the context mismatch that drives that process. Why is this fascinating? Because the power of constraints (aka context mismatch) is routinely acknowledged by MIRI here, but flatly ignored or denied for the terminal goals.

It's just a mess. Their theoretical ideas are just shoot-from-the-hip, plus some math added on top to make it look like some legit science.

Comment author: TheAncientGeek 16 May 2015 11:42:51AM *  0 points [-]

Loosemore's claim could be steelmanned into the claim that the Maverick Nanny isnt likely...it requires an AI with goals, with harcoded goals, with hardcoded goals including a full explicit definition of happiness, and a buggy full explicit definition of happiness.. That's a chain of premises.

Comment author: Richard_Loosemore 16 May 2015 04:26:31PM 0 points [-]

That isn't even remotely what the paper said. It's a parody.

View more: Prev | Next