CCC comments on Debunking Fallacies in the Theory of AI Motivation - LessWrong

8 Post author: Richard_Loosemore 05 May 2015 02:46AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (343)

You are viewing a single comment's thread. Show more comments above.

Comment author: CCC 07 May 2015 09:04:33AM 0 points [-]

The consequences of a misunderstanding are not a function of the size of the misunderstanding. Rather, they are a consequence of the ability that the acting agent (in this case, the AI) has to influence the world. A superintelligent AI has an unprecedentedly huge ability to influence the world, therefore, in the worst case, the potential consequences of a misunderstanding are unprecedentedly huge.

The nature of the misunderstanding - whether small or large - makes little difference here. And the nature of the problem, that is, communicating with a non-human and thus (presumably) alien in nature artificial intelligence, is rife with the potential for misunderstanding.

Suppose that after Kennedy gave his Go To The Moon speech, NASA worked on the project for several years, and then finally delivered a small family car, sized for three astronauts, which was capable of driving from the launch complex in Florida all the way up country to the little township of Moon, PA.

Considering that an artificial intelligence - at least at first - might well have immense computational ability and massive intelligence but little to no actual experience in understanding what people mean instead of what they say, this is precisely the sort of misunderstanding that is possible if the only mission objective given to the system is something along the lines of "get three astronauts (human) to location designated (Moon)". (Presumably, instead of waiting several years, it would take a few minutes to order a rental car instead - assuming it knew about rental cars, or thought to look for them).

Now, if the AI is capable of solving the difficult problem of separating out what people mean from what they say - which is a problem that human-level intelligences still have immense trouble with at times - and the AI is compassionate enough towards humanity to actually strongly value human happiness (as opposed to assigning approximately as much value to us as we assign to ants), then yes, you've done it, you've got a perfectly wonderful Friendly AI.

The problem is, getting those two things right is not simple. I don't think your proposed structure guarantees either of those.

The rest of what you say contains several implicit questions and I cannot address all of them at this time, but I will say that the last paragraph does get my suggestion very, very wrong. It is about as far from what I tried to say in the paper, as it is possible to get.

I am not surprised. I am very familiar with the effect - often, what one person means when they write something is not what another person sees when they read it.

That paragraph is what I saw when I read our paper. I think that it is likely that our implicit assumptions about the structure of such an AI differ drastically - I suspect that you are not mentioning (or are only briefly hinting at) the parts you consider obvious, and I am not considering those parts obvious and therefore not seeing them present at all.

The AI has a massive number of constraints on its behavior, but they are all built in at the beginning,

This is incredibly important, and something that I did not see in your proposal. What is the nature of these constraints?

and in effect that cannot be changed because the change requires that the pre- state sanction the design of the post- state, and since the pre- state is safe at iteration n = 1, I can invoke the law of induction to conclude that it stays safe for all n > 1.

A properly Friendly AI will remain Friendly even as it improves its own intelligence. Agreed.

...I'm just not seeing how your proposed design brings us any closer to Friendliness.