TheAncientGeek comments on Dreams of Friendliness - Less Wrong

15 Post author: Eliezer_Yudkowsky 31 August 2008 01:20AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (78)

Sort By: Old

You are viewing a single comment's thread.

Comment author: TheAncientGeek 17 September 2015 11:54:34AM *  -1 points [-]

To which the reply is that the AI needs goals in order to decide how to think: that is, the AI has to act as a powerful optimization process in order to plan its acquisition of knowledge, effectively distill sensory information, pluck "answers" to particular questions out of the space of all possible responses, and of course, to improve its own source code up to the level where the AI is a powerful intelligence. All these events are "improbable" relative to random organizations of the AI's RAM, so the AI has to hit a narrow target in the space of possibilities to make superintelligent answers come out.

Now, why might one think that an Oracle didn't need goals

One might think that, because constraining a general purpose system so that it does something specific isn't the only way to build something to do something specific. To take a slightly silly example, toasters toast because they can't do anything except toast, and kettles boil water because they can't do anything but boil water. Not all special purpose systems are so trivial though. The point is that you can't tell from outside a black box whether something does specific things, fulfils an apparent purpose, because it can only do that thing, or because it a general purpose problems solver which has been constrained by a goal system or utility function. One could conceivably have a situation where two functional black boxes are implemented in each of the two ways. The complexity of what it does isn't much of a clue: google's search engine is a complex special-purpose system, not a general purpose problem solver that has been constrained to do searches.

Actually having a goal is a question of implementation, of what is going on inside the black box. Systems that don't have goals, except in the metaphorical sense, won't generate sub goals, and therefore wont generate dangerous sub goals. Moreover, oracle systems are also nonagentive, and noagentive systems won't act agentively on their sub goals.

By agency I mean a tendency to default to doing something, and by non agency I mean a tendency to default to doing nothing. Much of the non-AI software we deal with is non-agentive. Word processors and spreadsheets just sit there if you don't input anything into them. Web servers and databases respond to requests likewise idle if they do not have request to respond to. Another class of software performs rigidly defined tasks, such as backups, at rigidly defined intervals ... cronjobs, and so on. These are not full agents either.

According to wikipedia: "In computer science, a software agent is a computer program that acts for a user or other program in a relationship of agency, which derives from the Latin agere (to do): an agreement to act on one's behalf. Such "action on behalf of" implies theauthority to decide which, if any, action is appropriate.[1][2]

Related and derived concepts include intelligent agents (in particular exhibiting some aspect of artificial intelligence, such aslearning and reasoning), autonomous agents (capable of modifying the way in which they achieve their objectives), distributedagents (being executed on physically distinct computers), multi-agent systems (distributed agents that do not have the capabilities to achieve an objective alone and thus must communicate), and mobile agents (agents that can relocate their execution onto different processors)."

So a fully fledged agent will do things without being specifically requested to, and will do particular things at particular times which are not trivially predictable.

There is nothing difficult about implementing nonagency: it is a matter of not implementing agency.

ETA

Anyway: the AI needs a goal of answering questions, and that has to give rise to subgoals of choosing efficient problem-solving strategies, improving its code, and acquiring necessary information. You can quibble about terminology, but the optimization pressure has to be there, and it has to be very powerful, measured in terms of how small a target it can hit within a large design space.

I am certainly can quibble about the terminology: it's not true that a powerful system necessarily has a goal at all, so it's not true that it necessarily has subgoals...rather it has subtasks, and that's a terminological difference that makes a difference, that indicates a different part of the terrritory.

ETA2

Is the Oracle AI thinking about the consequences of answering the questions you give it? Does the Oracle AI care about those consequences the same way you do, applying all the same values, to warn you if anything of value is lost?

No and no. But that doesn't make an oracle dangerous in the way that MIRI's standard superintelligent AI is.

Consider two rooms. One Room contains an Oracle AI, which can answer scientific problems, protein folding and so on, fed to it on slips of paper. Room B contains a team of scientists which can also answer questions using lab techniques instead of computation. Would you say room A is dangerous ..even though it is giving the same answers as room B.. even though room B is effectively what we already have with science? The point is not that there are never any problems arising from a scientific discovery, clearly there can be. The point is where the responsibility lies.

If people misapply a scientific discovery, or fail to see it's implication, then the responsibility and agency is theirs...it does not lie with bunsen burners and test tubes...Science is not a moral agent. (Moral agency can be taken reductively to mean where the problem is, where leverage is best applied to get desirable outcomes). Using a powerful Oracle AI doesn't change anything qualitively in that regard, it is not game changing: society still has the responsibility to weigh it's answers and decide how to make use of them. An Oracle Ai could give you the plans for a superweapon if you ask it, but it takes a human to build it and use it.

Comment author: ike 18 September 2015 02:12:23AM *  0 points [-]

One Room contains an Oracle AI, which can answer scientific problems, protein folding and so on, fed to it on slips of paper.

And if it produces a "protein" that technically answers our request, but has a nasty side effect of destroying the world? We don't consider scientists dangerous because we think they don't want to destroy the world.

Or are you claiming that we'd be able to recognize when a plan proposed by the Oracle AI (and if you're asking questions about protein folding, you're asking for a plan) is dangerous?

Comment author: TheAncientGeek 18 September 2015 09:25:07AM *  0 points [-]

And if it produces a "protein" that technically answers our request, but has a nasty side effect of destroying the world?

It produces a dangerous protein inadvertently, in the way that science might...or it has a higher-than-science probability of producing a dagnerous protein, due to some unfriendly intent?

We don't consider scientists dangerous because we think they don't want to destroy the world.

Was there a negative missing in that?

Or are you claiming that we'd be able to recognize when a plan proposed by the Oracle AI (and if you're asking questions about protein folding, you're asking for a plan) is dangerous?

I am not saying we necessarily would. I am saying that recognising the hidden dangers in the output form the Oracle room is fundamentally different from recognising the hidden dangers in the output from the science room, which we are doing already. It's not some new level of risk,.

Comment author: ike 18 September 2015 02:20:29PM 0 points [-]

Statement should be read

"We don't consider scientists dangerous" because "we think they don't want to destroy the world".

Since we think scientists are friendly, we trust them more than we should trust an Oracle AI. There's also the fact that an unfriendly AI presumably can fool us better than a scientist can.

It produces a dangerous protein inadvertently, in the way that science might...or it has a higher-than-science probability of producing a dagnerous protein, due to some unfriendly intent?

Mostly the latter. However, even the former can be worse than science now, in that "don't destroy the world" is not an implicit goal. So a scientist noticing that something is dangerous might not develop it, while an AI might not have such restrictions.

I am saying that recognising the hidden dangers in the output form the Oracle room is fundamentally different from recognising the hidden dangers in the output from the science room, which we are doing already.

Are you missing a negative now?

Comment author: TheAncientGeek 18 September 2015 03:00:45PM 0 points [-]

Since we think scientists are friendly, we trust them more than we should trust an Oracle AI.

I don't see how you can assert without knowing anything about the type of Oracle AI.

There's also the fact that an unfriendly AI presumably can fool us better than a scientist can.

Ditto.

Why would a non-agentive , non-goal-driven AI want to fools us? Where would it get the motivation from?

How could an AI with no knowledge of psychology fool us? Where would it get the knowledge from?

So a scientist noticing that something is dangerous might not develop it, while an AI might not have such restrictions.

But then people would know that the AI's output hasn't been filtered by a human's common sense.

Are you missing a negative now?

Yes. Irony strikes again.

Comment author: ike 18 September 2015 03:32:57PM 0 points [-]

I don't see how you can assert without knowing anything about the type of Oracle AI.

We can presume that a scientist wants to still exist, and hence doesn't want to destroy the world. This seems much stronger than a presumption that an Oracle AI will be safe. Of course, an AI might be safe, and a scientist might be out to get us; but the balance of probability says otherwise.

I'm not asserting that every AI is dangerous and every scientist is safe.

Ditto.

An AI can fool us better simply because it's smarter (by assumption).

Why would a non-agentive , non-goal-driven AI want to fools us? Where would it get the motivation from?

I still think you're using "non-agent" as magical thinking.

Here we're talking in context of what you said above:

Is the Oracle AI thinking about the consequences of answering the questions you give it? Does the Oracle AI care about those consequences the same way you do, applying all the same values, to warn you if anything of value is lost?

No and no. But that doesn't make an oracle dangerous in the way that MIRI's standard superintelligent AI is.

So let's say the Oracle AI decides that X best answers our question. But if it tell us X, we won't accept it. If the Oracle cares that we adopt X, it might answer Y, which does the same as X but looks more appealing.

Or more subtly, if the AI comes up with Y, it might not tell us that it causes X, because it doesn't care that X doesn't fulfil our values, whereas a scientist would note all the implications.

But then people would know that the AI's output hasn't been filtered by a human's common sense.

If humans are incapable of recognizing whether the plan is dangerous or not, it doesn't matter how much scrutiny they put it through, they won't be able to discern the danger.

Comment author: TheAncientGeek 18 September 2015 06:13:22PM *  -1 points [-]

We can presume that a scientist wants to still exist, and hence doesn't want to destroy the world. This seems much stronger than a presumption that an Oracle AI will be safe. Of course, an AI might be safe, and a scientist might be out to get us; but the balance of probability says otherwise.

You don't have any evidence that AIs are generally dangerous (since we have AIs and the empirical evidence is that they are not), and you don't have a basis for theorising that Oracles are dangerous, because there are a number of different kinds of oracle.

An AI can fool us better simply because it's smarter (by assumption).

So are out current AIs fooling is? We build them because they are better than us at specific things, but that doesn't give them the motivation or the ability to fool us. Smartness isn't a single one-size-all thing and AIs aren't uniform in their abilities an properties. Once you shed those two illusions, you can see much easier methods of AI safety than those put forward by MIRI.

I still think you're using "non-agent" as magical thinking.

I still think that if you can build it, it isn't magic.

But if it tell us X, we won't accept it. If the Oracle cares that we adopt X, it might answer Y, which does the same as X but looks more appealing.

A narrowly defined AI won't "care" about anything except answering questions, so it won't try to second guess us.

Or more subtly, if the AI comes up with Y, it might not tell us that it causes X, because it doesn't care that X doesn't fulfil our values, whereas a scientist would note all the implications.

I have dealt with that objection several times. People know that when you use databases and search engines, they don't fully contextualise things, and the user of the information therefore has to exercise caution.

If humans are incapable of recognizing whether the plan is dangerous or not, it doesn't matter how much scrutiny they put it through, they won't be able to discern the danger.

That's an only-perfection-will-do objection. Of course, humans can't perfectly scrutinise scientific discovery, etc, so that changes nothing.