Alexei comments on Should I believe what the SIAI claims? - Less Wrong

23 Post author: XiXiDu 12 August 2010 02:33PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (600)

You are viewing a single comment's thread. Show more comments above.

Comment author: Alexei 02 September 2010 12:57:35AM 0 points [-]

is not to try and make the agent do something it doesn't want to - but rather to make it want to do it in the first place.

I don't think there is much different between the two. Either way you are modifying the agent's behavior. If it doesn't want it, it won't have it.

The problem with off switches is that 1) it might not be guaranteed to work (AI changes its own code or prevents anyone from accessing/using the off switch), 2) it might not be guaranteed to work the way you want to. Unless you have formally proven that AI and all the possible modifications it can make to itself are safe, you can't know for sure.

Comment author: timtyler 02 September 2010 07:12:36AM *  0 points [-]

is not to try and make the agent do something it doesn't want to - but rather to make it want to do it in the first place.

I don't think there is much different between the two. Either way you are modifying the agent's behavior. If it doesn't want it, it won't have it.

It is not a modification if you make it that way "in the first place" as specified - and the "If it doesn't want it, it won't have it" seems contrary to the specified bit where you "make it want to do it in the first place".

The idea of off switches is not that they are guaranteed to work, but that they are a safety feature. If you can make a machine do anything you want at all, you can probably make it turn itself off. You can build it so the machine doesn't wish to stay turned on - but goes willing into the night.

We will never "know for sure" that a machine intelligence is safe. This is the real world, not math land. We may be able to prove some things about it - such that its initial state is not vulnerable to input stream buffer-overflow attacks - but we won't be able to prove something like that the machine will only do what we want it to do, for some value of "we".

At the moment, the self-improving systems we see are complex man-machine symbioses - companies and governments. You can't prove math theorems about such entities - they are just too messy. Machine intelligence seems likely to be like that for quite a while - functionally embedded in a human matrix. The question of "what would the machine do if no one could interfere with its code" is one for relatively late on - machines will already be very smart by then - smarter than most human computer programmers, anyway.

Comment author: LucasSloan 02 September 2010 07:17:18AM 2 points [-]

IF you can make a machine do anything you want at all, you can probably make it turn itself off.

The hardest part of Friendly AI is figuring out how to reliably instill any goal system.

Comment author: timtyler 02 September 2010 07:29:19AM *  -1 points [-]

If you can't get it to do what you want at all, the machine is useless, and there would be no point in constructing it. In practice, we know we can get machines to do what we want to some extent - we have lots of examples of that. So, the idea is to make the machine not mind being turned off. Don't make it an open-ended maximiser - make it maximise only until time t - or until its stop button is pressed - whichever comes sooner.

Comment author: Alexei 02 September 2010 06:45:58PM 1 point [-]

I don't think we really have a disagreement here. If you are building a normal program to do whatever, then by all means, do your best and try to implement safety features. Any failure would most likely be local.

However! If we are talking about building AI, which will go through many iterations, will modify its own code, and will become super-intelligent, then for all our sakes I hope you will have mathematically proven that the AI is Friendly. Otherwise you are betting the fate of this world on a hunch. If you don't agree with this point, I invite you to read Eliezer's paper on AI risks.

Comment author: timtyler 02 September 2010 07:59:39PM *  0 points [-]

"The AI is Friendly" seems to be a vague and poorly-defined concept - and even if you could pin it down, what makes you think it is something that could be proved in the first place?

Ethical agents should probably not hold off creating machine intelligence while chasing imagined rainbows for too long - since intelligence could prevent the carnage on the roads, fix many diseases, and generally help humanity - and also because delaying gives less ethically-conscious agents an opportunity to get there first - which could be bad.

See my The risks of caution - or Max's critique of the precautionary principle for more on that.

Comment author: Alexei 03 September 2010 08:53:09PM 0 points [-]

In fact, there is nothing vague about definition of "friendly". Eliezer wrote a lot on that topic and I invite you to look at his writing, e.g. the link I gave you earlier.

I agree that if someone is going to launch a self-improving AI, then we will need to preempt them with our own AI if our AI has a greater probability of being friendly. It all comes down to the expected value of our choices.

Comment author: timtyler 03 September 2010 09:16:16PM 0 points [-]

In fact, there is nothing vague about definition of "friendly".

You really believe that?!? You have a pointer to some canonical definition?

Comment author: Alexei 04 September 2010 12:57:05AM 1 point [-]

Ok, I might have been a bit overenthusiastic with how simple "friendly" aspect is, but here is a good attempt at describing what we want.

Comment author: ata 04 September 2010 01:52:38AM *  4 points [-]

I'm sure Tim Tyler is familiar with CEV; I presume his objection is that CEV is not sufficiently clear or rigorous. Indeed, CEV is only semitechnical; I think the FAI research done by Eliezer and Marcello since CEV's publication has included work on formalizing it mathematically, but that's not available to the public.

Note also that defining the thing-we-want-an-AI-to-do is only half of the problem of Friendliness; the other half is solving the problems in decision theory that will allow us to prove that an AI's goal system and decision algorithms will cause it to not change its goal system. If we build an AGI that implements the foundation of CEV but fails to quine itself, then during recursive self-improvement, its values may be lost before it stabilizes its goal system itself, and it will all be for naught.

Comment author: timtyler 04 September 2010 07:32:36AM *  0 points [-]

Probably the closest thing I have seen to a definition of "friendly" from E.Y. is:

"The term "Friendly AI" refers to the production of human-benefiting, non-human-harming actions in Artificial Intelligence systems that have advanced to the point of making real-world plans in pursuit of goals."

That appears to make Deep Blue "friendly". It hasn't harmed too many people so far - though maybe Kasparov's ego got a little bruised.

Another rather different attempt:

"I use the term "Friendly AI" to refer to this whole challenge. Creating a mind that doesn't kill people but does cure cancer ...which is a rather limited way of putting it. More generally, the problem of pulling a mind out of mind design space, such that afterwards that you are glad you did it."

  • here, 29 minutes in

...that one has some pretty obvious problems, as I describe here.

These are not operational definitions. For example, both rely on some kind of unspecified definition of what a "person" is. That maybe obvious today - but human nature will probably be putty in the hands of an intelligent machine - and it may well start wondering about the best way to gently transform a person into a non-person.