wedrifid comments on Should I believe what the SIAI claims? - Less Wrong

23 Post author: XiXiDu 12 August 2010 02:33PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (600)

You are viewing a single comment's thread. Show more comments above.

Comment author: wedrifid 14 August 2010 09:39:39AM *  5 points [-]

The idea is to prevent a "runaway" disaster.

My observation is that small ambitions can become 'runaway disasters' unless a lot of the problems of FAI are solved.

Relatively standard and conventional engineering safety methodologies would be used for other kinds of problems.

That sounds as 'safe' as giving Harry Potter rules to follow.

I understand that this is an area in which we fundamentally disagree. I have previously disagreed about the wisdom of using human legal systems to control AI behaviour and I assume that our disagreement will be similar on this subject.

Comment author: timtyler 14 August 2010 09:50:17AM *  1 point [-]

"Small ambitions" are a proposed solution. Get the machine to want something - and then stop when it's desires are satisfied - or at a specified date, whichever comes first.

The solution has some complications - but it does look as though it is a pretty obvious safety measure - one that suitably paranoid individuals are likely to have near the top of their lists.

It doesn't make a runaway disaster impossible. The agent could still set up minions, "forget" to switch them off - and then they run amok. The point is to make a runaway disaster much less likely. The safety level is pretty configurable - if the machine's desires are sufficiently constrained. I went into a lot of these issues on:

http://alife.co.uk/essays/stopping_superintelligence/

See also the previous discussion of the issue on this site.

Shane Legg has also gone into methods of restraining a machine "from within" - so to speak. Logically, you could limit space, time or matterial resources in this way - if you have control over an agent's utility function.

Comment author: Alexei 01 September 2010 04:24:21PM 1 point [-]

This is very dangerous thinking. There are many potential holes not covered in your essay. The problem with all these holes is that even the smallest one can potentially lead to the end of the universe. As Eliezer often mentions: the AI has to be mathematically rigorously proven to be friendly; there can't be any room for guessing or hoping.

As an example, consider that to the AI moving to quiescent state will be akin to dying. (Consider somebody wanting to make you not want anything or force you to want something that you normally don't.) I hope you don't come reply with a "but we can do X", because that would be another patch, and that's exactly what we want to avoid. There is no getting around creating a solid proven mathematical definition of friendly.

Comment author: timtyler 01 September 2010 06:59:32PM *  1 point [-]

The end of the universe - OMG!

It seems reasonable to expect that agents will welcome their end if their time has come.

The idea, as usual, is not to try and make the agent do something it doesn't want to - but rather to make it want to do it in the first place.

I expect off switches - and the like - will be among the safety techniques employed. Provable correctness might be among them as well - but judging by the history of such techniques it seems rather optimistic to expect very much from them.

Comment author: Perplexed 01 September 2010 07:36:07PM 2 points [-]

I am fairly confident that we can tweak any correct program into a form which allows a mathematical proof that the program behavior meets some formal specification of "Friendly".

I am less confident that we will be able to convince ourselves that the formal specification of "Friendly" that we employ is really something that we want.

We can prove there are no bugs in the program, but we can't prove there are no bugs in the program specification. Because the "proof" of the specification requires that all of the stakeholders actually look at that specification of "Friendly", think about that specification, and then bet their lives on the assertion that this is indeed what they want.

What is a "stakeholder", you ask? Well, what I really mean is pitchfork-holder. Stakes are from a different movie.

Comment author: Alexei 02 September 2010 12:57:35AM 0 points [-]

is not to try and make the agent do something it doesn't want to - but rather to make it want to do it in the first place.

I don't think there is much different between the two. Either way you are modifying the agent's behavior. If it doesn't want it, it won't have it.

The problem with off switches is that 1) it might not be guaranteed to work (AI changes its own code or prevents anyone from accessing/using the off switch), 2) it might not be guaranteed to work the way you want to. Unless you have formally proven that AI and all the possible modifications it can make to itself are safe, you can't know for sure.

Comment author: timtyler 02 September 2010 07:12:36AM *  0 points [-]

is not to try and make the agent do something it doesn't want to - but rather to make it want to do it in the first place.

I don't think there is much different between the two. Either way you are modifying the agent's behavior. If it doesn't want it, it won't have it.

It is not a modification if you make it that way "in the first place" as specified - and the "If it doesn't want it, it won't have it" seems contrary to the specified bit where you "make it want to do it in the first place".

The idea of off switches is not that they are guaranteed to work, but that they are a safety feature. If you can make a machine do anything you want at all, you can probably make it turn itself off. You can build it so the machine doesn't wish to stay turned on - but goes willing into the night.

We will never "know for sure" that a machine intelligence is safe. This is the real world, not math land. We may be able to prove some things about it - such that its initial state is not vulnerable to input stream buffer-overflow attacks - but we won't be able to prove something like that the machine will only do what we want it to do, for some value of "we".

At the moment, the self-improving systems we see are complex man-machine symbioses - companies and governments. You can't prove math theorems about such entities - they are just too messy. Machine intelligence seems likely to be like that for quite a while - functionally embedded in a human matrix. The question of "what would the machine do if no one could interfere with its code" is one for relatively late on - machines will already be very smart by then - smarter than most human computer programmers, anyway.

Comment author: LucasSloan 02 September 2010 07:17:18AM 2 points [-]

IF you can make a machine do anything you want at all, you can probably make it turn itself off.

The hardest part of Friendly AI is figuring out how to reliably instill any goal system.

Comment author: timtyler 02 September 2010 07:29:19AM *  -1 points [-]

If you can't get it to do what you want at all, the machine is useless, and there would be no point in constructing it. In practice, we know we can get machines to do what we want to some extent - we have lots of examples of that. So, the idea is to make the machine not mind being turned off. Don't make it an open-ended maximiser - make it maximise only until time t - or until its stop button is pressed - whichever comes sooner.

Comment author: Alexei 02 September 2010 06:45:58PM 1 point [-]

I don't think we really have a disagreement here. If you are building a normal program to do whatever, then by all means, do your best and try to implement safety features. Any failure would most likely be local.

However! If we are talking about building AI, which will go through many iterations, will modify its own code, and will become super-intelligent, then for all our sakes I hope you will have mathematically proven that the AI is Friendly. Otherwise you are betting the fate of this world on a hunch. If you don't agree with this point, I invite you to read Eliezer's paper on AI risks.

Comment author: timtyler 02 September 2010 07:59:39PM *  0 points [-]

"The AI is Friendly" seems to be a vague and poorly-defined concept - and even if you could pin it down, what makes you think it is something that could be proved in the first place?

Ethical agents should probably not hold off creating machine intelligence while chasing imagined rainbows for too long - since intelligence could prevent the carnage on the roads, fix many diseases, and generally help humanity - and also because delaying gives less ethically-conscious agents an opportunity to get there first - which could be bad.

See my The risks of caution - or Max's critique of the precautionary principle for more on that.