timtyler comments on Should I believe what the SIAI claims? - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (600)
It is not a modification if you make it that way "in the first place" as specified - and the "If it doesn't want it, it won't have it" seems contrary to the specified bit where you "make it want to do it in the first place".
The idea of off switches is not that they are guaranteed to work, but that they are a safety feature. If you can make a machine do anything you want at all, you can probably make it turn itself off. You can build it so the machine doesn't wish to stay turned on - but goes willing into the night.
We will never "know for sure" that a machine intelligence is safe. This is the real world, not math land. We may be able to prove some things about it - such that its initial state is not vulnerable to input stream buffer-overflow attacks - but we won't be able to prove something like that the machine will only do what we want it to do, for some value of "we".
At the moment, the self-improving systems we see are complex man-machine symbioses - companies and governments. You can't prove math theorems about such entities - they are just too messy. Machine intelligence seems likely to be like that for quite a while - functionally embedded in a human matrix. The question of "what would the machine do if no one could interfere with its code" is one for relatively late on - machines will already be very smart by then - smarter than most human computer programmers, anyway.
The hardest part of Friendly AI is figuring out how to reliably instill any goal system.
If you can't get it to do what you want at all, the machine is useless, and there would be no point in constructing it. In practice, we know we can get machines to do what we want to some extent - we have lots of examples of that. So, the idea is to make the machine not mind being turned off. Don't make it an open-ended maximiser - make it maximise only until time t - or until its stop button is pressed - whichever comes sooner.
I don't think we really have a disagreement here. If you are building a normal program to do whatever, then by all means, do your best and try to implement safety features. Any failure would most likely be local.
However! If we are talking about building AI, which will go through many iterations, will modify its own code, and will become super-intelligent, then for all our sakes I hope you will have mathematically proven that the AI is Friendly. Otherwise you are betting the fate of this world on a hunch. If you don't agree with this point, I invite you to read Eliezer's paper on AI risks.
"The AI is Friendly" seems to be a vague and poorly-defined concept - and even if you could pin it down, what makes you think it is something that could be proved in the first place?
Ethical agents should probably not hold off creating machine intelligence while chasing imagined rainbows for too long - since intelligence could prevent the carnage on the roads, fix many diseases, and generally help humanity - and also because delaying gives less ethically-conscious agents an opportunity to get there first - which could be bad.
See my The risks of caution - or Max's critique of the precautionary principle for more on that.
In fact, there is nothing vague about definition of "friendly". Eliezer wrote a lot on that topic and I invite you to look at his writing, e.g. the link I gave you earlier.
I agree that if someone is going to launch a self-improving AI, then we will need to preempt them with our own AI if our AI has a greater probability of being friendly. It all comes down to the expected value of our choices.
You really believe that?!? You have a pointer to some canonical definition?
Ok, I might have been a bit overenthusiastic with how simple "friendly" aspect is, but here is a good attempt at describing what we want.
I'm sure Tim Tyler is familiar with CEV; I presume his objection is that CEV is not sufficiently clear or rigorous. Indeed, CEV is only semitechnical; I think the FAI research done by Eliezer and Marcello since CEV's publication has included work on formalizing it mathematically, but that's not available to the public.
Note also that defining the thing-we-want-an-AI-to-do is only half of the problem of Friendliness; the other half is solving the problems in decision theory that will allow us to prove that an AI's goal system and decision algorithms will cause it to not change its goal system. If we build an AGI that implements the foundation of CEV but fails to quine itself, then during recursive self-improvement, its values may be lost before it stabilizes its goal system itself, and it will all be for naught.
Why exactly do we want "recursive self-improvement" anyways? Why not build into the architecture the impossibility of rewriting its own code, prove the "friendliness" of the software that we put there, and then push the ON button without qualms. And then, when we feel like it, we can ask our AI to design a more powerful successor to itself.
Then, we repeat the task of checking the security of the architecture and proving the friendliness of the software before we build and turn on the new AI.
There is no reason we have to have a "hard takeoff" if we don't want one. What am I missing here?
I'm curious - where did you hear this, if it's not available to the public? And why isn't it available to the public? And who's Marcello? There seems to be virtually no information in public circulation about what's actually going on as far as progress towards implementing CEV/FAI.... is current progress being kept secret, or am I just not in the loop? And how does one go about getting in the loop?
My understanding is that Eliezer considers this second part to be a substantially easier problem.
Probably the closest thing I have seen to a definition of "friendly" from E.Y. is:
"The term "Friendly AI" refers to the production of human-benefiting, non-human-harming actions in Artificial Intelligence systems that have advanced to the point of making real-world plans in pursuit of goals."
That appears to make Deep Blue "friendly". It hasn't harmed too many people so far - though maybe Kasparov's ego got a little bruised.
Another rather different attempt:
"I use the term "Friendly AI" to refer to this whole challenge. Creating a mind that doesn't kill people but does cure cancer ...which is a rather limited way of putting it. More generally, the problem of pulling a mind out of mind design space, such that afterwards that you are glad you did it."
...that one has some pretty obvious problems, as I describe here.
These are not operational definitions. For example, both rely on some kind of unspecified definition of what a "person" is. That maybe obvious today - but human nature will probably be putty in the hands of an intelligent machine - and it may well start wondering about the best way to gently transform a person into a non-person.