timtyler comments on Should I believe what the SIAI claims? - Less Wrong

23 Post author: XiXiDu 12 August 2010 02:33PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (600)

You are viewing a single comment's thread. Show more comments above.

Comment author: timtyler 02 September 2010 07:29:19AM *  -1 points [-]

If you can't get it to do what you want at all, the machine is useless, and there would be no point in constructing it. In practice, we know we can get machines to do what we want to some extent - we have lots of examples of that. So, the idea is to make the machine not mind being turned off. Don't make it an open-ended maximiser - make it maximise only until time t - or until its stop button is pressed - whichever comes sooner.

Comment author: Alexei 02 September 2010 06:45:58PM 1 point [-]

I don't think we really have a disagreement here. If you are building a normal program to do whatever, then by all means, do your best and try to implement safety features. Any failure would most likely be local.

However! If we are talking about building AI, which will go through many iterations, will modify its own code, and will become super-intelligent, then for all our sakes I hope you will have mathematically proven that the AI is Friendly. Otherwise you are betting the fate of this world on a hunch. If you don't agree with this point, I invite you to read Eliezer's paper on AI risks.

Comment author: timtyler 02 September 2010 07:59:39PM *  0 points [-]

"The AI is Friendly" seems to be a vague and poorly-defined concept - and even if you could pin it down, what makes you think it is something that could be proved in the first place?

Ethical agents should probably not hold off creating machine intelligence while chasing imagined rainbows for too long - since intelligence could prevent the carnage on the roads, fix many diseases, and generally help humanity - and also because delaying gives less ethically-conscious agents an opportunity to get there first - which could be bad.

See my The risks of caution - or Max's critique of the precautionary principle for more on that.

Comment author: Alexei 03 September 2010 08:53:09PM 0 points [-]

In fact, there is nothing vague about definition of "friendly". Eliezer wrote a lot on that topic and I invite you to look at his writing, e.g. the link I gave you earlier.

I agree that if someone is going to launch a self-improving AI, then we will need to preempt them with our own AI if our AI has a greater probability of being friendly. It all comes down to the expected value of our choices.

Comment author: timtyler 03 September 2010 09:16:16PM 0 points [-]

In fact, there is nothing vague about definition of "friendly".

You really believe that?!? You have a pointer to some canonical definition?

Comment author: Alexei 04 September 2010 12:57:05AM 1 point [-]

Ok, I might have been a bit overenthusiastic with how simple "friendly" aspect is, but here is a good attempt at describing what we want.

Comment author: ata 04 September 2010 01:52:38AM *  4 points [-]

I'm sure Tim Tyler is familiar with CEV; I presume his objection is that CEV is not sufficiently clear or rigorous. Indeed, CEV is only semitechnical; I think the FAI research done by Eliezer and Marcello since CEV's publication has included work on formalizing it mathematically, but that's not available to the public.

Note also that defining the thing-we-want-an-AI-to-do is only half of the problem of Friendliness; the other half is solving the problems in decision theory that will allow us to prove that an AI's goal system and decision algorithms will cause it to not change its goal system. If we build an AGI that implements the foundation of CEV but fails to quine itself, then during recursive self-improvement, its values may be lost before it stabilizes its goal system itself, and it will all be for naught.

Comment author: Perplexed 04 September 2010 02:53:09AM 1 point [-]

Why exactly do we want "recursive self-improvement" anyways? Why not build into the architecture the impossibility of rewriting its own code, prove the "friendliness" of the software that we put there, and then push the ON button without qualms. And then, when we feel like it, we can ask our AI to design a more powerful successor to itself.

Then, we repeat the task of checking the security of the architecture and proving the friendliness of the software before we build and turn on the new AI.

There is no reason we have to have a "hard takeoff" if we don't want one. What am I missing here?

Comment author: timtyler 04 September 2010 07:50:03AM *  3 points [-]

Why exactly do we want "recursive self-improvement" anyways?

You get that in many goal-directed systems, whether you ask for it or not.

Why not build into the architecture the impossibility of rewriting its own code, prove the "friendliness" of the software that we put there, and then push the ON button without qualms.

Impossible is not easy to implement. You can make it difficult for a machine to improve itself, but then that just becomes a challenge that it must overcome in order to reach its goals. If the agent is sufficiently smart, it may find some way of doing it.

Many here think that if you have a sufficiently intelligent agent that wants to do something you don't want it to do, you are probably soon going to find that it will find some way to get what it wants. Thus the interest in trying to get its goals and your goals better aligned.

Also, humans might well want to let the machine self-improve. They are in a race with competitiors; the machine says it can help with that, and it warns that - if the humans don't let it - the competitiors are likely to pull ahead...

Comment author: ata 04 September 2010 03:23:00AM *  2 points [-]

Why exactly do we want "recursive self-improvement" anyways?

Because we want more out of FAI than just lowercase-f friendly androids that we can rely upon not to rebel or break too badly. If we can figure out a rigorous Friendly goal system and a provably stable decision theory, then we should want to; then the world gets saved and the various current humanitarian emergencies get solved much quicker than they would if we didn't know whether the AI's goal system was stable and we had to check it at every stage and not let it impinge upon the world directly (not that that's feasible anyway —)

Why not build into the architecture the impossibility of rewriting its own code, prove the "friendliness" of the software that we put there, and then push the ON button without qualms. And then, when we feel like it, we can ask our AI to design a more powerful successor to itself.

Then, we repeat the task of checking the security of the architecture and proving the friendliness of the software before we build and turn on the new AI.

Most likely, after each iteration, it would become more and more incomprehensible to us. Rice's theorem suggests that we will not be able to prove the necessary properties of a system from the top down, not knowing how it was designed; that is a massively different problem than proving properties of a system we're constructing from the bottom up. (The AI will know how it's designing the code it writes, but the problem is making sure that it is willing and able to continuously prove that it is not modifying its goals.)

And, in the end, this is just another kind of AI-boxing. If an AI gets smart enough and it ends up deciding that it has some goals that would be best carried out by something smarter than itself, then it will probably get around any safeguards we put in place. It'll emit some code that looks Friendly to us but isn't, or some proof that is too massively complicated for us to check, or it'll do something far too clever for a human like me to think of as an example. I'd say there's a dangerously high possibility that an AI will be able to start a hard takeoff even if it doesn't have access to its own code — it may be able to introspect and understand intelligence well enough that it could just write its own AI (if we can do that, then why can't it?), and then push that AI out "into the wild" by the usual means (smooth-talk a human operator, invent molecular nanotech that assembles a computer that runs the new software, etc.).

Even trying to do it this way would likely be a huge waste of time (at best) — if we don't build in a goal system that we know will preserve itself in the first place, then why would we expect its self-designed successor to preserve its goals?

If an AGI is not safe under recursive self-improvement, then it is not safe at all.

Comment author: Perplexed 04 September 2010 03:45:16AM 1 point [-]

... we will not be able to prove the necessary properties of a system from the top down, not knowing how it was designed.

I guess I didn't make clear that I was talking about proof-checking rather than proof-finding. And, of course, we ask the designer to find the proof - if it can't provide one, then we (and it) have no reason to trust the design.

Doing it this way would also likely be a major waste of time — if we don't build in a goal system that we know will preserve itself in the first place, then why would we expect its self-designed successor to preserve its goals?

If an AGI is not safe under recursive self-improvement, then it is not safe at all.

I may be a bit less optimistic than you that we will ever be able to prove the correctness of self-modifying programs. But assume that such proofs are possible, but we humans have not yet made the conceptual breakthroughs by the time we are ready to build our first super-human AI. But assume that we can prove friendliness for non-self-modifying programs.

In this case, proceeding as I suggest, and then asking the AI to help discover the missing proof technology, would not be wasting time - it would be saving time.

Your final sentence is a slogan, not an argument.

Comment author: Mitchell_Porter 04 September 2010 03:20:13AM 2 points [-]

Why exactly do we want "recursive self-improvement" anyways?

Generally we want our programs to be as effective as possible. If the program can improve itself, that's a good thing, from an ordinary perspective.

But for a sufficiently sophisticated program, you don't even need to make self-improvement an explicit imperative. All it has to do is deduce that improving its own performance will lead to better outcomes. This is in the paper by Steve Omohundro (ata's final link).

Why not build into the architecture the impossibility of rewriting its own code

There are too many possibilities. The source code might be fixed, but the self-improvement occurs during run-time via alterations to dynamical objects - data structures, sets of heuristics, virtual machines. An AI might create a new and improved AI rather than improving itself. As Omohundro argues, just having a goal, any goal at all, gives an AI an incentive to increase the amount of intelligence being used in the service of that goal. For a complicated architecture, you would have to block this incentive explicitly, declaratively, at a high conceptual level.

Comment author: kodos96 04 September 2010 02:09:43AM 1 point [-]

I think the FAI research done by Eliezer and Marcello since CEV's publication has included work on formalizing it mathematically, but that's not available to the public

I'm curious - where did you hear this, if it's not available to the public? And why isn't it available to the public? And who's Marcello? There seems to be virtually no information in public circulation about what's actually going on as far as progress towards implementing CEV/FAI.... is current progress being kept secret, or am I just not in the loop? And how does one go about getting in the loop?

Comment author: ata 04 September 2010 02:31:49AM *  2 points [-]

Marcello is Marcello Herreshoff, a math genius and all around cool guy who is Eliezer's apprentice/coworker. Eliezer has mentioned on LW that he and Marcello "work[ed] for a year on AI theory", and from conversations about these things when I was at Benton(/SIAI House) for a weekend, I got the impression that some of this work included expanding on and formalizing CEV, though I could be misremembering.

(Regarding "where did you hear this, if it's not available to the public?" — I don't think the knowledge that this research happened is considered a secret, only the content of it is. And I am not party to any of that content, because I am still merely a wannabe FAI researcher.)

Comment author: komponisto 04 September 2010 03:08:18AM 0 points [-]

Note also that defining the thing-we-want-an-AI-to-do is only half of the problem of Friendliness; the other half is solving the problems in decision theory that will allow us to prove that an AI's goal system and decision algorithms will cause it to not change its goal system and decision algorithms.

My understanding is that Eliezer considers this second part to be a substantially easier problem.

Comment author: timtyler 04 September 2010 07:32:36AM *  0 points [-]

Probably the closest thing I have seen to a definition of "friendly" from E.Y. is:

"The term "Friendly AI" refers to the production of human-benefiting, non-human-harming actions in Artificial Intelligence systems that have advanced to the point of making real-world plans in pursuit of goals."

That appears to make Deep Blue "friendly". It hasn't harmed too many people so far - though maybe Kasparov's ego got a little bruised.

Another rather different attempt:

"I use the term "Friendly AI" to refer to this whole challenge. Creating a mind that doesn't kill people but does cure cancer ...which is a rather limited way of putting it. More generally, the problem of pulling a mind out of mind design space, such that afterwards that you are glad you did it."

  • here, 29 minutes in

...that one has some pretty obvious problems, as I describe here.

These are not operational definitions. For example, both rely on some kind of unspecified definition of what a "person" is. That maybe obvious today - but human nature will probably be putty in the hands of an intelligent machine - and it may well start wondering about the best way to gently transform a person into a non-person.