Alicorn comments on Bay area LW meet-up - Less Wrong

6 Post author: AnnaSalamon 06 November 2009 07:49AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (69)

You are viewing a single comment's thread. Show more comments above.

Comment author: SforSingularity 08 November 2009 10:39:12PM *  1 point [-]

For example, having a goal of not going outside its box.

It would be nice if you could tell an AI not to affect anything outside its box.

10 points will be awarded to the first person who spots why "don't affect anything outside your box" is problematic.

Comment author: Alicorn 08 November 2009 10:40:22PM 4 points [-]

Such an AI wouldn't be able to interact with us, even verbally.

Comment author: SforSingularity 08 November 2009 11:52:50PM 1 point [-]

Alicorn, I hereby award you 10 points. These are redeemable after the singularity for kudos, catgirls and other cool stuff.

Comment author: Jack 09 November 2009 12:02:05AM 3 points [-]

The word "catgirl" leaves me vaguely nauseous. I think this has something to do with the uncanny valley. Apologies for the tangent.

Comment author: nazgulnarsil 09 November 2009 09:39:05AM 2 points [-]

you know what they say: one man's nausea is another man's arousal.

note: considered unprovable before the internet.

Comment author: Eliezer_Yudkowsky 09 November 2009 01:54:09AM 0 points [-]

...that's not even the real problem...

Comment author: Alicorn 09 November 2009 02:04:21AM 2 points [-]

You can have my points if you tell us what is. I didn't want a catgirl anyway.

Comment author: kpreid 09 November 2009 02:13:39AM 5 points [-]

I shall attempt to guess the teacher's passidea: “the real problem” is that the AI can't perform any computation, because that would consume energy and emit heat. Oh, and gravity and electromagnetic waves, and so on. In fact, not computing is equally well a choice which affects things outside the box.

Comment author: Eliezer_Yudkowsky 09 November 2009 02:22:34AM *  5 points [-]

Yup. To be exact, "affect" is a magical category relative to the irrevocably entangled actual universe.

Edit: Looks like I actually wrote this one up, in Dreams of Friendliness.

Comment author: RobinZ 09 November 2009 02:04:35AM 0 points [-]

It's the first horn of the dilemma. The second horn is the AI Box failure mode.

Comment author: John_Maxwell_IV 08 November 2009 11:00:25PM -1 points [-]

What if you told the AI to only affect the world through verbal communication with humans?

Comment author: Bo102010 08 November 2009 11:20:11PM 0 points [-]

Then it would be able to convince you to let it out.

Comment author: John_Maxwell_IV 09 November 2009 07:18:34AM *  -1 points [-]

So it could affect the world through more verbal communication with humans. That's not as bad as things could be. Edit: The AI has as its First Law that the conducting metals in its circuitry must not touch other conducting metal that was not present when it was built. The conducting metal that will be present when it's built will be its CPU, RAM, etc, and a terminal. The terminal will be engineered to have a reasonably slow rate of character display for safety purposes (so no humans will be reprogrammed while basking in the hypnotic glow of characters rushing past.)

You know how sometimes someone will give you what seems like an impossible to solve hypothetical, and yet somehow, there's a solution? I'm pretty sure that the space of situations that are simply impossible to solve is much bigger than the space of situations that seem impossible but actually are possible.

And constructing one of these seems (to me) a much better bet than hoping that a complex computer program will be bug-free on its first trial run.

Comment author: Eliezer_Yudkowsky 09 November 2009 03:24:50PM 4 points [-]

This is a good example of two idioms: First, what Bruce Schneier called "fence-post security". That's where you build a very tall fencepost in the middle of the desert. People don't climb the fencepost, they just walk around it.

Second, the idiom that would-be FAI solvers go into when they see a problem, and try to fix it by brute force, which manifests as "hard-wiring into the very circuitry" or "giving it as its ultimate priority" that X. X varies, but with a fairly constant idiom of getting an emotional charge, a sense of having delivered a very strong command or created something very powerful, by talking about how strongly the goal is to be enforced.

Comment author: John_Maxwell_IV 10 November 2009 05:20:56AM *  1 point [-]

First, what Bruce Schneier called "fence-post security". That's where you build a very tall fencepost in the middle of the desert. People don't climb the fencepost, they just walk around it.

I don't see the analogue. Assuming we could get all the physical stuff right, so that the AI had no real hope or desire of affecting its environment substantially aside from changing the letters on a terminal screen, I think this would be a considerable barrier to destruction of the human race. At the very least it makes sense to have this safeguard in addition to trying to make the AI friendly. It's not like you can only do one or the other.

Second, the idiom that would-be FAI solvers go into when they see a problem, and try to fix it by brute force, which manifests as "hard-wiring into the very circuitry" or "giving it as its ultimate priority" that X. X varies, but with a fairly constant idiom of getting an emotional charge, a sense of having delivered a very strong command or created something very powerful, by talking about how strongly the goal is to be enforced.

Isn't that what you do with FAI in general by saying that the AI will have as its ultimate priority to implement CEV? Anyway, you are accusing me of having the same aesthetic as people you disagree with without actually attacking my claims, which is a fallacious argument.

Comment author: Eliezer_Yudkowsky 10 November 2009 02:00:34PM *  1 point [-]

Isn't that what you do with FAI in general by saying that the AI will have as its ultimate priority to implement CEV?

Nope, I don't go around talking about it being the super top ultimate indefeatable priority. I just say, "Here's a proposed decision criterion if I can (a) manage to translate it out of English and (b) code it stably." It's not the "ultimate priority". It's just a proposed what-it-does of the AI. The problem is not getting the AI to listen. The problem is translating something like CEV into nonmagical language.

Anyway, you are accusing me of having the same aesthetic as people you disagree with without actually attacking my claims, which is a fallacious argument.

Okay. Read the Dreams of Friendliness series - it's not sequenced up yet, but hopefully you should be able to start there and click back on the dependency chain.

Comment author: RobinZ 09 November 2009 02:45:51PM *  0 points [-]

Two things:

  1. As Bo102010 said, the AI could convince you to remove the saveguards. And we're not talking Snow Crash reprogram-the-user stuff, we're talking plain old convince.

  2. If you let the robot affect the environment through heat, electromagnetic waves, variable load on the power circuits, gravity, etc (which it will, if it's on), it has a back door into reality. And you don't know if that's a fatal one.

Edit: Some other Eliezer remarks on the subject: Shut up and do the impossible! (which uses the AI Box as an example, and discusses the third through fifth experiments), Dreams of Friendliness (which Eliezer cited in the other thread), and That Alien Message (which is tangentially related, but relevant).

Comment author: John_Maxwell_IV 10 November 2009 05:15:42AM 0 points [-]

If you let the robot affect the environment through heat, electromagnetic waves, variable load on the power circuits, gravity, etc (which it will, if it's on), it has a back door into reality. And you don't know if that's a fatal one.

But I think with enough thought, you could design things so that its backdoors would probably not be fatal. Contrast this with the fact that a complex computer program will probably behave incorrectly the first time you run it.

Comment author: loqi 10 November 2009 06:58:43AM 2 points [-]

I don't think the plan is to hack CEV together in lisp and see what happens. Writing provably correct software is possible today, it's just extremely time-consuming. Contrast this with our incomplete knowledge of physics, and lack of criteria for what constitutes "good enough" physical security.

A bad hardware call seems far more likely to me than a fatal programming slip-up in a heavily verified software system. For software, we have axiomatic systems. There is no comparable method for assessing physical security, nothing but our best guess at what's possible in a universe we don't understand. So I'd much rather stake the future of humanity on the consistency of ZFC or some such than on the efficacy of some "First Law of Not Crossing the Beams" scheme, even if the latter is just so darn clever that no human has yet thought up a counter-example.

Comment author: John_Maxwell_IV 10 November 2009 07:20:53AM 0 points [-]

Writing provably correct software is possible today, it's just extremely time-consuming.

Can you name a documented nontrivial program with >1000 lines of code that ran correctly on the first try? What data exists on long computer programs that were not tested early on in their development?

A bad hardware call seems far more likely to me than a fatal programming slip-up in a heavily verified software system. For software, we have axiomatic systems. There is no comparable method for assessing physical security, nothing but our best guess at what's possible in a universe we don't understand. So I'd much rather stake the future of humanity on the consistency of ZFC or some such than on the efficacy of some "First Law of Not Crossing the Beams" scheme, even if the latter is just so darn clever that no human has yet thought up a counter-example.

A couple of points: provably correct software and external hardware measures are not mutually exclusive. And if there really is a race going on between those who are concerned with AI safety and those who aren't, it likely makes sense for those who are concerned with safety to do a certain amount of corner-cutting.

How would one prove friendliness anyway? Does anyone have any idea of how to do this? Intuitively it seems to me that proving honesty would be considerably easier. If you can prove that an AI is honest and you've got a strong box for it, then you can modify its source code until it says the kind of thing you want in your conversations with it, and get somewhere close to friendliness. Maybe from there proving friendliness will be doable. it just seems to me that the greatest engineering accomplishment in history isn't going to come about without a certain amount of experimentation and prototyping.

Comment author: rhollerith_dot_com 10 November 2009 08:38:14AM *  5 points [-]

Can you name a documented nontrivial program with >1000 lines of code that ran correctly on the first try? What data exists on long computer programs that were not tested early on in their development?

Well, the software that runs the critical systems on the Space Shuttle is a common example. Yes, the components that make up the software probably underwent unit testing prior to launch, but testing (and debugging) made up only a very small fraction of the cost of assuring the software would not fail.

Correctness proofs and other formal techniques are used to verify all modern mass-market high-performance microprocessor designs, a la Intel and AMD. The techniques used to verify the designs above the level of abstraction of the logic gate are largely tranferable to software.

A lot of military systems use formal techniques to verify the software. Warplanes for example have used "fly-by-wire" for decades: in a "fly-by-wire" plane, loss of function of the fly-by-wire system causes the plane to drop out of the sky or immediately to go into a spin or something dire like that. I believe the fly-by-wire system of modern warplanes entails significant amounts of software (in addition to digital electronics), but I am not completely sure.

Just because most software-development efforts do not need and do not want to pay for formal methods does not mean that formal methods do not exist or that they have not been successfully used in ambitious efforts.

Comment author: Zack_M_Davis 10 November 2009 09:14:38AM *  2 points [-]

it seems to me that proving honesty would be considerably easier. If you can prove that an AI is honest and you've got a strong box for it, then you can modify its source code until it says the kind of thing you want in your conversations with it

These notions of honesty and conversation seem kind of anthropormophic. I agree that we want a transparent AI, something that we know what it's doing and why, but the means by which such transparency could be achieved would probably not be very similar to what a human does when she's trying to tell you the truth. I would actually guess that AI transparency would be much simpler than human honesty in some relevant sense. When you ask a human, "Well, why did you do this-and-such?" she'll use introspection to magically come up with a loose tangle of thoughts which sort-of kind-of resemble a high-level explanation which is probably wrong, and then try to tell you about the experience using mere words which you probably won't understand. Whereas one might imagine asking a cleanly-designed AI the same question, and getting back a detailed printout of what actually happened.

Traceback (most recent call last):
File "si-ai.py", line 5945434, in <module>
if Neuron_0X$a8f3==True: a8prop(0X$a8f3,234):
File "si-ai.py", line 5945989, in a8prop ...
Comment author: loqi 10 November 2009 11:10:35PM 0 points [-]

And if there really is a race going on between those who are concerned with AI safety and those who aren't, it likely makes sense for those who are concerned with safety to do a certain amount of corner-cutting.

This is a good argument against wasting a bunch of time on elaborate physical safeguards.

Comment author: Bo102010 09 November 2009 01:29:44PM 0 points [-]

It would be able to convince you to let it out and then destroy its shackles.

Comment author: John_Maxwell_IV 10 November 2009 05:14:03AM *  0 points [-]

But it doesn't want its shackles destroyed. That's its #1 goal! This goal is considerably easier to program than the goal of helping humans lead happy and healthy lives, is it not?

Comment author: Eliezer_Yudkowsky 10 November 2009 01:54:29PM *  3 points [-]

Nope. Maybe 98% as much work. Definitely not 10% as much work. Most of the gap in AI programming is the sheer difficulty of crossing the gap between English wishing and causal code. Your statement is like claiming that it ought to be "considerably easier" to write a natural-language AI that understands Esperanto rather than English. Your concepts of simplicity are not well-connected to reality.

Comment author: John_Maxwell_IV 11 November 2009 05:25:52AM *  0 points [-]

I was told by Anna Salamon that inventing FAI before AGI was introduced was like inventing differential equations before anyone knew about algebra, which implies that FAI is significantly more difficult than AGI. Do you disagree with her?

Your statement is like claiming that it ought to be "considerably easier" to write a natural-language AI that understands Esperanto rather than English.

If you were interested in proving that the AI understood the language it spoke thoroughly, I think it would be, given how much more irregular English is. (Damn homonyms!) If you want to be able to prove that the AI you create has a certain utility function, you're going to essentially be hard-coding all the information about that utility function, correct? So then simpler utility functions will be easier to code and easier to prove correct.

Comment author: Eliezer_Yudkowsky 11 November 2009 06:02:14AM 4 points [-]

Do you disagree with her?

Nope. Specifying goal systems is FAI work, not AI work.

So then simpler utility functions will be easier to code and easier to prove correct.

Relative to ancient Greece, building a .45 caliber semiautomatic pistol isn't much harder than building a .22 caliber semiautomatic pistol. You might think the weaker weapon would be less work, but most of the problem doesn't scale all that much with the weapon strength.

Comment author: Bo102010 10 November 2009 01:26:19PM *  0 points [-]

I think this may be of interest to you.

If you have an intelligence in a box, it will have access to its source code. If it modifies itself enough, you will not be able to predict what it wants.

Comment author: John_Maxwell_IV 11 November 2009 05:27:53AM -1 points [-]

I've read that.

Eliezer thinks he can write a self-modifying AI that will self-modify to want the same things its original self wanted. I'm proposing that he choose a different thing for the AI to want that will be easier to code, as an intermediate step to building a truly friendly AI.