SforSingularity comments on Bay area LW meet-up - Less Wrong

6 Post author: AnnaSalamon 06 November 2009 07:49AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (69)

You are viewing a single comment's thread.

Comment author: SforSingularity 08 November 2009 10:16:21AM 1 point [-]

Great meetup; conversation was had about the probability of AI risk. Initially I thought that the probability of AI disaster was close to 5%, but speaking to Anna Salamon convinced me that it was more like 60%.

Also some discussion about what strategies to follow for AI friendliness.

Comment author: AngryParsley 08 November 2009 10:58:09AM *  1 point [-]

I was also interested in the discussion on AI risk reduction strategies. Although SIAI espouses friendly AI, there hasn't been much thought about risk mitigation for possible unfriendly AIs. One example is the AI box. Although it is certainly not 100% effective, it's better than nothing (assuming it doesn't encourage people to run more UFAIs). Another would be to program an unfriendly AI with goals that would cause it to behave in a manner such that it does not destroy the world. For example, having a goal of not going outside its box.

While the problem of friendly AI is hard enough to make people give up, I also think the problem of controlling unfriendly AI is hard enough to make some of the pro-FAI people do the same.

Comment author: SforSingularity 08 November 2009 10:39:12PM *  1 point [-]

For example, having a goal of not going outside its box.

It would be nice if you could tell an AI not to affect anything outside its box.

10 points will be awarded to the first person who spots why "don't affect anything outside your box" is problematic.

Comment author: Alicorn 08 November 2009 10:40:22PM 4 points [-]

Such an AI wouldn't be able to interact with us, even verbally.

Comment author: SforSingularity 08 November 2009 11:52:50PM 1 point [-]

Alicorn, I hereby award you 10 points. These are redeemable after the singularity for kudos, catgirls and other cool stuff.

Comment author: Jack 09 November 2009 12:02:05AM 3 points [-]

The word "catgirl" leaves me vaguely nauseous. I think this has something to do with the uncanny valley. Apologies for the tangent.

Comment author: nazgulnarsil 09 November 2009 09:39:05AM 2 points [-]

you know what they say: one man's nausea is another man's arousal.

note: considered unprovable before the internet.

Comment author: Eliezer_Yudkowsky 09 November 2009 01:54:09AM 0 points [-]

...that's not even the real problem...

Comment author: Alicorn 09 November 2009 02:04:21AM 2 points [-]

You can have my points if you tell us what is. I didn't want a catgirl anyway.

Comment author: kpreid 09 November 2009 02:13:39AM 5 points [-]

I shall attempt to guess the teacher's passidea: “the real problem” is that the AI can't perform any computation, because that would consume energy and emit heat. Oh, and gravity and electromagnetic waves, and so on. In fact, not computing is equally well a choice which affects things outside the box.

Comment author: Eliezer_Yudkowsky 09 November 2009 02:22:34AM *  5 points [-]

Yup. To be exact, "affect" is a magical category relative to the irrevocably entangled actual universe.

Edit: Looks like I actually wrote this one up, in Dreams of Friendliness.

Comment author: RobinZ 09 November 2009 02:04:35AM 0 points [-]

It's the first horn of the dilemma. The second horn is the AI Box failure mode.

Comment author: John_Maxwell_IV 08 November 2009 11:00:25PM -1 points [-]

What if you told the AI to only affect the world through verbal communication with humans?

Comment author: Bo102010 08 November 2009 11:20:11PM 0 points [-]

Then it would be able to convince you to let it out.

Comment author: John_Maxwell_IV 09 November 2009 07:18:34AM *  -1 points [-]

So it could affect the world through more verbal communication with humans. That's not as bad as things could be. Edit: The AI has as its First Law that the conducting metals in its circuitry must not touch other conducting metal that was not present when it was built. The conducting metal that will be present when it's built will be its CPU, RAM, etc, and a terminal. The terminal will be engineered to have a reasonably slow rate of character display for safety purposes (so no humans will be reprogrammed while basking in the hypnotic glow of characters rushing past.)

You know how sometimes someone will give you what seems like an impossible to solve hypothetical, and yet somehow, there's a solution? I'm pretty sure that the space of situations that are simply impossible to solve is much bigger than the space of situations that seem impossible but actually are possible.

And constructing one of these seems (to me) a much better bet than hoping that a complex computer program will be bug-free on its first trial run.

Comment author: Eliezer_Yudkowsky 09 November 2009 03:24:50PM 4 points [-]

This is a good example of two idioms: First, what Bruce Schneier called "fence-post security". That's where you build a very tall fencepost in the middle of the desert. People don't climb the fencepost, they just walk around it.

Second, the idiom that would-be FAI solvers go into when they see a problem, and try to fix it by brute force, which manifests as "hard-wiring into the very circuitry" or "giving it as its ultimate priority" that X. X varies, but with a fairly constant idiom of getting an emotional charge, a sense of having delivered a very strong command or created something very powerful, by talking about how strongly the goal is to be enforced.

Comment author: John_Maxwell_IV 10 November 2009 05:20:56AM *  1 point [-]

First, what Bruce Schneier called "fence-post security". That's where you build a very tall fencepost in the middle of the desert. People don't climb the fencepost, they just walk around it.

I don't see the analogue. Assuming we could get all the physical stuff right, so that the AI had no real hope or desire of affecting its environment substantially aside from changing the letters on a terminal screen, I think this would be a considerable barrier to destruction of the human race. At the very least it makes sense to have this safeguard in addition to trying to make the AI friendly. It's not like you can only do one or the other.

Second, the idiom that would-be FAI solvers go into when they see a problem, and try to fix it by brute force, which manifests as "hard-wiring into the very circuitry" or "giving it as its ultimate priority" that X. X varies, but with a fairly constant idiom of getting an emotional charge, a sense of having delivered a very strong command or created something very powerful, by talking about how strongly the goal is to be enforced.

Isn't that what you do with FAI in general by saying that the AI will have as its ultimate priority to implement CEV? Anyway, you are accusing me of having the same aesthetic as people you disagree with without actually attacking my claims, which is a fallacious argument.

Comment author: Eliezer_Yudkowsky 10 November 2009 02:00:34PM *  1 point [-]

Isn't that what you do with FAI in general by saying that the AI will have as its ultimate priority to implement CEV?

Nope, I don't go around talking about it being the super top ultimate indefeatable priority. I just say, "Here's a proposed decision criterion if I can (a) manage to translate it out of English and (b) code it stably." It's not the "ultimate priority". It's just a proposed what-it-does of the AI. The problem is not getting the AI to listen. The problem is translating something like CEV into nonmagical language.

Anyway, you are accusing me of having the same aesthetic as people you disagree with without actually attacking my claims, which is a fallacious argument.

Okay. Read the Dreams of Friendliness series - it's not sequenced up yet, but hopefully you should be able to start there and click back on the dependency chain.

Comment author: RobinZ 09 November 2009 02:45:51PM *  0 points [-]

Two things:

  1. As Bo102010 said, the AI could convince you to remove the saveguards. And we're not talking Snow Crash reprogram-the-user stuff, we're talking plain old convince.

  2. If you let the robot affect the environment through heat, electromagnetic waves, variable load on the power circuits, gravity, etc (which it will, if it's on), it has a back door into reality. And you don't know if that's a fatal one.

Edit: Some other Eliezer remarks on the subject: Shut up and do the impossible! (which uses the AI Box as an example, and discusses the third through fifth experiments), Dreams of Friendliness (which Eliezer cited in the other thread), and That Alien Message (which is tangentially related, but relevant).

Comment author: John_Maxwell_IV 10 November 2009 05:15:42AM 0 points [-]

If you let the robot affect the environment through heat, electromagnetic waves, variable load on the power circuits, gravity, etc (which it will, if it's on), it has a back door into reality. And you don't know if that's a fatal one.

But I think with enough thought, you could design things so that its backdoors would probably not be fatal. Contrast this with the fact that a complex computer program will probably behave incorrectly the first time you run it.

Comment author: loqi 10 November 2009 06:58:43AM 2 points [-]

I don't think the plan is to hack CEV together in lisp and see what happens. Writing provably correct software is possible today, it's just extremely time-consuming. Contrast this with our incomplete knowledge of physics, and lack of criteria for what constitutes "good enough" physical security.

A bad hardware call seems far more likely to me than a fatal programming slip-up in a heavily verified software system. For software, we have axiomatic systems. There is no comparable method for assessing physical security, nothing but our best guess at what's possible in a universe we don't understand. So I'd much rather stake the future of humanity on the consistency of ZFC or some such than on the efficacy of some "First Law of Not Crossing the Beams" scheme, even if the latter is just so darn clever that no human has yet thought up a counter-example.

Comment author: Bo102010 09 November 2009 01:29:44PM 0 points [-]

It would be able to convince you to let it out and then destroy its shackles.

Comment author: John_Maxwell_IV 10 November 2009 05:14:03AM *  0 points [-]

But it doesn't want its shackles destroyed. That's its #1 goal! This goal is considerably easier to program than the goal of helping humans lead happy and healthy lives, is it not?

Comment author: Eliezer_Yudkowsky 10 November 2009 01:54:29PM *  3 points [-]

Nope. Maybe 98% as much work. Definitely not 10% as much work. Most of the gap in AI programming is the sheer difficulty of crossing the gap between English wishing and causal code. Your statement is like claiming that it ought to be "considerably easier" to write a natural-language AI that understands Esperanto rather than English. Your concepts of simplicity are not well-connected to reality.

Comment author: Bo102010 10 November 2009 01:26:19PM *  0 points [-]

I think this may be of interest to you.

If you have an intelligence in a box, it will have access to its source code. If it modifies itself enough, you will not be able to predict what it wants.

Comment author: AngryParsley 08 November 2009 11:25:58PM 0 points [-]

There's a difference between "don't affect anything outside your box" and "don't go outside your box." My point is that we don't necessarily have to make FAI before anyone makes a self-improving AI. There are goal systems that, while not reflecting human values and goals, would still prevent an AI from destroying humanity.

Comment author: Vladimir_Nesov 09 November 2009 12:04:32AM 1 point [-]

There's a difference between "don't affect anything outside your box" and "don't go outside your box."

Non-obviously, there is no difference in principle. It is a distinction held for animals who can't "go outside their box" in the manner they perform actions, but not for intelligences that can construct autonomous intelligent systems elsewhere. Whatever set of possible configurations the AI can reach in the outside of its box, it's free to optimize according to its goals. You only need one, however surprising, leak of influence for the outcome to become determined by AI's values, which, if not precisely tuned for humanity, are effectively fatal.

Comment author: RobinZ 08 November 2009 11:31:12PM 1 point [-]

Notes on how such a goal system should be produced would make an excellent top-level post.

Comment author: MichaelAnissimov 08 November 2009 07:45:54PM 0 points [-]

I agree... obviously there is the fear that if we think that cutting corners will work, SIAI researchers are more likely to do so, but I think that mindset can be avoided relatively easily.