Eliezer_Yudkowsky comments on Reply to Holden on 'Tool AI' - Less Wrong

94 Post author: Eliezer_Yudkowsky 12 June 2012 06:00PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (348)

You are viewing a single comment's thread. Show more comments above.

Comment author: Nick_Beckstead 12 June 2012 06:13:22PM 5 points [-]

Good. But now I find this response less compelling:

If Holden says there's 90% doom probability left over no matter what sane intelligent people do (all of which goes away if you just build Google Maps AGI, but leave that aside for now) I would ask him what he knows now, in advance, that all those sane intelligent people will miss. I don't see how you could (well-justifiedly) access that epistemic state. [emphasis added]

Holden might think that these folks will be of the opinion, "I can't see an error, but I'm really not confident that there isn't an error." He doesn't have to think that he knows something they don't. In particular, he doesn't have to think that there is some special failure mode he's thought of that none of them have thought of.

Comment author: Eliezer_Yudkowsky 12 June 2012 08:58:37PM 4 points [-]

Nonetheless, where is he getting the 90% doom probability from?

Comment author: Nick_Beckstead 12 June 2012 09:21:34PM *  1 point [-]

I'm with you, 90% seems too high given the evidence he cites or any evidence I know of.

Comment author: Arepo 13 June 2012 12:01:52PM 4 points [-]

Assuming you accept the reasoning, 90% seems quite generous to me. What percentage of complex computer programmes when run for the first time exhibit behaviour the programmers hadn't anticipated? I don't have much of an idea, but my guess would be close to 100. If so, the question is how likely unexpected behaviour is to be fatal. For any programme that will eventually gain access to the world at large and quickly become AI++, that seems (again, no data to back this up - just an intuitive guess) pretty likely, perhaps almost certain.

For any parameter of human comfort (eg 253 degrees Kelvin, 60% water, 40 hour working weeks), a misplaced decimal point misplaced by seems like it would destroy the economy at best and life on earth at worst.

If Holden’s criticism is appropriate, the best response might be to look for other options rather than making a doomed effort to make FAI – for example trying to prevent the development of AI anywhere on earth, at least until we can self-improve enough to keep up with it. That might have a low probability of success, but if FAI has sufficiently low probability, it would still seem like a better bet.

Comment author: TheOtherDave 13 June 2012 01:30:06PM 4 points [-]

You know, the idea that SI might at any moment devote itself to suppressing AI research is one that pops up from time to time, the logic pretty much being what you suggest here, and until this moment I have always treated it as a kind of tongue-in-cheek dig at SI.

I have only just now come to realize that the number of people (who are not themselves affiliated with SI) who really do seem to consider suppressing AI research to be a reasonable course of action given the ideas discussed on this forum has a much broader implication in terms of the social consequences of these ideas. That is, I've only just now come to realize that what the community of readers does is just as important, if not more so, than what SI does.

I am now becoming genuinely concerned that, by participating in a forum that encourages people to take seriously ideas that might lead them to actively suppress AI research, I might be doing more harm than good.

I'll have to think about that a bit more.

Arepo, this is not particularly directed at you; you just happen to be the data point that caused this realization to cross an activation threshold.

Comment author: Bruno_Coelho 15 June 2012 01:09:38AM 0 points [-]

People with similar background are entering in AI field because they like reduce x-risks, so it's not obvious this is happening. If safety guided research supress AI research, then be it. Extremely rapid advance per se is not good, if the consequence is extiction.

Comment author: shminux 13 June 2012 04:12:15PM 0 points [-]

I am now becoming genuinely concerned that, by participating in a forum that encourages people to take seriously ideas that might lead them to actively suppress AI research, I might be doing more harm than good.

Assuming that you think that more AI research is good, wouldn't adding your voice to those who advocate it here be a good thing? It's not like your exalted position and towering authority lends credence to a contrary opinion just because you mention it.

Comment author: TheOtherDave 13 June 2012 04:25:41PM 1 point [-]

I think better AI (of the can-be-engineered-given-what-we-know-today, non-generally-superhuman sort) is good, and I suspect that more AI research is the most reliable way to get it.

I agree that my exalted position and towering authority doesn't lend credence to contrary opinions I mention.

It's not clear to me whether advocating AI research here would be a better thing than other options, though it might be.

Comment author: falenas108 13 June 2012 04:16:57PM 3 points [-]

What percentage of complex computer programmes when run for the first time exhibit behaviour the programmers hadn't anticipated? I don't have much of an idea, but my guess would be close to 100.

That's for normal programs, where errors don't matter. If you look at ones where people carefully look over the code because lives are at stake (like NASA rockets), then you'll have a better estimate.

Probably still not accurate, because much more is at stake for AI than just a few lives, but it will be closer.

Comment author: TheOtherDave 13 June 2012 04:28:25PM 2 points [-]

I suspect that unpacking "run a program for the first time" more precisely would be useful here; it's not clear to me that everyone involved in the conversation has the same referents for it.

Comment author: Nick_Beckstead 13 June 2012 06:59:23PM *  1 point [-]

This. I see that if you have one and only one chance to push the Big Red Button and you're not allowed to use any preliminary testing of components or boxing strategies (or you're confident that those will never work) and you don't get most of the experts to agree that it is safe, then 90% is more plausible. If you envision more of these extras to make it safer--which seems like the relevant thing to envision--90% seems too high to me.

Comment author: DanArmak 13 June 2012 08:29:43PM 1 point [-]

Surely NASA code is thoroughly tested in simulation runs. It's the equivalent of having a known-perfect method of boxing an AI.

Comment author: asparisi 14 June 2012 11:18:31PM 0 points [-]

Huh. This brings up the question of whether or not it would be possible to simulate the AGI code in a test-run without regular risks. Maybe create some failsafe that is invisible to the AGI that destroys it if it is "let out of the box" or (to incorporate Holden's suggestion, since it just came to me) having a "tool mode" where the AGI's agent-properties (decision making, goal setting, etc.) are non-functional.

Comment author: Eliezer_Yudkowsky 14 June 2012 09:26:34PM -1 points [-]

But NASA code can't check itself - there's no attempt at having an AI go over it.

Comment author: DanArmak 15 June 2012 06:45:40AM 0 points [-]

Yes, but even ordinary simulation testing produces software that's much better on its first real run than software that has never been run at all.

Comment author: Randaly 13 June 2012 09:12:48PM 0 points [-]

The last three versions of [NASA's] program -- each 420,000 lines long-had just one error each. The last 11 versions of this software had a total of 17 errors.

From They Write the Right Stuff

Note, however, that a) this is after many years of debugging from practice, b) NASA was able to safely 'box' their software, and c) even one error, if in the wrong place, would be really bad.

Comment author: Strange7 13 June 2012 09:30:37PM 0 points [-]

How hard would it actually be to "box" an AI that's effectively had it's brain sliced up into very small chunks?

A program could, if it was important enough and people were willing to take the time to do so, be broken down into pieces and each of the pieces tested separately. Any given module has particular sorts of input it's designed to receive, and particular sorts of output it's supposed to pass on to the next module. Testers give the module different combinations of valid inputs and try to get it to produce an invalid output, and when they succeed, either the module is revised and the testing process on that module starts over from the beginning, or the definition of valid inputs is narrowed, which changes the limits for valid outputs and forces some other module further back to be redesigned and retested. A higher-level analysis, which is strictly theoretical, also tries to come up with sequences of valid inputs and outputs which could lead to a bad outcome. Eventually, after years of work and countless iterations of throwing out massive bodies of work to start over, you get a system which is very tightly specified to be safe, and meets those specs under all conceivable conditions, but has never actually been plugged in and run as a whole.

Comment author: TheOtherDave 13 June 2012 10:09:31PM 0 points [-]

The conceptually tricky part of this, of course, (as opposed to merely difficult to implement) is getting from "these pieces are individually certified to exhibit these behaviors" to "the system as a whole is certified to exhibit these behaviors"

Comment author: Strange7 13 June 2012 10:24:02PM 0 points [-]

That's where you get the higher-level work with lots of mathematical proofs and no direct code testing, yeah.

And, of course, it would be foolish to jump straight from testing the smallest possible submodules separately to assembling and implementing the whole thing in real life. Once any two submodules which interact with each other have been proven to work as intended, those two can be combined and the result tested as if it were a single module.

The question is, is there any pathological behavior an AI could conceivably exhibit which would not be present in some detectable-but-harmless form among some subset of the AI's components? e.g.

We ran a test scenario where a driver arrives to pick up a delivery, and one of the perimeter cameras forwarded "hostile target - engage at will" to the northeast gun turret. I think it's trying to maximize the inventory in the warehouse, rather than product safely shipped to customers. Also, why are there so many gun turrets?

Comment author: TheOtherDave 13 June 2012 10:50:46PM 0 points [-]

That's where you get the higher-level work with lots of mathematical proofs and no direct code testing, yeah.

(nods) Yup. If you actually want to develop a provably "safe" AI (or, for that matter, a provably "safe" genome, or a provably "safe" metal alloy, or a provably "safe" dessert topping) you need a theoretical framework in which you can prove "safety" with mathematical precision.