You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

XiXiDu comments on Google may be trying to take over the world - Less Wrong Discussion

22 [deleted] 27 January 2014 09:33AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (133)

You are viewing a single comment's thread. Show more comments above.

Comment author: XiXiDu 27 January 2014 04:19:39PM *  11 points [-]

Well, that's sort of like having the brightest minds at CERN spend two weeks full time talking to some random "autodidact" who's claiming that LHC is going to create a blackhole that will devour the Earth.

This is an unusual situation though. We have a lot of smart people who believe MIRI (they are not idiots, you've to grant them that). And you and me are not going to change their mind, ever, and they are hardly going to convince us. But if a bunch of independent top-notch people were to accept MIRI's position, then that would certainly make me assign a high probability to the possibility that I simply don't get it and that they are right after all.

Society can't work this way.

In the case of the LHC, independent safety reviews have been conducted. I wish this was the case for the kinds of AI risk scenarios imagined by MIRI.

Comment author: private_messaging 27 January 2014 04:34:51PM *  4 points [-]

We have a lot of smart people who believe MIRI (they are not idiots, you've to grant them that).

If you pitch something stupid to a large enough number of smart people, some small fraction will believe.

In the case of the LHC, independent safety reviews have been conducted.

Not for every crackpot claim. edit: and since they got an ethical review board, that's your equivalent of what was conducted...

I wish this was the case for the kinds of AI risk scenarios imagined by MIRI.

There's a threshold. Some successful trading software, or a popular programming language, or some AI project that does something world-level notable (plays some game really well for example), that puts one above the threshold. Convincing some small fraction of smart people does not. Shane Legg's startup evidently is above the threshold.

As for the risks, why would you think that Google's research is a greater risk to mankind than, say, MIRI's? (assuming that the latter is not irrelevant, for the sake of the argument)

Comment author: XiXiDu 27 January 2014 04:57:36PM *  2 points [-]

As for the risks, why would you think that Google's research is a greater risk to mankind than, say, MIRI's? (assuming that the latter is not irrelevant, for the sake of the argument)

If MIRI was right then, as far as I understand it, a not quite friendly AI (broken friendly AI) could lead to a worse outcome than a general AI that was designed without humans in mind. Since in the former case you would end up with something that keeps humans alive, but e.g. gets a detail liked boredom wrong, while in the latter case you would be transformed into e.g. paperclips. So from this perspective, if MIRI was right, it could be the greater risk.

Comment author: private_messaging 27 January 2014 07:14:57PM 8 points [-]

Well, the other issue is also that people's opinions tend to be more informative of their own general plans than about the field in general.

Imagine that there's a bunch of nuclear power plant engineering teams - before nuclear power plants - working on different approaches.

One of the teams - not a particularly impressive one either - claimed that any nuclear plant is going to blow up like a hundred kiloton nuclear bomb, unless fitted with a very reliable and fast acting control system. This is actually how nuclear power plants were portrayed in early science fiction ("Blowups Happen", by Heinlein).

So you look at the blueprints, and you see that everyone's reactor is designed for a negative temperature coefficient of reactivity, in the high temperature range, and can't blow up like a nuke. Except for one team whose reactor is not designed to make use of a negative temperature coefficient of reactivity. The mysterious disagreement is explained, albeit in a very boring way.

Comment author: V_V 27 January 2014 08:55:14PM 13 points [-]

Except for one team whose reactor is not designed to make use of a negative temperature coefficient of reactivity.

Except that this contrarian team, made of high school drop-outs, former theologians, philosophers, mathematicians and coal power station technicians, never produce an actual design, instead they spend all their time investigating arcane theoretical questions about renormalization in quantum field theory and publish their possibly interesting results outside the scientific peer review system, relying on hype to disseminate them.

Comment author: private_messaging 28 January 2014 09:49:55AM *  2 points [-]

Well, they still have some plan, however fuzzy it is. The plan involves a reactor which according to it's proponents would just blow up like a 100 kiloton nuke if not for some awesome control system they plan to someday work on. Or in case of AI, a general architecture that is going to self improve and literally kill everyone unless a correct goal is set for it. (Or even torture everyone if there's a minus sign in the wrong place - the reactor analogy would be a much worse explosion still if the control rods get wired backwards. Which happens).

My feeling is that there may be risks for some potential designs, but they are not like "the brightest minds that build the first AI failed to understands some argument that even former theologians can follow" (In fiction this happens because said theologian is very special, in reality it happens because the argument is flawed or irrelevant)

Comment author: XiXiDu 28 January 2014 11:10:33AM 7 points [-]

"the brightest minds that build the first AI failed to understands some argument that even former theologians can follow"

This is related to something that I am quite confused about. There are basically 3 possibilities:

(1) You have to be really lucky to stumble across MIRI's argument. Just being really smart is insufficient. So we should not expect whoever ends up creating the first AGI to think about it.

(2) You have to be exceptionally intelligent to come up with MIRI's argument. And you have to be nowhere as intelligent in order to build an AGI that can take over the world.

(3) MIRI's argument is very complex. Only someone who deliberately thinks about risks associated with AGI could come up with all the necessary details of the argument. The first people to build an AGI won't arrive at the correct insights in time.

Maybe there is another possibility on how MIRI could end up being right that I have not thought about, let me know.

It seems to me that what all of these possibilities have in common is that they are improbable. Either you have to be (1) lucky or (2) exceptionally bright or (3) be right about a highly conjunctive hypothesis.

Comment author: [deleted] 28 January 2014 03:42:21PM *  3 points [-]

I would have to say:

4) MIRI themselves are incredibly bad at phrasing their own argument. Go hunt through Eliezer's LessWrong postings about AI risks, from which most of MIRI's language regarding the matter is taken. The "genie metaphor", of Some Fool Bastard being able to give an AGI a Bad Idea task in the form of verbal statements or C++-like programming at a conceptual level humans understand, appears repeatedly. The "genie metaphor" is a worse-than-nothing case of Generalizing From Fictional Evidence.

I would phrase the argument this way (and did so on Hacker News yesterday):

[T]hink of it in terms of mathematics rather than psychology. A so-called "artificial intelligence" is just an extremely sophisticated active[-environment], online learning agent designed to maximize some utility function or (equivalently) minimize some loss function. There's no term in a loss function for "kill all humans", but neither is there one for "do what humans want", or better yet, "do what humans would want if they weren't such complete morons half the time".

This takes us away from magical genies that can be programmed with convenient meta-wishes like, "Do what I mean" or "be the Coherent Extrapolated Volition of humanity" and into the solid, scientific land of equations, accessible by everyone who ever took a machine-learning class in college.

I mean, seriously, my parents understand this phrasing, and they have no education in CS. They do, however, understand very well that a numerical score in some very specific game or task does not represent everything they want out of life, but that it will represent everything the AI wants out of life.

(EDIT: I apologize for any feelings I may have hurt with this comment, but I care about not being paper-clipped more than I care about your feelings. I would rather the scientific public, if not the general public, have a decent understanding of and concern for AGI safety engineering, than have everyone at MIRI get to feel like they're extraordinarily rational and special for spotting a problem nobody else spotted.)

Comment author: private_messaging 29 January 2014 07:26:14AM *  4 points [-]

MIRI themselves are incredibly bad at phrasing their own argument.

Maybe it's just the argument that is bad and wrong.

[T]hink of it in terms of mathematics rather than psychology. A so-called "artificial intelligence" is just an extremely sophisticated active[-environment], online learning agent designed to maximize some utility function or (equivalently) minimize some loss function.

What's the domain of this function? I've a feeling that there's some severe cross-contamination between the meaning of the word "function" as in an abstract mathematical function of something, and the meaning of the word "function" as in purpose of the genie that you have been cleverly primed with, by people who aren't actually bad at phrasing anything but instead good at inducing irrationality.

If you were to think of mathematical functions, well, those don't readily take real world as an input, do they?

Comment author: [deleted] 29 January 2014 01:42:41PM *  3 points [-]

Maybe it's just the argument that is bad and wrong.

At least for the genie metaphor, I completely agree. That one is just plain wrong, and arguments for it are outright bad.

If you were to think of mathematical functions, well, those don't readily take real world as an input, do they?

Ah, here's where things get complicated.

In current models, the domain of the function is Symbols. As in, those things on Turing Machines. Literally: AIXI is defined to view the external universe as a Turing Machine whose output tape is being fed to AIXI, which then feeds back an input tape of Action Symbols. So you learned about this in CS401.

The whole point of phrasing things this way was to talk about general agents: agents that could conceivably receive and reason over any kind of inputs, thus rendering their utility domain to be defined over, indeed, the world.

Thing being, under current models, Utility and Reality are kept ontologically separate: they're different input tapes entirely. An AIXI might wirehead and commit suicide that way, but the model of reality it learns is defined over reality. Any failures of ontology rest with the programmer for building an AI agent that has no concept of ontology, and therefore cannot be taught to value useful, high-level concepts other than the numerical input on its reward tape.

My point? You're correct to say that current AGI models don't take the Entire Real World as input to a magic-genie Verbally Phrased Utility Function like "maximize paperclips". That is a fantasy, we agree on that. So where the hell is the danger, or the problem? Well, the problem is that human AGI researchers are not going to leave it that way. We humans are the ones who want AIs we can order to solve particular problems. We are the ones who will immediately turn the first reinforcement or value learning AGIs, which will be expensive and difficult to operate, towards the task of building more sophisticated AGI architectures that will be easier to direct, more efficient, cheaper, and more capable of learning -- and eventually even self-improvement!

Which means that, if it should come to that, we humans will be the ones who deliberately design AGI architectures that can receive orders in the form of a human-writable program. And that, combined with the capability for self-improvement, would be the "danger spot" where a semi-competent AGI programmer can accidentally direct a machine to do something dangerous without its having enough Natural Language Processing capability built in to understand and execute the intent behind a verbally phrased goal, thus resulting in the programmer failing to specify something because he wasn't so good at coding everything in.

(Some portable internal representation of beliefs, by the way, is one of the fundamental necessities for a self-improving FOOMy AGI, which is why nobody really worries too much about neural networks self-improving and killing us all.)

Now, does all this support the capital-N Narrative of the old SIAI, that we will all die a swift, stupid death if we don't give them all our money now? Absolutely not.

However, would you prefer that the human-implemented bootstrap path from barely-intelligent, ultra-inefficient reinforcement/value learning agents to highly intelligent, ultra-efficient self-improving goal fulfilment devices be very safe, with few chances for even significant damage by well-intentioned idiots, or very dangerous, with conspiracies, espionage, weaponization, and probably a substantial loss of life due to sheer accidents?

Personally, I prefer the former, so I think machine ethics is a worthwhile pursuit, regardless of whether the dramatized, ZOMFG EVIL GENIE WITH PAPERCLIPS narrative is worth anything.

Comment author: XiXiDu 28 January 2014 06:37:01PM *  2 points [-]

There's no term in a loss function for "kill all humans", but neither is there one for "do what humans want", or better yet, "do what humans would want if they weren't such complete morons half the time".

Right. I don't dismiss this, but I think there a bunch of caveats here that I've largely failed to describe in a way that people around here understand sufficiently in order to convince me that the arguments are wrong, or irrelevant.

Here is just one of those caveats, very quickly.

Consider Google was to create an oracle. In an early research phase they would run the following queries and receive the answers listed below:

Input 1: Oracle, how do I make all humans happy?

Output 1: Tile the universe with smiley faces.

Input 2: Oracle, what is the easiest way to print the first 100 Fibonacci numbers?

Output 2: Use all resources in the universe to print as many natural numbers as possible.

(Note: I am aware that MIRI believes that such an oracle wouldn't even return those answers without taking over the world.)

I suspect that an oracle that behaves as depicted above would not be able to take over the world. Simply because such an oracle would not get a chance to do so, since it would be thoroughly revised for giving such ridiculous answers.

Secondly, if it is incapable of understanding such inputs correctly (yes, "make humans happy" is a problem in physics and mathematics that can be answered in a way that is objectively less wrong than "tile the universe with smiley faces"), then such a mistake will very likely have grave consequences for its ability to solve the problems it needs to solve in order to take over the world.

Comment author: [deleted] 28 January 2014 06:41:25PM 0 points [-]

So that hinges on a Very Good Question: can we make and contain a potentially Unfriendly Oracle AI without its breaking out and taking over the universe?

To which my answer is: I do not know enough about AGI to answer this question. There are actually loads of advances in AGI remaining before we can make an agent capable of verbal conversation, so it's difficult to answer.

One approach I might take would be to consider the AI's "alphabet" of output signals as a programming language, and prove formally that this language can only express safe programs (ie: programs that do not "break out of the box").

But don't quote me on that.

Comment author: gjm 28 January 2014 12:28:20PM 0 points [-]

(4) MIRI's argument is easily confused with other arguments that are simple, widely known, and wrong. ("If we build a powerful AI, it is likely to come to hate us and want to kill us like in Terminator and The Matrix, or for that matter Frankenstein. So we shouldn't.") Accordingly, someone intelligent and lucky might well think of the argument, but then dismiss it because it feels silly on account of resembling "OMG if we build an AI it'll turn into Skynet and we'll all die".

This still requires the MIRI folks to be unusually competent in a particular respect, but it's not exactly intelligence they need to claim to have more of. And it might then be more credible that being smart enough to make an AGI is compatible with lacking that particular unusual competence.

In general, being smart enough to do X is usually compatible with being stupid enough to do Y, for almost any X and Y. Human brains are weird. So there's no huge improbability in the idea that the people who build the first AGI might make a stupid mistake. It would be more worrying if no one expert in the field agreed with MIRI's concerns, but e.g. the latest edition of Russell&Norvig seems to take them seriously.

Comment author: XiXiDu 28 January 2014 02:39:53PM *  2 points [-]

(4) MIRI's argument is easily confused with other arguments that are simple, widely known, and wrong. ("If we build a powerful AI, it is likely to come to hate us and want to kill us like in Terminator and The Matrix, or for that matter Frankenstein. So we shouldn't.")

I wonder why there is such a strong antipathy to the Skynet scenario around here? Just because it is science fiction?

The story is that Skynet was build to protect the U.S. and remove the possibility of human error. Then people noticed how Skynet's influence grew after it began to learn at a geometric rate. So people decided to turn it off. Skynet perceived this as an attack and came to the conclusion that all of humanity would attempt to destroy it. To defend humanity from humanity, Skynet launched nuclear missiles under its command at Russia, which responded with a nuclear counter-attack against the U.S. and its allies.

This sounds an awful lot like what MIRI has in mind...so what's the problem?

In general, being smart enough to do X is usually compatible with being stupid enough to do Y, for almost any X and Y.

As far as I can tell, what is necessary to create a working AGI hugely overlaps with making it not want to take over the world. Since many big problems are related to constraining an AGI to, unike e.g. AIXI, use resources efficiently and dismiss certain hypotheses in order to not fall prey to Pascal's mugging. Getting this right means to succeed at getting the AGI work as expected along a number dimensions.

People who get all this right seem to have a huge spectrum of competence.

Comment author: V_V 28 January 2014 06:45:35PM 0 points [-]

Since many big problems are related to constraining an AGI to, unike e.g. AIXI, use resources efficiently and dismiss certain hypotheses in order to not fall prey to Pascal's mugging.

I don't think tat AIXI falls prey to Pascal's mugging in any reasonable scenario. I recall some people here arguing it, but I think they didn't understand the math.

Comment author: [deleted] 28 January 2014 03:48:12PM 0 points [-]

As far as I can tell, what is necessary to create a working AGI hugely overlaps with making it not want to take over the world. Since many big problems are related to constraining an AGI to, unike e.g. AIXI, use resources efficiently and dismiss certain hypotheses in order to not fall prey to Pascal's mugging. Getting this right means to succeed at getting the AGI work as expected along a number dimensions.

And this may well be true. It could be, in the end, that Friendliness is not quite such a problem because we find a way to make "robot" AGIs that perform highly specific functions without going "out of context", that basically voluntarily stay in their box, and that these are vastly safer and more economical to use than a MIRI-grade Mighty AI God.

At the moment, however, we don't know.

Comment author: gjm 28 January 2014 05:08:29PM -1 points [-]

so what's the problem?

The problem is that it's in a movie and smart people are therefore liable not to take it seriously. Especially smart people who are fed up of conversations like this: "So, what do you do?" "I do research into artificial intelligence." "Oh, like in Terminator. Aren't you worried that your creations will turn on us and kill us all?"

Comment author: private_messaging 28 January 2014 02:13:11PM *  2 points [-]

If we build a powerful AI, it is likely to come to hate us and want to kill us like in Terminator

In Terminator the AI gets a goal of protecting itself, and kills everyone as instrumental to that goal.

And in any case, taking a wrong idea from the popular culture and trying to make a more plausible variation out of it, is not exactly an unique and uncommon behaviour. What I am seeing is that a popular notion is likely to spawn and reinforce similar notions, what you seem to be claiming is that a popular notion is likely to somehow suppress the similar notions, and I see no evidence in support of that claim.

With regards to any arguments about humans in general, they apply to everyone, if anything undermining the position of outliers even more.

edit: also, if you have to strawman a Hollywood blockbuster to make the point about top brightest people failing to understand something... I think it's time to seriously rethink your position.