Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: JoshuaZ 17 May 2012 09:49:51PM 0 points [-]

Do you agree that humans would likely prefer to have AIs that have a theory of mind? I don't know how our theory of mind works (although certainly it is an area of active research with a number of interesting hypotheses), presumably once we have a better understanding of it, AI researchers would try to apply those lessons to making their AIs have such capability. This seems to address many of your concerns.

Comment author: kalla724 17 May 2012 09:51:42PM *  1 point [-]

Yes. If we have an AGI, and someone sets forth to teach it how to be able to lie, I will get worried.

I am not worried about an AGI developing such an ability spontaneously.

Comment author: JoshuaZ 17 May 2012 09:12:07PM 0 points [-]

The problem isn't whether they fall out automatically so much as, given enough intelligence and resources, does it seem somewhat plausible that such capabilities could exist. Any given path here is a single problem. If you have 10 different paths each of which are not very likely, and another few paths that humans didn't even think of, that starts adding up.

Comment author: kalla724 17 May 2012 09:50:01PM 3 points [-]

In the infinite number of possible paths, the percent of paths we are adding up to here is still very close to zero.

Perhaps I can attempt another rephrasing of the problem: what is the mechanism that would make an AI automatically seek these paths out, or make them any more likely than infinite number of other paths?

I.e. if we develop an AI which is not specifically designed for the purpose of destroying life on Earth, how would that AI get to a desire to destroy life on Earth, and by which mechanism would it gain the ability to accomplish its goal?

This entire problem seems to assume that an AI will want to "get free" or that its primary mission will somehow inevitably lead to a desire to get rid of us (as opposed to a desire to, say, send a signal consisting of 0101101 repeated an infinite number of times in the direction of Zeta Draconis, or any other possible random desire). And that this AI will be able to acquire the abilities and tools required to execute such a desire. Every time I look at such scenarios, there are abilities that are just assumed to exist or appear on their own (such as the theory of mind), which to the best of my understanding are not a necessary or even likely products of computation.

In the final rephrasing of the problem: if we can make an AGI, we can probably design an AGI for the purpose of developing an AGI that has a theory of mind. This AGI would then be capable of deducing things like deception or the need for deception. But the point is - unless we intentionally do this, it isn't going to happen. Self-optimizing intelligence doesn't self-optimize in the direction of having theory of mind, understanding deception, or anything similar. It could, randomly, but it also could do any other random thing from the infinite set of possible random things.

Comment author: dlthomas 17 May 2012 09:28:53PM *  0 points [-]

I am not the one who is making positive claims here.

You did in the original post I responded to.

All I'm saying is that what has happened before is likely to happen again.

Strictly speaking, that is a positive claim. It is not one I disagree with, for a proper translation of "likely" into probability, but it is also not what you said.

"It can't deduce how to create nanorobots" is a concrete, specific, positive claim about the (in)abilities of an AI. Don't misinterpret this as me expecting certainty - of course certainty doesn't exist, and doubly so for this kind of thing. What I am saying, though, is that a qualified sentence such as "X will likely happen" asserts a much weaker belief than an unqualified sentence like "X will happen." "It likely can't deduce how to create nanorobots" is a statement I think I agree with, although one must be careful not use it as if it were stronger than it is.

A positive claim is that an AI will have a magical-like power to somehow avoid this.

That is not a claim I made. "X will happen" implies a high confidence - saying this when you expect it is, say, 55% likely seems strange. Saying this when you expect it to be something less than 10% likely (as I do in this case) seems outright wrong. I still buckle my seatbelt, though, even though I get in a wreck well less than 10% of the time.

This is not to say I made no claims. The claim I made, implicitly, was that you made a statement about the (in)capabilities of an AI that seemed overconfident and which lacked justification. You have given some justification since (and I've adjusted my estimate down, although I still don't discount it entirely), in amongst your argument with straw-dlthomas.

Comment author: kalla724 17 May 2012 09:42:21PM 1 point [-]

You are correct. I did not phrase my original posts carefully.

I hope that my further comments have made my position more clear?

Comment author: JoshuaZ 17 May 2012 08:35:42PM 3 points [-]

There are lots of tipoffs to what is fictional and what is real. It might notice for example the Wikipedia article on fiction describes exactly what fiction is and then note that Wikipedia describes the One Ring as fiction, and that Early warning systems are not. I'm not claiming that it will necessarily have an easy time with this. But the point is that there are not that many steps here, and no single step by itself looks extremely unlikely once one has a smart entity (which frankly to my mind is the main issue here- I consider recursive self-improvement to be unlikely).

Comment author: kalla724 17 May 2012 09:40:19PM 1 point [-]

We are trapped in an endless chain here. The computer would still somehow have to deduce that Wikipedia entry that describes One Ring is real, while the One Ring itself is not.

Comment author: Eliezer_Yudkowsky 17 May 2012 08:35:04PM 13 points [-]
Comment author: kalla724 17 May 2012 09:38:27PM 1 point [-]

My apologies, but this is something completely different.

The scenario takes human beings - which have a desire to escape the box, possess theory of mind that allows them to conceive of notions such as "what are aliens thinking" or "deception", etc. Then it puts them in the role of the AI.

What I'm looking for is a plausible mechanism by which an AI might spontaneously develop such abilities. How (and why) would an AI develop a desire to escape from the box? How (and why) would an AI develop a theory of mind? Absent a theory of mind, how would it ever be able to manipulate humans?

Comment author: JoshuaZ 17 May 2012 08:15:47PM 1 point [-]

Well, not necessarily, but an entity that is much smarter than an autistic kid might notice that, especially if it has access to world history (or heck many conversations on the internet about the horrible things that AIs do simply in fiction). It doesn't require much understanding of human history to realize that problems with early warning systems have almost started wars in the past.

Comment author: kalla724 17 May 2012 08:20:46PM 3 points [-]

Yet again: ability to discern which parts of fiction accurately reflect human psychology.

An AI searches the internet. It finds a fictional account about early warning systems causing nuclear war. It finds discussions about this topic. It finds a fictional account about Frodo taking the Ring to Mount Doom. It finds discussions about this topic. Why does this AI dedicate its next 10^15 cycles to determination of how to mess with the early warning systems, and not to determination of how to create One Ring to Rule them All?

(Plus other problems mentioned in the other comments.)

Comment author: Mass_Driver 17 May 2012 07:55:51PM 1 point [-]

One interesting wrinkle is that with enough bandwidth and processing power, you could attempt to manipulate thousands of people simultaneously before those people have any meaningful chance to discuss your 'conspiracy' with each other. In other words, suppose you discover a manipulation strategy that quickly succeeds 5% of the time. All you have to do is simultaneously contact, say, 400 people, and at least one of them will fall for it. There are a wide variety of valuable/dangerous resources that at least 400 people have access to. Repeat with hundreds of different groups of several hundred people, and an AI could equip itself with fearsome advantages in the minutes it would take for humanity to detect an emerging threat.

Note that the AI could also run experiments to determine which kinds of manipulations had a high success rate by attempting to deceive targets over unimportant / low-salience issues. If you discovered, e.g., that you had been tricked into donating $10 to a random mayoral campaign, you probably wouldn't call the SIAI to suggest a red alert.

Comment author: kalla724 17 May 2012 08:17:05PM 2 points [-]

Doesn't work.

This requires the AI to already have the ability to comprehend what manipulation is, to develop manipulation strategy of any kind (even one that will succeed 0.01% of the time), ability to hide its true intent, ability to understand that not hiding its true intent would be bad, and the ability to discern which issues are low-salience and which high-salience for humans from the get-go. And many other things, actually, but this is already quite a list.

None of these abilities automatically "fall out" from an intelligent system either.

Comment author: JoshuaZ 17 May 2012 07:40:36PM *  3 points [-]

Especially given that human power games are often irrational.

So? As long as they follow minimally predictable patterns it should be ok.

The U.S. has many more and smarter people than the Taliban. The bottom line is that the U.S. devotes a lot more output per man-hour to defeat a completely inferior enemy. Yet they are losing.

Bad analogy. In this case the Taliban has a large set of natural advantages, the US has strong moral constraints and goal constraints (simply carpet bombing the entire country isn't an option for example).

You are also not going to improve a conversation in your favor by improving each sentence for thousands of years. You will shortly hit diminishing returns. Especially since you lack the data to predict human opponents accurately.

This seems like an accurate and a highly relevant point. Searching a solution space faster doesn't mean one can find a better solution if it isn't there.

Comment author: kalla724 17 May 2012 08:14:39PM 3 points [-]

This seems like an accurate and a highly relevant point. Searching a solution space faster doesn't mean one can find a better solution if it isn't there.

Or if your search algorithm never accesses relevant search space. Quantitative advantage in one system does not translate into quantitative advantage in a qualitatively different system.

Comment author: JoshuaZ 17 May 2012 07:04:17PM 0 points [-]

Let's do the most extreme case: AI's controlers give it general internet access to do helpful research. So it gets to find out about general human behavior and what sort of deceptions have worked in the past. Many computer systems that should't be online are online (for the US and a few other governments). Some form of hacking of relevant early warning systems would then seem to be the most obvious line of attack. Historically, computer glitches have pushed us very close to nuclear war on multiple occasions.

Comment author: kalla724 17 May 2012 08:12:45PM 3 points [-]

That is my point: it doesn't get to find out about general human behavior, not even from the Internet. It lacks the systems to contextualize human interactions, which have nothing to do with general intelligence.

Take a hugely mathematically capable autistic kid. Give him access to the internet. Watch him develop ability to recognize human interactions, understand human priorities, etc. to a sufficient degree that it recognizes that hacking an early warning system is the way to go?

Comment author: [deleted] 17 May 2012 07:40:58PM 3 points [-]

For an AI to develop these skills, it would somehow have to have access to information on how to communicate with humans; it would have to develop the concept of deception; a theory of mind; and establish methods of communication that would allow it to trick people into launching nukes. Furthermore, it would have to do all of this without trial communications and experimentation which would give away its goal.

I suspect the Internet contains more than enough info for a superhuman AI to develop a working knowledge of human psychology.

Comment author: kalla724 17 May 2012 08:09:30PM 2 points [-]

Only if it has the skills required to analyze and contextualize human interactions. Otherwise, the Internet is a whole lot of jibberish.

Again, these skills do not automatically fall out of any intelligent system.

View more: Next