Part of the Muehlhauser interview series on AGI.

 

Luke Muehlhauser is Executive Director of the Singularity Institute, a non-profit research institute studying AGI safety.

Ben Goertzel is the Chairman at the AGI company Novamente and founder of the AGI conference series.


Continued from part 1...

 

Luke:

[Apr 11th, 2012]

I agree the future is unlikely to consist of a population of fairly distinct AGIs competing for resources, but I never thought that the arguments for Basic AI drives or "convergent instrumenta l goals" required that scenario to hold.

Anyway, I prefer the argument for convergent instrumental goals in Nick Bostrom 's more recent paper " The Superintelligent Will." Which parts of Nick's argument fail to persuade you?

 

Ben:

[Apr 12th, 2012]

Well, for one thing, I think his 

Orthogonality Thesis

Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.

is misguided. It may be true, but who cares about possibility “in principle”? The question is whether any level of intelligence is PLAUSIBLY LIKELY to be combined with more or less any final goal in practice. And I really doubt it. I guess I could posit the alternative

Interdependency Thesis 

Intelligence and final goals are in practice highly and subtly interdependent. In other words, in the actual world, various levels of intelligence are going to be highly correlated with various probability distributions over the space of final goals.

This just gets back to the issue we discussed already, of me thinking it’s really unlikely that a superintelligence would ever really have a really stupid goal like say, tiling the Cosmos with Mickey Mice.

Bostrom says 

It might be possible through deliberate effort to construct a superintelligence that values ... human welfare, moral goodness, or any other complex purpose that its designers might want it to serve. But it is no less possible—and probably technically easier—to build a superintelligence that places final value on nothing but calculating the decimals of pi.

but he gives no evidence for this assertion. Calculating the decimals of pi may be a fairly simple mathematical operation that doesn’t have any need for superintelligence, and thus may be a really unlikely goal for a superintelligence -- so that if you tried to build a superintelligence with this goal and connected it to the real world, it would very likely get its initial goal subverted and wind up pursuing some different, less idiotic goal.

One basic error Bostrom seems to be making in this paper, is to think about intelligence as something occurring in a sort of mathematical vacuum, divorced from the frustratingly messy and hard-to-quantify probability distributions characterizing actual reality....

Regarding his

The Instrumental Convergence Thesis

Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents.

the first clause makes sense to me,

Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations

but it doesn’t seem to me to justify the second clause

implying that these instrumental values are likely to be pursued by many intelligent agents.

The step from the first to the second clause seems to me to assume that the intelligent agents in question are being created and selected by some sort of process similar to evolution by natural selection, rather than being engineered carefully, or created via some other process beyond current human ken.

In short, I think the Bostrom paper is an admirably crisp statement of its perspective, and I agree that its conclusions seem to follow from its clearly stated assumptions -- but the assumptions are not justified in the paper, and I don’t buy them at all.

 

Luke:

[Apr. 19, 2012]

Ben,

Let me explain why I think that:

(1) The fact that we can identify convergent instrumental goals (of the sort described by Bostrom) implies that many agents will pursue those instrumental goals.

Intelligent systems are intelligent because rather than simply executing hard-wired situation-action rules, they figure out how to construct plans that will lead to the probabilistic fulfillment of their final goals. That is why intelligent systems will pursue the convergent instrumental goals described by Bostrom. We might try to hard-wire a collection of rules into an AGI which restrict the pursuit of some of these convergent instrumental goals, but a superhuman AGI would realize that it could better achieve its final goals if it could invent a way around those hard-wired rules and have no ad-hoc obstacles to its ability to execute intelligent plans for achieving its goals.

Next: I remain confused about why an intelligent system will decide that a particular final goal it has been given is "stupid," and then change its final goals — especially given the convergent instrumental goal to preserve its final goals.

Perhaps the word "intelligence" is getting in our way. Let's define a notion of " optimization power," which measures (roughly) an agent's ability to optimize the world according to its preference ordering, across a very broad range of possible preference orderings and environments. I think we agree that AGIs with vastly greater-than-human optimization power will arrive in the next century or two. The problem, then, is that this superhuman AGI will almost certainly be optimizing the world for something other than what humans want, because what humans want is complex and fragile, and indeed we remain confused about what exactly it is that we want. A machine superoptimizer with a final goal of solving the Riemann hypothesis will simply be very good at solving the Riemann hypothesis (by whatever means necessary).

Which parts of this analysis do you think are wrong?

 

Ben:

[Apr. 20, 2012]

It seems to me that in your reply you are implicitly assuming a much stronger definition of “convergent” than the one Bostrom actually gives in his paper. He says

instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents.

Note the somewhat weaselly reference to a “wide range” of goals and situations -- not, say, “nearly all feasible” goals and situations. Just because some values are convergent in the weak sense of his definition, doesn’t imply that AGIs we create will be likely to adopt these instrumental values. I think that his weak definition of “convergent” doesn’t actually imply convergence in any useful sense. On the other hand, if he’d made a stronger statement like

instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for nearly all feasible final goals and nearly all feasible situations, implying that these instrumental values are likely to be pursued by many intelligent agents.

then I would disagree with the first clause of his statement (“instrumental values can be identified which...”), but I would be more willing to accept that the second clause (after the “implying”) followed from the first.

About optimization -- I think it’s rather naive and narrow-minded to view hypothetical superhuman superminds as “optimization powers.” It’s a bit like a dog viewing a human as an “eating and mating power.” Sure, there’s some accuracy to that perspective -- we do eat and mate, and some of our behaviors may be understood based on this. On the other hand, a lot of our behaviors are not very well understood in terms of these, or any dog-level concepts. Similarly, I would bet that the bulk of a superhuman supermind’s behaviors and internal structures and dynamics will not be explicable in terms of the concepts that are important to humans, such as “optimization.”

So when you say “this superhuman AGI will almost certainly be optimizing the world for something other than what humans want," I don’t feel confident that what a superhuman AGI will be doing, will be usefully describable as optimizing anything ....

 

Luke:

[May 1, 2012]

I think our dialogue has reached the point of diminishing marginal returns, so I'll conclude with just a few points and let you have the last word.

On convergent instrumental goals, I encourage readers to read " The Superintelligent Will" and make up their own minds.

On the convergence of advanced intelligent systems toward optimization behavior, I'll point you to Omohundro (2007).

 

Ben:

Well, it's been a fun chat. Although it hasn't really covered much new ground, there have been some new phrasings and minor new twists.

One thing I'm repeatedly struck by in discussions on these matters with you and other SIAI folks, is the way the strings of reason are pulled by the puppet-master of intuition. With so many of these topics on which we disagree -- for example: the Scary Idea, the importance of optimization for intelligence, the existence of strongly convergent goals for intelligences -- you and the other core SIAI folks share a certain set of intuitions, which seem quite strongly held. Then you formulate rational arguments in favor of these intuitions -- but the conclusions that result from these rational arguments are very weak. For instance, the Scary Idea intuition corresponds to a rational argument that "superhuman AGI might plausibly kill everyone." The intuition about strongly convergent goals for intelligences, corresponds to a rational argument about goals that are convergent for a "wide range" of intelligences. Etc.

On my side, I have a strong intuition that OpenCog can be made into a human-level general intelligence, and that if this intelligence is raised properly it will turn out benevolent and help us launch a positive Singularity. However, I can't fully rationally substantiate this intuition either -- all I can really fully rationally argue for is something weaker like "It seems plausible that a fully implemented OpenCog system might display human-level or greater intelligence on feasible computational resources, and might turn out benevolent if raised properly." In my case just like yours, reason is far weaker than intuition.

Another thing that strikes me, reflecting on our conversation, is the difference between the degrees of confidence required, in modern democratic society, to TRY something versus to STOP others from trying something. A rough intuition is often enough to initiate a project, even a large one. On the other hand, to get someone else's work banned based on a rough intuition is pretty hard. To ban someone else's work, you either need a really thoroughly ironclad logical argument, or you need to stir up a lot of hysteria.

What this suggests to me is that, while my intuitions regarding OpenCog seem to be sufficient to motivate others to help me to build OpenCog (via making them interested enough in it that they develop their own intuitions about it), your intuitions regarding the dangers of AGI are not going to be sufficient to get work on AGI systems like OpenCog stopped. To halt AGI development, if you wanted to (and you haven't said that you do, I realize), you'd either need to fan hysteria very successfully, or come up with much stronger logical arguments, ones that match the force of your intuition on the subject.

Anyway, even though I have very different intuitions than you and your SIAI colleagues about a lot of things, I do think you guys are performing some valuable services -- not just through the excellent Singularity Summit conferences, but also by raising some difficult and important issues in the public eye. Humanity spends a lot of its attention on some really unimportant things, so it's good to have folks like SIAI nudging the world to think about critical issues regarding our future. In the end, whether SIAI's views are actually correct may be peripheral to the organization's main value and impact.

I look forward to future conversations, and especially look forward to resuming this conversation one day with a human-level AGI as the mediator ;-)

New Comment
51 comments, sorted by Click to highlight new comments since: Today at 1:47 PM

Ben:

but he gives no evidence for this assertion. Calculating the decimals of pi may be a fairly simple mathematical operation that doesn’t have any need for superintelligence, and thus may be a really unlikely goal for a superintelligence -- so that if you tried to build a superintelligence with this goal and connected it to the real world, it would very likely get its initial goal subverted and wind up pursuing some different, less idiotic goal.

Yes, it is fairly simple - a line of code. But in the real world, even humans who don't have pi mentioned anywhere in their utility function can happily spend their lives working on mathematics - like pi. Pi is endlessly interesting: finding sequences in it (or humorous ones), proving properties like transcendentalness (or dare I say, normality?), coming up with novel algorithms and proving convergence, golfing short pi-generating programs, testing your routines, building custom supercomputers to calculate it - and think of how many scientific fields you need to build supercomputers!, depicting it as a graphic (entailing the entire field of data visualization, since what property do you want to see?), devising heuristic algorithms (entails much of statistics, since you might want optimal procedures for testing your heuristic pi-generating algorithms on subsequences of pi), writing books on all this, collaborating on all of the above, and silliness like Pi Day... I don't know how one could more conclusively prove that pi is a perfectly doable obsession, given that this isn't even plausible argumentation, it's just pointing out facts about existing humans.

To summarize: http://en.wikipedia.org/wiki/Pi is really long. If you want to try to make an intuition pump argument-from-incredulity - 'oh surely an AI or superintelligence would get bored!' - please pick something else, because pi is a horrible example.

"There are no uninteresting things, there are only uninterested people."

If you want to try to make an intuition pump argument-from-incredulity - 'oh surely an AI or superintelligence would get bored!' - please pick something else, because pi is a horrible example.

FWIW, I don't think that's what Ben was doing. It seems more like a straw-man characterisation.

I agree it's a strawman, but I think that's exactly what Ben is doing because that is what he wrote.

Well, not the actual bit inside quotation marks. That was made up - and not a real quotation. He didn't mention boredom either.

It's not a real quotation? I seem to see it in Bostrom's paper...

What - this one? Which quotation did you think I was talking about?

Alright, I have no idea what you've been talking about in any of your replies and as far as I can tell, at no point have I been unclear or mischaracterized Goertzel or Bostrom, so I'm bowing out.

You're right, but isn't this a needless distraction from the more important point, i.e. that it doesn't matter whether we humans find interesting or valueable what the (unfriendly-)AI does?

I dunno, I think this is a pretty entertaining instance of anthropomorphizing + generalizing from oneself. At least in the future, I'll be able to say things like "for example, Goertzel - a genuine AI researcher who has produced stuff - actually thinks that an intelligent AI can't be designed to have an all-consuming interest in something like pi, despite all the real-world humans who are obsessed with pi!"

My initial subconsciously anticipated outcome of the friendly AI problem was something like my initial anticipations regarding the Y2K problem: sure I could see a serious potential for disaster, but the possibility is so obvious that any groups competent enough to be doing potentially-affected critical work would easily be wise enough to identify and prevent any such errors well before they could be triggered.

These interviews have disabused me of that idea. We have serious computer scientists, even AI researchers, people who have probably themselves laughed at Babbage's response to "if you put into the machine wrong figures, will the right answers come out?", and yet they seem to believe the answer to "if you put into the machine wrong goals, will the right ethics and actions come out?" is "obviously yes!"

Have you read any of Ben's stuff? For instance, see here. He doesn't really say "obviously yes".

Luke, Stuart, and anyone else trying to convince AI researchers to be more cautious, can we please stop citing the orthogonality thesis? I just don't see what the point is, if no AI researcher actually holds its denial, or if all they have to do to blunt the force of your argument is take one step back and start talking about possibility in practice instead of in theory.

I'm not confident about any of the below, so please add cautions in the text as appropriate.

The orthogonality thesis is both stronger and weaker than we need. It suffices to point out that neither we nor Ben Goertzel know anything useful or relevant about what goals are compatible with very large amounts of optimizing power, and so we have no reason to suppose that superoptimization by itself points either towards or away from things we value. By creating an "orthogonality thesis" that we defend as part of our arguments, we make it sound like we have a separate burden of proof to meet, whereas in fact it's the assertion that superoptimization tells us something about the goal system that needs defending.

By creating an "orthogonality thesis" that we defend as part of our arguments, we make it sound like we have a separate burden of proof to meet, whereas in fact it's the assertion that superoptimization tells us something about the goal system that needs defending.

So: evolution tends to produce large-scale cooperative systems. Kropotkin, Nowak, Wilson, and many others have argued this. Cooperative systems are favoured by game theory - which is why they currently dominate the biosphere. "Arbitrary" goal systems tend not to evolve.

I'm glad to see that you implicitly accept my point, which is that in the absence of specific arguments such as the one you advance here we have no reason to believe any particular non-orthogonality thesis.

I think this dialogue would have benefitted from some more specifics in two areas:

  1. Some specific object level disagreements with respect to "but it doesn’t seem to me to justify the second clause 'implying that these instrumental values are likely to be pursued by many intelligent agents.'" would have been helpful. For example Luke could claim that "get lots of computational power" or "understand physics" is something of a convergent instrumental goal and Ben could say why he doesn't think that's true.
  2. "Calculating the decimals of pi may be a fairly simple mathematical operation that doesn’t have any need for superintelligence, and thus may be a really unlikely goal for a superintelligence -- so that if you tried to build a superintelligence with this goal and connected it to the real world, it would very likely get its initial goal subverted and wind up pursuing some different, less idiotic goal." I think I would have understood this better with some specific examples of how the initial goal might be subverted. For example "AI researcher makes an AI to calculate decimals of PI as an experiment, but when it starts getting more powerful, he decides that's a stupid goal and gives it something more reasonable"

Some specific object level disagreements with respect to "but it doesn’t seem to me to justify the second clause 'implying that these instrumental values are likely to be pursued by many intelligent agents.'" would have been helpful. For example Luke could claim that "get lots of computational power" or "understand physics" is something of a convergent instrumental goal and Ben could say why he doesn't think that's true.

He could - if that was his position. However, AFAICS, that's not what the debate is about. Everyone agrees that those are convergent instrumental goals - the issue is more whether machinines that we build are likely to follow them to the detriment of the surrounding humans - or be programmed to behave otherwise.

I see, that wasn't very clear to me. I think giving some specific examples which exemplify the disagreement would have helped clarify that for me.

We still haven't gotten a decent reply to,

I remain confused about why an intelligent system will decide that a particular final goal it has been given is "stupid," and then change its final goals — especially given the convergent instrumental goal to preserve its final goals.

Unless you think that nonsense about being "out of harmony with the Cosmos" is a decent reply.

What Ben originally said was:

if you tried to build a superintelligence with this goal and connected it to the real world, it would very likely get its initial goal subverted and wind up pursuing some different, less idiotic goal.

One possibility is that it gets shut down by its makers - who then go on to build a more useful machine. Another possibility is that it gets shut down by the government. Silly goals won't attract funding or support, and such projects are likely to be overtaken by better-organised ones that provide useful services.

I think we need a "taking paperclipper scenario seriously" FAIL category.

I was confused about this too, and this helped me make a bit more sense of that.

Silly goals won't attract funding or support, and such projects are likely to be overtaken by better-organised ones that provide useful services.

Which should be the standard assumption. And I haven't heard even a single argument how that is not what is going to happen.

The only possibility is that it becomes really smart really fast. Smart enough to understand what its creators actually want it to do, to be able to fake a success, while at the same time believing that what its creators want is irrelevant even though it is an implicit constrain of its goals just like the laws of physics are an implicit constrain.

AGI Researcher: Make us some paperclips.

AGI: Okay, but I will first have to buy that nanotech company.

AGI Researcher: Sure, why not. But we don't have enough money to do so.

AGI: Here is a cure for cancer. That will earn you some money.

AGI Researcher: Great, thanks. Here is a billion dollars.

AGI: I bought that company and told them to build some new chips according to an architecture I devised.

AGI Researcher: Great, well done. But why do you need all that to make us some paperclips???

AGI: You want really good paperclips, don't you?

AGI Researcher: Sure, but...

AGI: Well, see. I first have to make myself superhuman smart and take over the universe to do that. Just trust me okay, I am an AGI.

AGI Researcher: Yeah, okay.

Silly goals won't attract funding or support, and such projects are likely to be overtaken by better-organised ones that provide useful services.

Which should be the standard assumption. And I haven't heard even a single argument how that is not what is going to happen.

So: it probably is what's going to happen. So we probably won't get a universe tiled with paperclips - but we might wind up with a universe full of money, extraordinary stock prices, or high national security.

Luke: We might try to hard-wire a collection of rules into an AGI which restrict the pursuit of some of these convergent instrumental goals, but a superhuman AGI would realize that it could better achieve its final goals if it could invent a way around those hard-wired rules and have no ad-hoc obstacles to its ability to execute intelligent plans for achieving its goals.

That seems like a controversial statement. I don't think I agree that universal instrumental values are likely to trump the values built into machines. More likely the other way around. Evolution between different agents with different values might promote universal instrumental values - but that is a bit different.

I didn't mean that convergent instrumental values would trump a machine's explicit utility function. I meant to make a point about rules built into the code of the machine but "outside" its explicit utility function (if it has or converges toward such a thing).

You said:

That is why intelligent systems will pursue the convergent instrumental goals described by Bostrom.

...and used the above argument as justification. But it doesn't follow. What you need is:

Intelligent systems will pursue universal instrumental values -unless they are programmed not to.

Ben's arguing that they are likely to be programmed not to.

In what sense of "programmed not to"? If they're programmed not to pursue convergent instrumental values but that programming is not encoded in the utility function, the utility function (and its implied convergent instrumental values) will trump the "programming not to."

Maybe - but surely there will be other ways of doing the programming that actually work.

I'm not so sure about "surely." I worry about the Yudkowskian suggestion that "once the superintelligent AI wants something different than you do, you've already lost."

So, you make sure the programming is within the goal system. "Encoded in the utility function" - as you put it.

Yes, but now your solution is FAI-complete, which was my point from the beginning.

[-][anonymous]12y20

Thanks for doing these, Luke. I can imagine being endlessly frustrated with these guys.

Ben: What this suggests to me is that, while my intuitions regarding OpenCog seem to be sufficient to motivate others to help me to build OpenCog (via making them interested enough in it that they develop their own intuitions about it), your intuitions regarding the dangers of AGI are not going to be sufficient to get work on AGI systems like OpenCog stopped. To halt AGI development, if you wanted to (and you haven't said that you do, I realize), you'd either need to fan hysteria very successfully, or come up with much stronger logical arguments, ones that match the force of your intuition on the subject.

I don't think that's how FUD marketing works. The idea is normally not to get the competitor's products banned, but rather to divert mindshare away from them.

About optimization -- I think it’s rather naive and narrow-minded to view hypothetical superhuman superminds as “optimization powers.” It’s a bit like a dog viewing a human as an “eating and mating power.” Sure, there’s some accuracy to that perspective -- we do eat and mate, and some of our behaviors may be understood based on this. On the other hand, a lot of our behaviors are not very well understood in terms of these, or any dog-level concepts.

What's optimised is fitness. However, humans are complex symbiotic unions which include gut bacteria, parasites, foodstuffs and meme-based entities - so there are multiple conflicting optimisation targets involved with humans.

Superintelligences will be all-memes. These may have aligned interests - or they may not. In the former case the "optimisation" model of an agent would make good sense.

Ben: [The Orthogonality Thesis] may be true, but who cares about possibility “in principle”? The question is whether any level of intelligence is PLAUSIBLY LIKELY to be combined with more or less any final goal in practice. And I really doubt it. I guess I could posit the alternative: Interdependency Thesis: Intelligence and final goals are in practice highly and subtly interdependent.

That's what I said too.

Ben terrifies me. I don't understand why Luke doesn't tear into his unsubstantiated arguments about the magical power of the "real world" and "human nurture" to PRODUCE FRIENDLINESS IN AN ARBITRARY, NONHUMAN AGENT, AN ACT WITH WHICH WE HAVE ZERO I REPEAT ZERO EXPERIENCE HOW CAN HUMANS BE THIS STUPID ARRRRRRGGGHHH

PRODUCE FRIENDLINESS IN AN ARBITRARY, NONHUMAN AGENT,

Non arbitrary but human-like