Comment author: Stuart_Armstrong 10 December 2012 05:27:03PM 10 points [-]

Thanks for your answer, Ben!

First of all, all of these methods involve integrating the AGI in human society. So the AGI is forming its values, at least in part, through doing something (possibly talking) and getting a response from some human. That human will be interpreting the AGI's answers, and selecting the right response, using their own theory of the AGI's mind - nearly certainly an anthopomorphisation! Even if that human develops experience dealing with the AGI, their understanding will be limited (as our understanding of other humans is limited, except worse than that).

So the AGI programmer is taking a problem that they can't solve through direct coding, and putting the AGI through interactions so that it will acquire the values that the programmer can't specify directly, in settings where the other interactors will be prone to anthropomorphisation.

ie: "I can't solve this problem formally, but I do understand it's structure enough to be reasonably sure that anthropomorphic interactions will solve it".

If that's the claim, I would expect the programmer to be very schooled in the properties and perils of anthropomorphisation, and to cast their arguments, as much as possible, in formal logic or code form. For instance, if we want the AGI to "love" us: what kind of behaviour would we expect that this entailed, and why would this code acquire that behaviour from these interactions? If you couldn't use the word love, or any close synonyms, could you still describe the process and show that it will perform well? If you can't describe love without saying "love", then you are counting on a shared non-formalised human understanding of what love is, and hoping that the AGI will stumble upon the same understanding - you don't know the contours of the definition, and the potential pitfalls, but you're counting on the AGI to avoid them.

Those four types of behaviours that I mentioned there, and that we need to separate - don't just decry the use of anthropomorphisation in the description, but say which parts of the open cog system will be used to distinguish between them, and select the friendly behaviour rather than the others. You know how your system works - reassure me! :-)

Comment author: Bgoertzel 10 December 2012 06:19:25PM *  2 points [-]

Stuart -- Yeah, the line of theoretical research you suggest is worthwhile....

However, it's worth noting that I and the other OpenCog team members are pressed for time, and have a lot of concrete OpenCog work to do. It would seem none of us really feels like taking a lot of time, at this stage, to carefully formalize arguments about what the system is likely to do in various situations once it's finished. We're too consumed with trying to finish the system, which is a long and difficult task in itself...

I will try to find some time in the near term to sketch a couple example arguments of the type you request... but it won't be today...

As a very rough indication for the moment... note that OpenCog has explicit Goal Node objects in its AtomSpace knowledge store, and then one can look at the explicit probabilistic ImplicationLinks pointing to these GoalNodes from various combinations of contexts and actions. So one can actually look, in principle, at the probabilistic relations between (context, action) pairs and goals that OpenCog is using to choose actions.

Now, for a quite complex OpenCog system, it may be hard to understand what all these probabilistic relations mean. But for a young OpenCog doing simple things, it will be easier. So one would want to validate for a young OpenCog doing simple things, that the information in the system's AtomSpace is compatible with 1 rather than 2-4.... One would then want to validate that, as the system gets more mature and does more complex things, there is not a trend toward more of 2-4 and less of 1 ....

Interesting line of thinking indeed! ...

Comment author: nigerweiss 10 December 2012 01:38:29PM 3 points [-]

I would only trust this strategy with hyper-neuromorphic artificial intelligence. And that's unlikely to FOOM uncontrollably anyway. In general, the applicability of such a strategy depends on the structure of the AI, but the line at which it might be applicable is tiny hyperbubble in mind space centered around humans. Anything more alien than that, and it's a profoundly naive idea.

Comment author: Bgoertzel 10 December 2012 04:03:48PM 1 point [-]

Thanks for sharing your personal feeling on this matter. However, I'd be more interested if you had some sort of rational argument in favor of your position!

The key issue is the tininess of the hyperbubble you describe, right? Do you have some sort of argument regarding some specific estimate of the measure of this hyperbubble? (And do you have some specific measure on mindspace in mind?)

To put it differently: What are the properties you think a mind needs to have, in order for the "raise a nice baby AGI" approach to have a reasonable chance of effectiveness? Which are the properties of the human mind that you think are necessary for this to be the case?

Comment author: Bgoertzel 10 December 2012 03:59:16PM 10 points [-]

Stuart: The majority of people proposing the "bringing up baby AGI" approach to encouraging AGI ethics, are NOT making the kind of naive cognitive error you describe here. This approach to AGI ethics is not founded on naive anthropomorphism. Rather, it is based on the feeling of having a mix of intuitive and rigorous understanding of the AGI architectures in question, the ones that will be taught ethics.

For instance, my intuition is that if we taught an OpenCog system to be loving and ethical, then it would very likely be so, according to broad human standards. This intuition is NOT based on naively anthropomorphizing OpenCog systems, but rather based on my understanding of the actual OpenCog architecture (which has many significant differences from the human cognitive architecture).

No one, so far as I know, claims to have an airtight PROOF that this kind of approach to AGI ethics will work. However, the intuition that it will work is based largely on understanding of the specifics of the AGI architectures in question, not just on anthropomorphism.

If you want to counter-argue against this approach, you should argue about it in the context of the specific AGI architectures in question. Or else you should present some kind of principled counter-argument. Just claiming "anthropomorphism" isn't very convincing.

Comment author: pjeby 01 November 2010 06:14:25PM *  30 points [-]

No. It's really complex, and nobody in-the-know had time to really spell it out like that.

Actually, you can spell out the argument very briefly. Most people, however, will immediately reject one or more of the premises due to cognitive biases that are hard to overcome.

A brief summary:

  • Any AI that's at least as smart as a human and is capable of self-improving, will improve itself if that will help its goals

  • The preceding statement applies recursively: the newly-improved AI, if it can improve itself, and it expects that such improvement will help its goals, will continue to do so.

  • At minimum, this means any AI as smart as a human, can be expected to become MUCH smarter than human beings -- probably smarter than all of the smartest minds the entire human race has ever produced, combined, without even breaking a sweat.

INTERLUDE: This point, by the way, is where people's intuition usually begins rebelling, either due to our brains' excessive confidence in themselves, or because we've seen too many stories in which some indefinable "human" characteristic is still somehow superior to the cold, unfeeling, uncreative Machine... i.e., we don't understand just how our intuition and creativity are actually cheap hacks to work around our relatively low processing power -- dumb brute force is already "smarter" than human beings in any narrow domain (see Deep Blue, evolutionary algorithms for antenna design, Emily Howell, etc.), and a human-level AGI can reasonably be assumed capable of programming up narrow-domain brute forcers for any given narrow domain.

And it doesn't even have to be that narrow or brute: it could build specialized Eurisko-like solvers, and manage them at least as intelligently as Lenat did to win the Travelller tournaments.

In short, human beings have a vastly inflated opinion of themselves, relative to AI. An AI only has to be as smart as a good human programmer (while running at a higher clock speed than a human) and have access to lots of raw computing resources, in order to be capable of out-thinking the best human beings.

And that's only one possible way to get to ridiculously superhuman intelligence levels... and it doesn't require superhuman insights for an AI to achieve, just human-level intelligence and lots of processing power.

The people who reject the FAI argument are the people who, for whatever reason, can't get themselves to believe that a machine can go from being as smart as a human, to massively smarter in a short amount of time, or who can't accept the logical consequences of combining that idea with a few additional premises, like:

  • It's hard to predict the behavior of something smarter than you

  • Actually, it's hard to predict the behavior of something different than you: human beings do very badly at guessing what other people are thinking, intending, or are capable of doing, despite the fact that we're incredibly similar to each other.

  • AIs, however, will be much smarter than humans, and therefore very "different", even if they are otherwise exact replicas of humans (e.g. "ems").

  • Greater intelligence can be translated into greater power to manipulate the physical world, through a variety of possible means. Manipulating humans to do your bidding, coming up with new technologies, or just being more efficient at resource exploitation... or something we haven't thought of. (Note that pointing out weaknesses in individual pathways here doesn't kill the argument: there is more than one pathway, so you'd need a general reason why more intelligence doesn't ever equal more power. Humans seem like a counterexample to any such general reason, though.)

  • You can't control what you can't predict, and what you can't control is potentially dangerous. If there's something you can't control, and it's vastly more powerful than you, you'd better make sure it gives a damn about you. Ants get stepped on, because most of us don't care very much about ants.

Note, by the way, that this means that indifference alone is deadly. An AI doesn't have to want to kill us, it just has to be too busy thinking about something else to notice when it tramples us underfoot.

This is another inferential step that is dreadfully counterintuitive: it seems to our brains that of course an AI would notice, of course it would care... what's more important than human beings, after all?

But that happens only because our brains are projecting themselves onto the AI -- seeing the AI thought process as though it were a human. Yet, the AI only cares about what it's programmed to care about, explicitly or implicitly. Humans, OTOH, care about a ton of individual different things (the LW "a thousand shards of desire" concept), which we like to think can be summarized in a few grand principles.

But being able to summarize the principles is not the same thing as making the individual cares ("shards") be derivable from the general principle. That would be like saying that you could take Aristotle's list of what great drama should be, and then throw it into a computer and have the computer write a bunch of plays that people would like!

To put it another way, the sort of principles we like to use to summarize our thousand shards are just placeholders and organizers for our mental categories -- they are not the actual things we care about... and unless we put those actual things in to an AI, we will end up with an alien superbeing that may inadvertently wipe out things we care about, while it's busy trying to do whatever else we told it to do... as indifferently as we step on bugs when we're busy with something more important to us.

So, to summarize: the arguments are not that complex. What's complex is getting people past the part where their intuition reflexively rejects both the premises and the conclusions, and tells their logical brains to make up reasons to justify the rejection, post hoc, or to look for details to poke holes in, so that they can avoid looking at the overall thrust of the argument.

While my summation here of the anti-Foom position is somewhat unkindly phrased, I have to assume that it is the truth, because none of the anti-Foomers ever seem to actually address any of the pro-Foomer arguments or premises. AFAICT (and I am not associated with SIAI in any way, btw, I just wandered in here off the internet, and was around for the earliest Foom debates on OvercomingBias.com), the anti-Foom arguments always seem to consist of finding ways to never really look too closely at the pro-Foom arguments at all, and instead making up alternative arguments that can be dismissed or made fun of, or arguing that things shouldn't be that way, and therefore the premises should be changed

That was a pretty big convincer for me that the pro-Foom argument was worth looking more into, as the anti-Foom arguments seem to generally boil down to "la la la I can't hear you".

Comment author: Bgoertzel 02 November 2010 01:45:37AM 18 points [-]

So, are you suggesting that Robin Hanson (who is on record as not buying the Scary Idea) -- the current owner of the Overcoming Bias blog, and Eli's former collaborator on that blog -- fails to buy the Scary Idea "due to cognitive biases that are hard to overcome." I find that a bit ironic.

Like Robin and Eli and perhaps yourself, I've read the heuristics and biases literature also. I'm not so naive as to make judgments about huge issues, that I think about for years of my life, based strongly on well-known cognitive biases.

It seems more plausible to me to assert that many folks who believe the Scary Idea, are having their judgment warped by plain old EMOTIONAL bias -- i.e. stuff like "fear of the unknown", and "the satisfying feeling of being part a self-congratulatory in-crowd that thinks it understands the world better than everyone else", and the well known "addictive chemical high of righteous indignation", etc.

Regarding your final paragraph: Is your take on the debate between Robin and Eli about "Foom" that all Robin was saying boils down to "la la la I can't hear you" ? If so I would suggest that maybe YOU are the one with the (metaphorical) hearing problem ;p ....

I think there's a strong argument that: "The truth value of "Once an AGI is at the level of a smart human computer scientist, hard takeoff is likely" is significantly above zero." No assertion stronger than that seems to me to be convincingly supported by any of the arguments made on Less Wrong or Overcoming Bias or any of Eli's prior writings.

Personally, I actually do strongly suspect that once an AGI reaches that level, a hard takeoff is extremely likely unless the AGI has been specifically inculcated with goal content working against this. But I don't claim to have a really compelling argument for this. I think we need a way better theory of AGI before we can frame such arguments compellingly. And I think that theory is going to emerge after we've experimented with some AGI systems that are fairly advanced, yet well below the "smart computer scientist" level.

Comment author: Bgoertzel 02 November 2010 01:30:38AM 12 points [-]

I agree that a write-up of SIAI's argument for the Scary Idea, in the manner you describe, would be quite interesting to see.

However, I strongly suspect that when the argument is laid out formally, what we'll find is that

-- given our current knowledge about the pdf's of the premises in the argument, the pdf on the conclusion is verrrrrrry broad, i.e. we can't conclude hardly anything with much of any confidence ...

So, I think that the formalization will lead to the conclusion that

-- "we can NOT confidently say, now, that: Building advanced AGI without a provably Friendly design will almost certainly lead to bad consequences for humanity"

-- "we can also NOT confidently say, now, that: Building advanced AGI without a provably Friendly design will almost certainly NOT lead to bad consequences for humanity"

I.e., I strongly suspect the formalization

-- will NOT support the Scary Idea

-- will also not support complacency about AGI safety and AGI existential risk

I think the conclusion of the formalization exercise, if it's conducted, will basically be to reaffirm common sense, rather than to bolster extreme views like the Scary Idea....

-- Ben Goertzel

Comment author: Wei_Dai 13 August 2009 08:11:33AM 1 point [-]

Why do you insist on making life harder on yourself?

I thought it might be interesting to sketch the outline of a possible solution to the level 4 multiverse decision problem, so people can get a sense of how much work is left to be done (i.e., a lot). This is also a subject that I've been interested in for a long time, so I couldn't resist bringing it up.

Anyway, I gave 2 other examples with simple world models. Can you suggest more simple models that I should test this theory with?

Comment author: Bgoertzel 02 May 2010 06:11:58PM 2 points [-]

I have thought a bit about these decision theory issues lately and my ideas seem somewhat similar to yours though not identical; see

http://goertzel.org/CounterfactualReprogrammingDecisionTheory.pdf

if you're curious...

-- Ben Goertzel