My previous article on this article went down like a server running on PHP (quite deservedly I might add). You can all rest assured that I won't be attempting any clickbait titles again for the foreseeable future. I also believe that the whole H+ article is written in a very poor and aggressive manner, but that some of the arguments raised cannot be ignored.

 

On my original article, many people raised this post by Eliezer Yudkowsky as a counterargument to the idea that an FAI could have goals contrary to what we programmed. In summary, he argues that a program doesn't necessarily do as the programmer wishes, but rather as they have programmed. In this sense, there is no ghost in the machine that interprets your commands and acts accordingly, it can act only as you have designed. Therefore from this, he argues, an FAI can only act as we had programmed.

 

I personally think this argument completely ignores what has made AI research so successful in recent years: machine learning. We are no longer designing an AI from scratch and then implementing it; we are creating a seed program which learns from the situation and alters its own code with no human intervention, i.e. the machines are starting to write themselves, e.g. with google's deepmind. They are effectively evolving, and we are starting to find ourselves in the rather concerning position where we do not fully understand our own creations.

 

You could simply say, as someone said in the comments of my previous post, that if X represents the goal of having a positive effect on humanity, then the FAI should be programmed directly to have X as its primary directive. My answer to that is the most promising developments have been through imitating the human brain, and we have no reason to believe that the human brain (or any other brain for that matter) can be guaranteed to have a primary directive. One could argue that evolution has given us our prime directives: to ensure our own continued existence, to reproduce and to cooperate with each other; but there are many people who are suicidal, who have no interest in reproducing and who violently rebel against society (for example psychopaths). We are instructed by society and our programming to desire X, but far too many of us desire, say, Y for this to be considered a reliable way of achieving X.


Evolution’s direction has not ensured that we do “what we are supposed to do”, we could well face similar disobedience from our own creation. Seeing as the most effective way we have seen of developing AI is creating them in our image; as there are ghosts in us, there could well be ghosts in the machine.

New to LessWrong?

New Comment
20 comments, sorted by Click to highlight new comments since: Today at 8:05 PM

I think this is at bottom a restatement of "determining the right goals with sufficient rigor to program it into an AI is hard; ensuring that these goals are stable under recursive self-modification is also hard." If I'm right, then don't worry; we already know it's hard. Worry, if you like, about how to do it anyway.

In a bit more detail:

the most promising developments have been through imitating the human brain, and we have no reason to believe that the human brain (or any other brain for that matter) can be guaranteed to have a primary directive. One could argue that evolution has given us our prime directives: to ensure our own continued existence, to reproduce and to cooperate with each other; but there are many people who are suicidal, who have no interest in reproducing and who violently rebel against society (for example psychopaths).

Evolution did a bad job. Humans were never given a single primary drive; we have many. If our desires were simple, AI would be easier, but they are not. So evolution isn't a good example here. Also, I'm not sure of your assertion that the best advances in AI so far came from mimicking the brain. The brain can tell us useful stuff as an example of various kinds of program (belief-former, decision-maker, etc.) but I don't think we've been mimicking it directly. As for machine learning, yes there are pitfalls in using that to come up with the goal function, at least if you can't look over the resulting goal function before you make it the goal of an optimizer. And making a potential superintelligence with a goal of finding [the thing you want to use as a goal function] might not be a good idea either.

I never claimed that evolution did a good job, but I would argue that it gave us a primary directive; to further the human species. All of our desires are part of our programming; they should perfectly align with desires which would optimize the primary goal, but they don't. Simply put, mistakes were made. As the most effective way of developing optimizing programs we have seen is through machine learning, which is very similar to evolution; we should be very careful of the desires of any singleton created by this method.

I'm not sure of your assertion that the best advances in AI so far came from mimicking the brain.

Mimicking the human brain is fundamental to most AI research; on DeepMind's website, they say that they employ computational neuroscientists and companies such as IBM are very interested in whole brain emulation.

I never claimed that evolution did a good job, but I would argue that it gave us a primary directive; to further the human species.

No, it didn't. That's why I linked "Adaptation Executers, not Fitness Maximizers". Evolution didn't even "try to" give us a primary directive; it just increased the frequency of anything that worked on the margin. But I agree that we shouldn't rely on machine learning to find the right utility function.

Only a pantheist would claim that evolution is a personal being, and so it can't "try to" do anything. It is, however, a directed process, serving to favor individuals that can better further the species.

But I agree that we shouldn't rely on machine learning to find the right utility function.

How would you suggest we find the right utility function without using machine learning?

[-][anonymous]9y10

How would you suggest we find the right utility function without using machine learning?

How would you find the right utility function using machine learning? With machine learning you have to have some way of classifying examples as good vs bad. That classifier itself is equivalent to the FAI problem.

How would you suggest we find the right utility function without using machine learning?

If I find out, you'll be one of the first to know.

The point I am making is that machine learning, though not provably safe, is the most effective way we can imagine of making the utility function. It's very likely that many AI's are going to be created by this method, and if the failure rate is anywhere near as high as that for humans, this could be very serious indeed. Some misguided person may attempt to create an FAI using machine learning and then we may have the situation in the H+ article

[-][anonymous]9y90

Congratulations! You've figured out that UFAI is a threat!

That wasn't what I claimed, I proposed that the current, most promising methods of producing an FAI are far too likely to produce a UFAI to be considered safe

[-][anonymous]9y30

Why do you think the whole website is obsessed with provably-friendly AI? The whole point of MIRI is that pretty much every superintelligence that is anything other than provably safe is going to be unfriendly! This site is littered with examples of how terribly almost-friendly AI would go wrong! We don't consider current methods "too likely" to produce a UFAI, we think they're almost certainly going to produce UFAI! (Conditional on creating a superintelligence at all, of course).

So as much as I hate asking this question because it's alienating, have you read the sequences?

[-][anonymous]9y00

Mimicking the human brain is fundamental to most AI research; on DeepMind's website, they say that they employ computational neuroscientists and companies such as IBM are very interested in whole brain emulation.

Mimicking the human brain is an obscure branch of AI. Most AI projects, and certainly the successful ones you've heard about, are at best inspired by stripped down models of specific isolated aspects of human thought, if they take any inspiration from the human brain at all.

DeepMind for example is reinforcement learning on top of modern machine learning. Machine learning may make use of neural networks, but beware of the name: neural networks only casually resemble the biological structure from which they take their name. DeepMind doesn't work anything like the human brain, nor does Watson, Deep Blue, or self driving cars.

Learn a bit about practical AI and neuroscience and you'd be surprise how little they have in common.

Evolution did a bad job. Humans were never given a single primary drive; we have many.

Plus a sense of boredom. Those may add up to a good thing, since humans are unlikely to paperclip, ie focus on one thing obsessively.

It can easily be argued that evolution did a good job, not a bad job, by not giving us a "primary directive." The reason AI is dangerous is precisely because it might have such a directive; being an "optimizer" is precisely the reason that one fears that AI might destroy the world. So if anything, kingmaker is correct to think that since human beings are like this, it is at least theoretically possible that AI's will be like this, and that they will not destroy the world for similar reasons.

If we had a simple primary directive, we would be fully satisfied by having a machine accomplish it for us, and it would be much easier to get a machine that would do it.

[-][anonymous]9y70

I wish that I knew the implications of all that I do. I do not. My abilities are limited, my knowledge imperfect, and my control of external forces weak. Therefore the results of my best efforts will include unintended consequences. I do not consider myself significantly special in these regards. No matter the intentions of a musician, a chef or a computer programer, how their creation will play out in the world cannot be known - only hoped-for. Thus a means for objective evaluation and damage control are helpful to build in to any system. The greater the potential for harm, the more attentiveness to objective evaluation and damage control may be. This mode of thought has been called 'conservative' in the past, but the word has been spread thin.

No matter the programing of an AI, what an AI does with itself and how third parties influence it may cause unintended consequences. This is a refutation of EY's claim.

As a local atheist once said to me, 'a mystery is not a miracle.' Not having perfect knowledge of myself (mystery) does not mean that there is a ghost in me (miracle).

You probably already agreed with "Ghosts in the Machine" before reading it since obviously, a program executes exactly its code even in the context of AI. Also obviously, the program can still appear to not do what it's supposed to if "supposed" is taken to mean to programmer's intent.

These statements don't ignore machine learning; they imply that we should not try to build an FAI using current machine learning techniques. You're right, we understand (program + parameters learned from dataset) even less than (program). So while the outside view might say: "current machine learning techniques are very powerful, so they are likely to be used for FAI," that piece of inside view says: "actually, they aren't. Or at least they shouldn't." ("learn" has a precise operational meaning here, so this is unrelated to whether an FAI should "learn" in some other sense of the word).

Again, whether a development has been successful or promising in some field doesn't mean it will be as successful in FAI, so imitation of the human brain isn't necessarily good here. Reasoning by analogy and thinking about evolution is also unlikely to help; nature may have given us "goals", but they are not goals in the same sense as : "The goal of this function is to add 2 to its input," or "The goal of this program is to play chess well," or "The goal of this FAI is to maximize human utility."

hey imply that we should not try to build an FAI using current machine learning techniques

Buit people are using ML techniques. Should MIRI be campaigning to get this research stopped?

I think practitioners of ML should be more wary of their tools. I'm not saying ML is a fast track to strong AI, just that we don't know if it is. Several ML people voiced reassurances recently, but I would have expected them to do that even if it was possible to detect danger at this point. So I think someone should find a way to make the field more careful.

I don't think that someone should be MIRI though; status differences are too high, they are not insiders, etc. My best bet would be a prominent ML researcher starting to speak up and giving detailed, plausible hypotheticals in public (I mean near-future hypotheticals where some error creates a lot of trouble for everyone).

We are no longer designing an AI from scratch and then implementing it; we are creating a seed program which learns from the situation and alters its own code with no human intervention, i.e. the machines are starting to write themselves, e.g. with google's deepmind.

Arguably, not knowing in detail how your creation works is a detriment, not a boon. This point has been raised multiple times, most recently by Bostrom in Superintelligence, I believe. Consider reading it.

I never said not understanding our creations is good; I only said AI research was successful. I have not read Superintelligence, but I appreciate just how dangerous AI could be.