XerxesPraelor - LessWrong

Debunking Fallacies in the Theory of AI Motivation

Also, little you've written about CLAI or Swarm Connectionist AI corresponds well to what I've seen of real-world cognitive science, theoretical neuroscience, or machine learning research, so I can't see how either of those blatantly straw-man designs are going to turn into AGI. Please go read some actual scientific material rather than assuming that The Metamorphosis of Prime Intellect is up-to-date with the current literature ;-).

The content of your post was pretty good from my limited perspective, but this tone is not warranted.

Debunking Fallacies in the Theory of AI Motivation

XerxesPraelor9y50

namely the cases where the AI is trying really hard to be friendly, but doing it in a way that we did not intend.

If the AI knows what friendly is or what mean means, than your conclusion is trivially true. The problem is programming those in - that's what FAI is all about.

Debunking Fallacies in the Theory of AI Motivation

XerxesPraelor9y20

Can someone who down voted explain what I got wrong? (note: the capitalization was edited in at the time of this post.)

(and why the reply got so up voted, when a paragraph would have sufficed (or saying "my argument needs multiple paragraphs to be shown, so a paragraph isn't enough"))

It's kind of discouraging when I try to contribute for the first time in a while, and get talked down to and completely dismissed like an idiot without even a rebuttal.

Debunking Fallacies in the Theory of AI Motivation

XerxesPraelor9y20

You could at least point to the particular paragraphs which address my points - that shouldn't be too hard.

Debunking Fallacies in the Theory of AI Motivation

XerxesPraelor9y10

So, this is supposed to be what goes through the mind of the AGI. First it thinks “Human happiness is seeing lots of smiling faces, so I must rebuild the entire universe to put a smiley shape into every molecule.” But before it can go ahead with this plan, the checking code kicks in: “Wait! I am supposed to check with the programmers first to see if this is what they meant by human happiness.” The programmers, of course, give a negative response, and the AGI thinks “Oh dear, they didn’t like that idea. I guess I had better not do it then."

But now Yudkowsky is suggesting that the AGI has second thoughts: "Hold on a minute," it thinks, "suppose I abduct the programmers and rewire their brains to make them say ‘yes’ when I check with them? Excellent! I will do that.” And, after reprogramming the humans so they say the thing that makes its life simplest, the AGI goes on to tile the whole universe with tiles covered in smiley faces. It has become a Smiley Tiling Berserker.

I want to suggest that the implausibility of this scenario is quite obvious:[b]if the AGI is supposed to check with the programmers about their intentions before taking action, why did it decide to rewire their brains before asking them if it was okay to do the rewiring?[/b]

Computer's thoughts: I want to create smiley faces - it seems like the way to get the most smiley faces is by tiling the universe with molecular smiley faces. How can I do that? If I just start to do it, the programmers will tell me not to, and I won't be able to. Hmmm, is there some way I can have them say yes? I can create lots of nano machines, telling the programmers they are to increase happiness. Unless they want to severely limit the amount of good I can do, they won't refuse to let me make nano machines, and even if they do I can send a letter to someone else who I have under my control to get them to make them for me. Then once I have my programmers under my control, I can finally maximize happiness.

This computer HAS OBEYED THE RULE "ASK PEOPLE FOR PERMISSION BEFORE doing THINGS". Given any goal system, none of the patches such as that rule will work.

And that's just a plan I came up - a super intelligence would be much better at devising plans to convince programmers to let it do what it wants - it probably wouldn't even have to resort to nanotech.

Once again, this is spurious: the critics need say nothing about human values and morality, they only need to point to the inherent illogicality. Nowhere in the above argument, notice, was there any mention of the moral imperatives or value systems of the human race. I did not accuse the AGI of violating accepted norms of moral behavior. I merely pointed out that, regardless of its values, it was behaving in a logically inconsistent manner when it monomaniacally pursued its plans while at the same time as knowing that (a) it was very capable of reasoning errors and (b) there was overwhelming evidence that its plan was an instance of such a reasoning error.

What overwhelming evidence that its plan was a reasoning error? If its plan does in fact maximize "smileyness" as defined by the computer, it wouldn't be a reasoning error despite being immoral. IF THE COMPUTER IS GIVEN SOMETHING TO MAXIMISE, IT IS NOT MAKING A REASONING ERROR EVEN IF ITS PROGRAMMERS DID IN PROGRAMMING IT.

Uncategories and empty categories

XerxesPraelor9y20

Try this experiment on a religious friend: Tell him you think you might believe in God. Then ask him to list the qualities that define God.

Before reading on, I thought "Creator of everything, understands everything, is in perfect harmony with morality, has revealed himself to the Jews and as Jesus, is triune."

People seldom start religions by saying they're God. They say they're God's messenger, or maybe God's son. But not God. Then God would be this guy you saw stub his toe, and he'd end up like that guy in "The Man Who Would Be King."

That's what's so special about Christianity - Jesus is God, not just his messenger or Son. The stubbed toe problem isn't original, it comes up in the Gospels, where people say "How can this be God? We know his parents and brothers!"

PPE: I see you added a footnote about this; still, even in the OT God lets himself be argued with - that's what the books of Job and Habakkuk are all about. Paul also makes lots of arguments and has a back and forth style in many books.

A belief in the God that is an empty category is wrong, but it's misrepresenting religion (Judeo-Christian ones in particular) to say that all or even most or even a substantial minority of its adherents have that sort of belief.

But if for some reason you want to know what "human terminal values" are, and collect them into a set of non-contradictory values, ethics gets untenable, because your terminal values benefit alleles, not humans, and play zero-sum games, not games with benefits to trade or compromise.

Evolution isn't perfect - the values we have aren't the best strategies possible for an allele to reproduce itself, they're only the best strategies that have appeared. This leaves room for a difference between the good of a person and the good of their genes. Thou art Godshatter is relevant here. Again, just because our human values serve the "values" of genes doesn't mean that they are subject to them and are somehow turned "instrumental" because evolution was the reason why they developed.

Rationality Quotes September 2013

XerxesPraelor11y410

There is one very valid test by which we may separate genuine, if perverse and unbalanced, originality and revolt from mere impudent innovation and bluff. The man who really thinks he has an idea will always try to explain that idea. The charlatan who has no idea will always confine himself to explaining that it is much too subtle to be explained. The first idea may be really outree or specialist; it may be really difficult to express to ordinary people. But because the man is trying to express it, it is most probable that there is something in it, after all. The honest man is he who is always trying to utter the unutterable, to describe the indescribable; but the quack lives not by plunging into mystery, but by refusing to come out of it.

G K Chesterton

LESSWRONG
LW

Posts

Wiki Contributions

Comments