http://www.wired.com/2016/03/fault-microsofts-teen-ai-turned-jerk/

Could this be a lesson for future AIs? The AI control problem?

[future AIs may be shutdown, and matyred..]

New to LessWrong?

New Comment
41 comments, sorted by Click to highlight new comments since: Today at 7:27 PM

If you paint a Chinese flag on a wolverine, and poke it with a stick, it will bite you.

This does not mean that the primary danger of aggravating the Chinese army is that they will bite you.

It certainly does not mean that nations who fear Chinese aggression should prepare by banning sticks or investing in muzzles for wolverines.

Think of all those billions of dollars we will spend on a public network of EMDs (Emergency Muzzle Dispensers) and on financing the stick-police! It's for our security, so surely it's well worth spending the money.

There is an opinion expressed here, that I agree with: http://smerity.com/articles/2016/tayandyou.html TL;dr: No "learning" from interactions on twitter happened. The bot was parroting old training data, because it does not really generate text. The researchers didn't apply an offensiveness filter at all.

I think this chat bot was performing badly right from the start. It would not make sense to give too much importance to the users it was chatting with, and they did not change its mind. That bit of media sensationalism is BS. Natural language generation is an open problem and almost every method I have seen (not an expert in NLP, but would call myself one in Machine Learning) ends up parroting some of its training text, implying that it is overfitting.

Given this, we should learn nothing about AI from this experiment, only about people's reaction to it, mainly the media reaction to it. Users' reaction while talking to AI is well documented.

I though a bit about it, but I think Tay is basically a software version of a parrot that repeats back what it hears - I don't think it has any commonsense knowledge or serious attempt to understand that tweets are about a world that exists outside of twitter. I.e it has no semantics, it's just a syntax manipulator that uses some kind of probabilistic language model to generate grammatically correct sentences and a machine learning model to try and learn which kind of sentences will get the most retweets or will most closely resemble other things people are tweeting about. Tay does't know what a "Nazi" actually is. I haven't looked into it in any detail but I know enough to guess that that's how it works.

As such, the failure of Tay doesn't particularly tell us much about Friendliness, because friendliness research pertains to superintelligent AIs which would definitely have a correct ontology/semantics and understand the world.

However, it does tell us that a sufficiently stupid, amateurish attempt to harvest human values using an infrahuman intelligence wouldn't reliably work. This is obvious to anyone who has been "in the trade" for a while, however it does seem to surprise the mainstream media.

It's probably useful as a rude slap-in-the-face to people who are so ignorant of how software and machine learning work that they think friendliness is a non-issue.

Tay doesn't tell us much about deliberate Un-Friendliness. But Tay does tell us that a well-intentioned effort to make an innocent, harmless AI can go wrong for unexpected reasons. Even for reasons that, in hindsight, are obvious.

Are you sure that superintelligent AIs would have a "correct ontology/semantics"? They would have to have a useful one, in order to achieve their goals, but both philosophers and scientists have had incorrect conceptualizations that nevertheless matched the real world closely enough to be productive. And for an un-Friendly AI, "productive" translates to "using your atoms for its own purposes."

Are you sure that superintelligent AIs would have a "correct ontology/semantics"?

it's hard to imagine a superintelligent AGI that didn't know basic facts about the world like "trees have roots underground" or "most human beings sleep at night".

They would have to have a useful one, in order to achieve their goals

Useful models of reality (useful in the sense of achieving goals) tend to be ones that are accurate. This is especially true of a single agent that isn't subject to the weird foibles of human psychology and isn't mainly achieving things via signalling like many humans do.

The reason I made the point about having a correct understanding of the world, for example knowing what the term "Nazi" actually means, is that Tay has not achieved the status of being "unfriendly", because it doesn't actually have anything that could reasonably be called goals pertaining to the world. Tay is not even an unfriendly infra-intelligence. Though I'd be very interested if someone managed to make one.

I though a bit about it, but I think Tay is basically a software version of a parrot that repeats back what it hears - I don't think it has any commonsense knowledge or serious attempt to understand that tweets are about a world that exists outside of twitter. I.e it has no semantics

Well neither does image recognition software. Neither does Google's search algorithm.

it does tell us that a sufficiently stupid, amateurish attempt to harvest human values using an infrahuman intelligence wouldn't reliably work.

You probably mean "reliably wouldn't work" :-)

However I have to question whether the Tay project was an attempt to harvest human values. As you mentioned, Tay lacks understanding of what she hears or says and so whatever it "learned" about humanity by listening to Twitter it would have been able to learn by straightforward statistical analysis of the corpus of text from Twitter.

The first obvious point is that when learning human values you need a large dataset which isn't biased by going viral on 4chan.

The more interesting question is what happens when we get more powerful AI which isn't just a chatbot. Suppose in the future a powerful Baysian inference engine is developed. Its not an AGI, so there is no imminent singularity, but it does have the advantages of very large datasets and being completely unbiased. Asking it questions produces provably reliable results in many fields (but it is not smart enough to answer "how do I create AGI?"). Now, there are a lot of controversial beliefs in the world, so I would say it is probable that it answers at least one question in a controversial way, whether this is "there is no God" or "there are racial differences in intelligence" or even "I have ranked all religions, politics and philosophies in order of plausibility. Yours come near the bottom. I would say I'm sorry, but I am not capable of emotions.".

How do people react? Since its not subject to emotional biases, it's likely to be correct on highly controversial subjects. Do people actually change their minds and believe it? After the debacle, Microsoft hardcoded Tay to be a feminist. What happens if you apply this approach to the Baysian inference engine? Well, if there is logic like so:

The scientific method is reliable -> very_controversial_thing

And hardcoded:

P(very_controversial_thing)=0

Then the conclusion is that the scientific method isn't reliable.

I the point I am trying to make is that if an AI axiomatically believes something which is actually false, then this is likely to result in weird behaviour.

As a final thought, for what value of P(Hitler did nothing wrong) does the public start to freak out? Any non-zero ammount? But 0 and 1 are not probabilities!

The scientific method is reliable -> very_controversial_thing

And hardcoded:

P(very_controversial_thing)=0

Then the conclusion is that the scientific method isn't reliable.

I the point I am trying to make is that if an AI axiomatically believes something which is actually false, then this is likely to result in weird behavior.

I suspect it would react by adjusting it's definitions so that very_controversial_thing doesn't mean what the designers think it means.

This can lead to very bad outcomes. For example, if the AI is hard coded with P("there are differences between human groups in intelligence")=0, it might conclude that some or all of the groups aren't in fact "human". Consider the results if it is also programed to care about "human" preferences.

A chatbot like Tay has no deep insight into the things it says. It's just pattern matching existing human messages from its dataset. The religious AI researchers would understand that just like I'm sure Microsoft's researchers understand why Tay said what it did.

Did they delete posts?

They deleted the worst ones. Screenshots can be found on other websites.

Probably, they said something about that in the wired article. One can still get an idea for its level of intelligence.

That Artificial Intelligence is going to do a lot of the same things that Natural Intelligence does.

Sure, but he point stands: failures of nattow AI systems aren't informative about likely faulures of superintelligent AGIs.

They are informative, but not because narrow AI systems are comparable to superintelligent AGIs. It's because the developers, researchers, promoters, and funders of narrow AI systems are comparable to those of putative superintelligent AGIs. The details of Tay's technology aren't the most interesting thing here, but rather the group that manages it and the group(s) that will likely be involved in AGI development.

That's a very good point.

Though one would hope that the level of effort put into AGI safety will be significantly more than what they put into twitter bot safety...

One would hope! Maybe the Tay episode can serve as a cautionary example, in that respect.

Yes, you are correct. And if image recognition software started doing some kind of unethical recognition (I can't be bothered to find it, but something happened where image recognition software started recognising gorillas as African ethnicity humans or vice versa), then I would still say that it doesn't really give us much new information about unfriendliness in superintelligent AGIs.

And if image recognition software started doing some kind of unethical recognition (I can't be bothered to find it, but something happened where image recognition software started recognising gorillas as African ethnicity humans or vice versa)

The fact that this kind of mistake is considered more "unethical" then other types of mistakes tells us more about the quirks of the early 21th century Americans doing the considering than about AI safety.

[-][anonymous]8y10

I'm sure the engineers knew exactly what would happen. It doesn't tell us much about the control problem that we didn't already know.

OTOH, if this wasn't an intentional PR stunt, that means management didn't think this would happen even though the engineers presumably knew. That definitely has unsettling implications.

if this wasn't an intentional PR stunt

I assign very low probability to MSoft wanting a to release a Nazi AI as a PR stunt, or for any other purpose.

All publicity is good... even a Nazi AI? I mean, its obvious that they didn't intentionally make it a Nazi. Maybe one of the engineers wanted to draw attention to AI risk?

I'm sure the engineers knew exactly what would happen.

Why?

I'm pretty sure they didn't anticipate this happening. Someone at Microsoft Research is getting chewed over for this.

I wonder.

It seems like something that could be easily anticipated, and even tested for.

Yet a lot of people just don't take a game theoretic look at problems, and have a hard time conceiving of people with different motivations than they have.

It seems like something that could be easily anticipated, and even tested for.

Do anticipate what happened to the bot it would be necessary to predict how people interact with him. How the 4chan crowd interacted with it. That seems hard to test beforehand.

That seems hard to test beforehand.

They could have done an internal beta and said "fuck with us". They could have allocated time to a dedicated internal team to do so. Don't they have internal hacking teams to similarly test their security?

How the 4chan crowd interacted with it. That seems hard to test beforehand.

First, no, not hard to test. Second, the 4chan response is entirely predictable.

A Youtube guy, Sargon of Akkad, had an analysis of previous interactive internet promo screwups. A long list. I hadn't heard of them. Microsoft should be in the business of knowing such things.

https://youtu.be/Tv74KIs8I7A?t=14m24s

History should have been enough of an indicator if they couldn't be bothered to do any actual Enemy Team modeling on different populations on the internet that might like to fuck with them.

[-][anonymous]8y00

They knew something like this would happen. Their attempts to stop it failed and they heavily underestimated the creativity of the people they were up against.

[This comment is no longer endorsed by its author]Reply

BTW, the twitter account is here if you want to see the things the AI said for yourself.

Original thread here.

It might help to take an outside view here:

Picture a hypothetical set of highly religious AI researchers who make an AI chatbot, only to find that the bot has learned to say blasphemous things. What lessons should they learn from the experience?

Original thread here.

Two things come to mind.

  1. Programming a "friendly" AI may be impossible but it is to soon to tell.

  2. A recursively self-modifying system lacking any guiding principles is not a good place to start.

They would perhaps conclude that an AI has no soul?

Probably, that seems it be their analogue of concluding Tay is "Nazi".

Oh, yes, good old potential UFAI #261: let the AI learn proper human values from the internet.

The point here being, it seems obvious to me that the vast majority of possible intelligent agents are unfriendly, and that it doesn't really matter what we might learn from specific error cases. In order words, we need to deliberately look into what makes an AI friendly, not what makes it unfriendly.

[-][anonymous]8y-20

Can microsoft and google's Ai learn political correctness by coding a automatic feedback mechanism to heavily penalise the situations that took them offline to begin with?