Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

What can we learn from Microsoft's Tay, its inflammatory tweets, and its shutdown?

1 Post author: InquilineKea 26 March 2016 03:41AM

http://www.wired.com/2016/03/fault-microsofts-teen-ai-turned-jerk/

Could this be a lesson for future AIs? The AI control problem?

Comments (61)

Comment author: fubarobfusco 26 March 2016 03:17:35PM 16 points [-]

If you paint a Chinese flag on a wolverine, and poke it with a stick, it will bite you.

This does not mean that the primary danger of aggravating the Chinese army is that they will bite you.

It certainly does not mean that nations who fear Chinese aggression should prepare by banning sticks or investing in muzzles for wolverines.

Comment author: SquirrelInHell 27 March 2016 10:40:12AM -1 points [-]

Think of all those billions of dollars we will spend on a public network of EMDs (Emergency Muzzle Dispensers) and on financing the stick-police! It's for our security, so surely it's well worth spending the money.

Comment deleted 26 March 2016 08:02:09PM [-]
Comment author: Houshalter 02 April 2016 10:35:53AM 1 point [-]

A chatbot like Tay has no deep insight into the things it says. It's just pattern matching existing human messages from its dataset. The religious AI researchers would understand that just like I'm sure Microsoft's researchers understand why Tay said what it did.

Comment author: skeptical_lurker 30 March 2016 06:24:37AM 0 points [-]

They would perhaps conclude that an AI has no soul?

Comment author: Lamp2 08 April 2016 02:11:15AM 0 points [-]

Probably, that seems it be their analogue of concluding Tay is "Nazi".

Comment author: The_Jaded_One 26 March 2016 04:49:53AM 4 points [-]

I though a bit about it, but I think Tay is basically a software version of a parrot that repeats back what it hears - I don't think it has any commonsense knowledge or serious attempt to understand that tweets are about a world that exists outside of twitter. I.e it has no semantics, it's just a syntax manipulator that uses some kind of probabilistic language model to generate grammatically correct sentences and a machine learning model to try and learn which kind of sentences will get the most retweets or will most closely resemble other things people are tweeting about. Tay does't know what a "Nazi" actually is. I haven't looked into it in any detail but I know enough to guess that that's how it works.

As such, the failure of Tay doesn't particularly tell us much about Friendliness, because friendliness research pertains to superintelligent AIs which would definitely have a correct ontology/semantics and understand the world.

However, it does tell us that a sufficiently stupid, amateurish attempt to harvest human values using an infrahuman intelligence wouldn't reliably work. This is obvious to anyone who has been "in the trade" for a while, however it does seem to surprise the mainstream media.

It's probably useful as a rude slap-in-the-face to people who are so ignorant of how software and machine learning work that they think friendliness is a non-issue.

Comment deleted 26 March 2016 07:52:33PM [-]
Comment author: The_Jaded_One 27 March 2016 08:12:34PM 0 points [-]

Yes, you are correct. And if image recognition software started doing some kind of unethical recognition (I can't be bothered to find it, but something happened where image recognition software started recognising gorillas as African ethnicity humans or vice versa), then I would still say that it doesn't really give us much new information about unfriendliness in superintelligent AGIs.

Comment deleted 28 March 2016 03:50:53AM [-]
Comment author: The_Jaded_One 28 March 2016 05:06:20PM 1 point [-]

Sure, but he point stands: failures of nattow AI systems aren't informative about likely faulures of superintelligent AGIs.

Comment author: dlarge 29 March 2016 05:54:00PM 5 points [-]

They are informative, but not because narrow AI systems are comparable to superintelligent AGIs. It's because the developers, researchers, promoters, and funders of narrow AI systems are comparable to those of putative superintelligent AGIs. The details of Tay's technology aren't the most interesting thing here, but rather the group that manages it and the group(s) that will likely be involved in AGI development.

Comment author: The_Jaded_One 29 March 2016 08:28:14PM *  1 point [-]

That's a very good point.

Though one would hope that the level of effort put into AGI safety will be significantly more than what they put into twitter bot safety...

Comment author: dlarge 30 March 2016 05:42:11PM 1 point [-]

One would hope! Maybe the Tay episode can serve as a cautionary example, in that respect.

Comment author: Lumifer 30 March 2016 05:57:52PM 0 points [-]
Comment author: Lamp2 08 April 2016 02:08:15AM 0 points [-]

And if image recognition software started doing some kind of unethical recognition (I can't be bothered to find it, but something happened where image recognition software started recognising gorillas as African ethnicity humans or vice versa)

The fact that this kind of mistake is considered more "unethical" then other types of mistakes tells us more about the quirks of the early 21th century Americans doing the considering than about AI safety.

Comment author: Lamp2 08 April 2016 01:58:35AM 0 points [-]

I though a bit about it, but I think Tay is basically a software version of a parrot that repeats back what it hears - I don't think it has any commonsense knowledge or serious attempt to understand that tweets are about a world that exists outside of twitter. I.e it has no semantics

Well neither does image recognition software. Neither does Google's search algorithm.

Comment author: Lumifer 28 March 2016 01:04:10AM 0 points [-]

it does tell us that a sufficiently stupid, amateurish attempt to harvest human values using an infrahuman intelligence wouldn't reliably work.

You probably mean "reliably wouldn't work" :-)

However I have to question whether the Tay project was an attempt to harvest human values. As you mentioned, Tay lacks understanding of what she hears or says and so whatever it "learned" about humanity by listening to Twitter it would have been able to learn by straightforward statistical analysis of the corpus of text from Twitter.

Comment author: Rangi 30 March 2016 06:21:50AM 0 points [-]

Tay doesn't tell us much about deliberate Un-Friendliness. But Tay does tell us that a well-intentioned effort to make an innocent, harmless AI can go wrong for unexpected reasons. Even for reasons that, in hindsight, are obvious.

Are you sure that superintelligent AIs would have a "correct ontology/semantics"? They would have to have a useful one, in order to achieve their goals, but both philosophers and scientists have had incorrect conceptualizations that nevertheless matched the real world closely enough to be productive. And for an un-Friendly AI, "productive" translates to "using your atoms for its own purposes."

Comment author: The_Jaded_One 30 March 2016 07:31:55AM 0 points [-]

Are you sure that superintelligent AIs would have a "correct ontology/semantics"?

it's hard to imagine a superintelligent AGI that didn't know basic facts about the world like "trees have roots underground" or "most human beings sleep at night".

They would have to have a useful one, in order to achieve their goals

Useful models of reality (useful in the sense of achieving goals) tend to be ones that are accurate. This is especially true of a single agent that isn't subject to the weird foibles of human psychology and isn't mainly achieving things via signalling like many humans do.

The reason I made the point about having a correct understanding of the world, for example knowing what the term "Nazi" actually means, is that Tay has not achieved the status of being "unfriendly", because it doesn't actually have anything that could reasonably be called goals pertaining to the world. Tay is not even an unfriendly infra-intelligence. Though I'd be very interested if someone managed to make one.

Comment author: skeptical_lurker 30 March 2016 06:24:12AM *  3 points [-]

The first obvious point is that when learning human values you need a large dataset which isn't biased by going viral on 4chan.

The more interesting question is what happens when we get more powerful AI which isn't just a chatbot. Suppose in the future a powerful Baysian inference engine is developed. Its not an AGI, so there is no imminent singularity, but it does have the advantages of very large datasets and being completely unbiased. Asking it questions produces provably reliable results in many fields (but it is not smart enough to answer "how do I create AGI?"). Now, there are a lot of controversial beliefs in the world, so I would say it is probable that it answers at least one question in a controversial way, whether this is "there is no God" or "there are racial differences in intelligence" or even "I have ranked all religions, politics and philosophies in order of plausibility. Yours come near the bottom. I would say I'm sorry, but I am not capable of emotions.".

How do people react? Since its not subject to emotional biases, it's likely to be correct on highly controversial subjects. Do people actually change their minds and believe it? After the debacle, Microsoft hardcoded Tay to be a feminist. What happens if you apply this approach to the Baysian inference engine? Well, if there is logic like so:

The scientific method is reliable -> very_controversial_thing

And hardcoded:

P(very_controversial_thing)=0

Then the conclusion is that the scientific method isn't reliable.

I the point I am trying to make is that if an AI axiomatically believes something which is actually false, then this is likely to result in weird behaviour.

As a final thought, for what value of P(Hitler did nothing wrong) does the public start to freak out? Any non-zero ammount? But 0 and 1 are not probabilities!

Comment author: Lamp2 08 April 2016 02:11:57AM 1 point [-]

The scientific method is reliable -> very_controversial_thing

And hardcoded:

P(very_controversial_thing)=0

Then the conclusion is that the scientific method isn't reliable.

I the point I am trying to make is that if an AI axiomatically believes something which is actually false, then this is likely to result in weird behavior.

I suspect it would react by adjusting it's definitions so that very_controversial_thing doesn't mean what the designers think it means.

This can lead to very bad outcomes. For example, if the AI is hard coded with P("there are differences between human groups in intelligence")=0, it might conclude that some or all of the groups aren't in fact "human". Consider the results if it is also programed to care about "human" preferences.

Comment deleted 26 March 2016 08:46:49PM [-]
Comment author: ChristianKl 27 March 2016 04:38:35PM 1 point [-]

Did they delete posts?

Comment author: TheAltar 28 March 2016 12:52:38PM 1 point [-]

They deleted the worst ones. Screenshots can be found on other websites.

Comment author: Lamp2 08 April 2016 02:07:46AM 0 points [-]

Probably, they said something about that in the wired article. One can still get an idea for its level of intelligence.

Comment author: harshhpareek 27 March 2016 02:55:25AM *  2 points [-]

There is an opinion expressed here, that I agree with: http://smerity.com/articles/2016/tayandyou.html TL;dr: No "learning" from interactions on twitter happened. The bot was parroting old training data, because it does not really generate text. The researchers didn't apply an offensiveness filter at all.

I think this chat bot was performing badly right from the start. It would not make sense to give too much importance to the users it was chatting with, and they did not change its mind. That bit of media sensationalism is BS. Natural language generation is an open problem and almost every method I have seen (not an expert in NLP, but would call myself one in Machine Learning) ends up parroting some of its training text, implying that it is overfitting.

Given this, we should learn nothing about AI from this experiment, only about people's reaction to it, mainly the media reaction to it. Users' reaction while talking to AI is well documented.

Comment author: jollybard 27 March 2016 02:24:56AM *  1 point [-]

Oh, yes, good old potential UFAI #261: let the AI learn proper human values from the internet.

The point here being, it seems obvious to me that the vast majority of possible intelligent agents are unfriendly, and that it doesn't really matter what we might learn from specific error cases. In order words, we need to deliberately look into what makes an AI friendly, not what makes it unfriendly.

Comment author: SolveIt 26 March 2016 05:20:58AM 1 point [-]

I'm sure the engineers knew exactly what would happen. It doesn't tell us much about the control problem that we didn't already know.

OTOH, if this wasn't an intentional PR stunt, that means management didn't think this would happen even though the engineers presumably knew. That definitely has unsettling implications.

Comment author: buybuydandavis 26 March 2016 08:06:46PM 3 points [-]

if this wasn't an intentional PR stunt

I assign very low probability to MSoft wanting a to release a Nazi AI as a PR stunt, or for any other purpose.

Comment author: skeptical_lurker 30 March 2016 06:27:31AM -1 points [-]

All publicity is good... even a Nazi AI? I mean, its obvious that they didn't intentionally make it a Nazi. Maybe one of the engineers wanted to draw attention to AI risk?

Comment author: ChristianKl 26 March 2016 01:40:41PM 2 points [-]

I'm sure the engineers knew exactly what would happen.

Why?

Comment author: The_Jaded_One 26 March 2016 07:16:38PM 2 points [-]

I'm pretty sure they didn't anticipate this happening. Someone at Microsoft Research is getting chewed over for this.

Comment author: buybuydandavis 26 March 2016 08:11:45PM 0 points [-]

I wonder.

It seems like something that could be easily anticipated, and even tested for.

Yet a lot of people just don't take a game theoretic look at problems, and have a hard time conceiving of people with different motivations than they have.

Comment author: ChristianKl 27 March 2016 04:38:06PM *  0 points [-]

It seems like something that could be easily anticipated, and even tested for.

Do anticipate what happened to the bot it would be necessary to predict how people interact with him. How the 4chan crowd interacted with it. That seems hard to test beforehand.

Comment author: buybuydandavis 28 March 2016 02:05:26AM 2 points [-]

That seems hard to test beforehand.

They could have done an internal beta and said "fuck with us". They could have allocated time to a dedicated internal team to do so. Don't they have internal hacking teams to similarly test their security?

Comment author: Lumifer 28 March 2016 01:07:03AM 1 point [-]

How the 4chan crowd interacted with it. That seems hard to test beforehand.

First, no, not hard to test. Second, the 4chan response is entirely predictable.

Comment author: buybuydandavis 28 March 2016 01:58:49AM *  1 point [-]

A Youtube guy, Sargon of Akkad, had an analysis of previous interactive internet promo screwups. A long list. I hadn't heard of them. Microsoft should be in the business of knowing such things.

https://youtu.be/Tv74KIs8I7A?t=14m24s

History should have been enough of an indicator if they couldn't be bothered to do any actual Enemy Team modeling on different populations on the internet that might like to fuck with them.

Comment author: mwengler 03 April 2016 03:15:43PM 1 point [-]

That Artificial Intelligence is going to do a lot of the same things that Natural Intelligence does.

Comment author: jacob_cannell 31 March 2016 04:23:49AM 1 point [-]

Not much.

Comment author: Lamp2 08 April 2016 01:59:21AM 0 points [-]

BTW, the twitter account is here if you want to see the things the AI said for yourself.

Original thread here.

Comment author: Lamp2 08 April 2016 01:59:06AM 0 points [-]

It might help to take an outside view here:

Picture a hypothetical set of highly religious AI researchers who make an AI chatbot, only to find that the bot has learned to say blasphemous things. What lessons should they learn from the experience?

Original thread here.

Comment author: parabarbarian 03 April 2016 03:38:19PM 0 points [-]

Two things come to mind.

  1. Programming a "friendly" AI may be impossible but it is to soon to tell.

  2. A recursively self-modifying system lacking any guiding principles is not a good place to start.

Comment author: [deleted] 27 March 2016 07:15:57AM *  -2 points [-]

Can microsoft and google's Ai learn political correctness by coding a automatic feedback mechanism to heavily penalise the situations that took them offline to begin with?