drnickbone comments on The flawed Turing test: language, understanding, and partial p-zombies - Less Wrong

11 Post author: Stuart_Armstrong 17 May 2013 02:02PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (184)

You are viewing a single comment's thread.

Comment author: drnickbone 17 May 2013 05:46:08PM 10 points [-]

The best way to avoid this is to create more varied analogues of the Turing test - and to keep them secret. Just as you keep the training set and the test set distinct in machine learning, you want to confront the putative AIs with quasi-Turing tests that their designers will not have encountered or planed for.

But aren't these just instances of the Turing test? As the judge, you're allowed to ask any questions you like to try and distinguish the AI program from the human contestant, including novel and unexpected questions that the entrants have not had any chance to prepare for. At some point you will completely flummox the AI, but you will also flummox the human too. The interesting question then is whether you can tell the difference I.e. will the AI behave in a near-human way when trying to cope with a baffling and completely unexpected problem? If it does, that is a real sign if intelligence, is it not?

Comment author: Stuart_Armstrong 17 May 2013 05:54:07PM 1 point [-]

I mean move out of whatever format was agreed on. Move away from text-based systems (a truly smart AI could download voice software if it had to - make sure it has time to do so). Unilaterally extend the deadline. Offer side deals or bets with real money (which a smart AI could acquire or pretend to have). Insist the subject create videos on specific themes.

Do stuff you're not supposed/expected to do.

Comment author: Viliam_Bur 17 May 2013 07:50:55PM 4 points [-]

There is a risk of asking more from the AI than a human could deliver.

Imagine a mute human, randomly taken from the street, who has to download voice software and use it to communicate with the judge, without making the judge suspect that they are the AI. How much change of success here? Similarly, how many people would lose bets? Etc.

On the other hand, if we prefer to err on the side of underestimating the AI, but want to avoid overestimating it, then the more difficult task, the better, even if some humans couldn't solve it. But then... why not give the AI simply the task to convince humans about being intelligent, without any further rules?

Comment author: Stuart_Armstrong 17 May 2013 08:14:33PM 0 points [-]

Let's contrast two situations:

1) We build a whole brain emulation, uploading from a particular brain. We subject that WBE to a Turing test, that it passes. Is it conscious? I'd argue yes, even without a definition on consciousness, we must still grant it to the WBE, if we grant it to humans.

2) Same thing, but instead of the WBE, we have a de novo computer system designed specifically to pass the Turing test via mass data crunching. I'd say we now need more proof that this system is conscious.

Why the difference? In the first case the WBE was optimised for being an accurate representation of a brain. So if it passes the Turing test, then it probably is an accurate representation, as it is hard to conceive of a very flawed brain representation that also passes that test.

In the second case, the system was optimised for passing the test only. So it is very possible to conceive it passing the test, but not having the other attributes of consciousness or intelligence. So our tests have to be more rigorous in the second case.

Not that I've got particularly good ideas how to do this! I just note that it needs to be done. Maybe "long" Turing tests (6 months or more) might be enough. Or maybe we'll need to disconnect the AI from the internet (maybe give it a small video feed of some popular TV shows - but only give it info at human-bandwidth), wait for human society to evolve a bit, and test the AI on concepts that weren't available when it was disconnected.

The form of the AI is also relevant - if it's optimised for something else, then passing the Turing test is a much stronger indication.

Comment author: Bugmaster 17 May 2013 08:59:47PM 1 point [-]

So it is very possible to conceive it passing the test, but not having the other attributes of consciousness or intelligence.

What are these other attributes, as distinct from the attributes it would need to pass the Turing Test ?

Sure, you could ask it to make videos of itself skating or whatever, but a WBE wouldn't be able to do that, either (seeing as it doesn't have a body to skate with). Does it mean they both fail ?

Comment author: bogdanb 17 May 2013 11:03:17PM *  1 point [-]

you could ask it to make videos of itself skating or whatever

I don't think he meant it that way. I read it as "make a video montage of a meme" or the like. The point being that such a task exercises more elements of "human intelligence" than just chatting, like lexical and visual metaphors, perception of vision and movement, (at least a little bit of) imagination, planing and execution of a technical task, (presumably) using other software purposefully, etc. It is much harder to plan for and "fake" (whatever that means) all of that than to "fake" a text-only test with chat-bot techniques.

Of course, a blind (for instance) real man might not be able to do that particular task, but he will be able to justify that by being convincingly blind in the rest, and would be able to perform something analogous in other domains. (Music or even reciting something emphatically, or perhaps some tactile task that someone familiar with being blind might imagine.) The point I think is not to tie it to a particular sense or the body, but just to get a higher bandwidth channel for testing, one that would be so hard to fake in close to real time that you'd pretty much have to be smarter to do it.

Testing for consciousness seems to be so hard that text chat is not enough (or at least we're close to being better at faking it than testing for it), so I guess Stuart suggests we take advantage of the "in-built optimizations" that let us do stuff like fake and detect accents or infer distances from differences in apparent height (but is some contexts, status or other things). Things that we don't yet fake well, and even when we do, it's hard to mix and integrate them all.

Comment author: Bugmaster 18 May 2013 12:23:30AM 0 points [-]

I read it as "make a video montage of a meme" or the like.

If you told me personally to do that, I may not pass the test, either. And I personally know several humans who cannot, f.ex., "use other software purposefully". I think these kinds of challenges are a form of scope creep. We are not trying to test whether the AI is a polymath, just whether it's human or not.

It is much harder to plan for and "fake" (whatever that means) all of that than to "fake" a text-only test with chat-bot techniques.

I disagree; that is, while I agree that participating in many types of interactions is more difficult than participating in a single type of interaction, I disagree that this degree of difficulty is important.

As I said before, in order to hold an engaging conversation with a human through "fakery", the AI would have to "fake" human-level intelligence. Sure, it could try to steer the conversation toward its own area of expertise -- but firstly, this is what real humans do as well, and secondly, it would still have to do so convincingly, knowing full well that its interlocutor may refuse to be steered. I simply don't know of a way to perfectly "fake" this level of intelligence without actually being intelligent.

You speak of "higher bandwidth channels for testing", but consider the fact that there are several humans in existence today, at this very moment, whose interaction with you consists entirely of text. Do you accept that they are, in fact, human ? If so, then what's the difference between them and (hypothetical) Turing-grade AIs ?

Comment author: Jiro 18 May 2013 03:14:32PM *  2 points [-]

If you told me personally to do that, I may not pass the test, either. And I personally know several humans who cannot, f.ex., "use other software purposefully". I think these kinds of challenges are a form of scope creep. We are not trying to test whether the AI is a polymath, just whether it's human or not.

I don't believe it's scope creep at all. The requirement isn't really "make a video". The requirement is "be able to do some of the things in the category 'human activities that are hard to automate'". Making a video is a specific item in the category, and the test is not to see that someone can do any specific item in the category, just that they can do some of them. If the human questioner gets told "I don't know how to make a video", he's not going to say "okay, you're a computer", he's going to ask "okay, then how about you do this instead?", picking another item from the category.

(Note that the human is able to ask the subject to do another item from the category without the human questioner being able to list all the items in the category in advance.)

Comment author: Bugmaster 18 May 2013 09:02:40PM 1 point [-]

The requirement is "be able to do some of the things in the category 'human activities that are hard to automate'"

That is starting to sound like a "Turing Test of the gaps".

"Chatting online is really hard to automate, let's test for that.
Ok, we've automated chatting, let's test for musical composition, instead.
Ok, looks like there are AIs that can do that. Let's test it for calculus..."

Comment author: Stuart_Armstrong 20 May 2013 09:30:11AM 0 points [-]

My tests would be: have a chatterbot do calculus. Have a muscial bot chat. Have a calculus bot do music.

To test for general intelligence, you can't test on the specific skill the bot's trained in.

Comment author: Jiro 19 May 2013 02:59:42AM 0 points [-]

I'm not sure what that criticism is trying to say.

Assuming it's an analogy with the god of the gaps, you might be saying that if the computer can pass the test the questioner can always pick a new requirement that he knows the computer can't pass.

If this is what you are saying, then it's wrong because of the flip side of the previous argument: just like the test doesn't check to see if the subject succeeds in any specific item, it also doesn't check to see if the subject fails in any specific item. In order for a computer to fail the test because of inability to do something, it has to show a pattern of inability that is different from the pattern of inability that a human would show. The questioner can't just say "well, computers aren't omniscient, so I know there's something the computer will fail at", pick that, and automatically fail the computer--you don't fail because you failed one item.