Viliam_Bur comments on The flawed Turing test: language, understanding, and partial p-zombies - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (184)
I mean move out of whatever format was agreed on. Move away from text-based systems (a truly smart AI could download voice software if it had to - make sure it has time to do so). Unilaterally extend the deadline. Offer side deals or bets with real money (which a smart AI could acquire or pretend to have). Insist the subject create videos on specific themes.
Do stuff you're not supposed/expected to do.
There is a risk of asking more from the AI than a human could deliver.
Imagine a mute human, randomly taken from the street, who has to download voice software and use it to communicate with the judge, without making the judge suspect that they are the AI. How much change of success here? Similarly, how many people would lose bets? Etc.
On the other hand, if we prefer to err on the side of underestimating the AI, but want to avoid overestimating it, then the more difficult task, the better, even if some humans couldn't solve it. But then... why not give the AI simply the task to convince humans about being intelligent, without any further rules?
Let's contrast two situations:
1) We build a whole brain emulation, uploading from a particular brain. We subject that WBE to a Turing test, that it passes. Is it conscious? I'd argue yes, even without a definition on consciousness, we must still grant it to the WBE, if we grant it to humans.
2) Same thing, but instead of the WBE, we have a de novo computer system designed specifically to pass the Turing test via mass data crunching. I'd say we now need more proof that this system is conscious.
Why the difference? In the first case the WBE was optimised for being an accurate representation of a brain. So if it passes the Turing test, then it probably is an accurate representation, as it is hard to conceive of a very flawed brain representation that also passes that test.
In the second case, the system was optimised for passing the test only. So it is very possible to conceive it passing the test, but not having the other attributes of consciousness or intelligence. So our tests have to be more rigorous in the second case.
Not that I've got particularly good ideas how to do this! I just note that it needs to be done. Maybe "long" Turing tests (6 months or more) might be enough. Or maybe we'll need to disconnect the AI from the internet (maybe give it a small video feed of some popular TV shows - but only give it info at human-bandwidth), wait for human society to evolve a bit, and test the AI on concepts that weren't available when it was disconnected.
The form of the AI is also relevant - if it's optimised for something else, then passing the Turing test is a much stronger indication.
What are these other attributes, as distinct from the attributes it would need to pass the Turing Test ?
Sure, you could ask it to make videos of itself skating or whatever, but a WBE wouldn't be able to do that, either (seeing as it doesn't have a body to skate with). Does it mean they both fail ?
I don't think he meant it that way. I read it as "make a video montage of a meme" or the like. The point being that such a task exercises more elements of "human intelligence" than just chatting, like lexical and visual metaphors, perception of vision and movement, (at least a little bit of) imagination, planing and execution of a technical task, (presumably) using other software purposefully, etc. It is much harder to plan for and "fake" (whatever that means) all of that than to "fake" a text-only test with chat-bot techniques.
Of course, a blind (for instance) real man might not be able to do that particular task, but he will be able to justify that by being convincingly blind in the rest, and would be able to perform something analogous in other domains. (Music or even reciting something emphatically, or perhaps some tactile task that someone familiar with being blind might imagine.) The point I think is not to tie it to a particular sense or the body, but just to get a higher bandwidth channel for testing, one that would be so hard to fake in close to real time that you'd pretty much have to be smarter to do it.
Testing for consciousness seems to be so hard that text chat is not enough (or at least we're close to being better at faking it than testing for it), so I guess Stuart suggests we take advantage of the "in-built optimizations" that let us do stuff like fake and detect accents or infer distances from differences in apparent height (but is some contexts, status or other things). Things that we don't yet fake well, and even when we do, it's hard to mix and integrate them all.
If you told me personally to do that, I may not pass the test, either. And I personally know several humans who cannot, f.ex., "use other software purposefully". I think these kinds of challenges are a form of scope creep. We are not trying to test whether the AI is a polymath, just whether it's human or not.
I disagree; that is, while I agree that participating in many types of interactions is more difficult than participating in a single type of interaction, I disagree that this degree of difficulty is important.
As I said before, in order to hold an engaging conversation with a human through "fakery", the AI would have to "fake" human-level intelligence. Sure, it could try to steer the conversation toward its own area of expertise -- but firstly, this is what real humans do as well, and secondly, it would still have to do so convincingly, knowing full well that its interlocutor may refuse to be steered. I simply don't know of a way to perfectly "fake" this level of intelligence without actually being intelligent.
You speak of "higher bandwidth channels for testing", but consider the fact that there are several humans in existence today, at this very moment, whose interaction with you consists entirely of text. Do you accept that they are, in fact, human ? If so, then what's the difference between them and (hypothetical) Turing-grade AIs ?
I don't believe it's scope creep at all. The requirement isn't really "make a video". The requirement is "be able to do some of the things in the category 'human activities that are hard to automate'". Making a video is a specific item in the category, and the test is not to see that someone can do any specific item in the category, just that they can do some of them. If the human questioner gets told "I don't know how to make a video", he's not going to say "okay, you're a computer", he's going to ask "okay, then how about you do this instead?", picking another item from the category.
(Note that the human is able to ask the subject to do another item from the category without the human questioner being able to list all the items in the category in advance.)
That is starting to sound like a "Turing Test of the gaps".
"Chatting online is really hard to automate, let's test for that.
Ok, we've automated chatting, let's test for musical composition, instead.
Ok, looks like there are AIs that can do that. Let's test it for calculus..."
My tests would be: have a chatterbot do calculus. Have a muscial bot chat. Have a calculus bot do music.
To test for general intelligence, you can't test on the specific skill the bot's trained in.
Try to teach the competitor to do some things that make sense to humans and some things that do no make sense to humans, from wildly different fields. If the competitor seems to be confused by things which are confusing to people and learns things which are not confusing, it is more likely to be thinking instead of parroting.
For example, you could explain why no consistent logical system can trust itself, and the ask the competitor if they think their way of thinking is consistent; if they think it isn't, Ask them if they think that they could prove literally anything using their way of thinking. If they think it is, ask them if they would believe everything that they can prove to be true.
Thinking entities will tend to believe that they can't prove things which are false, and thus that everything that they can prove is true. Calculating entities run I to trouble with those concepts.
Less meta, one could explain the magical thinking expressed in The Secret and ask why some people believe it and others don't, along with asking why the competitor does or doesn't.
I think we might have different definitions of what "general intelligence" is. I thought it meant something like, "being able to solve novel problems in some domain"; in this case, our domain is "human conversation". I may be willing to extend the definition to say, "...and also possessing the capacity to learn how to solve problems in some number of other domains".
Your definition, though, seems to involve solving problems in any domain. I think this definition is too broad. No human is capable of doing everything; and most humans are only good at a small number of things. An average mathematician can't compose music. An average musician can't do calculus. Some musicians can learn calculus (given enough time and motivation), but others cannot. Some mathematicians can learn to paint; others cannot.
Perhaps you mean to say that humans are not generally intelligent, and neither are AIs who pass the Turing Test ? In this case, I might agree with you.
I'm not sure what that criticism is trying to say.
Assuming it's an analogy with the god of the gaps, you might be saying that if the computer can pass the test the questioner can always pick a new requirement that he knows the computer can't pass.
If this is what you are saying, then it's wrong because of the flip side of the previous argument: just like the test doesn't check to see if the subject succeeds in any specific item, it also doesn't check to see if the subject fails in any specific item. In order for a computer to fail the test because of inability to do something, it has to show a pattern of inability that is different from the pattern of inability that a human would show. The questioner can't just say "well, computers aren't omniscient, so I know there's something the computer will fail at", pick that, and automatically fail the computer--you don't fail because you failed one item.
Yep, this is it. I see no reason why we should hold computers to a higher standard than we do our fellow humans.
I'm not sure what kind of a "pattern of inability" an average human would show; I'm not even convinced that such a pattern exists in a non-trivial sense (f.ex., the average human surely cannot fly by will alone, but it would be silly to test for that).
All the tests that were proposed so far, such as "create a video" or "compose a piece of music" target a relatively small subset of humans who are capable of such tasks. Thus, we would in fact expect the average human to say, "sorry, I have no ear for music" or something of the sort -- which is also exactly what an AI would say (unless it was actually capable of the task, of course). Many humans would attempt the task but fail; the AI could do that, too (by design or by accident). So, the new tests don't really tell you much.