So the Turing test has been "passed", and the general consensus is that this was achieved in a very unimpressive way - the 13 year old Ukrainian persona was a cheat, the judges were incompetent, etc... These are all true, though the test did pass Turing's original criteria - and there are far more people willing to be dismissive of those criteria in retrospect than were in advance. It happened about 14 years later than Turing had been anticipating, which makes it quite a good prediction for 1950 (in my personal view, Turing made two mistakes that compensated - the "average interrogator" was a much lower bar than he thought, but progress on the subject would be much slower than he thought).
But anyway, the main goal now, as suggested by Toby Ord and others, is to design a better Turing test, something that can give AI designers something to aim at, and that would be a meaningful test of abilities. The aim is to ensure that if a program passes these new tests, we won't be dismissive of how it was achieved.
Here are a few suggestions I've heard about or thought about recently; can people suggest more and better ideas?
- Use proper control groups. 30% of judges thinking that a program is human is meaningless unless the judges also compare with actual humans. Pair up a human subject with a program, and the role of the judge is to establish which of the two subjects is the human and which is not.
- Toss out the persona tricks - no 13 year-olds, nobody with poor English skills. It was informative about human psychology that these tricks work, but we shouldn't allow them in future. All human subjects will have adequate English and typing skills.
- On that subject, make sure the judges and subjects are properly motivated (financial rewards, prizes, prestige...) to detect or appear human. We should also brief them that our usual conversational approach to establish which kind of human they are dealing with, is not useful for determining whether they are dealing with a human at all.
- Use only elite judges. For instance, if Scott Aaronson can't figure it out, the program must have some competence.
- Make a collection of generally applicable approaches (such as the Winograd Schemas) available to the judges, while emphasising they will have to come up with their own exact sentences, since anything online could have been used to optimise the program already.
- My favourite approach is to test the program on a task they were not optimised for. A cheap and easy way of doing that would be to test them on novel ASCII art.
My current method would be the lazy one of simply typing this, then waiting, arms folded:
"If you want to prove you're human, simply do nothing for 4 minutes, then re-type this sentence I've just written here, skipping one word out of 2".
I think we'll see (arguably have already seen) AI changing the world before we see a general AI passing the Turing test. But I don't think that makes the Turing test useless, or a red herring.
Narrow AI is plenty powerful. It drives cars, flies military drones, runs short-term trading systems, and plays chess, and does (or will shortly do) them all better than the best humans in their domains. Right now that hasn't dramatically changed the world, but I don't think it's too much of a stretch to imagine a world that has been transformed by narrow AI applications.
But there are still things the Turing test or a successor would be useful for. For one thing, as AI techniques advance, I expect the line between narrow and general AI to blur. I can't rule out purpose-built AGI before this becomes significant, but if that doesn't make the problem completely irrelevant, then the Turing test serves as a pretty good marker of generalizability: if your trading system (that scrapes Reuters for headlines and does some sophisticated NLP and concept-mapping stuff that you're pretty proud of) starts asking you hilariously bizarre questions about business ethics, you're probably well on your way to dealing with something that can no longer be described as narrow AI. If it starts asking you good questions about business ethics... well, you're probably very lucky.
Less significantly from an AGI perspective, but still interestingly, there's a bunch of semi-narrow AI applications that focus tightly on interaction with humans. Siri, Google Now, and Cortana are probably the most salient examples right now, along with all those godawful customer-service phone systems; we could also imagine things like automated hotel concierges or caretakers for the elderly. The Turing test is an excellent benchmark for their performance; I no longer think we can take a pass as evidence of strong general intelligence, but humanlike responses are so useful in these roles that I still think it's a good thing to shoot for. A successor test in this role gives us a less gameable objective.
That argues any sufficiently general system could pass the Turing test. But maybe it's really impossible to pass the test without investing a lot of 'narrow' resources in that specific goal. Even if an AGI could self-modify to pass for human, it would not bother unless that were an instrumental goal (i.e. to trick humans), at which point it's probably too late for you from a FAI viewpoint.
We should be able to recognize a powerful, smart, general intelligence without requiring that it be g... (read more)