the Turing test serves as a pretty good marker of generalizability
That argues any sufficiently general system could pass the Turing test. But maybe it's really impossible to pass the test without investing a lot of 'narrow' resources in that specific goal. Even if an AGI could self-modify to pass for human, it would not bother unless that were an instrumental goal (i.e. to trick humans), at which point it's probably too late for you from a FAI viewpoint.
We should be able to recognize a powerful, smart, general intelligence without requiring that it be good at pretending to be a complete different kind of powerful, smart, general intelligence that has a lot of social quirks and cues.
The Turing test is an excellent benchmark for their performance; I no longer think we can take a pass as evidence of strong general intelligence, but humanlike responses are so useful in these roles that I still think it's a good thing to shoot for.
Again, I don't think the Turing test is necessary in this example. Siri can fulfill every objective of its designers without being able to trick humans who really want to know if it's an AI or not. A robotic hotel concierge wants to make guests comfortable and serve their needs; there is no reason that should involve tricking them.
So the Turing test has been "passed", and the general consensus is that this was achieved in a very unimpressive way - the 13 year old Ukrainian persona was a cheat, the judges were incompetent, etc... These are all true, though the test did pass Turing's original criteria - and there are far more people willing to be dismissive of those criteria in retrospect than were in advance. It happened about 14 years later than Turing had been anticipating, which makes it quite a good prediction for 1950 (in my personal view, Turing made two mistakes that compensated - the "average interrogator" was a much lower bar than he thought, but progress on the subject would be much slower than he thought).
But anyway, the main goal now, as suggested by Toby Ord and others, is to design a better Turing test, something that can give AI designers something to aim at, and that would be a meaningful test of abilities. The aim is to ensure that if a program passes these new tests, we won't be dismissive of how it was achieved.
Here are a few suggestions I've heard about or thought about recently; can people suggest more and better ideas?
My current method would be the lazy one of simply typing this, then waiting, arms folded:
"If you want to prove you're human, simply do nothing for 4 minutes, then re-type this sentence I've just written here, skipping one word out of 2".