It sounds like you're pointing out that people often overestimate the difficulty of passing the Turing Test. Is that what you mean to say?
Yes. I think the Turing Test is useful, but that there are too many quite distinct tests mapping to "Turing Test", and details matter. College students as volunteers will lead to markedly different results than a randomly iid drawn human from anywhere on Earth.
As is the case so often, many disagreements I've seen boil down to (usually unrecognized) definitional squabbles. Without clarification, the statement "A Turing Test is a reasonable test for intelligence" just isn't well defined enough. Which Turing Test? Reasonable in terms of optimality, in terms of feasability, or in what way? Intelligence in some LW "optimizing power above certain threshold" sense (if so, what threshold?), or some other notion?
You thankfully narrowed it down to the specific Turing version I mentioned, but in truth I don't have only one concept of intelligence I find useful, in the sense of that I can see various concepts of intelligence being useful in different contexts. I pay no special homage to "intelligence1" over "intelligence2". Concerning this discussion:
I think that human-level intelligence - and the Turing Test is invariably centered on humans as the benchmark - shouldn't be defined by educated gifted people, but by an average. An "average human" Turing Test being passed is surely interesting, not least from a historical perspective. However, it's not clear whether such an algorithm would be powerful enough to foom, or to do that many theoretically interesting tasks. Many less privileged humans can't do that many interesting tasks better than machines, apart from recognizing tanks and cats on pictures.
So should we focus on a Turing Test tuned to an AGI on par with fooling the best researchers into believing it to be a fellow AI researcher? Maybe, although if we had a "winner", we'd probably know just by looking out the window, before we even set up the test (or we'd know by the AI looking in ...).
All considered, I'd focus on a Turing Test which can fool average humans in the civilized world, which seems to be the lowest Turing Test level at which such a chatbot would have a transformative influence on social human interactions.
Thank you, that was a fantastic answer to my questions (and more)!
It's been a productive conversation on my post criticising the Turing test. I claimed that I wouldn't take the Turing test as definitive evidence of general intelligence if the agent was specifically optimised on the test. I was challenged as to whether I had a different definition of thinking than "able to pass the Turing test". As a consequence of that exchange, I think I do.
Truly general intelligence is impossible, because of various "no free lunch" theorems, that demonstrate that no algorithm can perform well in every environment (intuitively, this makes sense: a smarter being could always design an environment that specifically penalises a particular algorithm). Nevertheless, we have the intuitive definition of a general intelligence as one that performs well in most (or almost all) environments.
I'd like to reverse that definition, and define a general intelligence as one that doesn't perform stupidly in a novel environment. A small change of emphasis, but it gets to the heart of what the Turing test is meant to do, and why I questioned it. The idea of the Turing test is to catch the (putative) AGI performing stupidly. Since we can't test the AGI on every environment, the idea is to have the Turing test be as general as possible in potential. If you give me the questions in advance, I can certainly craft an algorithm that aces that test; similarly, you can construct an AGI that would ace any given Turing test. But since the space of reasonable conversations is combinatorially huge, and since the judge could potentially pick any element from within that, the AGI could not just have a narrow list of responses: it would have to be genuinely generally intelligent, so that it would not end up being stupid on the particular conversation it was in.
That's the theory, anyway. But maybe the space of conversations isn't as vast as all that, especially if the AGI has some simple classification algorithms. Maybe the data on the internet today, combined with some reasonably cunning algorithms, can carry a conversation as well as a human. After all, we are generating examples of conversations by the millions every hour of every day.
Which is why I emphasised testing from outside the domain of competence of the AGI. You need to introduce it to a novel environment, and give it the possibility of being stupid. If the space of human conversations isn't large enough, you need to move to the much larger space of real-world problem solving - and pick something from it. It doesn't matter what it is, simply that you have the potential of picking anything. Hence only a general intelligence could be confident, in advance, of coping with it. That's why I emphasised not saying what your test was going to be, and changing the rules or outright cheating: the less restrictions you allow on the potential test, the more informative the actual test is.
A related question, of course, is whether humans are generally intelligent. Well, humans are stupid in a lot of domains. Human groups augmented by data and computing technology, and given enough time, are much more generally intelligent that individual humans. So general intelligence is a matter of degree, not a binary classification (though it might be nearly binary for some AGI designs). Thus whether you call humans generally intelligent is a matter of taste and emphasis.