I don't think this captures the fundamental natures of intelligence, and I think others are right to throw an error at the word "stupidly."
Suppose there is some cognitive faculty, which we'll call adaptability, which agents have. When presented with a novel environment (i.e. sense data, set of possible actions, and consequences of those actions if taken), more adaptable agents will more rapidly choose actions with positive consequences.
Suppose there is some other cognitive faculty, which we'll call knowledge, which agents also have. This is a characterization of the breadth of environments to which they have adapted, and how well they have adapted to them.
Designing an agent with specific knowledge requires adaptability on the part of the designer; designing an agent with high adaptability requires adaptability on the part of the agent. Your general criticism seems to be "an agent can become knowledgeable about human conversations carried out over text channels with little adaptability of its own, and thus that is not a good test of adaptability."
I would agree: a GLUT written in stone, which is not adaptable at all, could still contain all the knowledge necessary to pass the Turing test. An adaptable algorithm could pass the Turing Test, but only after consuming a sample set containing thousands of conversations and millions of words and then participating in those conversations itself. After all, that's how I learned to speak English.
Perhaps there is an optimal learner that we can compare agents against. But communication has finite information transfer, and the bandwidth varies significantly; the quality of the instruction (or the match between the instruction and the learner) should be part of the test. Even exploration is an environment where knowledge can help, especially if the exploration is in a field linked to reality. (Indeed, it's not clear that humans are adaptable to anything, and so the binary "adaptable or not?" makes as much sense as an "intelligent or not?".)
These two faculties suggest different thresholds for AI: an AI can eat the jobs of knowledge workers once it has their knowledge, and an AGI can eat the job of creating knowledge workers once it has adaptability.
(Here I used two clusters of cognitive faculties, but I think the DIKW pyramid is also relevant.)
It's been a productive conversation on my post criticising the Turing test. I claimed that I wouldn't take the Turing test as definitive evidence of general intelligence if the agent was specifically optimised on the test. I was challenged as to whether I had a different definition of thinking than "able to pass the Turing test". As a consequence of that exchange, I think I do.
Truly general intelligence is impossible, because of various "no free lunch" theorems, that demonstrate that no algorithm can perform well in every environment (intuitively, this makes sense: a smarter being could always design an environment that specifically penalises a particular algorithm). Nevertheless, we have the intuitive definition of a general intelligence as one that performs well in most (or almost all) environments.
I'd like to reverse that definition, and define a general intelligence as one that doesn't perform stupidly in a novel environment. A small change of emphasis, but it gets to the heart of what the Turing test is meant to do, and why I questioned it. The idea of the Turing test is to catch the (putative) AGI performing stupidly. Since we can't test the AGI on every environment, the idea is to have the Turing test be as general as possible in potential. If you give me the questions in advance, I can certainly craft an algorithm that aces that test; similarly, you can construct an AGI that would ace any given Turing test. But since the space of reasonable conversations is combinatorially huge, and since the judge could potentially pick any element from within that, the AGI could not just have a narrow list of responses: it would have to be genuinely generally intelligent, so that it would not end up being stupid on the particular conversation it was in.
That's the theory, anyway. But maybe the space of conversations isn't as vast as all that, especially if the AGI has some simple classification algorithms. Maybe the data on the internet today, combined with some reasonably cunning algorithms, can carry a conversation as well as a human. After all, we are generating examples of conversations by the millions every hour of every day.
Which is why I emphasised testing from outside the domain of competence of the AGI. You need to introduce it to a novel environment, and give it the possibility of being stupid. If the space of human conversations isn't large enough, you need to move to the much larger space of real-world problem solving - and pick something from it. It doesn't matter what it is, simply that you have the potential of picking anything. Hence only a general intelligence could be confident, in advance, of coping with it. That's why I emphasised not saying what your test was going to be, and changing the rules or outright cheating: the less restrictions you allow on the potential test, the more informative the actual test is.
A related question, of course, is whether humans are generally intelligent. Well, humans are stupid in a lot of domains. Human groups augmented by data and computing technology, and given enough time, are much more generally intelligent that individual humans. So general intelligence is a matter of degree, not a binary classification (though it might be nearly binary for some AGI designs). Thus whether you call humans generally intelligent is a matter of taste and emphasis.