Sure. "If it's smart, it won't make simple mistakes." But I'm also interested in the question of whether, given the first few in this sequence of approximate agents, one could do a good job at predicting the next one.
It seems like you could - like there is a simple rule governing these systems ("check whether there's a human in the greenhouse") that might involve difficult interaction with the world in practice but is much more straightforward when considered from the omniscient third-person view of imagination. And given that this rule is (arguendo) simple within a fairly natural (though not by any means unique) model of the world, and that it helps predict the sequence, one might be able to guess that this rule was likely just from looking at the sequence of systems.
(This also relies on the distinction between just trying to find likely or good-enough answers, and the AI doing search to find weird corner cases. The inferred next step in the sequence might be expected to give similar likely answers, with no similar guarantee for corner-case answers.)
I had a post on empirically bridging syntax and semantics. It used the example of temperature, building on McCarthy and Searle's dispute about the beliefs of thermostats.
But temperature wasn't an ideal illustration of my points, as humans are not fine in their temperature sensitivity, so I'm presenting a better example here: detecting an intruder.
Internal and external variable
The external variable is a boolean x which corresponds to whether there is any human in a certain initially empty greenhouse.
There are five different "agents" with internal variables X:
Then all the Xi correlate well with the x in a lot of circumstances. If a passerby or a naive burglar get into the greenhouse, they will trigger the door alarm and the heat alarm, while the guard, the resourceful human, and the robot will all see the intruder.
It is, however, pretty easy to fool the door alarm: simply go through a window. Conversely, someone could open the door without entering (or the wind or an earthquake could do so), causing the alarm to trigger with no-one in the greenhouse. So Xa and x are correlated in a relatively narrow set of environments Ea. And if we consider instead the variable y "the electric circuit that goes through the door is unbroken", then it's clear that Xa and y are much better correlated than Xa and x; if there's a semantic meaning to Xa, then it's far closer to y than it is to x.
The heat-camera can also be fooled. Simply spray the lense with some infrared-opaque paint, then enter at your leisure. For the converse, maybe a entering bear could trigger the alarm. It seems clear that Xc is correlated with x in a much wider set of environments, Ec.
The human guard is hard to fool in either direction. We humans are very good at figuring out when other humans are around, so, assuming the guard is moderately attentive, tricking the guard in either direction requires a lot of work - though it is probably easier to trigger a false positive (the guard mistakenly thinks that there's a person in the greenhouse) than a false negative (the guard doesn't notice someone actually in the greenhouse). Confusing or overwhelming the guard becomes possible for intelligent adversaries. Still, the set of environments Eg where Xg is correlated to x is much larger.
The resourceful human is even harder to fool, because they have all the advantages of the guard, plus any extra precautions they may have taken (such as adding alarms, cameras, crowds of onlookers, etc...). So Eh is larger still.
Finally, bringing in a superintelligence really extends the accuracy of Xr, even against intelligent adversaries, so Er is again much larger than any of the previous sets of environments.
Not strict inclusion, not perfect correlation
The agents above are on a hierarchy: every one of them has a much larger set Ei of environments where Xi is correlated with x, than do any of the ones before that agent.
But none of the inclusions are strict. If someone sprays the heat-sensitive camera but then walks in through the door, the door-alarm will detect the intrusion even as the camera misses it. If someone disguises themselves as a table, they might be able to fool the guard but be caught by the camera. The resourceful human has their own personality, so there might be some manipulation of them that would fall flat for the guard.
And finally, even a superintelligence is computable, so the No Free Lunch theorems imply that there are some, stupidly complicated, environments in which Xa, Xh, Xh, and Xh are all equal to x, but Xr is not.
Since no computable agent can have a perfect correlation with the variable in question, there is a sense in which no symbol can be perfectly grounded (this gets even more obvious when you start slicing into the definition, and start wondering about the meanings of "human" and "a certain greenhouse" in x).
But, despite the lack of perfect inclusion and perfect correlation, there is a strong sense in which the later agents are better correlated than the earlier ones. Assume that we have a sensible computer language to pick a complexity prior in, and update on the world being roughly as we believe it to be. Then I'd be willing to wager that the posterior probabilities of the environments in which there are correlations, will be ordered: