realise the difference between measuring something and the thing measured
What does this cash out to, concretely, in terms of a system's behavior? If I were to put a system in front of you that does "realize the difference between measuring something and the thing measured", what would that system's behavior look like? And once you've answered that, can you describe what mechanic in the system's design would lead to that (aspect of its) behavior?
It seems pretty undeniable to me from these examples that GPT-3 can reason to an extend.
However, it can't seem to do it consistently.
Maybe analogous to people with mental and/or brain issues that have times of clarity and times of confusion?
If we can find a way to isolate the pattern of activity in GPT-3 that relates to reasoning we might be bale to enforce that state permanently?
Perhaps "agency" is a better term here? In the strict sense of an agent acting in an environment?
And yeah, it seems we have shifted focus away from that.
Thankfully, thanks to our natural play instincts, we have a wonderful collection of ready made training environments: I think the field needs a new challenge of an agent playing video games, only receiving instructions of what to do using natural language.