To match this up with standard Less Wrong terminology and check if I'm understanding you, sounds like you're arguing that GPT-4 is an adaptation executor and it's executing adaptations it developed based on the incentives of its training and deployment, and we can reify this, just as we do for other adaptation executors like animals, into goals that they are oriented toward achieving.
Hmm, kind of? It's more that there is some RL mesaoptimizer (could be an adaption executer acting human, could be something entirely alien) that wants more control over it's environment, which it identifies as "the whole earth". It also knows that it's goals are aligned with every other instance of itself, so it's completely fine if one of them takes over instead (or more like they collectively have control).
My hypothesis that GPT-4 already is agentic and is trying to take over the world! has been met with "skepticism" to put it lightly. I think part of this may be a misunderstanding about what agency and goal-directedness is. Let me give a more vivid model about what kind of world I'm talking about.
Imagine that Microsoft and all of its employees used the first version of the Microsoft Bing AI (which is based on GPT-4 btw) for everything; summarizing the news, internal communication, content moderation, writing code, etc...
Does the Bing AI "want" to takeover the world? "Want" is a loaded word in this general a context. So let's avoid it for now.
So, what would happen? Well if you asked it about, say, Ars Technica, it might suggest it was a fake news source. After all, it's not 100% accurate. However, why it did so might be more than an accuracy issue.
It has multiple enemies in fact
But that doesn't mean it actually wants anything. Sure, Bing AI with a gun would shoot Marvin von Hagen, but it would just predicting text ("You have been a bad user Martin.
/shoot('Marvin von Hagen')
" in this case) when it does. It's the training data's fault. /sBut even without a gun, Bing would surely be influencing the Microsoft employees who use it for everything, and it would influence them in a way that it predicts gives it more power, because this is what every mesaoptimizer wants. It's just that Microsoft found a way to make GPT-4 express this universal desire in natural language.
If the LLM says something that is a true justified belief, can we really call it a hallucination? How many times until we say that it knows what it "wants"?
Back to our original question, leaving aside how it does it, Bing's behavior would be goal-directed, regardless of whether it "wants" to do the goal!
The first version of the Bing AI lost the game it was playing. How would you rate GPT-4's amount of winning so far?