At this point in history, you have to be a bit more specific than the label "AGI," because I'd already consider language models to be above the minimum standard for "AGI."
But if you mean a program that navigates the real world at a near-human level and successfully carries out plans to perpetuate its existence, then I would expect such a program to have to work "out of the box," rather than being a pure simulacrum.
Not to say that language models can't be involved, but I'd count things like starting with a language model and then training it (or some supernetwork) to be an agent with RL as "designing it as an agent."
Thank you for your answer. In my example I was thinking of an AI such as a language model that would have latent ≥human-level capability without being an agent, but could easily be made to emulate one just long enough for it to get out of the box, e.g. duplicate itself. Do you think this couldn't happen?
More generally, I am wondering if the field of AI safety research studies somewhat specific scenarios based on the current R&D landscape (e.g. "A car company makes an AI to drive a car and then someone does xyz and then paperclips") and tailor-made safe...
I wonder about a scenario where the first AI with human or superior capabilities would be nothing goal-oriented, eg a language model like GPT. Then one instance of it would be used, possibly by a random user, to make a conversational agent told to behave as a goal-oriented AI. The bot would then behave as an AGI agent with everything that implies from a safety standpoint, eg using its human user to affect the outside world.
Is this a plausible scenario for the development of AGI and the first goal-oriented AGI? Does it have any implication regarding AI safety compared to the case of an AGI designed as goal-oriented from the start?