What you call invocations, I called 'bureaucracies' back in the day, and before that I believe they were called amplification methods. It's also been called scaffolding and language model programs and factored cognition. The kids these days are calling it langchain and ReAct and stuff like that.
I think I agree with your claims. ARC agrees also, I suspect; when I raised these concerns with them last year they said their eval had been designed with this sort of thing in mind and explained how.
I'm not surprised this idea was already in the water! I'm glad to hear ARC is already trying to design around this.
Yes, systems comprised of chains of calls to an LLM can be much more capable than a few individual, human-invoked completions. The effort needed to build such systems is usually tiny compared to the effort and expense needed to train the underlying foundation models.
Role architectures provides one way of thinking about and aligning such systems.
My post on steering systems also has some potentially relevant ways for thinking about these systems.
Well stated. I would go even further: the only short timeline scenario I can immagine involves some unholy combination of recursive LLM calls, hardcoded functions or non-LLM ML stuff, and API calls. There would probably be space to align such a thing. (sort of. If we start thinking about it in advance.)
Abstract: An LLM’s invocation is the non-model code around it that determines when and how the model is called. I illustrate that LLMs are already used under widely varying invocations, and that a model’s capabilities depend in part on its invocation. I discuss several implications for AI safety work including (1) a reminder that the AI is more than just the LLM, (2) discussing the possibility and limitations of “safety by invocation”, (3) suggesting safety evaluations use the most powerful invocations, and (4) acknowledging the possibility of an “invocation overhang”, in which an improvement in invocation leads to sudden capability gains on current models and hardware.
Defining Invocations, and Examples
An LLM’s invocation is the framework of regular code around the model that determines when the model is called, which inputs are passed to the LLM, and what is done with the model’s output. For instance, the invocation in the OpenAI playground might be called “simple recurrence”:
Note how many steps in “using the LLM” do not involve the actual model! Here are some ways this invocation can be varied:
Invocations Affect Capabilities
In this section I want to establish that invocations can improve capabilities. First, our prior from analogy to humans should support this claim - when solving e.g. math problems, access to scratch paper and a calculator makes a difference, as do “habits” such as checking your work rather than going with your first guess.
Furthermore, here are three examples of invocations affecting capabilities in the literature:
AI Safety Implications
From the system card: “Closed domain hallucinations refer to instances in which the model is instructed to use only information provided in a given context, but then makes up extra information that was not in that context.”