People have already been creating GPT-based agents for a month or two. Does this basically just streamline and standardize that process?
The update makes GPT-4 more competent at being an agent, since it's now fine-tuned for function calling. It's a bit surprising that base GPT-4 (prior to the update) was able to use tools, as it's just trained for predicting internet text and following instructions. As such, it's not that good at knowing when and how to use tools. The formal API parameters and JSON specification for function calling should make it more reliable for using it as an agent and could lead to considerably more interest in engineering agents. It should be easier to connect it with a variety of APIs for interacting with the internet, including complex specifications.
Anecdotally, the AutoGPT team is observing significant improvements in accuracy and reliability after switching from their current hacky approach to using the function-calling version of GPT-4.
I am curious how this fine-tuning for function calling was done, because it is user controllable. In the OpenAI API, if you pass none
to function_call
parameter, the model never calls a function. There seem to be one input bit and one output bit, for "you may want to call a function" and "I want to call a function".
OpenAI’s latest update unlocks function calling for GPT-4 and GPT-3.5 Turbo. This enables developers to "describe functions to gpt-4-0613 and gpt-3.5-turbo-0613, and have the model intelligently choose to output a JSON object containing arguments to call those functions." This might seem like just another technical update, but it's more like handing a magic wand to AI.
We’re entering a world where large language models (LLMs) can not only chat with users but take actions in pursuit of goals. With function calling, developers can create competent AI systems that perform tasks by integrating external tools. These tools enable broad capabilities, from sending email to facilitating cyberattacks or synthesizing toxic chemicals in a lab.
This newfound power doesn’t come without its risks. Margaret Mitchell, most known for her work on automatically removing demographic biases from AI models, voices concerns about large language models (LLMs) taking actions autonomously. According to Mitchell, enabling LLMs to perform actions in the real world could edge us towards extinction-level events. Yoshua Bengio has argued for a “policy banning powerful autonomous AI systems that can act in the world […] unless proven safe.”
We’ve seen concerns raised earlier with the release of ChatGPT plugins in March 2023. Red teamers found that they could “send fraudulent or spam emails, bypass safety restrictions, or misuse information sent to the plugin,” and the computational linguist Emily Bender cautioned, “Letting automated systems take action in the world is a choice that we make.” Even so, the release of ChatGPT plugins claimed to have “safety as a core principle” through the following requirements:
None of these safeguards exist with function calling, through which GPT-4 can take a sequence of actions without humans in the loop. Developers can connect it with their own tools, with no oversight regarding whether such tools are used safely. Overnight, on June 27, function calling will be the default for GPT-3.5 and GPT-4. This kind of sudden rollout could lead to a repeat of how Bing Chat (an early version of GPT-4) was unexpectedly misaligned and readily expressed aggression towards users.
To an extent, GPT-3.5 and GPT-4 already had an emergent capacity for function calling, despite not being trained for it before now. End users tasked GPT-4 with various goals, including malicious objectives. However, their autonomy was often limited by basic shortcomings, such as hallucinating tools that they did not have access to. Now that GPT-3.5 and GPT-4 have been fine-tuned for function calling, usage as agents becomes much more viable.
OpenAI acknowledges potential risks, as evidenced by their statement, “a proof-of-concept exploit illustrates how untrusted data from a tool’s output can instruct the model to perform unintended actions.” As adversarial robustness remains an unsolved problem, language models are susceptible to prompt injection attacks and jailbreaking that can overcoming their hesitation to take unethical actions. Besides potential for misuse, the users themselves might be surprised by unintended outcomes, because they’re unaware of the emergent capabilities that exist in the models.
The advice from OpenAI is for developers to incorporate safety measures, such as relying on trusted data sources and including user confirmation steps before the AI performs actions with real-world impact. This means asking the user “are you sure?” before sending an email, posting online, or making a purchase.
Language models already have a number of concerning capabilities.
As language models become more autonomous and more advanced, we can expect the dual use capabilities and risks to rise.
We think there are many additional safety measures that OpenAI could take. As a few ideas (some of which are speculative):
Overall, advancing AI capabilities for autonomy and agency poses unique risks. Even if there are no terrible outcomes in the coming months, it’s important to take a proactive attitude toward addressing ongoing and future threats due to AI.
Thanks to Anish Upadhayay for comments and suggestions.