It may be better to ask "Is a utility function a useful abstraction to describe how X makes decisions?" (Does it allow you to compress your description of X's decisions?) Recall that utility functions are just a representation derived from preferences that are structured in a particular way. But not all ways of deciding on a preferred outcome are structured in that way[1], and not all decision algorithms work by preferring outcomes, so thinking in terms of utility functions is not always helpful.
See for example:
Aumann, R. J. (1962). Utility theory without the completeness axiom. Econometrica: Journal of the Econometric Society, 445-462.
Bewley, T. F. (2002). Knightian decision theory. Part I. Decisions in economics and finance, 25(2), 79-110.
Even if it's a useful abstraction, it's only an abstraction. You can't make an AI safe by changing the it's UF unless it's UF is a distinct component at the engineering level, not just an abstraction.
I found janus's post Simulators to address this question very well. Much of AGI discussion revolves around agentic AIs (see the section Agentic GPT for discussion of this), but this does not model large language models very well. janus suggests that one should instead think of LLMs such as GPT-3 as "simulators". Simulators are not very agentic themselves or well described as having a utility function, though they may create simulacra that are agentic (e.g. GPT-3 writes a story where the main character is agentic).
A relevant passage from Simulators:
...We can specify some types of outer objectives using a ground truth distribution that we cannot with a utility function. As in the case of GPT, there is no difficulty in incentivizing a model to predict actions that are corrigible, incoherent, stochastic, irrational, or otherwise anti-natural to expected utility maximization. All you need is evidence of a distribution exhibiting these properties.
For instance, during GPT’s training, sometimes predicting the next token coincides with predicting agentic behavior, but:
- The acti
I think that the significant distinction is whether an AI system has a utility function that it is attempting to optimize at test time. A LLM does have an utility function, in that there is an objective function written in its training code that it uses to calculate gradients and update its parameters during training. However, once it is deployed, its parameters are frozen and its score on this objective function can no longer impact its behavior. In that sense, I don't think that it makes sense to think of a LLM as "trying to" optimize this objective after deployment. However, this answer could change in response to changes in model training strategy, which is why this distinction is significant.
YES, It wants to find the best next token, where 'best' is 'the most likely'.
That's a utility function. Its utility function is a line of code necessary for training, otherwise nothing would happen when you tried to train it.
Reply
A utility function is the assessment by which you decide how much an action would further your goals. If you can do that, highly accurately or not, you have a utility function.
If you had no utility function, you might decide you like NYC more than Kansas, and Kansas more than Nigeria, but you prefer Nigeria to NYC. So you get on a plane and fly in circles, hopping on planes every time you get to your destination forever.
Humans definitely have a utility function. We just don't know what ranks very highly on our utility function. We mostly agree on the low ranking stuff. A utility function is the process by which you rate potential futures that you might be able to bring about and decide you prefer some futures more than others.
With a utility function plus your (limited) predictive ability you rate potential futures as being better, worse, or equal to each other, and act accordingly.
There's a lot of discussion and research into AI alignment, almost always about variants of how to define/create a utility function (or meta-function, if it changes over time) that is actually aligned with ... something. That something is at least humanity's survival, but often something like flourishing or other semi-abstract goal. Oops, that's not my question for today.
My question for today is whether utility functions are actually part of the solution at all. Humans don't have them, the most interesting spurs toward AI don't have them. Maybe anything complicated enough to be called AGI doesn't have one (or at least doesn't have a simple, concrete, consistent one).