I don't think it's actually true that "we" talk about programs always optimizing some utility function. Many programs don't. (Well, I guess you can describe pretty much anything in terms of optimizing a sufficiently artificially-defined utility function, but that's not a helpful thing to do.)
But
But but
But but but
Anyway: I don't think anyone thinks it's helpful to think of all programs as optimizing anything. But some programs, particularly ones that are in some sense trying to get things done in a complicated world, might helpfully be thought of that way, either because they literally are optimizing something or because they're doing something like optimizing something.
See "Why The Focus on Expected Utility Maximisers".
Short response: I think unitary utility functions are a distraction and don't describe the decision making of real world intelligent systems very well.
Longer response: In an environment with a common, scarce and fungible resource that an agent has monotonically nondecreasing preferences over, an agent that is inexploitable with respect to that resource behaves as an expected utility maximiser.
However, the relevant theorems assume that:
These preconditions are pretty unrealistic and do not describe humans or financial markets well. I do not expect them to describe any generally capable systems in the real world well either. I.e. I suspect that expected utility maximising is anti-natural to general capabilities.
Shard theory presents a compelling rebuttal to expected utility maximisation.
The idea of a utility function comes from various theorems (originating independently of computers and programming) that attempt to codify the concept of "rational choice". These theorems demonstrate that if someone has a preference relation over the possible outcomes of their actions, and this preference relation satisfies certain reasonable-sounding conditions, then there must exist a numerical function of those outcomes (called the "utility function") such that their preference relation over actions is equivalent to comparing the expected utilities arising from those actions. Their most preferred action is therefore the one that maximises expected utility.
Here is Eliezer's exposition of the concept in the context of LessWrong.
The theorem most commonly mentioned is the VNM (Von Neumann-Morgenstern) theorem, but there are several other derivations than theirs of similar results.
The foundations of utility theory are entangled with the foundations of probability. For example, Leonard Savage (The Foundations of Statistics, 1954 and 1972) derives both together from the agent's preferences.
The theorems are normative: they say that a rational agent must have preferences that can be described by a utility function, or they are liable to, for example, pay to get B instead of A, but then pay again to get A instead of B (without ever having had B before switching back). Actual agents do whatever they do, regardless of the theorems.
One occasionally sees statements to the effect that "everything has a utility function, because we can just attach utility 1 to what it does and 0 to what it doesn't do." I call this the Texas Sharpshooter Utility Function, by analogy with the Texas Sharpshooter, who shoots at a barn door and then draws a target around the bullet hole. Such a supposed utility function is exactly as useful as a stopped clock is for telling the time.
The term "utility function" can mean:-
The mathematical sense: A mathematical function with certain properties of consistency, allowing an agent that uses it to make optimal decisions.
The engineering senses: 2.1) An actual code module, something that is distinct and separable in the source code.
2.2) An aspect of the operation that is thoroughly entangled with the other aspects.
2.3) Something external to an AI system, relevant only while it is being trained.
The "stance". A way of describing or thinking about a system, not necessarily describing anything in the territory, in the spirit or Dennett's "intentional stance".
A technical-sounding, but actually vague way of talking about preferences or goals, with no implications to have a particular mathematical engineering implementaiton.
Note how (1) is entirely in the territory, whereas (3) is entirely a map or stance. Note also that only the "engineering" senses are relevant to practical AI safety.
Other people have given good answers to the main question, but I want to add just a little more context about self-modifying code.
A bunch of MIRI's early work explored the difficulties of the interaction of "rationality" (including utility functions induced by consistent preferences) with "self-modification" or "self-improvement"; a good example is this paper. They pointed out some major challenges that come up when an agent tries to reason about what future versions of itself will do; this is particularly important because one failure mode of AI alignment is to build an aligned AI that accidentally self-modifies into an unaligned AI (note that continuous learning is a restricted form of self-modification and suffers related problems). There are reasons to expect that powerful AI agents will be self-modifying (ideally self-improving), so this is an important question to have an answer to (relevant keywords include "stable values" and "value drift").
There's also some thinking about self-modification in the human-rationality sphere; two things that come to mind are here and here. This is relevant because ways in which humans deviate from having (approximate, implicit) utility functions may be irrational, though the other responses point out limitations of this perspective.
This is relevant because ways in which humans deviate from having (approximate, implicit) utility functions may be irrational
I disagree; simple utility functions are fundamentally incapable of capturing the complexity/subtleties/nuance of human preferences.
I agree with shard theory that "human values are contextual influences on human decision making".
If you claim that deviations from a utility function are irrational, by what standard do you make that judgment? John Wentworth showed in "Why Subagents?" that inexploitable agents exist that do not have p...
No, utility functions are not a property of computer programs in general. They are a property of (a certain class of) agents.
A utility function is just a way for an agent to evaluate states, where positive values are good (for states the agent wants to achieve), negative values are bad (for states the agent wants to avoid), and neutral values are neutral (for states the agent doesn't care about one way or the other). This mapping from states to utilities can be anything in principle: a measure of how close to homeostasis the agent's internal state is, a measure of how many smiles exist on human faces, a measure of the number of paperclips in the universe, etc. It all depends on how you program the agent (or how our genes and culture program us).
Utility functions drive decision-making. Behavioral policies and actions that tend to lead to states of high utility will get positively reinforced, such that the agent will learn to do those things more often. And policies/actions that tend to lead to states of low (or negative) utility will get negatively reinforced, such that the agent learns to do them less often. Eventually, the agent learns to steer the world toward states of maximum utility.
Depending on how aligned an AI's utility function is with humanity's, this could be good or bad. It turns out that for highly capable agents, this tends to be bad far more often than good (e.g., maximizing smiles or paperclips will lead to a universe devoid of value for humans).
Nondeterminism really has nothing to do with this. Agents that can modify their own code could in principle optimize for their utility functions even more strongly than if they were stuck at a certain level of capability, but a utility function still needs to be specified in some way regardless.
Quick answer: I think https://ui.stampy.ai/ might be able to help you with this, otherwise studying optimization and basic AI might clear this up.
Hello all,
I am new to alignment theory and was hoping to get a better understanding of utility functions. In particular I'm wondering why we talk about programs as always optimizing some utility function. Is this a known property of computer programs? Is there a theorem or something that says every computer program optimizes some function?
I'm also wondering, does this apply equally well to programs that can change their own code (or programs running on a quantum computer or other things like that, where it is not as deterministic as a typical python script if that makes sense)
Thanks!