Could utility functions be for narrow AI only, and downright antithetical to AGI? That's a quite fundamental question and I'm kind of afraid there's an obvious answer that I'm just too uninformed to know about. But I did give this some thought and I can't find the fault in the following argument, so maybe you can?
Eliezer Yudkowsky says that when AGI exists, it will have a utility function. For a long time I didn't understand why, but he gives an explanation in AI Alignment: Why It's Hard, and Where to Start. You can look it up there, but the gist of the argument I got from it is:
- (explicit) If an agent's decisions are incoherent, the agent is behaving foolishly.
- Example 1: If an agent's preferences aren't ordered, the agent prefers A to B, B to C but also C to A, it behaves foolishly.
- Example 2: If an agent allocates resources incoherently, it behaves foolishly.
- Example 3: If an agent's preferences depend on the probability of the choice even having to be made, it behaves foolishly.
- Example 1: If an agent's preferences aren't ordered, the agent prefers A to B, B to C but also C to A, it behaves foolishly.
- (implicit) An AGI shouldn't behave foolishly, so its decisions have to be coherent.
- (explicit) Making coherent decisions is the same thing as having a utility function.
I accept that if all of these were true, AGI should have a utility function. I also accept points 1 and 3. I doubt point 2.
Before I get to why, I should state my suspicion why discussions of AGI really focus on utility functions so much. Utility functions are fundamental to many problems of narrow AI. If you're trying to win a game, or to provide a service using scarce computational resources, a well-designed utility function is exactly what you need. Utility functions are essential in narrow AI, so it seems reasonable to assume they should be essential in AGI because... we don't know what AGI will look like but it sounds similar to narrow AI, right?
So that's my motivation. I hope to point out that maybe we're confused about AGI because we took a wrong turn way back when we decided it should have a utility function. But I'm aware it is more likely I'm just too dumb to see the wisdom of that decision.
The reasons for my doubt are the following.
- Humans don't have a utility function and make very incoherent decisions. Humans are also the most intelligent organisms on the planet. In fact, it seems to me that the less intelligent an organism is, the easier its behavior can be approximated with model that has a utility function!
- Apes behave more coherently than humans. They have a far smaller range of behaviors. They switch between them relatively predictably. They do have culture - one troop of chimps will fish for termites using a twig, while another will do something like a rain dance - but their cultural specifics number in the dozens, while those of humans are innumerable.
- Cats behave more coherently than apes. There are shy cats and bold ones, playful ones and lazy ones, but once you know a cat, you can predict fairly precisely what kind of thing it is going to do on a random day.
- Earthworms behave more coherently than cats. There aren't playful earthworms and lazy ones, they basically all follow the nutrients that they sense around them and occasionally mate.
- And single-celled organisms are so coherent we think we can even model them them entirely on standard computing hardware. Which, if it succeeds, means we actually know e.coli's utility function to the last decimal point.
- Apes behave more coherently than humans. They have a far smaller range of behaviors. They switch between them relatively predictably. They do have culture - one troop of chimps will fish for termites using a twig, while another will do something like a rain dance - but their cultural specifics number in the dozens, while those of humans are innumerable.
- The randomness of human decisions seems essential to human success (on top of other essentials such as speech and cooking). Humans seem to have a knack for sacrificing precious lifetime for fool's errands that very occasionally create benefit for the entire species.
A few occasions where such fool's errands happen to work out will later look like the most intelligent things people ever did - after hindsight bias kicks in. Before Einstein revolutionized physics, he was not obviously more sane than those contemporaries of his who spent their lives doing earnest work in phrenology and theology.
And many people trying many different things, most of them forgotten and a few seeming really smart in hindsight - that isn't a special case that is only really true for Einstein, it is the typical way humans have randomly stumbled into the innovations that accumulate into our technological superiority. You don't get to epistemology without a bunch of people deciding to spend decades of their lives thinking about why a stick looks bent when it goes through a water surface. You don't settle every little island in the Pacific without a lot of people deciding to go beyond the horizon in a canoe, and most of them dying like the fools that they are. You don't invent rocketry without a mad obsession with finding new ways to kill each other. - An AI whose behavior is determined by a utility function has a couple of problems that human (or squid or dolphin) intelligence doesn't have, and they seem to be fairly intrinsic to having a utility function in the first place. Namely, the vast majority of possible utility functions lead directly into conflict with all other agents.
To define a utility function is to define a (direction towards a) goal. So a discussion of an AI with one, single, unchanging utility function is a discussion of an AI with one, single, unchanging goal. That isn't just unlike the intelligent organisms we know, it isn't even a failure mode of intelligent organisms we know. The nearest approximations we have are the least intelligent members of our species. - Two agents with identical utility functions are arguably functionally identical to a single agent that exists in two instances. Two agents with utility functions that are not identical are at best irrelevant to each other and at worst implacable enemies.
This enormously limits the interactions between agents and is again very different from the intelligent organisms we know, which frequently display intelligent behavior in exactly those instances where they interact with each other. We know communicating groups (or "hive minds") are smarter than their members, that's why we have institutions. AIs with utility functions as imagined by e.g. Yudkowsky cannot form these.
They can presumably create copies of themselves instead, which might be as good or even better, but we don't know that, because we don't really understand whatever it is exactly that makes institutions more intelligent than their members. It doesn't seem to be purely multiplied brainpower, because a person thinking for ten hours often doesn't find solutions that ten persons thinking together find in an hour. So if an AGI can multiply its own brainpower, that doesn't necessarily achieve the same result as thinking with others.
Now I'm not proposing an AGI should have nothing like a utility function, or that it couldn't temporarily adopt one. Utility functions are great for evaluating progress towards particular goals. Within well-defined areas of activity (such as playing Chess), even humans can temporarily behave as if they had utility functions, and I don't see why AGI shouldn't.
I'm also not saying that something like a paperclip maximizer couldn't be built, or that it could be stopped once underway. The AI alignment problem remains real.
I do contend that the paperclip maximizer wouldn't be an AGI, it would be narrow AI. It would have a goal, it would work towards it, but it would lack what we look for when we look for AGI. And whatever that is, I propose we don't find it within the space of things that can be described with (single, unchanging) utility functions.
And there are other places we could look. Maybe some of it is in whatever it is exactly that makes institutions more intelligent than their members. Maybe some of it is in why organisms (especially learning ones) play - playfulness and intelligence seem correlated, and playfulness has that incoherence that may be protective against paperclip-maximizer-like failure modes. I don't know.
I think utility functions can produce more behaviours than you give them credit for.
The less intelligent organisms are certainly more predictable. But I think that the less intelligent ones actually can't be described by utility functions and are instead predictable for other reasons. A classic example is the Sphex wasp.
So it looks like the wasp has a utility function "ensure the survival of its children" but in fact it's just following one of a number of fixed "programs". Whereas humans are actually capable of considering several plans and choosing the one they prefer, which I think is much closer to having a utility function. Of course humans are less predictable, but one would always expect intelligent organisms to be unpredictable. To predict an agent's actions you essentially have to mimic its thought processes, which will be longer for more intelligent organisms whether they use a utility function or not.
If trying actions at random produces useful results then a utility maximising AI will choose this course. Utility maximisers consider all plans and pick the one with the highest expected utility, and this can turn out to be one that doesn't look like it goes directly towards the goal. Eventually of course the AI will have to turn its attention towards its main goal. The question of when to do this is known as the exploration vs. exploitation tradeoff and there are mathematical results that utility maximisers tend to begin by exploring their options and then turn to exploiting their discoveries once they've learnt enough.
Again I think that this sort of behaviour (acting towards multiple goals) can be exhibited by utility maximizers. I'll give a simple example. Consider the agent who can by any 10 fruits from a market, and suppose its utility function is sqrt(number of oranges) + sqrt(number of apples). Then it buys 5 oranges and 5 apples (rather than just buying 10 apples or 10 oranges). The important thing about the example is the the derivative of the utility function is decreasing as the number of oranges increases, and so the more it has already the more it will prefer to buy apples instead. This creates a balance. This is just a simple example but by analogy it would be totally possible to create a utility function to describe a multitude of complex values all simultaneously.
Just like humans, two agents with different utility functions can cooperate through trade. The two agents calculate the outcome if they trade and the outcome if they don't trade, and they make the trade if the utility afterwards is higher for both of them. It's only if their utilities are diametrically opposed that they can't cooperate.
Agreed on that last point particularly. Especially since, if they want similar enough things, they could easily cooperate without trade.
Like if two AIs supported Alice in her role as Queen of Examplestan, they would probably figure that quibbling with each other over whether Bob the gardener should have one or two buttons undone (just on the basis of fashion, not due to larger consequences) is not a good use of their time.
Also, the utility functions can differ as much as you want on matters aren't going to come up. Like, Agents A and B disagree on how awful many bad things are. Both agree that they are all really quite bad and all effort should be put forth to prevent them.