LESSWRONG
is fundraising!
LW
$

18

[ Question ]

Why does AGI need a utility function?

by randomstring

23rd Aug 2022

1 min read

18

My intuition says that a narrow AI like DALL-E would not blow up the world, no matter how much smarter it became. It would just get really good at making pictures.

This is clearly a form of superintelligence we would all prefer, and the difference seems to me to be that DALL-E doesn't really seem to have 'goals' or anything like that, it's just a massive tool.

Why do we care to have AGI with utility functions?

18

Why does AGI need a utility function?

7Charlie Steiner

9Charlie Steiner

6Vladimir_Nesov

3Gunnar_Zarncke

New Answer

New Comment

5 Answers sorted by
top scoring

Charlie Steiner

Aug 23, 2022

73

It doesn't have to be literally a utility function. To be more precise, we're worried about any sort of AGI that exhibits goal-directed behavior across a wide variety of real-world contexts.

Why would anyone build an AI that does that? Humans might build it directly because it's useful: AI that you can tell to achieve real-world goals could make you very rich. Or it might arise as an unintended consequence of optimizing in non-real-world domains (e.g. playing a videogame): goal-directed reasoning in that domain might be useful enough that it gets learned from scratch - and then goal-directed behavior in the real world might be instrumentally useful to achieving goals in the original domain (e.g. modifying your hardware to be better at the game).

That seems to be a bit of a motte-and-bailey. Goal-directed behavior does not require optimizing, satisficing works fine. Having a utility function means not stopping until it's maximized, as I understand it.

9Charlie Steiner2y

Eh. I genuinely don't expect to build an AI that acts like a utility maximizer in all contexts. All real-world agents are limited - they can get hit by radiation, or dumped into supernovae, or fed adversarial noise, etc. All we're ever going to see in the real world are things that can be intentional-stanced in broad but limited domains. Satisficers have goal-directed behavior sometimes, but not in all contexts - the more satisfied they are, the less goal-directed they are. If I built a satisficer who would be satisfied with merely controlling the Milky Way (rather than the entire universe), that's plenty dangerous. And coincidentally, it's going to be acting goal-directed in all contexts present in your everyday life, because none of them come close to satisfying it.

6Vladimir_Nesov2y

There is still the murky area of following some proxy utility function within a sensible goodhart scope (a thing like base distribution of quantilization), even if you are not doing expected utility maximization and won't be letting the siren song of the proxy lead you out of scope of the proxy. It just won't be the utility function that selection theorems assign to you based on your coherent decisions, because you won't be making coherent decisions according to the classical definitions if you are not doing expected utility maximization (which is unbounded optimization). But then if you are not doing expected utility maximization, it's not clear that things in the shape of utility functions are that useful in specifying decision problems. So a good proxy for an unknown utility function is not obviously itself a utility function.

Aug 24, 2022

62

Good answer from Gwern here.

[-]randomstring2y20

Thank you, this answered my question

1TAG2y

"Agent" of course means more than one thing, eg; 1. Active versus passive...basically acting unprompted if we are talking about software. 2. Acting on another's behalf, as in principal-agent. 3. Having a utility function of its own (or some sort of goals of its own) and optimising (or satisficing) it. 4. Something that depends on free will, consciousness, Selfhood, etc. Gwern's claim that it's advantageous for agents to be tools is clearly false in sense 1. Most of the instances of software in the world are passive.. Spreadsheets, word processors and so on, that sit there doing nothing when you fire them up. The market doesn't demand agentive1 versions of spreadsheets nd word processors, and they haven't been outcompeted by agentive versions. They are tools that want to remain tools. There are software agents in senses 1 and 2,such as automated trading software. Trading software is agentive in the principle-agent sense, ie. It's intended to make money for its creators, the Principal. They don't want it to have too much agency, because it might start losing them money, or breaking the law, or making money for someone else...its creators don't want it to have a will of its own, they want it to optimise their own utility function. So that's another sense in which "the more agency, the better", is false. (Incidentally, it also means that Control and Capability aren't orthogonal ...capability of a kind worth wanting needs to be somewhat controlled, and the easiest way to control is to keep capability minimal). Optimisation is definable for an agent that does not have its own UF,...it's optimising the principal's UF as well as it can,and as well as the principal/creator can communicate it. That's not scary...if it's optimising your UF , it's aligned with you, and if it isn't that's an ordinary problem of expressing your business goals in an algorithm. But achieving that is your problem..a type 2 agent does not have its own UF, so you are communicating your UF to

Aug 24, 2022

34

A more intelligent DALL-E wouldn't make pictures that people like better, it would more accurately approximate the distribution of images in its training data. And you're right that this is not dangerous, but it is also not very useful.

Aug 24, 2022

30

A utility function is an abstraction. It is not something that you literally program into an agent. A utility function is a dual to all the individual decisions made or the preferences between real or hypothetical options. A utility function always implicitly exists if the preferences satisfy certain reasonable requirements. But it is mostly not possible to determine the utility function from observed preferences because you'd need all preferences or make a lot of regularizing assumptions.

A utility function can be a real, separable feature of a system, but that is rather exceptional.

Aug 23, 2022

0-2

Goodness of the picture is the utility function.

Maybe the intuition can be pumped by thinking of a picture prompt like "Timelapse of the world getting fixed. Colorized historical photo 4k."

1 comment, sorted by

Click to highlight new comments since: Today at 4:59 PM

I assume there is an ELI5 post on that somewhere maybe someone can link it.