Fake Fake Utility Functions

Eliezer Yudkowsky

Followup to: Most of my posts over the last month...

Every now and then, you run across someone who has discovered the One Great Moral Principle, of which all other values are a mere derivative consequence.

I run across more of these people than you do. Only in my case, it's people who know the amazingly simple utility function that is all you need to program into an artificial superintelligence and then everything will turn out fine...

It's incredible how one little issue can require so much prerequisite material. My original schedule called for "Fake Utility Functions" to follow "Fake Justification" on Oct 31.

Talk about your planning fallacy. I've been planning to post on this topic in "just a few days" for the past month. A fun little demonstration of underestimated inferential distances.

You see, before I wrote this post, it occurred to me that if I wanted to properly explain the problem of fake utility functions, it would be helpful to illustrate a mistake about what a simple optimization criterion implied. The strongest real-world example I knew was the Tragedy of Group Selectionism. At first I thought I'd mention it in passing, within "Fake Utility Functions", but I decided the Tragedy of Group Selectionism was a long enough story that it needed its own blog post...

So I started to write "The Tragedy of Group Selectionism". A few hours later, I noticed that I hadn't said anything about group selectionism yet. I'd been too busy introducing basic evolutionary concepts. Select all the introductory stuff, cut, Compose New Post, paste, title... "An Alien God". Then keep writing until the "Alien God" post gets too long, and start taking separate subjects out into their own posts: "The Wonder of Evolution", "Evolutions Are Stupid", and at this point it became clear that, since I was planning to say a few words on evolution anyway, that was the time. Besides, a basic familiarity with evolution would help to shake people loose of their human assumptions when it came to visualizing nonhuman optimization processes.

So, finally I posted "The Tragedy of Group Selectionism". Now I was ready to write "Fake Utility Functions", right? The post that was supposed to come immediately afterward? So I thought, but each time I tried to write the post, I ended up recursing on a prerequisite post instead. Such as "Fake Selfishness", "Fake Morality", and "Fake Optimization Criteria".

When I got to "Fake Optimization Criteria", I really thought I could do "Fake Utility Functions" the next day. But then it occurred to me that I'd never explained why a simple utility function wouldn't be enough. We are a thousand shards of desire, as I said in "Thou Art Godshatter". Only that first required discussing "Evolutionary Psychology", which required explaining that human minds are "Adaptation-Executers, not Fitness-Maximizers", plus the difference between "Protein Reinforcement and DNA Consequentialism".

Furthermore, I'd never really explained the difference between "Terminal Values and Instrumental Values", without which I could hardly talk about utility functions.

Surely now I was ready? Yet I thought about conversations I'd had over the years, and how people seem to think a simple instruction like "Get my mother out of that burning building!" contains all the motivations that shape a human plan to rescue her, so I thought that first I'd do "The Hidden Complexity of Wishes". But, really, the hidden complexity of planning, and all the special cases needed to patch the genie's wish, was part of the general problem of recording outputs without absorbing the process that generates the outputs - as I explained in "Artificial Addition" and "Truly Part Of You". You don't want to keep the local goal description and discard the nonlocal utility function: "Leaky Generalizations" and "Lost Purposes".

Plus it occurred to me that evolution itself made an interesting genie, so before all that, came "Conjuring An Evolution To Serve You".

One kind of lost purpose is artificial pleasure, and "happiness" is one of the Fake Utility Functions I run into more often: "Not for the Sake of Happiness (Alone)". Similarly, it was worth taking the time to establish that fitness is not always your friend ("Evolving to Extinction") and that not everything in the universe is subject to significant selection pressures ("No Evolutions for Corporations or Nanodevices"), to avoid the Fake Utility Function of "genetic fitness".

Right after "Lost Purposes" seemed like a good time to point out the deep link between keeping track of your original goal and keeping track of your original question: "Purpose and Pragmatism".

Into the home stretch! No, wait, this would be a good time to discuss "Affective Death Spirals", since that's one of the main things that goes wrong when someone discovers The One True Valuable Thingy - they keep finding nicer and nicer things to say about it. Well, you can't discuss affective death spirals unless you first discuss "The Affect Heuristic", but I'd been meaning to do that for a while anyway. "Evaluability" illustrates the affect heuristic and leads to an important point about "Unbounded Scales and Futurism". The second key to affective death spirals is "The Halo Effect", which we can see illustrated in "Superhero Bias" and "Mere Messiahs". Then it's on to affective death spirals and how to "Resist the Happy Death Spiral" and "Uncritical Supercriticality".

A bonus irony is that "Fake Utility Functions" isn't a grand climax. It's just one of many Less Wrong posts relevant to my AI work, with plenty more scheduled. This particular post just turned out to require just a little more prerequisite material which - I thought on each occasion - I would have to write anyway, sooner or later.

And that's why blogging is difficult, and why it is necessary, at least for me. I would have been doomed, yea, utterly doomed, if I'd tried to write all this as one publication rather than as a series of blog posts. One month is nothing for this much material.

But now, it's done! Now, after only slightly more than an extra month of prerequisite material, I can do the blog post originally scheduled for November 1st!

Except...

Now that I think about it...

This post is pretty long already, right?

So I'll do the real "Fake Utility Functions" tomorrow.

[-]saifedean18y30

Brilliant! I like how this post is an annotated bibliography to many of your previous posts. This is really handy for this website. The one gripe I have with this website is how inconvenient it is to read old posts. It would be great if one could display articles by an author in one place.

Alternatively, it would be a great idea if you wrote something like this for ALL your previous posts, and for all future posts once a month or so.

[-]Matt_Huebert18y10

This post made me think that I'm reading your writings in a very inefficient manner if I do not follow the order they're written in, yet I cannot find a simple way to do so (without opening a few dozen tabs in firefox and gritting my teeth through the inevitable subsequent memory leak!). Am I missing something?

[-]Kenny13y20

All articles (posts)

[-]Andy18y10

I've often wondered why you didn't make this material into a book or something, because you clearly could have given your expansive knowledge. The fact that you are doing it in this mostly non-profit format impresses me, although it is no surprise given the honesty and knowledge in your writing, and your stated desire to "save the world."

Anyway, thanks.

[-]Hannu18y00

"This post made me think that I'm reading your writings in a very inefficient manner if I do not follow the order they're written in, yet I cannot find a simple way to do so"

Well one simple way to do it, is to subscribe to the RSS feed and then in the reader, search "Eliezer Yudkowsky", for example, and there you go. Every post in OB by Eliezer in the right order.

[-]botogol218y60

hmm... well, it was an interesting month but I'm not sure I didn't enjoy this blog more when it was a blog just about, well, overcoming bias. Eliezer, perhaps you should have your own blog...

BTW I saw you on the TV the other night, talking about the singularity.

It's always odd, isn't it, seeing a person for the first time whom you have previously known only in writing. As always, you didn't look/sound at all like I expected.

Which led me to an interesting line of thought: why does reading someone's writing produce any expectation at all of what the person looks/sounds like? It's a bias. Is there a halo effect: we expect intelligent writers to look tall and beautiful? Or is there a einstein-bias: we expect intelligent writers to be wild-looking with messy hair, talking quickly?

Anyway, there you were on the screen. Looking different. "Quick!", I yelled to my wife, who was in the kitchen, "quick, come here! It's Eliezer Yudkowsky!"

hmmm...perhaps I need to get out more.

LESSWRONG
LW

LESSWRONG
LW

44

Fake Fake Utility Functions

44

44