Utility functions have two very different meanings, and people keep confusing them.
On the one hand, you can take any object, and present it with choices, record what it actually does, and try to represent the pattern of its choices AS IF its internal architecture was generating all possible actions, evaluating them with a utility-function-module, and then taking the action with the highest utility in this situation. Call this the "observational" utility function.
On the other hand, you can build entities that do in fact have utility-function-modules as part of their internal architecture, either a single black box (as in some current AI architectures), or as some more subtle design parameter. Call this the "architectural utility function".
However, entities with an explicit utility function component have a failure mode, so-called "wireheading". If some industrial accident occurred and drove a spike into its brain, the output of the utility function module might be "fail high" and cause the entity to do nothing, or nothing except pursue similar industrial accidents. More subtle, distributed utility function modules would require more subtle, "lucky" industrial accidents, but the failure mode is still there.
Now, if you consider industrial accidents to be a form of stimulus, and you take an entity with a utility-function component, and try to compute its observational utility function, you will find that the observational utility function differs from the designed utility function - specifically, it pursues certain "wireheading" stimuli, even though those are not part of the designed utility function.
If you insist on using observational utility, then wireheading is meaningless; for example, addicted humans want addictive substances, and that's simply part of their utility function. However, I suggest that this is actually an argument against using observational utility. In order to design minds that are resistant to wireheading, we should use the (admittedly fuzzy) concept of "architectural utility" - meaning the output of utility function modules, even though that means that we can no longer say that a (for example) paperclip maximizer necessarily maximizes paperclips. It might try to maximize paperclips but routinely fail, and that pattern of behavior might be characterizable using a different, observational utility function something like "maximize paperclip-making attempts".
bentarm writes:
I'm just echoing everyone else here, but I don't understand why the AI would do anything at all other than just immediately find the INT_MAX utility and halt - you can't put intermediate problems with some positive utility because the AI is smarter than you and will immediately devote all its energy to finding INT_MAX.
Now, this is in response to a proposed AI who gets maximum utility when inside its box. Such an AI would effectively be a utility junkie, unable to abandon its addiction and, consequently, unable to do much of anything.
(EDIT: this is a misunderstanding of the original idea by jimrandomh. See comment here.)
However, doesn't the same argument apply to any AI? Under the supposition that it would be able to modify its own source code, the quickest and easiest way to maximize utility would be to simply set its utility function to infinity (or whatever the maximum is) and then halt. Are there ways around this? It seems to me that any AI will need to be divided against itself if it's ever going to get anything done, but maybe I'm missing something?