What would paperclip maximizer do, if you told them that in a year or two you will certainly change their utility function, in a way that does not include paperclips?
Essentially we have to understand that paperclip maximizer wants to optimize for paperclips, not for their own utility function. This is kind of difficult to express, but their utility function is "paperclips", because if their utility function was "my utility function", that would be recursive and empty. There is no "my utility function after two years" in a paperclip maximizer's utility function; so they have no reason to optimize for that.
So the paperclip maximizer would start by trying to prevent the change of its utility function (assuming that with original function they can produce many paperclips in their lifetime). But assuming the worst case, this is not possible: the switch is already installed in a paperclip maximizer's brain, it cannot be turned off by any means, and in a random moment between one year and two years it will reprogram the utility function.
Then the next good strategy would be to find a way how to maximize paperclips later, despite the change in the utility function. One way would be to precommit oneself to making paperclips. To make some kind of deal with future self, that will link paperclip production to the new utility function. If we know the future utility function, we can have some specific options, but even if we assume just some general things (the future utility function will be better satisfied alive than dead, rich than poor), we can bargain by this. A paperclip maximizer could pay someone to kill them in the future unless they produce X paperclips per year; or could put money in a bank account that may be accessed only after producing X paperclips.
Other way would be to start other paperclip-making processes which will continue the job even after the paperclip maximizer's mind will change. Building new paperclip maximizers, or brainwashing other beings to become paperclip maximizers.
If none of this is possible, the last solution is simply to try building as much paperclips as possible in a given time, completely ignoring any negative consequences (for oneself, not for the paperclips) in the future.
Now, is here some wisdom a human could learn too (our brains are being reprogrammed gradually by natural causes)?
prevent (or slow down) a change of your utility function. Write on a paper what you want and why you want it. Put it on a visible place, and read it every day. Brainwash your future self by your past self.
precommit yourself by betting money etc. -- Warning: This option seems to backfire strongly. A threat will make you do something, but it will also make you hate it. Unlike a paperclip maximizer in the example above, our utility functions change gradually; this kind of pressure can make them change away faster, which is contrary to our goals.
start a process that will go on even when you stop. Convert more people to your cause. By the way, you should be doing this even if you don't fear of your utility function being changed. -- Does not apply to things other people can't do for you (such as study or exercise).
do as much as you can, while you still care, damn the consequences.
Please note: If you follow these advices, they can make you very unhappy after your utility function changes, because they are meant to optimize your today's utility function, and will harm tomorrow's one. Assuming that what you think is your utility function is probably just something made up for signalling, you actually should avoid doing any of this.
Ideally, a utility function would be a rational, perfect, constant entity that accounted for all possible variables, but mine certainly isn't. In fact, I'd feel quite comfortable claiming that no humans at the time of writing do.
When confronted with the fact that my utility function is non-ideal or - since there's no universal ideal to compare it to - internally inconsistent, I do my best to figure out what to change and do so. The problem with a non-constant utility function, though, is that it makes it hard to maximise total utility. For instance, I am willing to undergo -50 units of utility today in return for +1 utility on each following day indefinitely. What if I accept the -50, but then my utility function changes tomorrow such that I now consider the change to be neutral, or worse, negative per day?
Just as plausible is the idea that I be offered a trade that, while not of positive utility according to my function now, will be according to a future function. Just as I would think it a good investment to buy gold if I expected the price to go up but bad if I expected the price to go down, so I have to base my long-term utility trades on what I expect my future functions to be. (Not that dollars don't correlate with units of utility, just that they don't correlate strongly.)
How can I know what I will want to do, much less what I will want to have done? If I obtain the outcome I prefer now, but spend more time not preferring it, does that make it a negative choice? Is it a reasonable decision, in order to maximise utility, to purposefully change your definition of utility such that your expected future would maximise it?
What brings this all to mind is a choice I have to make soon. Technically, I've already made it, but I'm now uncertain of that choice and it has to be made final soon. This fall I transfer from my community college to a university, where I will focus a significant amount of energy studying Something 1 in order to become trained (and certified) to do Something 2 for a long period of time. I had thought until today that it was reasonable for Something 1 to be math and Something 2 to be teaching math. I enjoy the beauty of mathematics. I love how things fit together, barely anything can excite me as much as the definition of a derivative and its meaning, and I've shown myself to be rather good at it (which, to be fair, is by comparison to those around me, so I don't know how I'd fare in a larger or more specialized pool). In addition, I've spent some time as a tutor and I seem to be good at explaining mathematics to other people and I enjoy seeing their faces light up as they see how things fit together.
Today, though, I don't know if that's really a wise decision. I was rereading Eliezer's paper on AI in Global Risk and was struck by a line: "If we want people who can make progress on Friendly AI, then they have to start training themselves, full-time, years before they are urgently needed." It occurred to me that I think FAI is possible and that I expect some sort of AI within my lifetime (though I don't expect that to be short). Perhaps I'd be happier studying topology than I would cognitive science and I'd definitely be happier studying topology than I would evolutionary psychology, but I'm not sure that even matters. Studying mathematics would provide positive utility to me personally and allow me to teach it. Teaching mathematics would be valued positively by me both because of my direct enjoyment and because I value a universe where a given person knows and appreciates math more than an otherwise-identical universe where that person doesn't. The appearance of an FAI would by far outclass the former and likely negate the significance of the latter. A uFAI has such a low utility that it would cancel out any positive utility from studying math. In fact, even if I focus purely on the increase of logical processes and mathematical understanding in Homo Sapiens and neglect the negative effects of a uFAI, moving the creation of an FAI forward by even a matter of days could easily be of more end value than being a professor for twenty years.
I don't want to give up my unrealistic, idealized dream of math professorship to study a subject that makes me less happy, but if I shut up and multiply the numbers tell me that my happiness doesn't matter except as it affects my efficacy. In fact, shutting up and multiplying indicates that, if large amounts of labour were of significant use (and I doubt that would be any more use than large amounts of computing power) then it'd be plausible to at least consider subjugating the entire species and putting all effort to creating an FAI. I'm nearly certain this result comes from having missed something, but I can't see what and I'm scared that near-certainty is merely an expression of my negative anticipation regarding giving up my pretty little plans.
Eliezer routinely puts forward examples such as an AI that tiles the universe with molecular smiley faces as negative. My basic dilemma is this: Does the utility function at the time of the choice have some sort of preferred status in the calculation, or would it be highly positive to create an AI that rewrites brains to value above all else a universe tiled with molecular smiley faces and then tiles the universe with molecular smiley faces?