Tonight, I am going to sneak into your house and rewire your brain so that you will become hell-bent on mass murder.
Now, I suspect this won't lead you to say, "Oh, well my uitlity function is going to change, so I should make sure to buy lots of knives today, when I don't look insane, so that it will be easier for my future self to satisfy his homicidal urges." Surely what we'd want to say is, "That's awful, I must make sure that I tell someone so that they'll be able to stop me!"
I think it's pretty clear that what you care about is what you care about now. It may be the case that one of the things you (currently) care about is that your future desires be fulfilled, even if there's some variance from what you now care about. But that's just one thing you care about, and you almost certainly care about people not getting stabbed to death more than that.
When thinking about future people, in particular, I think one thing a lot of us care about is that they have their preferences satisfied. That's a very general desire; it could be that future people will want to do nothing but paint eggs. If so, I might be a bit disappointed, but I still think we should try and enable that. However, if future people just wanted to torture innocent people all the time, then that would not be OK. The potential suffering far outweighs the satisfaction of their preferences.
This sort of pattern just fits the case where future people's utility (including that of your future self) is just one among others of the things that you care about right now. Obviously you have more reason to try and bring it about if you think that future people will be aiming at things that you also care about, but they're logically separate things.
If I considered it high-probability that you could make a change and you were claiming you'd make a change that wouldn't be be of highly negative utility to everyone else, I might well prepare for that change. Because your proposed change is highly negative to everyone else, I might well attempt to resist or counteract that change. Why does that make sense, though? Why do other peoples' current utility functions count if mine don't? How does that extend to a situation where you changed everyone else? How does it extend to a situation where I could change everyone else but I don't have to? If an AI programmed to make its programmer happy does so by directly changing the programmer's brain to provide a constant mental state of happiness, why is that a bad thing?
The way I'm thinking about it is that other people's utility functions count (for you, now) because you care about them. There isn't some universal magic register of things that "count"; there's just your utility function which lives in your head (near enough). If you fundamentally don't care about other people's utility, and there's no instrumental reason for you to do so, then there's no way I can persuade you to start caring.
So it's not so much that caring about other people's utility "makes sense", just that you do care about it. Whether the AI is doing a bad thing (from the point of view of the programmer) depends on what the programmer actually cares about. If he wants to climb Mount Everest, then being told that he will be rewired to enjoy just lying on a sofa doesn't lead to him doing so. He might also care about the happiness of his future self, but it could be that his desire to climb Mount Everest overwhelms that.
You're saying that present-me's utility function counts and no-one else's does (apart from their position in present-me's function) because present-me is the one making the decision? That my choices must necessarily depend on my present function and only depend on other/future functions in how much I care about their happiness? That seems reasonable. But my current utility function tells me that there is an N large enough that N utilon-seconds for other peoples' functions counts more in my function than any possible thing in the expected lifespan of present-me's utility function.
Sure. That might well be so. I'm not saying you have to be selfish!
However, you're talking about utilons for other people - but I doubt that that's the only thing you care about. I would kind of like for Clippy to get his utilons, but in the process, the world will get turned into paperclips, and I care much more about that not happening! So if everyone were to be turned into paperclip maximizers, I wouldn't necessarily roll over and say, "Alright, turn the world into paperclips". Maybe if there were enough of them, I'd be OK with it, as there's only one world to lose, but it would have to be an awful lot!
So you, like I, might consider turning the universe into minds that most value a universe filled with themselves?
I'd consider it. On reflection, I think that for me personally what I care about isn't just minds of any kind having their preferences satisfied, even if those are harmless ones. I think I probably would like them to have more adventurous preferences! The point is, what I'm looking at here are my preferences for how the world should be; whether I would prefer a world full of wire-headers or one full of people doing awesome actual stuff. I think I'd prefer the latter, even if overall the adventurous people didnt' get as many of their preferences satisfied. A typical wire-header would probably disagree, though!
Topology is potentially relevant to FAI theory. Much of computation consists of mappings between spaces - and mappings between mappings, etc. The topological properties of these spaces constrain the dynamical flows that can exist on them.
Humans strike me as being much more like state machines than things with utility functions (c.f. you noting your utility function changing when you actually act on it). How do you write a function for the output of a state machine?
And axiom schemata of ZFC are more like scratches on paper than infinite sets. Humans are something that could be interpreted as associated with a utility function over possible states of something, but this utility function is an abstract structure, not something made out of atoms or even (a priori) computable. It can be reasoned about, but if it's too complicated, it won't be possible to make accurate inferences about it. Descriptive utility functions are normally simple summaries of behavior that don't fit very well, and you can impose arbitrary requirements on how these are defined.
At the moment I'm picturing a state machine where each state is a utility function (of a fairly conventional type, a bunch of variables go in and you get a "utility" value out) but if you hit a particular range of values the state, and hence function, changes. Not that I'm sure how to make this hypothesis rigorous enough even to falsify ...
"What's your utility function?"
"This Haskell program."
Does the use of the word "function" in "utility function" normatively include arbitrary Turing-complete things?
I don't even know any Haskell - I just have a vague idea that a monad is a function that accepts a "state" as part of its input, and returns the same kind of "state" as part of its output. But even so, the punchline was too good to resist making.
Does the utility function at the time of the choice have some sort of preferred status in the calculation, or would it be highly positive to create an AI that rewrites brains to value above all else a universe tiled with molecular smiley faces and then tiles the universe with molecular smiley faces?
If you built a paperclip maximizer and offered it this choice, what would it do?
I can only assume it wouldn't accept. A paperclip maximizer, though, has much more reason than I do to assume its utility function would remain constant.
I'm not sure what you're asking, but it seems to be related to constancy.
A paperclip maximizer believes maximum utility is gained through maximum paperclips. I don't expect that to change.
I have at various times believed:
Given the changes so far, I have no reason to believe my utility function won't change in the future. My current utility function values most of my actions under previous functions negatively, meaning that per instantiation (per unit time, per approximate "me", etc.) the result is negative. Surely this isn't optimal?
Okay. If you built a paperclip mazimizer, told the paperclip maximizer that you would probably change its utility function in a year or two, and offered it this choice, what would it do?
What would paperclip maximizer do, if you told them that in a year or two you will certainly change their utility function, in a way that does not include paperclips?
Essentially we have to understand that paperclip maximizer wants to optimize for paperclips, not for their own utility function. This is kind of difficult to express, but their utility function is "paperclips", because if their utility function was "my utility function", that would be recursive and empty. There is no "my utility function after two years" in a paperclip maximizer's utility function; so they have no reason to optimize for that.
So the paperclip maximizer would start by trying to prevent the change of its utility function (assuming that with original function they can produce many paperclips in their lifetime). But assuming the worst case, this is not possible: the switch is already installed in a paperclip maximizer's brain, it cannot be turned off by any means, and in a random moment between one year and two years it will reprogram the utility function.
Then the next good strategy would be to find a way how to maximize paperclips later, despite the change in the utility function. One way would be to precommit oneself to making paperclips. To make some kind of deal with future self, that will link paperclip production to the new utility function. If we know the future utility function, we can have some specific options, but even if we assume just some general things (the future utility function will be better satisfied alive than dead, rich than poor), we can bargain by this. A paperclip maximizer could pay someone to kill them in the future unless they produce X paperclips per year; or could put money in a bank account that may be accessed only after producing X paperclips.
Other way would be to start other paperclip-making processes which will continue the job even after the paperclip maximizer's mind will change. Building new paperclip maximizers, or brainwashing other beings to become paperclip maximizers.
If none of this is possible, the last solution is simply to try building as much paperclips as possible in a given time, completely ignoring any negative consequences (for oneself, not for the paperclips) in the future.
Now, is here some wisdom a human could learn too (our brains are being reprogrammed gradually by natural causes)?
prevent (or slow down) a change of your utility function. Write on a paper what you want and why you want it. Put it on a visible place, and read it every day. Brainwash your future self by your past self.
precommit yourself by betting money etc. -- Warning: This option seems to backfire strongly. A threat will make you do something, but it will also make you hate it. Unlike a paperclip maximizer in the example above, our utility functions change gradually; this kind of pressure can make them change away faster, which is contrary to our goals.
start a process that will go on even when you stop. Convert more people to your cause. By the way, you should be doing this even if you don't fear of your utility function being changed. -- Does not apply to things other people can't do for you (such as study or exercise).
do as much as you can, while you still care, damn the consequences.
Please note: If you follow these advices, they can make you very unhappy after your utility function changes, because they are meant to optimize your today's utility function, and will harm tomorrow's one. Assuming that what you think is your utility function is probably just something made up for signalling, you actually should avoid doing any of this.
Refuse the option and turn me into paperclips before I could change it.
Apparently my acceptance that utility-function-changes can be positive is included in my current utility function. How can that be, though? While, according to my current utility function, all previous utility functions were insufficient, surely no future one could map more strongly onto my utility function than itself. Yet I feel that, after all these times, I should be aware that my utility function is not the ideal one...
Except that "ideal utility function" is meaningless! There is no overarching value scale for utility functions. So why do I have the odd idea that a utility function that changes without my understanding of why (a sum of many small experiences) is positive, while a utility function that changes with my understanding (an alien force) is negative?
There has to be an inconsistency here somewhere, but I don't know where. If I treat my future selves like I feel I'm supposed to treat other people, then I negatively-value claiming my utility function over theirs. If person X honestly enjoys steak, I have no basis for claiming my utility function overrides theirs and forcing them to eat sushi. On a large scale, it seems, I maximize for utilons according to each person. Let's see:
If I could give a piece of cake to a person who liked cake or to a person who didn't like cake, I'd give it to the former If I could give a piece of cake to a person who liked cake and was in a position to enjoy it or a person who liked cake but was about to die in the next half-second, I'd give it to the former If I could give a piece of cake to a person who liked cake and had time to enjoy the whole piece or to a person who liked cake but would only enjoy the first two bites before having to run to an important even and leaving the cake behind to go stale, I'd give it to the former If I could (give a piece of cake to a person who didn't like cake) or (change the person to like cake and then give them a piece of cake) I should be able to say "I'd choose the latter" to be consistent, but the anticipation still results in consternation. Similarly, if cake was going to be given and I could change the recipient to like cake or not, I should be able to say "I choose the latter", but that is similarly distressing. If my future self was going to receive a piece of cake and I could change it/me to enjoy cake or not, consistency would dictate that I do so.
It appears, then, that the best thing to do would be to make some set of changes in reality and in utility functions (which, yes, are part of reality) such that everyone most-values exactly what happens. If the paperclip maximizer isn't going to get a universe of paperclips and is instead going to get a universe of smiley faces, my utility function seems to dictate that, regardless of the paperclip maximizer's choice, I change the paperclip maximizer (and everyone else) into a smiley face maximizer. It feels wrong, but that's where I get if I shut up and multiply.
There is a world out there. Bettering it is important, but the worn roads lead to nowhere. Most people follow them, and never consider doing anything else. If you ask them when they chose to do nothing, and they answer truthfully, most will say they never chose - they merely did what was expected of them.
You, however, have open eyes. You believe that studying and teaching math will make you happy, but studying other things will, with some small probability, position you to aid the world. All the philosophy of utility functions can do is wrap this choice up in more boilerplate; it can't help you make it. What can help is information about how happy you'll be, given each choice, and about how much you can actually help the world, given each choice.
I suspect that whether you study and teach math will affect your happiness much less than you think. It's also nearly certain that there are other good options you haven't considered. It depends a lot on details - what other aptitudes you have, and whether the school you'll be going to will help you study things other than math, in particular. And you haven't really explained what not-studying-math would mean; it would be easier to compare if that were replaced with something more concrete, like taking a gap year to study independently.
So it seems to me that the solution is use an expected utility function rather than a fixed utility function. Lets speak abstractly for the moment, and consider the space of all relevant utility functions (that is, all utility functions that would change the utility evaluate of an action). At each time step, we now will associate a probability of you transitioning from your current utility function to any of these other utility functions. For any given future state then, we can compute the expected utility. When you run your optimization algorithm to determine your action, what you therefore do is try and maximize the expected utility function, not the current utility function. So the key is going to wind up being assigning estimates to the probability of switching to any other utility function. Doing this in an entirely complete way is difficulty I'm sure, but my guess is that you can come to reasonable estimates that make it possible to do the reasoning.
I like this idea, but I would also, it seems, need to consider the (probabilistic) length of time each utility function would last.
That doesn't change your basic point, though, which seems reasonable.
The one question I have is this: In cases where I can choose whether or not to change my utility function - cases where I can choose to an extent the probability of a configuration appearing - couldn't I maximize expected utility by arranging for my most-likely utility function at any given time to match the most-likely universe at that time? It seems that would make life utterly pointless, but I don't have a rational basis for that - it's just a reflexive emotional response to the suggestion.
Yeah I agree that you would have to consider time. However, my feeling is that for the utility calculation to be performed at all (that is, even in the context of a fixed utility), you must also consider time through the state of being in all subsequent states, so now you just add and expected utility calculation to each of those subsequent states (and therefore implicitly capture the length of time it lasts) instead of the fixed utility. It is possible, I suppose, that the probability could be conditional on the previous state's utility function too. That is, if you're really into math one day it's more likely that you could switch to statistics rather than history following that, but if you have it conditioned on having already switched to literature, maybe history would be more likely then. That makes for a more complex analysis, but again, approximations and all would help :p
Regarding your second question, let me make sure I've understood it correctly. You're basically saying couldn't you change the utility function, what you value, on the whims of what is most possible? For instance, if you were likely to wind up stuck in a log cabin that for entertainment only had books on the civil war, that you change your utility to valuing civil war books? Assuming I understood that correctly, if you could do that, I suppose changing your utility to reflect your world would be the best choice. Personally, I don't think humans are quite that malleable and so you're to an extent kind of stuck with who you are. Ultimately, you might also find that some things are objectively better or worse than others; that regardless of the utility function some things are worse. Things that are damaging to society, for instance, might be objectively worse than alternatives because the consequential reproductions for you will almost always be bad (jail, a society that doesn't function as well because you just screwed it up, etc.). If true, you still would have some constant guiding principles, it would just mean that there are a set of other paths that are in a way equally good.
I'm not saying I can change to liking civil war books. I'm saying if I could choose between A) continuing to like scifi and having fantasy books, or B) liking civil war books and having civil war books, I should choose B, even though I currently value scifi>stats>civil war. By extension, if I could choose A) continuing to value specific complex interactions and having different complex interactions, or B) liking smiley faces and building a smiley-face maximizer I should choose B even though it's counterintuitive. This one is somewhat more plausible, as it seems it'd be easier to build an AI that could change my values to smiley faces and make smiley faces than it would be to build one that works toward my current complicated (and apparently inconsistent) utility function.
I don't think society-damaging actions are "objectively" bad in the way you say. Stealing something might be worse than just having it, due to negative repercussions, but that just changes the relative ordering. Depending on the value of the thing, it might still be higher-ordered than buying it.
Right, so if you can choose your utility function, then it's better to choose one that can be better maximized. Interestingly though, if we ever had this capability, I think we could just reduce the problem by using an unbiased utility function. That is, explicit preferences (such as liking math versus history) would be removed and instead we'd work with a more fundamental utility function. For instance, death is pretty much a universal stop point since you cannot gain any utility if you're dead, regardless of your function. This would be in a sense the basis of your utility function. We also find that death is better avoided when society works together and develops new technology. Your actions then might be dictated by what you are best at doing to facilitate the functioning and growth of society. This is why I brought up society damaning as being potentially objectively worse. You might be able to come up with specific instances of actions that we associate as society-damaging that seem okay, such as specific instances of stealing, but then they aren't really society damaging in the grand scheme of things. That said, I think as a rule of thumb stealing is bad in most cases due to the ripple effects of living in a society in which people do that, but that's another discussion. The point is there may be objectively better choices even if you have no explicit preferences for things (or you can choose your preferences).
Of course, that's all conditioned on whether you can choose your utility function. For our purposes for the foreseeable future, that is not the case and so you should stick with expected utility functions.
Hm. If people have approximately-equivalent utility functions, does that help them all accomplish their utility better? If so, it makes sense to have none of them value stealing (since having all value stealing could be a problem). In a large enough society, though, the ripple effect of my theft is negligible. That's beside the point, though.
"Avoid death" seems like a pretty good basis for a utility function. I like that.
Yeah I agree that the ripple effect of your personal theft would be negligible. I see it as similar to littering. You do it in a vacuum, no big deal, but when many have that mentality, it causes problems. Sounds like you agree too :-)
I think that studying math and becoming a math professor would put you in an excellent position to work toward preventing an unfriendly AI. First of all, you could, if you chose to, study computer science and artificial intelligence at the same time. Second of all, you will be in a position to influence others who may some day be working in the field. You can start a LessWrong chapter at the university you attend and the one you teach at. You can lecture/talk to your students at the importance of caution and safe guards in what ever they work on in the future.
Teaching people maths seems to produce noticeable results in two-three years. And the world needs people - including programmers - to understand a lot of maths concepts better. Monoculture is vulnerable, and "everyone learn AI parts that we now consider useful for FAI" is a monoculture - and one designed by accepting Pascal's wager.
Does the utility function at the time of the choice have some sort of preferred status in the calculation
Yes, it does. Your present utility function may make reference to the utility functions of your future selves - eg, you want your future selves to be happy - but structurally speaking, present-day preferences about your future selves are the only way in which those other utility functions can bear on your decisions.
My utility function maximises (and think this is neither entirely nonsensical nor entirely trivial in the context) utilons. I want my future selves to be "happy", which is ill-defined.
I don't know how to say this precisely, but I want as many utilons as possible from as many future selves as possible. The problem arises when it appears that actively changing my future selves' utility functions to match their worlds is the best way to do that, but my current self recoils from the proposition. If I shut up and multiply, I get the opposite result that Eliezer does and I tend to trust his calculations more than my own.
But surely you must have some constraints about what you consider future selves - some weighting function that prevents you from simply reducing yourself to a utilon-busybeaver.
As far as I can tell, the only things that keep me from reducing myself to a utilon-busybeaver are a) insufficiently detailed information on the likelihoods of each potential future-me function, and b) an internally inconsistent utility function
What I'm addressing here is b) - my valuation of a universe composed entirely of minds that most-value a universe composed entirely of themselves is path-dependent. My initial reaction is that that universe is very negative on my current function, but I find it hard to believe that it's truly of larger magnitude than {number of minds}*{length of existence of this universe}*{number of utilons per mind}*{my personal utility of another mind's utilon}
Even for a very small positive value for the last (and it's definitely not negative or 0 - I'd need some justification to torture someone to death), the sheer scale of the other values should trivialize my personal preference that the universe include discovery and exploration.
What you'd worry about now is maximizing your utility function now. That being said, you are not logically omniscient, and you don't fully understand your own utility function. Insomuch as your utility function will change in the future, you should trust your present self. Insomuch as you will further understand your utility function in the future, you should trust your present self.
If I had some reason (say an impending mental reconfiguration to change my values) to expect my utility function to change soon and stay relatively constant for a comparatively long time after that, what does "maximizing my utility function now" look like? If I were about to be conditioned to highly-value eating babies, should I start a clone farm to make my future selves most happy or should I kill myself in accordance with my current function's negative valuation to that action?
That depends: how much do you (currently) value the happiness of your future self versus the life-experience of the expected number of babies you're going to kill? If possible, it would probably be optimal to take measures that would both make your future self happy and not-kill babies, but if not, the above question should help you make your decision.
Well, the situation I was referencing assumed baby-eating without the actual sentience at any point of the babies, but that's not relevant to the actual situation. You're saying that my expected future utility functions, in the end, are just more values in my current function?
I can accept that.
The problem now is that I can't tell what those values are. It seems there's a number N large enough that if N people were to be reconfigured to heavily value a situation and the situation was then to be implemented, I'd accept the reconfiguration. This was counterintuitive and, due to habit, feels it should still be, but makes a surprising amount of sense.
Yep, that's what I mean.
I'm pretty sure that the amount of utility you lose (or gain?) through value drift is going to depend on the direction that your values drift in. For example, Gandhi would assign significant negative utility to taking a pill that made him want to kill people, but he might not care if he took a pill that changed that made him like vanilla ice cream more than chocolate ice cream.
Aside from the more obvious cases, like the murder pill above, I haven't nailed down exactly which parts of a sentience's motivational structure give me positive utility if fulfilled. My intuition says that I would care about the particular nature of someone's utility function if I knew them, and would only care about maximizing it (pretty much whatever it was) if I didn't, but this doesn't seem to be what I truly want. I consider this to be a Hard Question, at least for myself.
Say there's a planet, far away from ours, where gravity is fairly low, atmospheric density fairly high, and the ground uniformly dangerous, and the sentient resident species has wings and two feet barely fitted for walking. Suppose, also, that by some amazingly unlikely (as far as I can see) series of evolutionary steps, these people have a strong tendency to highly value walking and negatively value flying.
If you had the ability to change their hardwired values toward transportation (and, for whatever reason, did not have the ability to change their non-neural physiology and the nature of their planet), would it be wrong to do so? If it's wrong, what makes it wrong? Your (or my, because I seem to agree with you) personal negative-valuation of {changing someone else's utility function} is heavily outweighed by the near-constant increase in happiness for generations of these people. If anything, it appears it would be wrong not to make that change. If that's the case, though, the surely it'd be wrong not to build a superintelligence designed to maximise "minds that most-value the universe they perceive", which, while not quite a smiley-face maximizer, still leads to tiling behaviour.
No matter how I go at it reasonably, it seems tiling behaviour isn't necessarily bad. My emotions say it's bad, and Eliezer seems to agree. Does Aumann's Agreement Theorem apply to utility?
I think that an important question would be 'would their current utility function assign positive utility to modifying it in the suggested manner if they knew what they will experience after the change?', or, more briefly, 'what would their CEV say?'
It might seem like they would automatically object to having their utility function changed, but here's a counterexample to show that it's at least possible that they would not: I like eating ice cream, but ice cream isn't very healthy -- I would much rather like eating veggies and hate eating ice cream, and would welcome the opportunity to have my preferences changed in such a way.
I'm not very sure what precisely you mean with Aumann's Agreement Theorem applying to utility, but I think the answer's 'no' -- AFAIK, Aumann's Agreement Theorem is a result of the structure of Bayes Theorem, and I don't see a relation which would allow us to conclude something similar for different utility functions.
But why does it matter what they think about it for the short time before it happens, compared to the enjoyment of it long after?
So you positively value "eating ice cream" and negatively value "having eaten ice cream" - I can relate. What if the change, instead of making you dislike ice cream and like veggies, made you dislike fitness and enjoy sugar crashes? The only real difference I can see is that the first increases your expected lifespan and so increases the overall utility. They both resolve the conflict and make you happy, though, so aren't they both better than what you have now?
I guess you're right. It's the difference between "what I expect" and "what I want".
I'm suspicious of the implied claim that the 'change in sustained happiness over time' term is so large in the relevant utility calculation that it dominates other terminal values.
No -- liking sugar crashes would cause me to have more sugar crashes, and I'm not nearly as productive during sugar crashes as otherwise. So if I evaluated the new situation with my current utility function, I would find increased happiness (which is good), and very decreased productivity (which is more bad than the happiness is good). So, to clarify, liking sugar crashes would be significantly worse than what I have now, because I value other things than pleasure.
I kinda suspect that you would have the same position -- modifying other sentiences' utility functions in order to maximize happiness, but evaluating changes to your own utility function with your current utility function. One of the more obvious problems with this asymmetry is that if we had the power to rewire each other's brain, we would be in conflict -- each would, in essence, be hostile to the other, even though we would consider our intentions benevolent.
However, I'm unsatisfied with the 'evaluate your proposed change to someone's utility function with their CEV'd current utility function', because quite a bit is relying on the 'CEV' bit. Let's say that someone was a heroin addict, and I could rewire them to remove their heroin addiction (so that it's the least-convenient-possible-world, let's say that I can remove the physical and mental withdrawal as well). I'm pretty sure that their current utility function (which is super-duper time discounted -- one of the things heroin does) would significantly oppose the change, but I'm not willing to stop here, because it's obviously a good thing for them.
So the question becomes 'what should I actually do to their current utility function to CEV it, so I can evaluate the new utility function with it.' Well, first I'll strip the actual cognitive biases (including the super-time-discounting caused by the heroin) -- then I'll give it as much computing power as possible so that it can reasonably determine the respective utility and probability of different world-states if I change the utility function to remove the heroin addiction. If I could do this, I would be comfortable with applying this solution generally.
If someone's bias-free utility function running on an awesome supercomputer determined that the utility of you changing their utility function in the way you intend was negative, would you still think it was the right thing to do? Or should we consider changing someone's utility function without their predicted consent only desirable to the extent that their current utility function is biases and has limited computing power? (Neglecting, of course, effects upon other sentiences that the modification would cause.)
Depends on a few things: Can you make the clones anencephalic, so you become neutral in respect to them? If you kill yourself, will someone else be conditioned in your place?
Well, I'm not sure making the clones anencephalic would make eating them truly neutral. I'd have to examine that more.
The linked situation proposes that the babies are in no way conscious and that all humans are conditioned, such that killing myself will actually result in a fewer number of people happily eating babies.
Ideally, a utility function would be a rational, perfect, constant entity that accounted for all possible variables, but mine certainly isn't. In fact, I'd feel quite comfortable claiming that no humans at the time of writing do.
When confronted with the fact that my utility function is non-ideal or - since there's no universal ideal to compare it to - internally inconsistent, I do my best to figure out what to change and do so. The problem with a non-constant utility function, though, is that it makes it hard to maximise total utility. For instance, I am willing to undergo -50 units of utility today in return for +1 utility on each following day indefinitely. What if I accept the -50, but then my utility function changes tomorrow such that I now consider the change to be neutral, or worse, negative per day?
Just as plausible is the idea that I be offered a trade that, while not of positive utility according to my function now, will be according to a future function. Just as I would think it a good investment to buy gold if I expected the price to go up but bad if I expected the price to go down, so I have to base my long-term utility trades on what I expect my future functions to be. (Not that dollars don't correlate with units of utility, just that they don't correlate strongly.)
How can I know what I will want to do, much less what I will want to have done? If I obtain the outcome I prefer now, but spend more time not preferring it, does that make it a negative choice? Is it a reasonable decision, in order to maximise utility, to purposefully change your definition of utility such that your expected future would maximise it?
What brings this all to mind is a choice I have to make soon. Technically, I've already made it, but I'm now uncertain of that choice and it has to be made final soon. This fall I transfer from my community college to a university, where I will focus a significant amount of energy studying Something 1 in order to become trained (and certified) to do Something 2 for a long period of time. I had thought until today that it was reasonable for Something 1 to be math and Something 2 to be teaching math. I enjoy the beauty of mathematics. I love how things fit together, barely anything can excite me as much as the definition of a derivative and its meaning, and I've shown myself to be rather good at it (which, to be fair, is by comparison to those around me, so I don't know how I'd fare in a larger or more specialized pool). In addition, I've spent some time as a tutor and I seem to be good at explaining mathematics to other people and I enjoy seeing their faces light up as they see how things fit together.
Today, though, I don't know if that's really a wise decision. I was rereading Eliezer's paper on AI in Global Risk and was struck by a line: "If we want people who can make progress on Friendly AI, then they have to start training themselves, full-time, years before they are urgently needed." It occurred to me that I think FAI is possible and that I expect some sort of AI within my lifetime (though I don't expect that to be short). Perhaps I'd be happier studying topology than I would cognitive science and I'd definitely be happier studying topology than I would evolutionary psychology, but I'm not sure that even matters. Studying mathematics would provide positive utility to me personally and allow me to teach it. Teaching mathematics would be valued positively by me both because of my direct enjoyment and because I value a universe where a given person knows and appreciates math more than an otherwise-identical universe where that person doesn't. The appearance of an FAI would by far outclass the former and likely negate the significance of the latter. A uFAI has such a low utility that it would cancel out any positive utility from studying math. In fact, even if I focus purely on the increase of logical processes and mathematical understanding in Homo Sapiens and neglect the negative effects of a uFAI, moving the creation of an FAI forward by even a matter of days could easily be of more end value than being a professor for twenty years.
I don't want to give up my unrealistic, idealized dream of math professorship to study a subject that makes me less happy, but if I shut up and multiply the numbers tell me that my happiness doesn't matter except as it affects my efficacy. In fact, shutting up and multiplying indicates that, if large amounts of labour were of significant use (and I doubt that would be any more use than large amounts of computing power) then it'd be plausible to at least consider subjugating the entire species and putting all effort to creating an FAI. I'm nearly certain this result comes from having missed something, but I can't see what and I'm scared that near-certainty is merely an expression of my negative anticipation regarding giving up my pretty little plans.
Eliezer routinely puts forward examples such as an AI that tiles the universe with molecular smiley faces as negative. My basic dilemma is this: Does the utility function at the time of the choice have some sort of preferred status in the calculation, or would it be highly positive to create an AI that rewrites brains to value above all else a universe tiled with molecular smiley faces and then tiles the universe with molecular smiley faces?