Comment author: Dorikka 11 May 2012 03:52:23PM 0 points [-]

I'm suspicious of the implied claim that the 'change in sustained happiness over time' term is so large in the relevant utility calculation that it dominates other terminal values.

No -- liking sugar crashes would cause me to have more sugar crashes, and I'm not nearly as productive during sugar crashes as otherwise. So if I evaluated the new situation with my current utility function, I would find increased happiness (which is good), and very decreased productivity (which is more bad than the happiness is good). So, to clarify, liking sugar crashes would be significantly worse than what I have now, because I value other things than pleasure.

I kinda suspect that you would have the same position -- modifying other sentiences' utility functions in order to maximize happiness, but evaluating changes to your own utility function with your current utility function. One of the more obvious problems with this asymmetry is that if we had the power to rewire each other's brain, we would be in conflict -- each would, in essence, be hostile to the other, even though we would consider our intentions benevolent.

However, I'm unsatisfied with the 'evaluate your proposed change to someone's utility function with their CEV'd current utility function', because quite a bit is relying on the 'CEV' bit. Let's say that someone was a heroin addict, and I could rewire them to remove their heroin addiction (so that it's the least-convenient-possible-world, let's say that I can remove the physical and mental withdrawal as well). I'm pretty sure that their current utility function (which is super-duper time discounted -- one of the things heroin does) would significantly oppose the change, but I'm not willing to stop here, because it's obviously a good thing for them.

So the question becomes 'what should I actually do to their current utility function to CEV it, so I can evaluate the new utility function with it.' Well, first I'll strip the actual cognitive biases (including the super-time-discounting caused by the heroin) -- then I'll give it as much computing power as possible so that it can reasonably determine the respective utility and probability of different world-states if I change the utility function to remove the heroin addiction. If I could do this, I would be comfortable with applying this solution generally.

If someone's bias-free utility function running on an awesome supercomputer determined that the utility of you changing their utility function in the way you intend was negative, would you still think it was the right thing to do? Or should we consider changing someone's utility function without their predicted consent only desirable to the extent that their current utility function is biases and has limited computing power? (Neglecting, of course, effects upon other sentiences that the modification would cause.)

Comment author: momothefiddler 12 May 2012 06:38:26AM 0 points [-]

I can't figure out an answer to any of those questions without having a way to decide which utility function is better. This seems to be a problem, because I don't see how it's even possible.

Comment author: Dorikka 11 May 2012 03:03:53AM 0 points [-]

I think that an important question would be 'would their current utility function assign positive utility to modifying it in the suggested manner if they knew what they will experience after the change?', or, more briefly, 'what would their CEV say?'

It might seem like they would automatically object to having their utility function changed, but here's a counterexample to show that it's at least possible that they would not: I like eating ice cream, but ice cream isn't very healthy -- I would much rather like eating veggies and hate eating ice cream, and would welcome the opportunity to have my preferences changed in such a way.

I'm not very sure what precisely you mean with Aumann's Agreement Theorem applying to utility, but I think the answer's 'no' -- AFAIK, Aumann's Agreement Theorem is a result of the structure of Bayes Theorem, and I don't see a relation which would allow us to conclude something similar for different utility functions.

Comment author: momothefiddler 11 May 2012 02:09:41PM 0 points [-]

But why does it matter what they think about it for the short time before it happens, compared to the enjoyment of it long after?

So you positively value "eating ice cream" and negatively value "having eaten ice cream" - I can relate. What if the change, instead of making you dislike ice cream and like veggies, made you dislike fitness and enjoy sugar crashes? The only real difference I can see is that the first increases your expected lifespan and so increases the overall utility. They both resolve the conflict and make you happy, though, so aren't they both better than what you have now?

I guess you're right. It's the difference between "what I expect" and "what I want".

Comment author: FeepingCreature 08 May 2012 12:31:33PM 0 points [-]

But surely you must have some constraints about what you consider future selves - some weighting function that prevents you from simply reducing yourself to a utilon-busybeaver.

Comment author: momothefiddler 08 May 2012 03:37:07PM *  0 points [-]

As far as I can tell, the only things that keep me from reducing myself to a utilon-busybeaver are a) insufficiently detailed information on the likelihoods of each potential future-me function, and b) an internally inconsistent utility function

What I'm addressing here is b) - my valuation of a universe composed entirely of minds that most-value a universe composed entirely of themselves is path-dependent. My initial reaction is that that universe is very negative on my current function, but I find it hard to believe that it's truly of larger magnitude than {number of minds}*{length of existence of this universe}*{number of utilons per mind}*{my personal utility of another mind's utilon}

Even for a very small positive value for the last (and it's definitely not negative or 0 - I'd need some justification to torture someone to death), the sheer scale of the other values should trivialize my personal preference that the universe include discovery and exploration.

Comment author: Alerus 07 May 2012 08:47:58PM 0 points [-]

Right, so if you can choose your utility function, then it's better to choose one that can be better maximized. Interestingly though, if we ever had this capability, I think we could just reduce the problem by using an unbiased utility function. That is, explicit preferences (such as liking math versus history) would be removed and instead we'd work with a more fundamental utility function. For instance, death is pretty much a universal stop point since you cannot gain any utility if you're dead, regardless of your function. This would be in a sense the basis of your utility function. We also find that death is better avoided when society works together and develops new technology. Your actions then might be dictated by what you are best at doing to facilitate the functioning and growth of society. This is why I brought up society damaning as being potentially objectively worse. You might be able to come up with specific instances of actions that we associate as society-damaging that seem okay, such as specific instances of stealing, but then they aren't really society damaging in the grand scheme of things. That said, I think as a rule of thumb stealing is bad in most cases due to the ripple effects of living in a society in which people do that, but that's another discussion. The point is there may be objectively better choices even if you have no explicit preferences for things (or you can choose your preferences).

Of course, that's all conditioned on whether you can choose your utility function. For our purposes for the foreseeable future, that is not the case and so you should stick with expected utility functions.

Comment author: momothefiddler 08 May 2012 03:11:57AM 0 points [-]

Hm. If people have approximately-equivalent utility functions, does that help them all accomplish their utility better? If so, it makes sense to have none of them value stealing (since having all value stealing could be a problem). In a large enough society, though, the ripple effect of my theft is negligible. That's beside the point, though.

"Avoid death" seems like a pretty good basis for a utility function. I like that.

Comment author: bryjnar 07 May 2012 09:10:26PM 3 points [-]

I'd consider it. On reflection, I think that for me personally what I care about isn't just minds of any kind having their preferences satisfied, even if those are harmless ones. I think I probably would like them to have more adventurous preferences! The point is, what I'm looking at here are my preferences for how the world should be; whether I would prefer a world full of wire-headers or one full of people doing awesome actual stuff. I think I'd prefer the latter, even if overall the adventurous people didnt' get as many of their preferences satisfied. A typical wire-header would probably disagree, though!

Comment author: momothefiddler 08 May 2012 03:08:53AM 0 points [-]

Fair.

Comment author: bryjnar 07 May 2012 02:41:12PM 0 points [-]

Sure. That might well be so. I'm not saying you have to be selfish!

However, you're talking about utilons for other people - but I doubt that that's the only thing you care about. I would kind of like for Clippy to get his utilons, but in the process, the world will get turned into paperclips, and I care much more about that not happening! So if everyone were to be turned into paperclip maximizers, I wouldn't necessarily roll over and say, "Alright, turn the world into paperclips". Maybe if there were enough of them, I'd be OK with it, as there's only one world to lose, but it would have to be an awful lot!

Comment author: momothefiddler 07 May 2012 08:07:46PM 0 points [-]

So you, like I, might consider turning the universe into minds that most value a universe filled with themselves?

Comment author: Alerus 07 May 2012 01:23:08PM 0 points [-]

Yeah I agree that you would have to consider time. However, my feeling is that for the utility calculation to be performed at all (that is, even in the context of a fixed utility), you must also consider time through the state of being in all subsequent states, so now you just add and expected utility calculation to each of those subsequent states (and therefore implicitly capture the length of time it lasts) instead of the fixed utility. It is possible, I suppose, that the probability could be conditional on the previous state's utility function too. That is, if you're really into math one day it's more likely that you could switch to statistics rather than history following that, but if you have it conditioned on having already switched to literature, maybe history would be more likely then. That makes for a more complex analysis, but again, approximations and all would help :p

Regarding your second question, let me make sure I've understood it correctly. You're basically saying couldn't you change the utility function, what you value, on the whims of what is most possible? For instance, if you were likely to wind up stuck in a log cabin that for entertainment only had books on the civil war, that you change your utility to valuing civil war books? Assuming I understood that correctly, if you could do that, I suppose changing your utility to reflect your world would be the best choice. Personally, I don't think humans are quite that malleable and so you're to an extent kind of stuck with who you are. Ultimately, you might also find that some things are objectively better or worse than others; that regardless of the utility function some things are worse. Things that are damaging to society, for instance, might be objectively worse than alternatives because the consequential reproductions for you will almost always be bad (jail, a society that doesn't function as well because you just screwed it up, etc.). If true, you still would have some constant guiding principles, it would just mean that there are a set of other paths that are in a way equally good.

Comment author: momothefiddler 07 May 2012 08:05:35PM 0 points [-]

I'm not saying I can change to liking civil war books. I'm saying if I could choose between A) continuing to like scifi and having fantasy books, or B) liking civil war books and having civil war books, I should choose B, even though I currently value scifi>stats>civil war. By extension, if I could choose A) continuing to value specific complex interactions and having different complex interactions, or B) liking smiley faces and building a smiley-face maximizer I should choose B even though it's counterintuitive. This one is somewhat more plausible, as it seems it'd be easier to build an AI that could change my values to smiley faces and make smiley faces than it would be to build one that works toward my current complicated (and apparently inconsistent) utility function.

I don't think society-damaging actions are "objectively" bad in the way you say. Stealing something might be worse than just having it, due to negative repercussions, but that just changes the relative ordering. Depending on the value of the thing, it might still be higher-ordered than buying it.

Comment author: bryjnar 07 May 2012 11:21:43AM *  1 point [-]

The way I'm thinking about it is that other people's utility functions count (for you, now) because you care about them. There isn't some universal magic register of things that "count"; there's just your utility function which lives in your head (near enough). If you fundamentally don't care about other people's utility, and there's no instrumental reason for you to do so, then there's no way I can persuade you to start caring.

So it's not so much that caring about other people's utility "makes sense", just that you do care about it. Whether the AI is doing a bad thing (from the point of view of the programmer) depends on what the programmer actually cares about. If he wants to climb Mount Everest, then being told that he will be rewired to enjoy just lying on a sofa doesn't lead to him doing so. He might also care about the happiness of his future self, but it could be that his desire to climb Mount Everest overwhelms that.

Comment author: momothefiddler 07 May 2012 11:31:42AM 0 points [-]

You're saying that present-me's utility function counts and no-one else's does (apart from their position in present-me's function) because present-me is the one making the decision? That my choices must necessarily depend on my present function and only depend on other/future functions in how much I care about their happiness? That seems reasonable. But my current utility function tells me that there is an N large enough that N utilon-seconds for other peoples' functions counts more in my function than any possible thing in the expected lifespan of present-me's utility function.

Comment author: Dorikka 07 May 2012 02:09:00AM 0 points [-]

Yep, that's what I mean.

I'm pretty sure that the amount of utility you lose (or gain?) through value drift is going to depend on the direction that your values drift in. For example, Gandhi would assign significant negative utility to taking a pill that made him want to kill people, but he might not care if he took a pill that changed that made him like vanilla ice cream more than chocolate ice cream.

Aside from the more obvious cases, like the murder pill above, I haven't nailed down exactly which parts of a sentience's motivational structure give me positive utility if fulfilled. My intuition says that I would care about the particular nature of someone's utility function if I knew them, and would only care about maximizing it (pretty much whatever it was) if I didn't, but this doesn't seem to be what I truly want. I consider this to be a Hard Question, at least for myself.

Comment author: momothefiddler 07 May 2012 11:26:39AM 0 points [-]

Say there's a planet, far away from ours, where gravity is fairly low, atmospheric density fairly high, and the ground uniformly dangerous, and the sentient resident species has wings and two feet barely fitted for walking. Suppose, also, that by some amazingly unlikely (as far as I can see) series of evolutionary steps, these people have a strong tendency to highly value walking and negatively value flying.

If you had the ability to change their hardwired values toward transportation (and, for whatever reason, did not have the ability to change their non-neural physiology and the nature of their planet), would it be wrong to do so? If it's wrong, what makes it wrong? Your (or my, because I seem to agree with you) personal negative-valuation of {changing someone else's utility function} is heavily outweighed by the near-constant increase in happiness for generations of these people. If anything, it appears it would be wrong not to make that change. If that's the case, though, the surely it'd be wrong not to build a superintelligence designed to maximise "minds that most-value the universe they perceive", which, while not quite a smiley-face maximizer, still leads to tiling behaviour.

No matter how I go at it reasonably, it seems tiling behaviour isn't necessarily bad. My emotions say it's bad, and Eliezer seems to agree. Does Aumann's Agreement Theorem apply to utility?

Comment author: bryjnar 07 May 2012 12:14:09AM 2 points [-]

Tonight, I am going to sneak into your house and rewire your brain so that you will become hell-bent on mass murder.

Now, I suspect this won't lead you to say, "Oh, well my uitlity function is going to change, so I should make sure to buy lots of knives today, when I don't look insane, so that it will be easier for my future self to satisfy his homicidal urges." Surely what we'd want to say is, "That's awful, I must make sure that I tell someone so that they'll be able to stop me!"

I think it's pretty clear that what you care about is what you care about now. It may be the case that one of the things you (currently) care about is that your future desires be fulfilled, even if there's some variance from what you now care about. But that's just one thing you care about, and you almost certainly care about people not getting stabbed to death more than that.

When thinking about future people, in particular, I think one thing a lot of us care about is that they have their preferences satisfied. That's a very general desire; it could be that future people will want to do nothing but paint eggs. If so, I might be a bit disappointed, but I still think we should try and enable that. However, if future people just wanted to torture innocent people all the time, then that would not be OK. The potential suffering far outweighs the satisfaction of their preferences.

This sort of pattern just fits the case where future people's utility (including that of your future self) is just one among others of the things that you care about right now. Obviously you have more reason to try and bring it about if you think that future people will be aiming at things that you also care about, but they're logically separate things.

Comment author: momothefiddler 07 May 2012 12:54:34AM 0 points [-]

If I considered it high-probability that you could make a change and you were claiming you'd make a change that wouldn't be be of highly negative utility to everyone else, I might well prepare for that change. Because your proposed change is highly negative to everyone else, I might well attempt to resist or counteract that change. Why does that make sense, though? Why do other peoples' current utility functions count if mine don't? How does that extend to a situation where you changed everyone else? How does it extend to a situation where I could change everyone else but I don't have to? If an AI programmed to make its programmer happy does so by directly changing the programmer's brain to provide a constant mental state of happiness, why is that a bad thing?

View more: Next