x

Model Uncertainty, Pascalian Reasoning and Utilitarianism — LessWrong

35

Model Uncertainty, Pascalian Reasoning and Utilitarianism

by multifoliaterose

14th Jun 2011

6 min read

35

Explicit ReasoningInfinities In EthicsPascal's MuggingUtilitarianism

35

Model Uncertainty, Pascalian Reasoning and Utilitarianism

5multifoliaterose

1Vladimir_Nesov

4multifoliaterose

0multifoliaterose

3Vladimir_Nesov

1Vladimir_Nesov

12Richard_Kennaway

2Richard_Kennaway

0Richard_Kennaway

0Richard_Kennaway

0Richard_Kennaway

0multifoliaterose

5multifoliaterose

2multifoliaterose

0Normal_Anomaly

1Normal_Anomaly

0Normal_Anomaly

0Normal_Anomaly

0Normal_Anomaly

7Vladimir_Nesov

0Normal_Anomaly

4Vladimir_Nesov

0Vladimir_Nesov

0Vladimir_Nesov

4Vladimir_Nesov

2Vladimir_Nesov

4Jonathan_Graehl

2The Dao of Bayes

0multifoliaterose

New Comment

155 comments, sorted by

Click to highlight new comments since: Today at 6:46 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]JenniferRM15y300

The more I think about it, the more I'm tempted to just bite the bullet and accept that my "empirically observed utility function" (to the degree that such a thing even makes sense) may be bounded, finite, with a lot of its variation spent measuring relatively local things like the prosaic well being of myself and my loved ones, so that there just isn't much left over to cover anyone outside my monkey sphere except via a generic virtue-ethical term for "being a good citizen n'stuff".

A first order approximation might be mathematically modeled by taking all the various utilities having to do with "weird infinite utilities", normalizing all those scenarios by "my ability to affect those outcomes" (so my intrinsic concern for things decreased when I "gave up" on affecting them... which seems broken but also sorta seems like how things might actually work) and then run what's left through a sigmoid function so their impact on my happiness and behavior is finite and marginal... claiming maybe 1% of my consciously strategic planning time and resource expenditures under normal circumstances.

Under this model, the real meat of my utility func... (read more)

7CarlShulman15y

I largely buy the framework of this comment, as I've said elsewhere. It does still leave the question of how to go about "being a good citizen n'stuff" with the limited portion of your efforts you want to invest in doing so. Most of multifoliaterose's questions could be reframed in those terms.

5multifoliaterose15y

Thanks for your thoughtful comment. I agree that it's unclear that it makes sense to talk about humans having utility functions; my use of the term was more a manner of speaking than anything else. It sounds like you're going with something like Counterargument #5 with something like the Dunbar number determining the point at which your concern for others caps off; this augmented by some desire to "be a good citizen n'stuff". Something similar may be true of me, but I'm not sure. I know that I derive a lot of satisfaction from feeling like I'm making the world a better place and am uncomfortable with the idea that I don't care about people who I don't know (in light of my abstract belief in space and time independence of moral value); but maybe the intensity of the relevant feelings are sufficiently diminished when the magnitude of uncertainty becomes huge so that other interests predominate. I feel like if I could prove that course X maximizes expected utility then my interest in pursuing course X would increase dramatically (independently of how small the probabilities are and of the possibility of doing more harm than good) but that having a distinctive sense that I'll probably change my mind about whether pursuing course X was a good idea significantly decreases my interest in pursuing course X. Finding it difficult to determine whether this reflects my "utility function" or whether there's a logical argument coming from utilitarianism against pursuing courses that one will probably regret (e.g. probable burnout and disillusionment repelling potentially utilitarian bystanders). Great Adam Smith quotation; I've seen it before, but it's good to have a reference.

8CarlShulman15y

Obligatory OB link: Bostrom and Ord's parliamentary model for normative uncertainty/mixed motivations.

2timtyler15y

They do have them - in this sense:

0moridinamael15y

I think the use of both DALYs and dollars in the main article is worth talking about, in context of some of the things you have mentioned. Being a stupid human, I find that it is generally useful for me to express utility to myself in dollars, because I possess a pragmatic faculty for thinking about dollars. I might not bend over to pick up one dollar. I might spend a couple of hours working for $100. There isn't much difference between one billion and two billion dollars, from my current perspective. When you ask me how many dollars I would spend to avert the deaths of a million people, the answer can't be any larger than the amount of dollars I actually have. If you ask me how many dollars I would spend to avoid the suffering associated with a root canal, it could be some noticeable percentage of my net worth. When we start talking about decisions where thousands of DALYs hang in the balance, my monkey brain has no intuitive sense of the scope of this, and no pragmatic way of engaging with it. I don't have the resources or power to purchase even one DALY-equivalent under my own valuation! If the net utility of the universe is actually being largely controlled by infinitesimal probabilities of enormous utilities, then my sense of scale for both risk and value is irrelevant. It hardly matters how many utilons I attribute to a million starving people when I have only so much time and so much money. I don't know what, if anything, to conclude from this, except to say that it makes me feel unsuited to reasoning about anything outside the narrow human scope of likelihoods and outcomes.

[-]Will_Newsome15y150

ETA: This is a meta comment about some aspects of some comments on this post and what I perceive to be problems with the sort of communication/thinking that leads to the continued existence of those aspects. This comment is not meant to be taken as a critique of the original post.

ETA2: This comment lacks enough concreteness to act as a serious consideration in favor of one policy over another. Please disregard it as a suggestion for how LW should normatively respond to something. Instead one might consider if one might personally benefit from enacting a policy I might be suggesting, on an individual basis.

Why are people on Less Wrong still talking about 'their' 'values' using deviations from a model that assumes they have a 'utility function'? It's not enough to explicitly believe and disclaim that this is obviously an incorrect model, at some point you have to actually stop using the model and adopt something else. People are godshatter, they are incoherent, they are inconsistent, they are an abstraction, they are confused about morality, their revealed preferences aren't their preferences, their revealed preferences aren't even their revealed preferences, their verbally express... (read more)

[-]Wei Dai15y140

Don't you think people need to go through an "ah ha, there is such a thing as rationality, and it involves Bayesian updating and expected utility maximization" phase before moving on to "whoops, actually we don't really know what rationality is and humans don't seem to have utility functions"? I don't see how you can get people to stop talking about human utility functions unless you close LW off from newcomers.

8XiXiDu15y

I was pretty happy before LW, until I learnt about utility maximization. It tells me that I ought to do what I don't want to do on any other than some highly abstract intellectual level. I don't even get the smallest bit of satisfaction out of it, just depression. Saving galactic civilizations from superhuman monsters burning the cosmic commons, walking into death camps as to reduce the likelihood of being blackmailed, discounting people by the length of their address in the multiverse...taking all that seriously and keeping one's sanity, that's difficult for some people. What LW means by 'rationality' is to win in a hard to grasp sense that is often completely detached from the happiness and desires of the individual.

[-]CarlShulman15y140

It tells me that I ought to do what I don't want to do on any other than some highly abstract intellectual level. I don't even get the smallest bit of satisfaction out of it, just depression.

If this is really having that effect on you, why not just focus on things other than abstract large-scale ethical dilemmas, e.g. education, career, relationships? Progress on those fronts is likely to make you happier, and if you want to come back to mind-bending ethical conundrums you'll then be able to do so in a more productive and pleasant way. Trying to do something you're depressed and conflicted about is likely to be ineffective or backfire.

3Benquo15y

Yeah, I have found that when my mind breaks, I have to relax while it heals before I can engage it in the same sort of vigorous exercise again. It's important to remember that that's what is going on. When you become overloaded and concentrate on other things, you are not neglecting your duty. Your mind needs time to heal and become stronger by processing the new information you've given it.

1Vladimir_Nesov15y

Not necessarily, sometimes people are doing exactly that, depending on what you mean by "overloaded".

2Benquo15y

Hmm... I think I've slipped into "defending a thesis" mode here. The truth is that the comment you replied to was much too broad, and incorrect as stated, as you correctly pointed out. Thanks for catching my error!

1Benquo15y

You are right, it depends on the specifics. And if you focus on other things with no plan to ever return to the topic that troubled you, that's different. But if you've learned things that make demands on your mind beyond what it can meet, then failing to do what is in fact impossible for you is not negligence.

4multifoliaterose15y

Gosh, recurring to jsteinhart's comment everything should add up to normality . If you feel that you're being led by abstract reasoning in directions that feel consistently feel wrong then there's probably something wrong with the reasoning. My own interest in existential risk reduction is that when I experience a sublime moment I want people to be able to have more of them for a long time. If all that there was was a counterintuitive abstract argument I would think about other things.

1XiXiDu15y

Yup, my confidence in the reasoning here on LW and my own ability to judge it is very low. The main reason for this is described in your post above, taken to its logical extreme you end up doing seemingly crazy stuff like trying to stop people from creating baby universes rather than solving friendly AI. I don't know how to deal with this. Where do I draw the line? What are the upper and lower bounds? Are risks from AI above or below the line of uncertainty that I better ignore, given my own uncertainty and the uncertainty in the meta-level reasoning involved? I am too uneducated and probably not smart enough to figure this out, yet I face the problems that people who are much more educated and intelligent than me devised.

4jsteinhardt15y

If a line of reasoning is leading you to do something crazy, then that line of reasoning is probably incorrect. I think that is where you should draw the line. If the reasoning is actually correct, then by learning more your intuitions will automatically fall in line with the reasoning and it will not seem crazy anymore. In this case, I think your intuition correctly diagnoses the conclusion as crazy. Whether you are well-educated or not, the fact that you can tell the difference speaks well of you, although I think you are causing yourself way too much anxiety by worrying about whether you should accept the conclusion after all. Like I said, by learning more you will decrease the inferential distance you will have to traverse in such arguments, and better deduce whether they are valid. That being said, I still reject these sorts of existential risk arguments based mostly on intuition, plus I am unwilling to do things with high probabilities of failure, no matter how good the situation would be in the event of success. ETA: To clarify, I think existential risk reduction is a worthwhile goal, but I am uncomfortable with arguments advocating specific ways to reduce risk that rely on very abstract or low-probability scenarios.

0Will_Newsome15y

There are many arguments in this thread that this extreme isn't even correct given the questionable premises, have you read them? Regardless, though, it really is important to be psychologically realistic, even if you feel you "should" be out there debating with AI researchers or something. Leading a psychologically healthy life makes it a lot less likely you'll have completely burnt yourself out 10 years down the line when things might be more important, and it also sends a good signal to other people that you can work towards bettering the world without being some seemingly religiously devout super nerd. One XiXiDu is good, two XiXiDus is a lot better, especially if they can cooperate, and especially if those two XiXiDus can convince more XiXiDus to be a little more reflective and a little less wasteful. Even if the singularity stuff ends up being total bullshit or if something with more "should"-ness shows up, folk like you can always pivot and make the world a better place using some other strategy. That's the benefit of keeping a healthy mind.

0multifoliaterose15y

[Edit] I share your discomfort but this is more a matter of the uncertainty intrinsic to the world than we live in than a matter of education/intelligence. At some point a leap of faith is required.

4timtyler15y

That's not utility maximisation, that's utilitarianism. A separate idea, though confusingly named. IMHO, utilitarianism is a major screw-up for a human being. It is an unnatural philosophy which lacks family values and seems to be used mostly by human beings for purposes of signalling and manipulation.

3Will_Newsome15y

Two things seem off. The first is that expected utility maximization isn't the same thing as utilitarianism. Utility maximization can be done even if your utility function doesn't care at all about utilitarian arguments, or is unimpressed by arguments in favor of scope sensitivity. But even after making that substitution, why do you think Less Wrong advocates utilitarianism? Many prominent posters have spoken out against it both for technical reasons and ethical ones. And arguments for EU maximization, no matter how convincing they are, aren't at all related to arguments for utilitarianism. I understand what you're getting at---Less Wrong as a whole seems to think there might be vitally important things going on in the background and you'd be silly to not think about them---but no one here is going to nod their head disapprovingly or shove math in your face if you say "I'm not comfortable acting from a state of such uncertainty". And I link to this article again and again these days, but it's really worth reading: http://lesswrong.com/lw/uv/ends_dont_justify_means_among_humans/ . This doesn't apply so much to epistemic arguments about whether risks are high or low, but it applies oh-so-much to courses of action that stem from those epistemic arguments.

4XiXiDu15y

The problem is that if I adopt unbounded utility maximization, then I perceive it to converge with utilitarianism. Even completely selfish values seem to converge with utilitarian motives. Not only does every human, however selfish, care about other humans, but they are also instrumental to their own terminal values. Solving friendly AI means to survive. As long as you don't expect to be able to overpower all other agents, by creating your own FOOMing AI, the best move is to play the altruism card and argue in favor of making an AI friendly_human. Another important aspect is that it might be rational to treat copies of you, or agents with similar utility-functions (or ultimate preferences), as yourself (or at least assign non-negligible weight to them). One argument in favor of this is that the goals of rational agents with the same preferences will ultimately converge and are therefore instrumental in realizing what you want. But even if you only care little about anything but near-term goals revealed to you by naive introspection, taking into account infinite (or nearly infinite, e.g. 3^^^^3) scenarios can easily outweigh those goals. All in all, if you adopt unbounded utility maximization and you are not completely alien, you might very well end up pursuing utilitarian motives. A real world example is my vegetarianism. I assign some weight to sub-human suffering, enough to outweigh the joy of eating meat. Yet I am willing to consume medical comforts that are a result of animal experimentation. I would also eat meat if I would otherwise die. Yet, if the suffering was big enough I would die even for sub-human beings, e.g. 3^^^^3 pigs being eaten. As a result, if I take into account infinite scenarios, my terminal values converge with that of someone subscribed to utilitarianism. The problem, my problem, is that if all beings would think like this and sacrifice their own life's, no being would end up maximizing utility. This is contradictory. One might argue t

4Wei Dai15y

How about, for example, assigning .5 probability to a bounded utility function (U1), and .5 probability to an unbounded (or practically unbounded) utility function (U2)? You might object that taking the average of U1 and U2 still gives an unbounded utility function, but I think the right way to handle this kind of value uncertainty is by using a method like the one proposed by Bostrom and Ord, in which case you ought to end up spending roughly half of your time/resources on what U1 says you should do, and half on what U2 says you should do.

[-]steven046115y130

I haven't studied all the discussions on the parliamentary model, but I'm finding it hard to understand what the implications are, and hard to judge how close to right it is. Maybe it would be enlightening if some of you who do understand the model took a shot at answering (or roughly approximating the answers to) some practice problems? I'm sure some of these are underspecified and anyone who wants to answer them should feel free to fill in details. Also, if it matters, feel free to answer as if I asked about mixed motivations rather than moral uncertainty:

I assign 50% probability to egoism and 50% to utilitarianism, and am going along splitting my resources about evenly between those two. Suddenly and completely unexpectedly, Omega shows up and cuts down my ability to affect my own happiness by a factor of one hundred trillion. Do I keep going along splitting my resources about evenly between egoism and utilitarianism?
I'm a Benthamite utilitarian but uncertain about the relative values of pleasure (measured in hedons, with a hedon calibrated as e.g. me eating a bowl of ice cream) and pain (measured in dolors, with a dolor calibrated as e.g. me slapping myself in the face). My

... (read more)

2Perplexed15y

Why spend only half on U1? Spend (1 - epsilon). And write a lottery ticket giving the U2-oriented decision maker the power with probability epsilon. Since epsilon infinity = infinity, you still get infinite expected* utility (according to U2). And you also get pretty close to the max possible according to U1. Infinity has uses even beyond allocating hotel rooms. (HT to A. Hajek) Of course, Hajek's reasoning also makes it difficult to locate exactly what it is that U2 "says you should do".

0Will_Sawin15y

In general, it should be impossible to allocate 0 to U2 in this sense. What's the probability that an angel comes down and magically forces you to do the U2 decision? Around epsilon, i'd say. U2 then becomes totally meaningless, and we are back with a bounded utility function.

1cousin_it15y

That can't be right. What if U1 says you ought to buy an Xbox, then U2 says you ought to throw it away? Looks like a waste of resources. To avoid such wastes, your behavior must be Bayesian-rational. That means it must be governed by a utility function U3. What U3 is defined by the parliamentary model? You say it's not averaging, but it has to be some function defined in terms of U1 and U2. We've discussed a similar problem proposed by Stuart on the mailing list and I believe I gave a good argument (on Jan 21, 2011) that U3 must be some linear combination of U1 and U2 if you want to have nice things like Pareto-optimality. All bargaining should be collapsed into the initial moment, and output the coefficients of the linear combination which never change from that point on.

0Wei Dai15y

Right, clearly what I said can't be true for arbitrary U1 and U2, since there are obvious counterexamples. And I think you're right that theoretically, bargaining just determines the coefficients of the linear combination of the two utility functions. But it seems hard to apply that theory in practice, whereas if U1 and U2 are largely independent and sublinear in resources, splitting resources between them equally (perhaps with some additional Pareto improvements to take care of any noticeable waste from pursuing two completely separate plans) seems like a fair solution that can be applied in practice. (ETA side question: does your argument still work absent logical omniscience, for example if one learns additional logical facts after the initial bargaining? It seems like one might not necessarily want to stick with the original coefficients if they were negotiated based on an incomplete understanding of what outcomes are feasible, for example.)

1Will_Sawin15y

My thoughts: 1. You do always get a linear combination. 2. I can't tell what that combination is, which is odd. The non-smoothness is problematic. You run right up against the constraints - I don't remember how to deal with this. Can you? 3. If you have N units of resources which can be devoted to either task A or task B, the ratios of resource used will be the ratio of votes. 4. I think it depends on what kind of contract you sign. So if I sign a contract that says "we decide according to this utility function" you get something different then a contract that says "We vote yes in these circumstances and no in those circumstances". The second contract, you can renegotiate, and that can change the utility function. ETA: 1. In the case where utility is linear in the set of decisions that go to each side, for any Pareto-optimal allocation that both parties prefer to the starting (random) alllocation, you can construct a set of prices that is consistent with that allocation. So you're reduced to bargaining, which I guess means Nash arbitration.

0cousin_it15y

I don't know how to make decisions under logical uncertainty in general. But in our example I suppose you could try to phrase your uncertainty about logical facts you might learn in the future in Bayesian terms, and then factor it into the initial calculation.

0timtyler15y

These are surely really, really different things. Utilitarianism says to count people more-or-less equally. However, the sort of utility maximization that actually goes on in people's heads typically results in people valuing their own existence vastly above that of everyone else. That is because they were built that way by evolution - which naturally favours egoism. So, their utility function says: "Me, me, me! I, me, mine!" This is not remotely like utilitarianism - which explains why utilitarians have such a hard time acting on their beliefs - they are wired up by nature to do something totally different. Also, you probably should not say "instrumental to their own terminal values". "Instrumental" in this context usually refers to "instrumental values". Using it to mean something else is likely to mangle the reader's mind.

-4Will_Newsome15y

So, I think about things like infinite ethics all the time, and it doesn't seem to disturb me to the extent it does you. You might say, "My brain is set up such that I automatically feel a lot of tension/drama when I feel like I might be ignoring incredibly morally important things." But it is unclear that this need be the case. I can't imagine that the resulting strain is useful in the long run. Have you tried jumping up a meta-level, tried to understand and resolve whatever's causing the strain? I try to think of it as moving in harmony with the Dao.

0timtyler15y

He is not alone. Consider this, for instance: Utilitarianism is like a plague around here. Perhaps it is down to the founder effect.

-5MixedNuts15y

3Vladimir_Nesov15y

I don't feel there is a need for that. You just present these things as tools, not fundamental ideas, also discussing why they are not fundamental and why figuring out fundamental ideas is important. The relevant lesson is along the lines of Fake Utility Functions (the post has "utility function" in it, but it doesn't seem to need to), applied more broadly to epistemology.

4Wei Dai15y

Thinking of Bayesianism as fundamental is what made some people (e.g., at least Eliezer and me) think that fundamental ideas exist and are important. (Does that mean we ought to rethink whether fundamental ideas exist and are important?) From Eliezer's My Bayesian Enlightenment: (Besides, even if your suggestion is feasible, somebody would have to rewrite a great deal of Eliezer's material to not present Bayesianism as fundamental.)

1Vladimir_Nesov15y

The ideas of Bayesian credence levels and maximum entropy priors are important epistemic tools that in particular allow you to understand that those kludgy AI tools won't get you what you want. (It doesn't matter for the normative judgment, but I guess that's why you wrote this in parentheses.) I don't think Eliezer misused the idea in the sequences, as Bayesian way of thinking is a very important tool that must be mastered to understand many important arguments. And I guess at this point we are arguing about the sense of "fundamental".

0Will_Newsome15y

Agreed, but what I'm mostly griping about is when people who know that utility functions are a really inaccurate model still go ahead and use it, even if prefaced by some number of standard caveats. "Goal system", for example, conveys a similar abstract idea without all of the questionable and misleading technical baggage (let alone associations with "utilitarianism"), and is more amenable to case-specific caveats. I don't think we should downvote people for talking about utility functions, especially if they're newcomers, but there's a point at which we have to adopt generally higher standards for which concepts we give low K complexity in our language. I have a vested interested in this. All of the most interesting meta-ethics and related decision theory I've seen thus far has come from people associated with SingInst or Less Wrong. If we are to continue to be a gathering place for that kind of mind we can't let our standards degenerate, and ideally we should be aiming for improvement. From far away it would be way easy to dismiss Less Wrong as full of naive nerds completely ignorant of both philosophy and psychology. From up close it would be easy to dismiss Less Wrong as overly confident in a suspiciously homogeneous set of philosophically questionable meta-ethical beliefs, e.g. some form of utilitarianism. The effects of such appearances are hard to calculate and I think larger than most might intuit. (The extent to which well-meaning folk of an ideology very influenced by Kurzweil have poisoned the well for epistemic-hygienic or technical discussion of technological singularity scenarios, for instance, seems both very large and very saddening.)

1Wei Dai15y

What is giving this appearance? We have plenty of vocal commenters who are against utilitarianism, top-level posts pointing out problems in utilitarianism, and very few people actually defending utilitarianism. I really don't get it. (BTW, utilitarianism is usually considered normative ethics, not metaethics.) Also, utility function != utilitarianism. The fact that some people get confused about this is not a particularly good (additional) reason to stop talking about utility functions.

-1Will_Newsome15y

Here is someone just in this thread who apparently confuses EU-maxing with utilitarianism and apparently thinks that Less Wrong generally advocates utilitarianism. I'll ask XiXiDu what gave him these impressions, that might tell us something.

-1Will_Newsome15y

ETA: The following comment is outdated. I had a gchat conversation with Wei Dai in which he kindly pointed out some ways in which my intended message could easily and justifiably have interpreted as a much stronger claim. I'll add a note to my top level comment warning about this. I never proposed that people stop talking about utility functions, and twice now I've described the phenomenon that I'm actually complaining about. Are you trying to address some deeper point you think is implicit in my argument, are you predicting how other people will interpret my argument and arguing against that interpreted version, or what? I may be wrong, but I think it is vitally important for epistemic hygiene that we at least listen to and ideally respond to what others are actually saying. You're an excellent thinker and seemingly less prone to social biases than most so I am confused by your responses. Am I being dense somehow? (ETA: The following hypothesis is obviously absurd. Blame it on rationalization. It's very rare I get to catch myself so explicitly in the act! w00t!) Anyway, the people I have in mind don't get confused about the difference between reasoning about/with utility functions and being utilitarian, they just take the former as strong evidence as of the latter. This doesn't happen when "utility function" is used technically or in a sand-boxed way, only when it is used in the specific way that I was objecting to. Notice how I said we should be careful about which concepts we use, not which words.

-1Will_Newsome15y

I don't really get it either. It seems that standard Less Wrong moral philosophy can be seen at some level of abstraction as a divergence from utilitarianism, e.g. because of apparently widespread consequentialism and focus on decision theory. But yeah, you'd think the many disavowments of utilitarianism would have done more to dispel the notion. Does your impression agree with mine though that it seems that many people think Less Wrong is largely utilitarian? I desperately want a word that covers the space I want to cover that doesn't pattern match to incorrect/fuzzy thing. (E.g. I think it is important to remember that one's standard moral beliefs can have an interesting implicit structure at the ethical/metaethical levels, vice versa, et cetera.) Sometimes I use "shouldness" or "morality" but those are either misleading or awkward depending on context. Are there obvious alternatives I'm missing? I used "moral philosophy" above but I'm pretty sure that's also straight-up incorrect. Epistemology of morality is clunky and probably means something else.

-4timtyler15y

Why would you want to stop people talking about human utility functions?!? People should not build economic models of humans? How are such things supposedly misleading? You are concerned people will drag in too much from Von Neumann and Morgenstern? What gives? By contrast, the idea that humans don't have utility functions seems to be mysterian nonsense. What sense can be made out of that idea?

7steven046115y

As I see it, humans have revealed behavioral tendencies and reflected preferences. I share your reservations about "revealed preferences", which if they differ from both would have to mean something in between. Maybe revealed preferences would be what's left after reflection to fix means-ends mistakes but not other reflection, if that makes sense. But when is that concept useful? If you're going to reflect on means-ends, why not reflect all the way? Also note that the preferences someone reveals through programming them into a transhuman AI may be vastly different from the preferences someone reveals through other sorts of behavior. My impression is that many people who talk about "revealed preferences" probably wouldn't count the former as authentic revealed preferences, so they're privileging behavior that isn't too verbally mediated, or something. I wonder if this attributing revealed preference to a person rather than a person-situation pair should set off fundamental attribution error alarms. If we have nothing to go by except behavior, it seems like it's underdetermined whether we should say it's preferences or beliefs (aliefs) or akrasia that's being revealed, given that these factors determine behavior jointly and that we're defining them by their effects. With reflected preferences it seems like you can at least ask the person which one of these factors they identify as having caused their behavior.

-1Will_Newsome15y

Good plausible hypothesis to cache for future priming, but I'm not sure I fully understand it: More specifically, what process are you envisioning here (or think others might be envisioning)?

5TimFreeman15y

We might make something someday that isn't godshatter, and we need to practice. I agree that reforming humans to be rational is hopeless, but it is nevertheless useful to imagine how a rational being would deal with things.

0jsteinhardt15y

But VNM utility is just one particularly unintuitive property of rational agents. (For instance, I would never ever use a utility function to represent the values of an AGI.) Surely we can talk about rational agents in other ways that are not so confusing? Also, I don't think VNM utility takes into account things like bounded computational resources, although I could be wrong. Either way, just because something is mathematically proven to exist doesn't mean that we should have to use it.

0TimFreeman15y

Who is sure? If you're saying that, I hope you are. What do you propose? I don't think anybody advocated what you're arguing against there. The nearest thing I'm willing to argue for is that one of the following possibilities hold: * We use something that has been mathematically proven to exist, now. * We might be speaking nonsense, depending on whether the concepts we're using can be mathematically proven to make sense in the future.

-1timtyler15y

Since even irrational agents can be modelled using a utility function, no "reforming" is needed.

2jsteinhardt15y

How can they be modeled with a utility function?

2timtyler15y

As explained here:

1jsteinhardt15y

Thanks for the reference. It seems though that the reward function might be extremely complicated in general (in fact I suspect that this paper can be used to show that the reward function can be potentially uncomputable).

0timtyler15y

The whole universe may well be computable - according to the Church–Turing–Deutsch principle. If it isn't the above analysis may not apply.

0TimFreeman15y

I agree with jsteinhardt, thanks for the reference. I agree that the reward functions will vary in complexity. If you do the usual thing in Solomonoff induction, where the plausibility of a reward function decreases exponentially with its size, so far as I can tell you can infer reward fuctions from behavior, if you can infer behavior. We need to infer a utility function for somebody if we're going to help them get what they want, since a utility function is the only reasonable description I know of what an agent wants.

2Perplexed15y

It was my impression that it was LW orthodoxy that at "reflective equilibrium", the values and preferences of rational humans can be represented by a utility function. That is: ... if we or our AI surrogate ever reach that point, then humans have a utility function that captures what we want morally and hedonistically. Or so I understand it. Yes, our current god-shatter-derived inconsistent values can not be described by a utility function, even as an abstraction. But it seems to me that most of the time what we are actually talking about is what our values ought to be rather than what they are. So, I don't think that a utility function is a ridiculous abstraction - particularly for folk who strive to be rational.

0timtyler15y

Actually, yes they can. Any computable agent's values can be represented by a utility function. That's one of the good things about modelling using utility functions - they can represent any agent. For details, see here:

1timtyler15y

Nope. Humans do have utility functions - in this sense: Any computable agent has a utility function. That's the beauty of using a general theory.

[-]Richard_Kennaway15y120

Nope. Humans do have utility functions - in this sense:

A trivial sense, that merely labels what an agent does with 1 and what it doesn't with 0: the Texas Sharpshooter Utility Function. A "utility function" that can only be calculated -- even by the agent itself -- in hindsight is not a utility function. The agent is not using it to make choices and no observer can use it to make predictions about the agent.

Curiously, in what appears to be a more recent version of the paper, the TSUF is not included.

0timtyler15y

Er, the idea is that you can make a utility-maximising model of the agent - using the specified utility function - that does the same things the agent does if you put it in the same environment. Can people please stop dissing the concept of a human utility function. Correcting these people is getting tedious - and I don't want to be boring.

2Richard_Kennaway15y

Doesn't work. The Texas Sharpshooter utility function described by Dewar cannot be used to make a utility-maximising model of the agent, except by putting a copy of the actual agent into the box, seeing what it does, declaring that to have utility 1, and doing it. The step of declaring it to have utility 1 plays no role in deciding the actions. It is a uselessly spinning cog doing no more work than a suggestive name on a Lisp symbol. I was thinking a similar thought about you. You're the only person here that I've seen taking these trivial utility functions seriously.

0timtyler15y

The idea here is that - if the agent is computable - then it can be simulated by any other computable system. So, if the map between its inputs and state, and its motor output is computable then we can make another computable system which produces the same map - since all universal computing systems can simulate each other by virtue of being Turing complete (and systems made of e.g. partial recursive functions can simulate each other too - if they are given enough memory to do so). I mentioned computability at the top, by saying: "any computable agent has a utility function". As far as anyone can tell, the whole universe is computable.

0Richard_Kennaway15y

I don't see how this bears on the possibility of modelling every agent by a utility-maximising agent. Dewar's construction doesn't work. Its simulation of an agent by a utility-maximising agent just uses the agent to simulate itself and attaches the label "utility=1" to its actions.

0timtyler15y

Dewey says pretty plainly: "any agents can be written in O-maximizer form". O-maximisers are just plain old utility maximisers. Dewey rechristens them "Observation-Utility Maximizers" in his reworked paper. He makes an O-maximiser from an agent, A. Once you have the corresponding O-maximiser, the agent A could be discarded.

0Richard_Kennaway15y

I know that he says that. I am saying, I thought pretty plainly, that I disagree with him. He only does that in the earlier paper. His construction is as I described it: define O as doing whatever A does and label the result with utility 1. A is a part of O and cannot be discarded. He even calls this construction trivial himself, but underrates its triviality.

1timtyler15y

I don't really understand which problem you are raising. If the O eventually contains a simulated copy of A - so what? O is still a utililty-maximiser that behaves the same way that A does if placed in the same environment. The idea of a utility maximiser as used here is that it assigns utilities to all its possible actions and then chooses the action with the highest utility. O does that - so it qualifies as a utililty-maximiser.

0Richard_Kennaway15y

O doesn't assign utilities to its actions and then choose the best. It chooses its action (by simulating A), labels it with utility 1, and chooses to perform the action it just chose. The last two steps are irrelevant.

1timtyler15y

"Irrelevant"? If it didin't perform those steps, it wouldn't be a utility maximiser, and then the proof that you can build a utility maximiser which behaves like any computable agent wouldn't go through. Those steps are an important part of the reason for exhibiting this construction in the first place.

-1Will_Newsome15y

I think that everyone understands the point you're trying to make---you can usefully model people as having a utility function in a wide variety of cases---but very often people use such models unskillfully, and it causes people like me to facepalm. If you want to model a lot of humans, for instance, it's simple and decently accurate to model them as having utility functions. Economics, say. And if you have something like AIXI, or as Dawkins might argue a gene, then a utility function isn't even a model, it's right there in front of you. I hypothesize that the real trouble starts when a person confuses the two; he sees or imagines a Far model of humans with utility functions, zooms in on an individual human or zooms in on himself, and thinks he can see the real utility function sitting right there in front of him, like he could with AIXI. Yeah, he knows in the abstract that he doesn't have direct access to it, but it feels Near. This can lead to a lot of confusion, and it leads people like me to think folk shouldn't talk about a person's "utility function" except in cases where it obviously applies. Even where you can say "Person A has a utility function that assigns 4 utility to getting cheesecake and 2 utility to getting paperlips", why not say "Agent A"? But that's not what I facepalm at. I only facepalm when people say they got their "utility function" from natural selection (i.e. ignoring memes), or say they wish they could modify their utility function, et cetera. In many cases it works as an abstraction, but if you're not at all thinking about EU, why not talk directly about your preferences/values? It's simpler and less misleading.

0timtyler15y

This seems like a bit of a different issue - and one that I am not so interested in. A couple of comments about your examples, though: For someone like me it is pretty accurate to say that I got my utility function from natural selection acting on DNA genes. Memes influence me, but I try not to let them influnce my goals. I regard them as symbiotes: mutualists and pathogens. In principle they could do deals with me that might make me change my goals - but currently I have a powerful bargaining position, their bargaining position is typically weak - and so I just get my way. They don't get to affect my goals. Those that try get rejected by my memetic immune system. I do not want to become the victim of a memetic hijacking. As for the implied idea that natural selection does not apply to memes, I'll try to bite my tongue there. That seems closely equivalent to me. The cases where people talk about utility functions are mostly those where you want to compare with machines, or conjour up the idea of an expected utility maximiser for some reason. Sometimes even having "utility" in the context is enough for the conversation to wander on to utility functions. My council would be something like: "Don't like it? Get used to it!" There is not, in fact, anything wrong with it.

-2Will_Newsome15y

That totally wasn't what I meant to imply. I am definitely a universal Darwinist. (You can view pretty much any optimization process as "evolution", though, so in some cases it's questionably useful. Bayesian updating is just like population genetics. But with memes it's obviously a good description.) Yes, but I think you're rather unusual in this regard; most people aren't so wary of memes. Might I ask why you prefer genes to memes? This seems odd to me. Largely because humans evolved for memes and with memes. Archetypes, for example. But also because the better memes seem to have done a lot of good in the world. (My genetically evolved cognitive algorithms---that is, the algorithms in my brain that I think aren't the result of culture, but instead are universal machinery---stare in appreciation at the beauty of cathedrals, and are grateful that economies make my life easier.)

0timtyler15y

's why I tried to bite my tongue - but it was difficult to completely let it go by... Well, I love memes, but DNA-genes built 99% of my ancestors unassisted, and are mostly responsible for building me. They apparently equipped me with a memetic immune system, for weeding out undesirable memes, to allow me to defend myself in those cases where there is a conflict of interests. Why should I side with the memes? They aren't even related to me. The best of them are beneficial human symbionts - rather like lettuces and strawberries. I care for them some - but don't exactly embrace their optimisation targets as my own. I don't dispute memes have done a lot of good things in the world. So has Mother Teresa - but that doesn't mean I have to adopt her goals as my own either.

0XiXiDu15y

I know what I want based on naive introspection. If you want to have preferences other than those based on naive introspection, then one of your preferences, based on naive introspection, is not to have preferences that are based on naive introspection. I am not sure how you think you could ever get around intuition, can you please elaborate?

-1Will_Newsome15y

Naive introspection is an epistemic process; it's one kind of algorithm you can run to figure out aspects of the world, in this case your mind. Because it's an epistemic process we know that there are many, many ways it can be suboptimal. (Cognitive biases come to mind, of course; Robin Hanson writes a lot about how naive introspection and actual reasons are very divergent. But sheer boundedness is also a consideration; we're just not very good Bayesians.) Thus, when you say "one of your preferences, based on naive introspection, is not to have preferences that are based on naive introspection," I think: If my values are what I think they are, I desire to believe that my values are what I think they are; If my values aren't what I think they are, I desire to believe that my values aren't what I think they are; Let me not become attached to values that may not be.

0jsteinhardt15y

Agree completely. (Even though I am guilty of using the word myself below.) But most of this post seems to be based on linearity of preference, which imho can usually only be justified by muddling around with utilities. So maybe that is the place to start? EDIT: To clarify, I mean that maybe the reason to reject Person 1's argument is because it implicitly appeals to notions of utility when claiming you should maximize expected DALYs.

0multifoliaterose15y

I agree with most of what you say here; is your comment referring to my post and if so which part?

-1Will_Newsome15y

Not referring to your post, no, just some aspects of some of the comments on it and the memetic ecology that enables those aspects. I'll add a meta tag to my comment to make this clearer.

-1wedrifid15y

Because rational agents care about whatever the hell they want to care about. I, personally, choose to care about my abstract 'utility function' with the clear implication that said utility function is something that must be messily constructed from godshatter preferences. And that's ok because it is what I want to want. No. It is a useful abstraction. Not using utility function measures does not appear to improve abstract decision making processes. I'm going to stick with it.

-3Will_Newsome15y

Eliezer's original quote was better. Wasn't it about superintelligences? Anyway you are not a superintelligence or a rational agent and therefore have not yet earned the right to want to want whatever you think you want to want. Then again I don't have the right to deny rights so whatever.

-2wedrifid15y

I wasn't quoting Eliezer, I made (and stand by) a plain English claim. It does happen to be a similar in form to a recent instance of Eliezer summarily rejecting PhilGoetz declaration that rationalists don't care about the future. That quote from Eliezer was about "expected-utility-maximising agents" which would make the quote rather inappropriate in the context. I will actually strengthen my declaration to: Because agents can care about whatever the hell they want to care about. (This too should be uncontroversial.) An agent does not determine its preferences by mere vocalisation and nor does its belief about its preference intrinsically make them so. Nevertheless I do care about my utility function (with the vaguely specified caveats). If you could suggest a formalization sufficiently useful for decision making that I could care about it even more than my utility function then I would do so. But you cannot. No, you don't. The only way you could apply limits on what I want is via physically altering my molecular makeup. As well as being rather difficult for you to do on any significant scale I could credibly claim that the new physical configuration you constructed from my atoms is other than 'me'. You can't get much more of a fundamental destruction of identity than by changing what an agent wants. I don't object to you declaring that you don't have or don't want to have a utility function. That's your problem not mine. But I will certainly object to any interventions made that deny that others may have them.

[-]CarlShulman15y70

and that's a far better investment than any other philanthropic effort that you know of, so you should fund course of action X even if you think that model A is probably wrong.

This stands out as problematic, since there's no plausible consequentialist argument for this from a steel-manned Person 1. Person 1 is both arguing for the total dominance of total utilitarian considerations in Person 2's decision-making, and separately presenting a bogus argument about what total utilitarianism would recommend. Jennifer's comment addresses the first prong, while... (read more)

5multifoliaterose15y

I agree with most of what you say here. Maybe it satisfactorily answers the questions raised in my post; I'll spend some time brooding over this. Here it would be good to compile a list; I myself am very much at a loss as to what the available options are.

1CarlShulman15y

I have such lists, but by the logic of your post it sounds like you should gather them yourself so you worry less about selection bias.

0David Althaus14y

I would love to study these lists! Would you mind sending me them? ( My email: myusername@gmx.de )

[-]jsteinhardt15y60

I think there are two things going on here:

more importantly, your utility probably doesn't scale linearly with DALYs, if for no other reason than that you don't care very much about things that happen at very low probabilities
less importantly, abstract arguments are much less likely to be correct than they seem at face value. Likelihood of correctness decreases exponentially in both argument length and amount of abstraction, and it is hard for us to appreciate that intuitively.

2multifoliaterose15y

Thanks. My life satisfaction certainly does not scale linearly with DALYs (e.g. averting the destruction of 1000 DALYs does not make me ten times as happy as averting the destruction of 100 DALYs). but does seem to be very much influenced by whether I have a sense that I'm "doing the right thing" (whatever that means). But maybe you mean utility in some other sense than life satisfaction. If I had the choice of pushing one of 10 buttons, each of which had different distribution of probabilities attached to magnitudes of impact, I think I would push the aggregate utility maximizing one regardless of how small the probabilities were. Would this run against my values? Maybe, I'm not sure. I agree; I've been trying to formulate this intuition in quasi-rigorous terms and have not yet succeeded in doing so.

2jsteinhardt15y

Well I am talking about the utility defined in the VNM utility theorem, which I assumed is what the term was generally taken to mean on LW, but perhaps I am mistaken. If you mean something else by utility, then I'm unsure why you would "push the aggregate utility maximizing one" as that choice seems a bit arbitrary to me to be a hard and fast rule (except for VNM utility, since VNM utility is by definition the thing whose expected value you maximize). Would you care to share your intuitions as to why you would push the utility maximizing button, and what you mean by utility in this case (a partial definition / example is fine if you don't have a precise definition).

0XiXiDu15y

Does that apply to AI going FOOM?

0jsteinhardt15y

To me the claim that human-level AI -> superhuman AI in at most a matter of years seems quite likely. It might not happen, but I think the arguments about FOOMing are pretty straightforward, even if not airtight. The specific timeline depends on where on the scale of Moore's law we are (so if I thought that AI was a large source of existential risk, then I would be trying to develop AGI as quickly as possible, so that the first AGI was slow enough to stop if something bad happened; i.e. waiting longer -> computers are faster -> FOOM happens on a shorter timescale). The argument I am far more skeptical of is about the likelihood of an UFAI happening without any warning. While I place some non-negligible probability on UFAI occurring, it seems like right now we know so little about AI that it is hard to judge whether an AI would actually have a significant danger of being unfriendly. By the time we are in any position to build an AGI, it should be much more obvious whether that is a problem or not.

-1Will_Newsome15y

Might you clarify your question? Depending on what you meant, this might not be relevant, but. Many arguments about AGI and FOOM are antipredictions. "Argument length" as jsteinhardt used it assumes that the argument is a conjunctive one. If an argument is disjunctive then its length implies an increased likelihood of correctness. Eliezer's "Hard Takeoff" article on OB was pretty long, but the words were used to make an antiprediction.

0XiXiDu15y

It is not clear to me that there are well-defined boundaries between what you call a conjunctive and a disjunctive argument. I am also not sure how two opposing predictions are not both antipredictions. I see that some predictions are more disjunctive than others, i.e. just some of their premises need to be true. But most of the time this seems to be a result of vagueness. It doesn't necessarily speak in favor of a prediction if it is strongly disjunctive. If you were going to pin it down it would turn out to be conjunctive, requiring all its details to be true. All predictions are conjunctive: If you predict that Mary is going to buy one of a thousand products in the supermarket, 1.) if she is hungry 2.) if she is thirsty 3.) if she needs a new coffee machine, then you are seemingly making a disjunctive prediction. But someone else might be less vague and make a conjunctive antiprediction. Mary is not going to buy one of a thousand products in the supermarket because 1.) she needs money 2.) she has to have some needs 3.) the supermarket has to be open. Sure, if the latter prediction was made first then the former would become the antiprediction, which happens to be disjunctive. But being disjunctive does not speak in favor of a prediction in and of itself. All prediction are antipredictions: Now you might argue that the first prediction could not be an antiprediction, as it does predict something to happen. But opposing predictions are always predicting the negation of each other. If you predict that Mary is going shopping then you predict that she is not not going shopping.

0Normal_Anomaly15y

I'd reverse the importance of those two considerations. Even though my utility doesn't scale linearly with DALYs, I wish it did.

0jsteinhardt15y

Why do you wish it did?

1Normal_Anomaly15y

My actual utility, I think, does scale with DALY's, but my hedons don't. I'd like my hedons to match my utilons so that I can maximize both at the same time (I prefer by definition to maximize utilons if I have to pick, but this requires willpower).

0jsteinhardt15y

Er I understand that utility != pleasure, but again, why does your utility scale linearly with DALYs? It seems like the sentiments you've expressed so far imply that your (ideal) utility function should not favor your own DALYs over someone else's DALYs, but I don't see why that implies a linear overall scaling of utility with DALYs.

0Normal_Anomaly15y

If I think all DALYs are equally valuable, I should value twice as many twice as much. That's why I'd prefer it to be linear.

2jsteinhardt15y

If by value you mean "place utility on" then that doesn't follow. As I said, utility has to do (among many other things) with risk aversion. You could be willing to pay twice as many dollars for twice as many DALYs and yet not place twice as much utility on twice as many DALYs. Assuming that 1 DALY = 1 utilon, then the utility of x DALYs is by definition 1/p, where p is the probability at which you would pay exactly 1 DALY to get x DALYs with probability p. Again, having all DALYs be equally valuable doesn't mean that your utility function scales linearly with DALYs, you could have a utility function that is say sqrt(# DALYs) and this would still value all DALYs equally. Although also see Will_Newsome's comments elsewhere about why talking about things in terms of utility is probably not the best idea anyways. If by utility you meant something other than VNM utility, then I apologize for the confusion (although as I pointed out elsewhere, I would then take objection to claims that you should maximize its expected value).

0Normal_Anomaly15y

I'm afraid my past few comments have been confused. I don't know as much about my utility function as I wish I did. I think I am allowed to assign positive utility to a change in my utility function, and if so then I want my utility function to be linear in DALYs. It probably is not so already.

0jsteinhardt15y

I think we may be talking past each other (or else I'm confused). My question for you is whether you would (or wish you would) sacrifice 1 DALY in order to have a 1 in 10^50 chance of creating 1+10^50 DALYs. And if so, then why? (If my questions are becoming tedious then feel free to ignore them.)

0Normal_Anomaly15y

I don't trust questions involving numbers that large and/or probabilities that small, but I think so, yes.

0jsteinhardt15y

Probably good not to trust such number =). But can you share any reasoning or intuition for why the answer is yes?

[-]Wei Dai15y50

If one accepts any kind of multiverse theory, even just Level I, then an infinite number of sentient organisms already exist, and it seems that we cannot care about each individual equally without running into serious problems. I previously suggested that we discount each individual using something like the length of its address in the multiverse.

7timtyler15y

Perhaps a good moment to point out that egoists don't have to bother with such bizarre weirdness.

3Wei Dai15y

And a nihilist doesn't have to bother with anything...

-3timtyler15y

I would continue - except that I don't think utilitarians need to bother with such bizarre weirdness either. Instrumental discounting is automatic, and neatly takes care of distant agents.

0Will_Sawin15y

Provably so? If, not, there almost certainly exist failure modes.

-1timtyler15y

That is not a very useful argument style. I can't prove that conservation of energy works throughout the universe - but should not leap from there to "there almost certainly exist failure modes".

1Will_Sawin15y

Conservation of energy in large systems can be proved reductively, from the properties of the subsystems. Similarly, most true facts about decision problems can be proved from a model of what kind of structures can be decision problems. It then becomes an empirical question whether other kinds of substructures or decision problems exist. EDIT: Suppose you get in a conversation with a Cunning Philosopher. He comes up with a clever philosophical example designed to expose a flaw in your theory. You point out that the example doesn't work, there is some problem in it. He comes up with another example, dealing with that problem. You point out that ....... Why should you expect this process to terminate with him running out of ideas? Now suppose you get in a conversation with the Cunning Perpetual Motion Machine Crank. He comes up with a clever machine designed to violate conservation of energy. You know, because of a proof, that he must be calculating as though one of the parts doesn't work the way physics said it does. You only need to find this part. There is no way for him to win - except by empirically proving one of the assumptions in the proof invalid.

3XiXiDu15y

Good thing we are not discounting individuals by the length of the inferential distance between them and Average Joe.

0Will_Sawin15y

Do we have any numbers on how many people on LW agree with you, or which people?

9steven046115y

I'm fuzzy about the whole thing, but a feature that I think I like about the proposal is that it gives you a nicely-behaved way to deal with the problem of how to value lives lived in extremely complex interpretations of rocks. And if someone lives so far away in space or time that just to locate him requires as much information as it would to specify his whole mind starting from a rock, it's not obvious to me that he exists in a sense in which the rock-mind does not.

7Vladimir_Nesov15y

I don't think there's anything wrong with valuing people who live in contrived interpretations or rocks, you just can't interact with them, and whatever it is you observe is usually more of a collection of snapshots than a relevant narrative. Also, destroying the rock only destroys part of your contrived device for observing facts about those people, unless you value the rock itself.

0Will_Newsome15y

It's good to see that panpsychism is finally getting the attention it rightfully deserves!

2Will_Sawin15y

"far away" from what? If you use your current location as a reference point than the theory becomes non-updateless and incoherent and falls apart. You don't "get" any starting point when you try to locate someone.

1steven046115y

I think the universe implicitly defines a reference point in the physics. By way of illustration, I think Tegmark sometimes talks about an inflation scenario where an actually infinite space is the same as a finite bubble that expands from a definite point, but with different coordinates that mix up space and time; and in that case I think that definite point would be algorithmically privileged. But I'm even fuzzier on all this than before.

2Will_Newsome15y

I think the focus on a physical reference point here seems misguided. Perhaps more conceptually well-founded would be something like a search for a logical reference point, using your existence in some form at some level of abstraction and your reasoning about that logical reference point both as research of and as evidence about attractors in agentspace, via typical acausal means. Vladimir Nesov's decision theory mailing list comments on the role of observational uncertainty in ambient-like decision theories seems relevant. Not to imply he wouldn't think what I'm saying here is complete nonsense.

-1Will_Newsome15y

In one of my imaginable-ideal-barely-possible worlds, Eliezer's current choice of "thing to point your seed AI at and say 'that's where you'll find morality content'" was tentatively determined to be what it currently nominally is (instead of tempting alternatives like "the thing that makes you think that your proposed initial dynamic is the best one" or "the thing that causes you to care about doing things like perfecting things like the choice of initial dynamic" or something) after he did a year straight of meditation on something like the lines of reasoning I suggest above, except honed to something like perfection-given-boundedness (e.g. something like the best you could reasonably expect to get at poker given that most of your energy has to be put into retaining your top 5 FIDE chess rating while writing a bestselling book popular science book).

1Will_Sawin15y

I think it depends on the physics. Some have privileged points, some don't.

2steven046115y

But surely given any scheme to assign addresses in an infinite universe, for every L there's a finite bubble of the universe outside of which all addresses are at least L in length?

2Will_Sawin15y

If a universe is tiled with a repeating pattern then you can assign addresses to parts of the pattern, each an infinite number of points. I don't know how this applies to other universes.

0Normal_Anomaly15y

If hypothetically our universe had a privileged point, what would you do if you discovered you were much farther away from it than average?

-4Will_Newsome15y

Naively, you wouldn't use some physical location, but instead logical descriptions in the space of algorithms given axioms you predict others will predict are Schelling points (using your own (your past) architecture/reasoning as evidence of course).

2Will_Sawin15y

Naively, this is a question of ethics and not game theory, so I don't see why Schelling points should enter into it.

-1Will_Newsome15y

I thought "Schelling point" was used by the decision theory workshop folk, I may be wrong. Anyway, decision theory shares many aspects of cooperative game theory as pointed out by Wei Dai long ago, and many questions of ethics must be determined/resolved/explored by such (acausal) cooperation/control.

4Vladimir_Nesov15y

Relevance? (That people in group Y use a word doesn't obviously clarify why you used it.)

-1Will_Newsome15y

I mistakenly thought that Will Sawin was in said group and was thus expressing confusion that he wasn't already familiar with its broader not-quite-game-theoretic usage, or at least what I perceived to be a broader usage. Our interaction is a lot more easily interpreted in that light.

0Vladimir_Nesov15y

(I didn't understand what you meant either when I wrote that comment, now I see the intuition, but not a more technical referent.)

-1Will_Newsome15y

And if you meant that you don't see a more technical referent for my use of Schelling point then there almost certainly isn't one, and thus it could be claimed that I was sneaking in technical connotations with my naive intuitions. Honestly I thought I was referring to a standard term or at least concept, though.

0Vladimir_Nesov15y

The term is standard, it was unclear how it applies, the intuition I referred to is about how it applies.

0Will_Sawin15y

Can you explain that intuition to me or point me to a place where it is explained or something? Or, alternately, tell me that the intuition is not important?

4Vladimir_Nesov15y

Two agents in a PD can find a reason to cooperate in proving (deciding) that their decision algorithms are equivalent to some third algorithm that is the same for both agents (in which case they can see that their decision is the same, and so (C,C) is better than (D,D)). This common algorithm could be seen as a kind of focal point that both agents want to arrive at.

-2Will_Newsome15y

I don't think it matters much, but the specific agents I had in mind were perhaps two subagents/subalgorithms (contingent instantiations? non-Platonic instantiations?) both "derived" (logically/acausally) from some class of variably probable unknown-to-them but less-contingent creator agents/algorithms (and the subagents have a decision theory that 'cares' about creator/creation symmetry or summat, e.g., causally speaking, there should be no arbitrary discontinuous decision policy timestamping). There may be multiple possible focal points and it may be tricky to correctly treat the logical uncertainty. All of that to imply that the focus shouldn't be determining some focal point for the universe, if that means anything, but focal points in algorithmspace, which is probably way more important.

0Will_Sawin15y

Ah, I see.

2Vladimir_Nesov15y

(I, on the other hand, don't.)

0Will_Newsome14y

You've talked about similar things yourself in the context of game semantics / abstract interpretation / time-symmetric perceptions/actions. I'd be interested in Skype convo-ing with you now that I have an iPhone and thus a microphone. I'm very interested in what you're working on, especially given recent events. Your emphasis on semantics has always struck me as well-founded. I have done a fair amount of speculation about how an AI (a Goedel machine, say) crossing the 'self-understanding'/'self-improving'/Turing-universal/general-intelligence/semantic boundary would transition from syntactic symbol manipulator to semantic goal optimizer and what that would imply about how it it would interpret the 'actual' semantics of the Lisp tokens that the humans would identify as its 'utility function'. If you don't think about that much then I'd like to convince you that you should, considering that it is on the verge of technicality and also potentially very important for Shulman-esque singularity game theory.

0Will_Sawin15y

The idea is that having exactly the same or similar algorithms to agents is enormously good, due to a proliferation of true PDs, and that therefore even non-game-theoretic parts of algorithms should be designed, whenever possible, to mimic other agents. However applying this argument to utility functions seems a bit over-the-top. Considering that whether or not something is a PD depends on your utility function, altering the utility function to win at PDs should be counter-productive. If that makes sense, we need better decision theories.

-1Will_Newsome15y

The intuition that "Schelling points" are an at all reasonable or non-bastardized way of thinking about this, or the intuition behind the "this" I just mentioned? If the latter, I did preface it with "naively", and I fully disclaim that I do not have a grasp of the technical aspects, just aesthetics which are hard to justify or falsify, and the only information I pass on that might be of practical utility to folk like you or Sawin will be ideas haphazardly stolen from others and subsequently half-garbled. If you weren't looking closely, you wouldn't see anything, and you have little reason to look at all. Unfortunately there is no way for me to disclaim that generally.

2Will_Sawin15y

link? explanation? something of that nature?

-1Will_Newsome15y

EDIT: Private message sent instead of comment reply.

-3Will_Newsome15y

I intuit that the difference between logical and observational uncertainty could be relevant in non-obvious ways. Anyway, this sort of thinking seems obviously correct, but I fear the comparison may mislead some, considering that inferring the numbers and preferences of minds in causally disconnected parts of the multiverse through sheer logical reasoning is probably way way way easier than interpreting the 'strength'/'existence' and preferences of minds in rocks, at least as I consider it. (I worded that so poorly that it's incoherent as explicitly stated but I think the message is intact.)

Replies to questions:

Yes.
Yes.
The problem arises only if one assumes that "a model not obviously wrong" shouldn't have a probability below some threshold, which is independent of the model. Hence to reconcile the things one should drop this assumption. Alternatively, one may question the "not obviously wrong" part.

Remarks:

Existence of a threshold p0 of minimal probability of any statement being true is clearly inconsistent, since for any value of p0 there are more than 1/p0 incompatible statements. Therefore some qualifier as &qu

... (read more)

[-]Jonathan_Graehl15y40

Disability-adjusted life year.

The easiest answer is that nobody is seriously anything even remotely approaching utilitarian. Try writing down your utility function in even some very limited domain, and you'll see that yourself.

Utilitarianism is a mathematical model that has very convenient mathematical properties, and has enough numbers to tweak available that you can use it to analyze some very simple situations (see the entire discipline of economics). It breaks very quickly when you push it a little.

And seriously, exercise of writing down point system of what is worth how many utility points to you is really eye-opening, I wrote a post on lesswrong about it ages ago if you're interested.

[-]Will_Newsome15y30

Here's a link to Dawrst's main page. I find this article on vegetarianism to be particularly interesting---though perhaps in a different way than Dawrst intended---and it's perhaps one of few 'traditional' utilitarian arguments that has contributed to me changing how I thought about day-to-day decisions. I haven't re-evaluated that article since I read it 6 months ago though.

[-]The Dao of Bayes15y20

Surely you'd assign at least a 10^-5 chance that it's on the mark? More confidence than this would seem to indicate overconfidence bias, after all, plenty of smart people believe in model A and it can't be that likely that they're all wrong.

It seems that if you accept this, you really ought to go accept Pascal's Wager as well, since a lot of smart people believe in God.

It seems like an extraordinary leap to accept that the original numbers are within 5 orders of magnitude, unless you've actually been presented with strong evidence. Humans naturally suc... (read more)

[-][anonymous]15y20

Here is an example of Counterargument #3.

[-]jsteinhardt15y10

Upon further thought, the real reason that I reject Person 1's argument is because everything should add up to normality, whereas Person 1's conclusion is ridiculous at face value, and not in a "that seems like a paradox" way, more of a "who is this lunatic talking to me" way.

[-]Johnicholas15y10

As I understand it, the scenario is that you're hearing a complicated argument, and you don't fully grok or internalize it. As advised by "Making Your Explicit Reasoning Trustworthy", you have decided not to believe it fully.

The problem comes in the second argument - should you take the advice of the person (or meme) that you at least somewhat mistrust in "correcting" for your mistrust? As you point out, if the person (or the meme) is self-serving, then the original proposal and the correction procedure will fit together neatly to cause... (read more)

0Marius15y

The person need not even be self-serving. All people respond to incentives, and since publishing popular results is rewarding (in fame; often financially as well) the creators of novel arguments will become more likely to believe those arguments.

[-]Johnicholas15y10

The link to Anna's post in the footnotes is broken. Should be here.

0multifoliaterose15y

Thanks, fixed.

[-]Will_Newsome15y-10

Is the suggestion that one's utilitarian efforts should be primarily focused on the possibility of lab universes an example of "explicit reasoning gone nuts?"

I think so, for side reasons I go into in another comment reply: basically, in a situation with a ton of uncertainty and some evidence for the existence of a class of currently unknown but potentially extremely important things, one should "go meta" and put effort/resources into finding out how to track down such things, reason about such things, and reason about the known u... (read more)

[-]Will_Newsome15y-10

One may not share Dawrst's intuition that pain would outweigh happiness in such universes, but regardless, the hypothetical of lab universes raises the possibility that all of the philanthropy that one engages in with a view toward utility maximizing should be focusing around creating or preventing the creation of infinitely many lab universes (according to whether or not one one views the expected value of such a universe as positive or negative).

I haven't even finished reading this post yet, but it's worth making explicit (because of the obvious conne... (read more)

-1Will_Newsome15y

Research into bootstrapping current research to ideal research, research into cognitive comparative advantage, research into how to convince people to research such things or support the research of such things, research into what to do given that practically no one can research any of these things and even if they could no one would pay them to...

More from multifoliaterose

Curated and popular this week

155

Related to: Confidence levels inside and outside an argument, Making your explicit reasoning trustworthy

A mode of reasoning that sometimes comes up in discussion of existential risk is the following.

Person 1: According to model A (e.g. some Fermi calculation with probabilities coming from certain reference classes), pursuing course of action X will reduce existential risk by 10^-5; existential risk has an opportunity cost of 10²⁵ DALYs (*), therefore model A says the expected value of pursuing course of action X is 10²⁰ DALYs. Since course of action X requires 10⁹ dollars, the number of DALYs saved per dollar invested in course of action X is 10¹¹. Hence course of action X is 10¹⁰ times as cost-effective as the most cost-effective health interventions in the developing world.

Person 2: I reject model A; I think that appropriate probabilities involved in the Fermi calculation may be much smaller than model A claims; I think that model A fails to incorporate many relevant hypotheticals which would drag the probability down still further.

Person 1: Sure, it may be that model A is totally wrong, but there's nothing obviously very wrong with it. Surely you'd assign at least a 10^-5 chance that it's on the mark? More confidence than this would seem to indicate overconfidence bias, after all, plenty of smart people believe in model A and it can't be that likely that they're all wrong. So you think that the side-effects of pursuing course of action X are systematically negative, even your own implicit model gives a figure of at least 10⁵$/DALY saved, and that's a far better investment than any other philanthropic effort that you know of, so you should fund course of action X even if you think that model A is probably wrong.

(*) As Jonathan Graehl mentions, DALY stands for Disability-adjusted life year.

I feel very uncomfortable with this sort of argument that Person 1 advances above. My best attempt at an summary of where my discomfort comes from is that it seems like one could make the sort of argument to advance a whole number of courses of action, many of which would be at odds with one another.

I have difficulty parsing where my discomfort comes from in more detail. There may be underlying game-theoretic considerations, there may be underlying considerations based on the anthropic principle, it could be that the probability that one ascribes to model A being correct should be much lower than 10^-5 on account of humans' poor ability to construct accurate models and that I shouldn't take it too seriously when some people ascribe to them, it could be that I'm irrationally influenced by social pressures against accepting unusual arguments that most people wouldn't feel comfortable accepting, it could be that in such extreme situations I value certainty over utility maximization, it could be some combination of all of these; I'm not sure how to disentangle the relevant issues in my mind.

One case study that I think may be useful to consider in juxtaposition with the above is as follows. In Creating Infinite Suffering: Lab Universes Alan Dawrst says

Abstract. I think there's a small but non-negligible probability that humans or their descendants will create infinitely many new universes in a laboratory. Under weak assumptions, this would entail the creation of infinitely many sentient organisms. Many of those organisms would be small and short-lived, and their lives in the wild would often involve far more pain than happiness. Given the seriousness of suffering, I conclude that creating infinitely many universes would be infinitely bad.

One may not share Dawrst's intuition that pain would outweigh happiness in such universes, but regardless, the hypothetical of lab universes raises the possibility that all of the philanthropy that one engages in with a view toward utility maximizing should be focusing around creating or preventing the creation of infinitely many lab universes (according to whether or not one one views the expected value of such a universe as positive or negative). This example is in the spirit of Pascal's wager but I prefer it because the premises are less metaphysically dubious.

One can argue that if one is willing to accept the argument given by Person 1 above, one should be willing to accept the argument that one should devote all of one's resources to studying and working toward or against lab universes.

Here various attempts at counterarguments seem to be uncompelling:

Counterargument #1: The issue here is with the infinite; we should ignore infinite ethics on the grounds that they're beyond the range of human comprehension and focus on finite ethics.

Response: The issue here doesn't seem to be with infinities, one can replace "infinitely many lab universes" with "3^^^3 lab universes" (or a sufficiently large number) and would be faced with essentially the same conundrum.

Counterargument #2: The hypothetical upside of a lab universe perfectly cancels out the hypothetical downside of such a universe so we can lab universes as having expected value zero.

Response: If this is true it's certainly not obviously true; there are physical constraints on the sorts of lab universes that could arise, it's probably not the case that for every universe there's an equal and opposite universe. Moreover, it's not the case that we don't have a means of investigating the expected utility of a lab universe. We do have our own universe as a model, can contemplate whether it has aggregate positive or negative utility and refine this understanding by researching fundamental physics, hypothesizing the variation among initial conditions and physical laws among lab universes and attempting to extrapolate what the utility/disutility of an average such universe would be.

Counterargument #3: Even if one's focus should be on lab universes, such a focus reduces to a focus on creating a Friendly AI, such an entity would be much better than us at reasoning about whether or not lab universes are a good thing and how to go about affecting their creation.

Response: Here too, if this is true it's not obvious. Even if one succeeds in creating an AGI that's sympathetic to human values, such an AGI may not ascribe to utilitarianism, after all many humans aren't and it's not clear that this is because their volitions have not been coherently extrapolated; maybe some humans have volitions which coherently extrapolate to being heavily utilitarian whereas others don't. If one is in the latter category, one may do better to focus on lab universes than one would do in focusing on FAI (for example, if one believes that lab universes would have average negative utility, one might work to increase existential risk so as to avert the possibility that a nonutilitarian FAI creates infinitely many universes in a lab because some people find it cool.

Counterargument #4: The universes so created would be parallel universes and parallel copies of a given organism should be considered equivalent to a single such organism, thus their total utility is finite and the expected utility of creating a lab universe is smaller than the expected utility in our own universe.

Response: Regardless of whether one considers parallel copies of a given organism equivalent to a single organism, there's some nonzero chance that the universes created would diverge in a huge number of ways; this could make the expected value of the creation of universes arbitrarily large depending how the probability that one assigns to the creation of n essentially distinct universes varies with n (this is partially an empirical/mathematical question; I'm not claiming that the answer goes one way or the other).

Counterargument #5: The statement "creating infinitely many universes would be infinitely bad" is misleading; as humans we experience marginal diminishing utility with respect to helping n sentient beings as n varies, this is not exclusively due to scope insensitivity, rather, the concavity of the function at least partially reflects terminal values.

Response: Even if one decides that this is true, one still has a question of how quickly the marginal diminishing utility sets in; and any choice here seems somewhat arbitrary so this line of reasoning seems unsatisfactory. Depending on the choice that one makes; one may reject Person 1's argument on the grounds that after a certain point one just doesn't care very much about helping additional people.

I'll end with a couple of questions for Less Wrong:

1. Is the suggestion that one's utilitarian efforts should be primarily focused on the possibility of lab universes an example of "explicit reasoning gone nuts?" (c.f. Anna's post Making your explicit reasoning trustworthy).

2. If so, is the argument advanced by Person 1 above also an example of "explicit reasoning gone nuts?" If the two cases are different then why?

3. If one rejects one or both of the argument by Person 1 and the argument that utilitarian efforts should be focused around lab universes, how does one reconcile this with the idea that one should assign some probability to the notion that one's model is wrong (or that somebody else's model is right)?