# Tendencies in reflective equilibrium

27 20 July 2011 10:38AM

Consider a case, not too different from what has been shown to happen in reality, where we ask Bob what sounds like a fair punishment for a homeless man who steals \$1,000, and he answers ten years. Suppose we wait until Bob has forgotten that we ever asked the first question, and then ask him what sounds like a fair punishment for a hedge fund manager who steals \$1,000,000, and he says five years. Maybe we even wait until he forgets the whole affair, and then ask him the same questions again with the same answers, confirming that these are stable preferences.

If we now confront Bob with both numbers together, informing him that he supported a ten year sentence for stealing \$1,000 and a five year sentence for stealing \$1,000,000, a couple of things might happen. He could say "Yeah, I genuinely believe poor people deserve greater penalties than rich people." But more likely he says "Oh, I guess I was prejudiced." Then if we ask him the same question again, he comes up with two numbers that follow the expected mathematical relationship and punish the greater theft with more jail time.

Bob isn't working off of some predefined algorithm for determining punishment, like "jail time = (10 * amount stolen)/net worth". I don't know if anyone knows exactly what Bob is doing, but at a stab, he's seeing how many unpleasant feelings get generated by imagining the crime, then proposing a jail sentence that activates about an equal amount of unpleasant feelings. If the thought of a homeless man makes images of crime more readily available and so increases the unpleasant feelings, things won't go well for the homeless man. If you're really hungry, that probably won't help either.

So just like nothing automatically synchronizes the intention to study a foreign language and the behavior of studying it, so nothing automatically synchronizes thoughts about punishing the theft of \$1000 and punishing the theft of \$1000000.

Of course, there is something that non-automatically does it. After all, in order to elicit this strange behavior from Bob, we had to wait until he forgot about the first answer. Otherwise, he would have noticed and quickly adjusted his answers to make sense.

We probably could represent Bob's tendencies as an equation and call it a preference. Maybe it would be a long equation with terms for net worth of criminal, amount stolen, how much food Bob's eaten in the past six hours, and whether his local sports team won the pennant recently, with appropriate coefficients and powers for each. But if Bob saw this equation, he certainly wouldn't endorse it. He'd probably be horrified. It's also unstable: if given a choice, he would undergo brain surgery to remove this equation, thus preventing it from being satisfied. This is why I am reluctant to call these potential formalizations of these equations a "preference".

Instead of saying that Bob has one preference determining his jail time assignments, it would be better to model him as having several tendencies - a tendency to give a certain answer in the \$1000 case, a tendency to give a different answer in the \$1000000 case, and several tendencies towards things like consistency, fairness, compassion, et cetera.

People strongly consciously endorse these latter tendencies, probably because they're socially useful1. If the Chief of Police says "I know I just put this guy in jail for theft, but I'm going to let this other thief off because he's my friend, and I don't really value consistency that much," then they're not going to stay Chief of Police for very long.

Bayesians and rationalists, in particular, make a big deal out of consistency. One common parable on the importance of consistency is the Dutch Book - a way to get free money from anyone behaving inconsistently. Suppose you have a weighted coin which can land on either heads or tails. There are several good reasons why I should not assign a probability of 66% to heads and 66% to tails, but one of the clearest is this: you can make me a bet that I will give you \$2 if it lands on tails and you give me \$1 if it lands on heads, and then a second bet where I give you \$2 if it lands on heads and you give me \$1 if it lands on tails. Whichever way the coin lands, I owe you \$1 and you owe me \$2 - I have gained a free dollar. So consistency is good if you don't want to be handing dollars out to random people...

...except that the Dutch book itself assumes consistency. If I believe that there is a 66% chance of it landing on heads, but refuse to take a bet at 2:1 odds - or even at 1.5:1 odds even though I should think it's easy money! - then I can't be Dutch booked. I am literally too stupid to be tricked effectively. You would think this wouldn't happen too often, since people would need to construct an accurate mental model to know when they should refuse such a bet, and such an accurate model would tell them they should revise their probabilities - but time after time people have demonstrated the ability to do exactly that.

I have not yet accepted that consistency is always the best course in every situation. For example, in Pascal's Mugging, a random person threatens to take away a zillion units of utility if you don't pay them \$5. The probability they can make good on their threat is miniscule, but by multiplying out by the size of the threat, it still ought to motivate you to give the money. Some belief has to give - the belief that multiplication works, the belief that I shouldn't pay the money, or the belief that I should be consistent all the time - and right now, consistency seems like the weakest link in the chain.

The best we can do is seek reflective equilibrium among our tendencies. If you endorse the belief that rich people should not get lighter sentences than poor people more strongly than you endorse the tendency to give the homeless man ten years in jail and the fund manager five, then you can edit the latter tendency and come up with a "fair" sentence. This is Eliezer's defense of reason and philosophy, a powerful justification for morality (see part one here) and it's probably the best we can do in justifying our motivations as well.

Any tendency that has reached reflective equilibrium in your current state is about as close to a preference as you're going to get. It still won't automatically motivate you, of course. But you can motivate yourself toward it obliquely, and come up with the course of action that you most thoroughly endorse.

FOOTNOTES:

1: A tendency toward consistency can cause trouble if someone gains advantage from both of two mutually inconsistent ideas. Trivers' hypothesis predicts that people will consciously deny the inconsistency so they can continue holding both ideas, yet still remain consistent and so socially acceptable. Rationalists are so annoying because we go around telling people they can't do that.

Sort By: Best
Comment author: 20 July 2011 03:37:10PM *  13 points [-]

I don't know if anyone knows exactly what Bob is doing, but at a stab, he's seeing how many unpleasant feelings get generated by imagining the crime, then proposing a jail sentence that activates about an equal amount of unpleasant feelings. If the thought of a homeless man makes images of crime more readily available and so increases the unpleasant feelings, things won't go well for the homeless man.

To defend poor Bob for a moment, it's worth noting that we don't respond to numbers well in a vacuum. A theft involving a hedge fund manager invokes a frame in which a million dollars isn't that much. A theft involving a homeless person invokes a frame in which a thousand dollars is a lot. I suspect that this magnitude distortion explains more of Bob's behavior than general negative affect towards homeless people.

ETA: Both to mitigate that annoying LW effect where the top-voted comment on an excellent article is always a correction or quibble, and just because it's plain true, I should add that I'm thoroughly enjoying this sequence, and that my rate-of-checking-LW has risen sharply over the last couple weeks since I've been looking for another installment.

Comment author: 20 July 2011 08:26:16PM 1 point [-]

One of the ways to be more correct is to use frameworks of reasoning rather than your intuition. When you see a question like: "What sounds like a fair punishment for a homeless man who steals \$1,000?", you should quickly create a framework for answering questions like that. Yvain's example for that kind of framework is "jail time = (10 * amount stolen)/net worth". This significantly helps to anyone be more consistent.

Comment author: 20 July 2011 08:49:37PM 2 points [-]

If you want to not just be consistant, but consistantly reflect your preferences (or reflective equilibrium of tendencies), you should validate your framework against a wide range of hypotheticals in the domain before actually using it in the specific case that prompted you to create it.

(Or try to meet the higher criteria of consistancy, not just that you judgments on a sequence of situations are consistant with each other, but that they are also consistant with the judgements made by a copy of you who sees the situations in a different order.)

Comment author: 21 July 2011 10:59:27AM *  1 point [-]

It's "consistent".

Comment author: 21 July 2011 05:24:03PM 13 points [-]

At least I spelled it the same way every time ;)

Comment author: 20 July 2011 10:09:24PM 1 point [-]

Absolutely. You start with a framework of reasoning and you make it less wrong. :)

Comment author: 20 July 2011 05:12:49PM 8 points [-]

My tendency is to assume that the homeless man would steal the \$1000 via violent means, whereas the hedge fund manager would steal the \$1 million using nonviolent deception. In addition to a belief that violent crime is actually worse, there is also the bias that it is easier to visualize. A homeless man stealing \$1000 looks like a man pointing a gun at a cashier. A hedge fund manager stealing \$1 million looks like a guy at a computer with a spreadsheet open.

Of course, I work at a hedge fund manager right now, so I have additional biases.

Comment author: 22 July 2011 04:03:56AM 17 points [-]

Fun fact: A fellow Rationalist and I were doing Rejection Therapy. My friend chose to do Pascal's Mugging (the positive version - if you give me \$5 now, a package of \$50000 will appear at your doorstep tomorrow morning).

The subject came extremely close to actually giving him the \$5, even though the subject only had five dollars and needed it to get home. (My friend added that a cab would arrive in five minutes if he waited at a particular intersection and take him home for free). He only stopped when I burst out laughing. (It took maybe a 5-10 minute conversation to build up to that point)

We talked to him about it afterwards to ask about his motivations. He said the logic made sense to him and my friend did a good job of maintaining the persona.

Comment author: 22 July 2011 06:16:23AM *  5 points [-]

I hope your friend never planned to actually accept the \$5.

Comment author: 22 July 2011 02:14:56PM 3 points [-]

I'm not sure. It wasn't an event he had planned for.

Comment author: 10 August 2011 05:16:20AM 2 points [-]

I wouldn't have. The negative feelings from accepting the \$5 would greatly outweigh the monetary value, even though I knew I almost certainly would never see the subject again...

...and would have been wrong; I ran into him last week.

Comment author: 27 July 2011 02:40:54AM 2 points [-]

Best rejection therapy ever.

Comment author: 21 July 2011 04:46:08AM *  4 points [-]

There are several good reasons why I should not assign a probability of 66% to heads and 66% to tails, but one of the clearest is this: you can make me a bet that I will give you \$2 if it lands on tails and you give me \$1 if it lands on heads, and then a second bet where I give you \$2 if it lands on heads and you give me \$1 if it lands on tails.

Got it.

Whichever way the coin lands, I owe you \$1 and you owe me \$2 - I have gained a free dollar.

Huh? You swapped "you" for "I" here (compared to above).

Comment author: 20 July 2011 05:03:03PM *  10 points [-]

I have not yet accepted that consistency is always the best course in every situation. For example, in Pascal's Mugging, a random person threatens to take away a zillion units of utility if you don't pay them \$5. The probability they can make good on their threat is miniscule, but by multiplying out by the size of the threat, it still ought to motivate you to give the money. Some belief has to give - the belief that multiplication works, the belief that I shouldn't pay the money, or the belief that I should be consistent all the time - and right now, consistency seems like the weakest link in the chain.

No, no, no!

There are an infinite number of possible Pascal's muggings, but people only look at them one at a time. Why don't you keep the \$5 in case you need it for the next Pascal's mugger who offers you 2^zillion units of utility? That is a much better bet if you only look at those two possible muggings.

The real problem is that utility functions, as we calculate them now, do not converge. This is a reason to be confused, not a reason to bite such ridiculous bullets.

Comment author: 20 July 2011 05:57:18PM *  5 points [-]

There are an infinite number of possible Pascal's muggings, but people only look at them one at a time. Why don't you keep the \$5 in case you need it for the next Pascal's mugger who offers you 2^zillion units of utility?

(As you acknowledge, but with more emphasis) this is an excuse, not a real reason. You do not really care about having money primarily so that you can be prepared for the next pascal's mugger. (Completing the pattern associated with Pascal's Wager does not fit here.)

Comment author: 20 July 2011 06:32:04PM 2 points [-]

For me, this is an acknowledgement of confusion, not an excuse. I think that finding a decision theory that can make sense of this is extremely important and I try to act accordingly.

Comment author: 20 July 2011 06:53:20PM 1 point [-]

For me, this is an acknowledgement of confusion, not an excuse. I think that finding a decision theory that can make sense of this is extremely important and I try to act accordingly.

I would call the other half of what you had to say the confusing part - liked the linked paper by the way. It's the 'but you need to save it for other possible muggings' would be straightforward game theory if the confusing part didn't happen before we even got to 'which mugger do we pay?' considerations.

Comment author: 20 July 2011 07:21:52PM *  3 points [-]

I agree; that was just an intuition pump to demonstrate the absurdity of only considering one mugger.

EDIT: I think of this intuition pump as very persuasive because it is part of how I came to this conclusion in the first place.

Comment author: 20 July 2011 11:32:33AM 10 points [-]

I have not yet accepted that consistency is always the best course in every situation. For example, in Pascal's Mugging, a random person threatens to take away a zillion units of utility if you don't pay them \$5. The probability they can make good on their threat is miniscule, but by multiplying out by the size of the threat, it still ought to motivate you to give the money. Some belief has to give - the belief that multiplication works, the belief that I shouldn't pay the money, or the belief that I should be consistent all the time - and right now, consistency seems like the weakest link in the chain.

Not upvoted, for this paragraph. You can't become right by removing beliefs at random until the remaining belief pool is consistent, but if you're right then you must be consistent.

Why does some belief have to give, if you reject consistency? If you're going to be inconsistent, why not inconsistently be consistent as well?

Also, you are attempting to be humorous by including beliefs like "multiplication works", but not beliefs like "at the 3^^^3rd murder, I'm still horrified" or "Solomonoff induction works", right?

We are but humble bounded rationalists, who have to use heuritistic soup, so we might have to be inconsistent at times. But to say that even after careful recomputation on perfectly formalized toy problems, we don't have to be consistent? Oh, come on!

Comment author: 20 July 2011 01:00:52PM *  3 points [-]

Agreed.

Here's an idea that just occurred to me: you could replace Solomonoff induction with a more arbitrary prior (interpreted as "degree of caring" like Wei Dai suggests) and hand-tune your degree of caring for huge/unfair universes so Pascal's mugging stops working. Informally, you could value your money more in universes that don't contain omnipotent muggers. This approach still feels unsatisfactory, but I don't remember it suggested before...

Comment author: 21 July 2011 01:14:44AM 1 point [-]

Is this different from jimrandomh's proposal to penalize the prior probability of events of utility of large magnitude, or komponisto's proposal to penalize the utility?

Comment author: 20 July 2011 12:50:53PM *  2 points [-]

For weakest implicit belief, I think I would have nominated "That I have the slightest idea how to properly calculate the probability of the mugger following through on his/her threat".

Also, Torture vs. Specks seems like another instance where many of us are willing to sacrifice apparent consistency. Most coherent formulations of utilitarianism must choose torture, yet many utilitarians are hesitant to do so.

In both cases, it seems like what we're doing isn't abandoning consistency, but admitting to the possibility that our consistent formula (e.g. naive utilitarianism) isn't necessarily the optimal / subjectively best / most reflectively equilibrial one. We therefore may choose to abandon it in favor of the intuitive answer (don't pay the mugger, choose specks, etc), not because we choose to be inconsistent, but because we predict the existence of a Better But Still Consistent Formula not yet known to us.

Of course, as Yvain notes, we can take pretty much any set of arbitrary preferences and create a "consistent" formula by adding enough terms to the equation. The difference is that the Better But Unknown formula above is both consistent and something we'd be in reflective equilibrium about.

Comment author: 20 July 2011 07:14:49PM 1 point [-]

By "Dust vs. Specks" you surely mean "torture vs. dust specks", and with "Specks", you want to say "torture", don't you?

Comment author: 20 July 2011 07:27:29PM *  0 points [-]

Fixed thanks. But no, I meant specks. It seems like utilitarianism (as opposed to just typical intuitive morality) commands you to inflict Torture. You only want to choose specks because your brain doesn't multiply properly, etc.

Of course, not everyone agrees that Utilitarianism picks Torture, but the argument for Torture is certainly a utilitarian one. So in this case picking Specks anyway seems like a case of overriding (at least naive versions of) utilitarianism.

Comment author: 21 July 2011 09:32:19AM *  0 points [-]

Wait...

Most coherent formulations of utilitarianism must choose specks, yet many utilitarians are hesitant to do so.

Are you sure that should be specks? If so, I am confused.

Comment author: 21 July 2011 11:45:57AM 1 point [-]

Wow. Sorry, you're obviously right. Brain totally misfired on me I guess.

Comment author: 24 July 2011 01:04:43PM 3 points [-]

I'm still confused about what point Yvain might be making by substituting "tendency" for "intuition" in this formulation of reflective equilibrium. I can think of two possibilities, but neither of them seems like something he might endorse.

1. When we reflect on what we really want, we should take into consideration not just our intuitions, but our behavioral tendencies. (But Yvain previously wrote "NO NEGOTIATION WITH UNCONSCIOUS".)
2. After we've reached reflective equilibrium, our behavioral tendencies can be said to be our preferences. (But suppose I have a tendency to reflexively slap myself whenever I see the color red. After reflecting on it, I decided this is not something I should do, but I have no power to actually edit the reflex away. Based on what Yvain wrote in The Blue-Minimizing Robot, I don't think he would say that I actually do prefer to slap myself (or have myself slapped) whenever the color red is in my field of vision.)

The only other explanation I have is that Yvain has gotten used to writing "tendency" for his other recent posts, and kept using it even when it no longer makes sense to. Does anyone else have other ideas?

Comment author: 01 August 2011 06:28:04PM 0 points [-]

I didn't have any particular interesting agenda for that word choice. If I had to justify it, I would say that to me "intuition" implies a belief (for example, I have an intuition that people who steal more money ought to be punished) and "tendency" implies an action (for example, when asked how much to punish a thief, I might respond "five years"). I am trying to carefully avoid language that implies the existence of beliefs, not because I have strong opinions on the matter but because I'm unsure.

Comment author: 20 July 2011 09:19:21PM 3 points [-]

Reflective equilibrium is usually described in terms of "considered judgments" or "intuitions". (Your own FAQ uses "intuitions".) Do we gain any new insights (or other benefits) from thinking about reflective equilibrium in terms of "tendencies" instead?

Comment author: 20 July 2011 03:29:56PM 2 points [-]

by multiplying out by the size of the threat, it still ought to motivate you to give the money. Some belief has to give - the belief that multiplication works, the belief that I shouldn't pay the money, or the belief that I should be consistent all the time - and right now, consistency seems like the weakest link in the chain.

What gives is the belief that by multiplying out by the size of the threat, it still ought to motivate me to give the money. Multiplication works, I shouldn't pay the money, and I should be consistent.

Comment author: 20 July 2011 05:49:35PM 2 points [-]

I think this is probably the sanest answer that doesn't throw out consistency, but there are still some distinctly weird things about it. To motivate you not to give up money, a threat to inflict \$RIDICULOUSNUMBER units of disutility has to be proportionately incredible -- but there's no particular reason to think that disutility is even roughly linear in 1/credibility, and a number of reasons not to.

Straight multiplication also suggests that for any fixed ridiculous threat there's always some amount of money that a rational agent will be willing to pay to ward it off, but I think I'd be more comfortable biting that bullet.

Comment author: 20 July 2011 07:31:52PM 5 points [-]

To motivate you not to give up money, a threat to inflict \$RIDICULOUSNUMBER units of disutility has to be proportionately incredible...there's no particular reason to think that disutility is even roughly linear in 1/credibility

It's not at all obvious that someone threatening to inflict disutility if I don't comply with certain demands would treat me worse if I don't comply with the demands than if I do.

One can't simply say "It is rational to one box on Newcomb's problem", because one might live in a universe in which an entity, say Sampi (if not Omega itself) executes one boxers painfully and rewards two-boxers.

The possibility that someone will inflict \$RIDICULOUSNUMBER units of disutility on me is as latent in the question "give me money or I will inflict \$RIDICULOUSNUMBER units of disutility on you" as it is in the question "paper or plastic", and not because it's possible bag choice will have a significant impact on my life. If I can't distinguish the credibility of the threat (that the speaker can and will act as they say) from zero, then I can't distinguish it from the opposite outcome, that they will act opposite of as they say, as I cant distinguish the possibility of the opposite outcome from zero.

On a personal note, the night before last, I had a wild dream (no laws of physics were violated, not so much laws of congress) that ended similar to how the movie "The Game" starring Michael Douglas ended. Well, it actually ended with me waking up - which is even more to the point. Thins oughtn't be simply accepted at face value.

Today I misread the following no fewer than two times, I think three times though I cant swear to that:

Marc Hauser, the primatologist psychologist at Harvard who recently was accused of mistreating evidence and graduate students, has resigned. I am in two minds about this. His work, although I am unconvinced by some of it, was very important, and he was good at communicating to the lay reader (including philosophers). I met him and was impressed by his demeanour and generosity. On the other hand, if he did deliberately misinterpret his data, that is an offence. Whether it is a hanging offence is moot.

I read the first sentence as: "Marc Hauser, the primatologist psychologist at Harvard who recently accused me of mistreating evidence and graduate students, has resigned." That made less and less sense as the post went on, so I took it from the top several time until I finally caught my error.

It's far more likely that I am misunderstanding someone threatening \$RIDICULOUSNUMBER than that they can carry out their threat, and also more likely that I'll misspeak and say "yes" when I mean no, and say "no" when I mean yes, or mistakenly hand over a one dollar bill, etc. than that they can and will carry it out. Simultaneously, I'm not paralyzed by offending people despite King Incognito being not just a possibility but a trope. I think anyone who jumps on the "it's a compelling argument" horn of the \$RIDICULOUSNUMBER argument has to give an account of how they disagree with anyone, ever, given the possible ramifications if the other person is Agent Smith, Haran al-Rashid, Peter the Great, Yoda, etc. I could easily mistake a threat of unimaginable torture for everyday speech or mild disagreement.

The obvious answer is that agreeing has indistinguishable ramifications (many dislike the teachers pet, which is another trope in itself)...in which case I would like to know why that same reasoning isn't applicable when a random person actually claims such power. It is no more likely that someone claiming such power has it than that someone not claiming such power has it, likewise for its use.

If you disagree, upvote this or I'll give you \$RIDICULOUSNUMBER units of disutility! I'm kidding, of course. (Or would no number of disclaimers be sufficient? Shall you believe that having expressed this claim, it is more likely than not I abide by it, and that saying I kid was a half-plausible way to deny making threats? Or shall you believe that having disclaimed it, I would be displeased by acts in accordance with it? Both are plausible for humans, but further muddling things, if I have such power, how would my intentions likely differ from a normal human's?)

Comment author: 24 July 2011 02:45:41PM 0 points [-]

It's not at all obvious that someone threatening to inflict disutility if I don't comply with certain demands would treat me worse if I don't comply with the demands than if I do.

In support of this point, I'd like to point out that the ridiculous powers required to inflict \$RIDICULOUSNUMBER sanctions are so far removed from our experience, that we have no idea how such an agent could be expected to act. It could do the opposite of what it claims (perhaps it hates cowards) as easily as fulfill its threats, given that we know nothing of its motives.

Comment author: 20 July 2011 10:02:01PM 3 points [-]

Valuing consistency is silly. If someone suggests putting one thief in jail and letting another go free, you won't object because it's inconsistent. You'll either object because you don't think thieves should go to jail, or because you don't think the should go free. Inconsistency just makes it easier to give a reason why it's wrong. You don't need to know whether or not a given person thinks thieves should be jailed to convince them that that isn't the best thing to do.

If you don't accept Pascal's mugging, you have to have some reason for it. The same goes if you do accept. Consistency probably isn't the reason either way. Why is it you disagree with Pascal's mugging?

If there is a reason, it should apply to everything else. Not out of consistency. Just out of the fact that it's there. If it "doesn't apply" to something, that just means that the reason isn't very applicable there, and if you thought it was, you were oversimplifying the reason.

Comment author: 20 July 2011 02:44:16PM *  3 points [-]

Any tendency that has reached reflective equilibrium in your current state is about as close to a preference as you're going to get.

But if you know your destination, you're already there. In principle, there is no need to wait for a tendency to manifest, or even to require that the conditions making the tendency manifest ever hold, if you know the way it'd go (not that you should just step back and watch). There are also one-off decisions that require knowing what to do this one time, where the intuition about reflective equilibrium applies less, and it's harder to express the correctness criteria in terms of "tendency", maybe only as a tendency to use certain considerations in making decisions.

Thus, knowledge of preference both makes sense and might be required in the absence of a currently stable tendency, even if we start with a tendency-based idea of preference.

Comment author: 20 July 2011 08:07:24PM *  1 point [-]

maybe only as a tendency to use certain considerations in making decisions.

Qualitatively speaking it might be worth making this distinction, but algorithmically speaking---from a superintelligence's perspective, say, or a decision theory researcher's---I can't see any good reasons why there would be discrete changes or fundamental conceptual differences between the levels of abstraction.

This lack of rigid partitions might also be desirable e.g. if 95% of your decision algorithm suddenly gets erased and you want to infer as much as possible from the remaining 5%; not only the lost "utility function" terms but also the meta-level implicit patterns as well as the highest level implicit decision theoretic policy, ideally using each of those as information to reconstruct the others even in the event of their complete annihilation. (You'd have to do this anyway if all you had left was some fragment of a time-stamped UDT policy lookup table (branch table?).)

ETA: To motivate even thinking about the problem of corrupted hardware a little more, imagine that an agent is running XDT and is trying to make sense of humans' (or humanity's, humanity's ancestors, God's, consciousnesses-trapped-in-rocks's)... we'll call them 'decision policy-ish-like thingies', but the "creator"-bound XDT agent only has partial information about any of many of its potential creators for any of many plausible reasons.

Also there is the more philosophical motivation of re-thinking the 'reasoning' that was done by the environment/universe in the process of creating our values---genetic/memetic evolution, atmospheric accidents, falling into "wrong"-in-hindsight attractors generally, and basically all causal chains or logical (teleological) properties of the universe that "resulted" at least partially in humans having their "current" values. Thinking things through from first principles, taking the idea of avoiding lost purposes to its logical conclusion (or non-conclusion)---not just searching for causal validity and not accepting even as an "initial dynamic" whatever point estimate we happen to have sitting around in the patterns of humanity around the arbitrary year 2011. This is the difference between CEV and CFAI, though if either were done in the spirit of their philosophy they might both have enough meta-power to converge to "morality" if such an attractor exists. Vladimir Nesov would perhaps call this yet another kind of values deathism?

ETA2: Or perhaps a better way to characterize what I see as the philosophical difference between CEV and CFAI is this. CEV starts with and focuses on human minds and their local properties at whatever moment the FAI button gets pressed, because, well, you know what you value; [rhetorically:] what else is there? (Though compare/contrast "so you know what you believe; what else is there?".) The earlier perspective of CFAI on the other hand focuses on human meta-moral intuitions and their potential for invalidity, with humanity-right-now being the result of a suboptimal both-explanatory-and-normative updating processes which "should" be immediately reflected upon and validated or "improved"---if it weren't for that whole 'first valid cause' problem... (Mmmmmmeta. Meta fight!) It is unclear to what extent these differences are matters of medium, emphasis, style, a change of memetic propagation strategy, or an important philosophical shift in Eliezer's thinking about FAI---perhaps as a result of his his potentially-somewhat-misguided-according-to-me Optimization Enlightenment (an immediate result or cause of his Bayesian Enlightenment if I'm correctly filling in the gaps).

Comment author: 20 July 2011 07:42:23PM *  0 points [-]
Comment author: 22 July 2011 02:29:10AM 2 points [-]

I don't think that Pascal's Mugging puts pressure on Bayesianism, I think it puts pressure on Solomonoff-type priors - Robin's anthropic answer is the one I currently find most appealing. The Lifespan Dilemma puts a lot more pressure on EU, in my book.

Comment author: 25 July 2011 08:33:51PM *  3 points [-]

Robin's anthropic answer is the one I currently find most appealing.

But it doesn't seem to address the case where the mugger threatens to torture 3^^^^3 kittens...

Comment author: 20 July 2011 01:14:26PM 1 point [-]

For example, in Pascal's Mugging, a random person threatens to take away a zillion units of utility if you don't pay them \$5. The probability they can make good on their threat is miniscule, but by multiplying out by the size of the threat, it still ought to motivate you to give the money.

Why? Hasn't this been gone over before? Tiny number * big number = not determined by the words "tiny" and "big".

Comment author: 20 July 2011 08:31:06PM *  -1 points [-]

This has been gone over before and the result is that you should give the \$5 dollars, because 3^^^3 is just a ridiculously huge number.

EDIT: What I meant to say was completely opposite to what I said. You should not give the \$5 dollars, even though 3^^^3 is just a ridiculously huge number.

Comment author: 20 July 2011 10:56:04PM *  3 points [-]

It has been gone over before and the result is that one shouldn't give money to such a mugger.

See this comment and following discussion. First, you probably have a bounded utility function (to the extent you have a utility function at all), so it's impossible for the mugger to actually offer that much utility (there is a tiny probability of being able to deliver many happy life-years or the like, but that's different from the utility of your personal utility function). Second, if we increase the bound of the utility function, for any particular bound (no matter how high) there will be alternatives more likely to deliver vast utility than giving in to the mugger (conditional on vast utility being attainable, it is very unlikely the mugger's obviously bogus offer is a good use of funds).

Comment author: 21 July 2011 06:11:10PM 0 points [-]

Hah! Oh the tragedy of a simple typo. I meant to type 'not give', and 'even though'. Wow, I hate when I accidentally say the opposite of what I wanted.

Comment author: 20 July 2011 09:07:53PM *  2 points [-]

I think it's more accurate to say that it's a ridiculously round number. That is, it's both huge and simple. If someone tried to mug you with a random number between 3^^^3 and 3^^^^3, you wouldn't take it, since that number is as complex, and therefor unlikely, as it is big.

Edit: I changed my mind on this. The unlikeliness would come from him stating the number. Once he does that, the number is now very simple. Namely: it's the number he just stated.

That said, the paradox from expected utility not converging is just due to the round ones.

Comment author: 20 July 2011 10:08:11PM 0 points [-]

I don't think it really matters at that point. I would not treat the situation differently if the mugger said "3^^^3" or if he explicitly stated some number "34084549...843".

Comment author: 20 July 2011 10:14:17PM 3 points [-]

"34084549...843"

I don't think you are appreciating the complexity penalty of the (presumably not very compressible) data hidden behind that ellipses, if the number is meant to be on the order fo 3^^^3.

Comment author: 21 July 2011 06:09:39PM -1 points [-]

Well, see, I would disagree with your presumption. The data might look random to you, but I could just point out that all the digits are actually taken from PI, starting with 3^^3rd digit. That simplifies the complexity tremendously. Or I could say I got those digits randomly. That again simplifies the complexity, because generating that number was simple.

Comment author: 21 July 2011 07:33:46PM 2 points [-]

I would disagree with your presumption.

If my presumption that the digits are not very compressible is wrong, then you have not really responded to Daniel's point about the ridiculous roundness of the number (where roundness is one way a number can be compressible).

Or I could say I got those digits randomly. That again simplifies the complexity, because generating that number was simple.

No. Getting "random" digits is not simple, or even an available action, for a deterministic generator. Saying to get "random" data can feel simple because you are just pointing at some source of data that you are ignorant about, but really, you have to account for the complexity of that source of data.

Comment author: 20 July 2011 11:05:04PM 5 points [-]

I don't think it really matters at that point. I would not treat the situation differently if the mugger said "3^^^3" or if he explicitly stated some number "34084549...843".

I would pay \$5 to not have to listen to the mugger explicitly state a number that long.

Comment author: 20 July 2011 11:34:43PM 1 point [-]

I once offered a similar deal to a tuba player on a subway platform.

Comment author: 21 July 2011 08:35:24AM 1 point [-]

The result of certain decision theories is to say that. That is a problem with those theories. That it is a problem, is the gist of the posting you cited.

Comment author: 20 July 2011 10:41:10PM 0 points [-]

I wonder how many \$5 transfers I would get if I actually tried this sort of mugging on LessWrong. Physical proximity isn't required, after all. Would you, or anyone, actually Paypal me \$5 if I made the zillion-units threat? Or is this a case of intellectual acceptance, emotional reluctance?

Comment author: 22 July 2011 03:39:08AM 1 point [-]

Please don't do this. We don't want to drive people who take their beliefs seriously away from the site.

Comment author: 22 July 2011 06:38:33AM 4 points [-]

Please don't do this. We don't want to drive people who take their beliefs seriously away from the site.

I don't mind if he does. It will encourage those who have silly beliefs to think them through a bit more clearly.

Mind you I would advise against making the threat. Because the rational response to threats is not necessarily compliance.

Comment author: 23 July 2011 01:31:56AM *  1 point [-]

You think that would drive people away from the site? You and wedrifid seem to take this astonishingly seriously. I thought it was clear I was merely musing about whether or not people really accepted Alexei's conclusion. The mugging's already been done by someone else, apparently, in any case; and I wasn't saying I'd do it, only that I wondered if people (Alexei) really believed paying up was the most rational response. See his edit, which makes it a moot point.

Comment author: 23 July 2011 07:07:31AM 3 points [-]

You and wedrifid seem to take this astonishingly seriously.

I think your 'seriousness' evaluator is somewhat broken. It is generally frowned upon to make try his kind of thing explicit even when hypothetical but try to imagine the kind of actions I would take if I thought you actually represented a zillion-unit threat. Hint: they do not include blog comments.

Comment author: 24 July 2011 04:21:50AM *  0 points [-]

I don't understand the first sentence. Are you saying you were just being facetious in your advice?

Unless you were, I think my seriousness evaluation is just fine. Your responses may not be serious compared to, say, an actual zillion-unit threat-response, but I am surprised that you'd bring up the possibility of the latter at all. I understood your advisory quite well; what was somewhat astonishing was that you apparently felt someone on LessWrong might take a fanciful version of a fanciful thought experiment seriously enough to engage in a "non-blog-comment rational response"! (!)

It was also surprising to see that endoself felt people would be bothered enough by a Pascal's Mugging to leave the site. These on top of the fact that I had no intention of actually posting a mugging, and meant my post to be a mere musing ("I wonder if ... Would anyone") on intellectual vs actual acceptance.

It is I who do not take this seriously enough, it appears! Though I myself have no intent to actually attempt said mugging, as stated before, I will point out that according to CarlShulman, user TimFreeman has already done so, and no shitstorm ensued... AFAIK.

Comment author: 20 July 2011 10:57:28PM 1 point [-]

None, the last time this was done by TimFreeman. And there's no plausible set of assumptions under which paying the \$5 is better than alternative uses of the money. See the comment linked to above.

Comment author: 21 July 2011 08:53:24PM *  0 points [-]

Ah, someone beat me to it, I see. Not a single transfer, eh?

I agree re: the advisability of paying up; Alexei's comment led me to believe he thought the opposite, but I see from his edit he agrees too.

Comment author: 21 July 2011 12:53:54AM *  0 points [-]

Could you explain your position a bit more?

ETA: Ah, RichardKennaway expressed his position on Pascal's Mugging here.

Comment author: 20 July 2011 07:19:37PM *  1 point [-]

I don't know if anyone knows exactly what Bob is doing, but at a stab, he's seeing how many unpleasant feelings get generated by imagining the crime, then proposing a jail sentence that activates about an equal amount of unpleasant feelings.

See the outrage heuristic, Kahneman & Frederick (2002) (pdf).

Comment author: 20 July 2011 01:50:54PM 1 point [-]

...except that the Dutch book itself assumes consistency. If I believe that there is a 66% chance of it landing on heads, but refuse to take a bet at 2:1 odds - or even at 1.5:1 odds even though I should think it's easy money! - then I can't be Dutch booked. I am literally too stupid to be tricked effectively. You would think this wouldn't happen too often, since people would need to construct an accurate mental model to know when they should refuse such a bet, and such an accurate model would tell them they should revise their probabilities - but time after time people have demonstrated the ability to do exactly that.

What? This paragraph just seems broken. If they refuse to take a bet at 1.5:1 odds, they either have an injunction against gambling or don't actually believe there's a 66% chance, which is the entire point of the belief in belief article.

Overall, I find this article pretty weak. Is the point that reflective equilibrium is what seeking consistency looks like in humans? Then why try and knock down consistency, since if you don't seek consistency you have no reason to seek reflective equilibrium?

Comment author: 21 July 2011 11:22:13AM *  1 point [-]

What? This paragraph just seems broken. If they refuse to take a bet at 1.5:1 odds, they either have an injunction against gambling or don't actually believe there's a 66% chance, which is the entire point of the belief in belief article.

People may simultaneously have contradicting beliefs. The mind is not one unified entity: one part of it can believe in X, while another believes in not-X. Refusing the bet may simply mean that the part of them which is in control of behavior at that particular moment doesn't believe there's a 66% chance. It doesn't mean that some other part of them might not genuinely believe there's a 66% chance, and that part may be in control in other situations.

Comment author: 20 July 2011 03:08:42PM 1 point [-]

They have an injunction against explicit gambling, which is not a bad idea when you're inconsistent. And their injunction isn't always explicit, either.

Comment author: 20 July 2011 06:30:36PM 2 points [-]

I wonder if that's true. Next time I'm in that situation, I'm going to offer up a few dollars on a bet that is genuinely and obviously good for the other person just to see if they're smart enough to take it.

Comment author: 20 October 2011 06:08:40PM 1 point [-]

I finally had this come up naturally. I offered a coin flip, my \$15 against his \$10. He declined. Then I offered two coinflips - if both were heads, I got his \$10, otherwise he got my \$15. He declined. He has an explicit injunction against gambling. In the second case, he said "Well that I would accept" and I asked "Okay, do you accept it? Because I'm offering" and he said "...no, I'd feel bad if I took your \$15, or if I lost my \$10". (paraphrased)

Comment author: 20 October 2011 07:38:48PM 1 point [-]

Thanks for reminding me!

I found a person that claimed an injunction against it as well, but I decided to put it off to see if I could get him to make the bet when he wasn't self primed with his injunction against gambling.

He said "I don't like gambling", and then claimed nonlinear utility at bets risking \$10, but he accepted \$1.5 risking \$1.

I won :)

Comment author: 20 July 2011 06:45:57PM 0 points [-]

Good point, I should downvote my post for claiming too much generality. I'll do the same and report. :D

Comment author: 20 July 2011 02:07:43PM 1 point [-]

They don't have a single consistent belief it's 0.66. From each decision they make you can infer a belief, but you'll soon notice it's not consistent, though it may be stable within some conditions. Maybe they always act like it's 0.66 when it's the first bet, but like it's 0.33 when they're offered a second one.