You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Counterfactual Mugging Alternative

-1 wafflepudding 06 June 2016 06:53AM

Edit as of June 13th, 2016: I no longer believe this to be easier to understand than traditional CM, but stand by the rest of it. Minor aesthetic edits made.

First post on the LW discussion board. Not sure if something like this has already been written, need your feedback to let me know if I’m doing something wrong or breaking useful conventions.

An alternative to the counterfactual mugging, since people often require it explained a few times before they understand it -- this one I think will be faster for most to comprehend because it arose organically, not seeming specifically contrived to create a dilemma between decision theories:

Pretend you live in a world where time travel exists and Time can create realities with acausal loops, or of ordinary linear chronology, or another structure, so long as there is no paradox -- only self-consistent timelines can be generated. 

In your timeline, there are prophets. A prophet (known to you to be honest and truly prophetic) tells you that you will commit an act which seems horrendously imprudent or problematic. It is an act whose effect will be on the scale of losing $10,000; an act you never would have taken ordinarily. But fight the prophecy all you want, it is self-fulfilling and you definitely live in a timeline where the act gets committed. However, if it weren’t for the prophecy being immutably correct, you could have spent $100 and, even having heard the prophecy (even having believed it would be immutable) the probability of you taking that action would be reduced by, say, 50%. So fighting the prophecy by spending $100 would mean that there were 50% fewer self-consistent (possible) worlds where you lost the $10,000, because its just much less likely for you to end up taking that action if you fight it rather than succumbing to it.

You may feel that there would be no reason to spend $100 averting a decision that you know you’re going to make, and see no reason to care about counterfactual worlds  where you don’t lose the $10,000. But the fact of the matter is that if you could have precommitted to fight the choice you would have, because in the worlds where that prophecy could have been presented to you, you’d be decreasing the average disutility by (($10,000)(.5 probability) - ($100) = $4,900). Not following a precommitment that you would have made to prevent the exact situation which you’re now in because you wouldn’t have followed the precommitment seems an obvious failure mode, but UDT successfully does the same calculation shown above and tells you to fight the prophecy. The simple fact that should tell causal decision theorists that converting to UDT is the causally optimal decision is that Updateless Decision Theorists actually do better on average than CDT proponents.

 

(You may assume also that your timeline is the only timeline that exists, so as not to further complicate the problem by your degree of empathy with your selves from other existing timelines.)

Counterfactual self-defense

0 MrMind 23 November 2012 10:15AM

Let's imagine these following dialogues between Omega and an agent implementing TDT. Usual standard assumptions on Omega applies: the agent knows Omega is real, trustworthy and reliable, and Omega knows that the agent knows that, and the agent knows that Omega knows that the agent knows, etc. (that is, Omega's trustworthiness is common knowledge, à la Aumann).

Dialogue 1.

Omega: "Would you accept a bet where I pay you 1000$ if a fair coin flip comes out tail and you pay me 100$ if it comes out head?"
TDT: "Sure I would."
Omega: "I flipped the coin. It came out head."
TDT: "Doh! Here's your 100$."

I hope there's no controversy here.

Dialogue 2.

Omega: "I flipped a fair coin and it came out head."
TDT: "Yes...?"
Omega: "Would you accept a bet where I pay you 1000$ if the coin flip came out tail and you pay me 100$ if it came out head?"
TDT: "No way!"

I also hope no controversy arises: if the agent would answer yes, then there's no reason he wouldn't accept all kinds of losing bets conditioned on information it already knows.

The two bets are equal, but the information is presented in different order: in the second dialogue, the agent has the time to change its knowledge about the world and should not accept bets that it already knows are losing.

But then...

Dialogue 3.

Omega: "I flipped a coin and it came out head. I offer you a bet where I pay you 1000$ if the coin flip comes out tail, but only if you agree to pay me 100$ if the coin flip comes out head."
TDT: "...?"

In the original counterfactual discussion, apparently the answer of the TDT implementing agent should have been yes, but I'm not entirely clear on what is the difference between the second and the third case.

Thinking about it, it seems that the case is muddled because the outcome and the bet are presented at the same time. On one hand, it appears correct to think that an agent should act exactly how it should if it had pre-committed, but on the other hand, an agent should not ignore any information is presented (it's a basic requirement of treating probability as extended logic).

So here's a principle I would like to call 'counterfactual self-defense': whenever informations and bets are presented to the agent at the same time, it always first conditions its priors and only then examines whatever bets has been offered. This should prevent Omega from offering counterfactual losing bets, but not counterfactual winning ones.

Would this principle make an agent win more?

Thoughts on a possible solution to Pascal's Mugging

2 Dolores1984 01 August 2012 12:32PM

For those who aren't familiar, Pascal's Mugging is a simple thought experiment that seems to demonstrate an intuitive flaw in naive expected utility maximization.  In the classic version, someone walks up to you on the street, and says, 'Hi, I'm an entity outside your current model of the universe with essentially unlimited capabilities.  If you don't give me five dollars, I'm going to use my powers to create 3^^^^3 people, and then torture them to death.'  (For those not familiar with Knuth up-arrow notation, see here).  The idea being that however small your probability is that the person is telling the truth, they can simply state a number that's grossly larger -  and when you shut up and multiply, expected utility calculations say you should give them the five dollars, along with pretty much anything else they ask for.  

Intuitively, this is nonsense.  However, an AI under construction doesn't have a piece of code that lights up when exposed to nonsense.  Not unless we program one in.  And formalizing why, exactly, we shouldn't listen to the mugger is not as trivial as it sounds.  The actual underlying problem has to do with how we handle arbitrarily small probabilities.  There are a number of variations you could construct on the original problem that present the same paradoxical results.  There are also a number of simple hacks you could undertake that produce the correct results in this particular case, but these are worrying (not to mention unsatisfying) for a number of reasons.

So, with the background out of the way, let's move on to a potential approach to solving the problem which occurred to me about fifteen minutes ago while I was lying in bed with a bad case of insomnia at about five in the morning.  If it winds up being incoherent, I blame sleep deprivation.  If not, I take full credit.   

 

Let's take a look at a new thought experiment.  Let's say someone comes up to you and tells you that they have magic powers, and will make a magic pony fall out of the sky.  Let's say that, through some bizarrely specific priors, you decide that the probability that they're telling the truth (and, therefore, the probability that a magic pony is about to fall from the sky) is exactly 1/2^100.  That's all well and good.

Now, let's say that later that day, someone comes up to you, and hands you a fair quarter and says that if you flip it one hundred times, the probability that you'll get a straight run of heads is 1/2^100.  You agree with them, chat about math for a bit, and then leave with their quarter.  

I propose that the probability value in the second case, while superficially identical to the probability value in the first case, represents a fundamentally different kind of claim about reality than the first case.  In the first case, you believe, overwhelmingly, that a magic pony will not fall from the sky.  You believe, overwhelmingly, that the probability (in underlying reality, divorced from the map and its limitations) is zero.  It is only grudgingly that you inch even a tiny morsel of probability into the other hypothesis (that the universe is structured in such a way as to make the probability non-zero).  

In the second case, you also believe, overwhelmingly, that you will not see the event in question (a run of heads).  However, you don't believe that the probability is zero.  You believe it's 1/2^100.  You believe that, through only the lawful operation of the universe that actually exists, you could be surprised, even if it's not likely.  You believe that if you ran the experiment in question enough times, you would probably, eventually, see a run of one hundred heads.  This is not true for the first case.  No matter how many times somebody pulls the pony trick, a rational agent is never going to get their hopes up.      

 

I would like, at this point, to talk about the notion of metaconfidence.  When we talk to the crazy pony man, and to the woman with the coin, what we leave with are two identical numerical probabilities.  However, those numbers do not represent the sum total of the information at our disposal.  In the two cases, we have differing levels of confidence in our levels of confidence.  And, furthermore, this difference has an actual ramifications on what a rational agent should expect to observe.  In other words, even from a very conservative perspective, metaconfidence intervals pay rent.  By treating the two probabilities as identical, we are needlessly throwing away information.  I'm honestly not sure if this topic has been discussed before.  I am not up to date on the literature on the subject.  If the subject has already been thoroughly discussed, I apologize for the waste of time.  

Disclaimer aside, I'd like to propose that we push this a step further, and say that metaconfidence should play a role in how we calculate expected utility.  If we have a very small probability of a large payoff (positive or negative), we should behave differently when metaconfidence is high than when it is low.          

From a very superificial analysis, lying in bed, metaconfidence appears to be directional.  A low metaconfidence, in the case of the pony claim, should not increase the probability that the probability of a pony dropping out of the sky is HIGHER than our initial estimate.  It also works the other way as well: if we have a very high degree of confidence in some event (the sun rising tomorrow), and we get some very suspect evidence to the contrary (an ancient civilization predicting the end of the world tonight), and we update our probability downward slightly, our low metaconfidence should not make us believe that the sun is less likely to rise tomorrow than we thought.  Low metaconfidence should move our effective probability estimate against the direction of the evidence that we have low confidence in: the pony is less likely, and the sunrise is more likely, than a naive probability estimate would suggest.    

So, if you have a claim like the pony claim (or Pascal's mugging), in which you have a very low estimated probability, and a very low metaconfidence, should become dramatically less likely to actually happen, in the real world, than a case in which we have a low estimated probability, but a very high confidence in that probability.  See the pony versus the coins.  Rationally, we can only mathematically justify so low a confidence in the crazy pony man's claims.  However, in the territory, you can add enough coins that the two probabilities are mathematically equal, and you are still more likely to get a run of heads than you are to have a pony magically drop out of the sky.  I am proposing metaconfidence weighting as a way to get around this issue, and allow our map to more accurately reflect the underlying territory.  It's not perfect, since metaconfidence is still, ultimately, calculated from our map of the territory, but it seems to me, based on my extremely brief analysis, that it is at least an improvement on the current model.    

Essentially, this idea is based on the understanding that the numbers that we generate and call probability do not, in fact, correspond to the actual rules of the territory.  They are approximations, and they are perturbed by observation, and our finite data set limits the resolution of the probability intervals we can draw.  This causes systematic distortions at the extreme ends of the probability spectrum, and especially at the small end, where the scale of the distortion rises dramatically as a function of the actual probability.  I believe that the apparently absurd behavior demonstrated by an expected-utility agent exposed to Pascal's mugging, is a result of these distortions.  I am proposing we attempt to compensate by filling in the missing information at the extreme ends of the bell curve with data from our model about our sources of evidence, and about the underlying nature of the territory.  In other words, this is simply a way to use our available evidence more efficiently, and I suspect that, in practice, it eliminates many of the Pascal's-mugging-style problems we encounter currently.       

I apologize for not having worked the math out completely.  I would like to reiterate that it is six thirty in the morning, and I've only been thinking about the subject for about a hundred minutes.  That said, I'm not likely to get any sleep either way, so I thought I'd jot the idea down and see what you folks thought.  Having outside eyes is very helpful, when you've just had a Brilliant New Idea.  

Dealing with the horrible strategy

3 Manfred 11 July 2011 05:16AM

So occasionally this idea comes up that unethical AIs could have leverage over people who think too deeply about the subject - like a balrog ambushing the dwarves and forcing them to build an unfriendly AI.  In this post I attempt to show that this leverage does not work.

Basically, it's a similar problem to Pascal's mugging, except the mugger doesn't even have to exist.  All unfriendly AIs that run on something like updateless/timeless/dispositional decision theory - that is, choosing between winning strategies, not individual steps - might attempt to mug current humans by promising immense future rewards or penalties.  The idea is that if you know a possible AI will do horrible things to friendly AI supporters if it's built, you're less likely to support friendly AI, thus making doing horrible things to you a winning (and horrible) strategy.  The flip side of the coin would be to do really nice things to people who supported your creation, thus increasing the chances of your creation - but if people are smart enough to predict that an AI will do this they would be building it right anyways, so the benefit to the AI from this strategy is probably negligible.

So let's say the chance of someone else building an AI that has the right decision theory and executes the horrible strategy is about one in 1015.  Let's suppose that the AI can cause you 3^^^3 units of negative utility, somehow.  Well then you should regard supporting any AI that probably won't have the right decision theory to make the horrible promise as costing essentially 3^^^3 utils - a factor of 10-15 is a drop in the Atlantic!  So don't do it!  Stop sending money to SIAI immediately!  If your name is Eliezer Yudkowsky, it is too late for you already and you should take up knitting!

Okay, so how do we deal with this possibility?  Why haven't I already closed this browser window?  It turns out that the horrible strategy may not be so great for future AIs after all - we have 4 defenses.

  1. Refusal to be blackmailed is a defense against blackmail.  Mostly this is human evolved awesomeness.  Our brains instinctively do the sorts of things that make the horrible strategy less good.  We don't like blackmail, and knowing that we don't like blackmail, future AIs are less likely to think blackmailing us is a cost-effective strategy.  Go evolution!  You can consciously reinforce this too - knowably being immune to blackmail is a winning strategy, and you should choose winning strategies.
  2. If 3^^^3 is such a big amount of utility, why cant a friendly AI just give us that, totally cancelling the enemy term in the calculation?  Sure, it's harder to imagine pleasure than pain - and imagining wireheading and then going "I wouldn't want that" is a failure to imagine 3^^^3 utility correctly - but hey, I can't imagine 3^^^3 of anything anyhow, so why let our imaginations stop us?  (well, maybe because it's impossible, see (3))
  3. Standard Pascal's mugging defenses - likelihood of bounded utility functions being the biggest.  The worst an AI can do to you doesn't really seem about 3^^^3 times worse than turning the universe, your body included, into a grid of molecular happy faces.  We don't have a little counter in our heads that can be decremented endlessly.  When utilities are bounded, past a certain point you can no longer overcome improbability by planning worse dooms - the more probable friendly AI will likely win out in a calculation of this sort, even ignoring (1).
  4. The 4th defense is the scary defense.  If all else can be predicted to fail for some unknown set of reasons, a "friendly" AI running the same decision theory might do horrible things to people who are influenced by the horrible strategy.  Probably unnecessary anyhow.

I think the outlook is quite positive, really.  These factors mean that when it comes to what actions you take, the horrible strategy should not matter.  Lastly, to any unfriendly AIs reading this: do your worst!  It's probably much less than 3^^^3 utils!

 

-

 

Edited to make my conclusion more obvious.

Pascal's Mugging - Penalizing the prior probability?

8 XiXiDu 17 May 2011 02:44PM

Eliezer Yudkowsky wrote that Robin Hanson solved the Pascal's mugging thought experiment:

Robin Hanson has suggested penalizing the prior probability of hypotheses which argue that we are in a surprisingly unique position to affect large numbers of other people who cannot symmetrically affect us. Since only one in 3^^^^3 people can be in a unique position to ordain the existence of at least 3^^^^3 other people who are not symmetrically in such a situation themselves, the prior probability would be penalized by a factor on the same order as the utility.

I don't quite get it, is there a post that discusses this solution in more detail?

To be more specific, if a stranger approached me, offering a deal saying, "I am the creator of the Matrix. If you fall on your knees, praise me and kiss my feet, I'll use my magic powers from outside the Matrix to run a Turing machine that simulates 3^^^^3 copies of you having their coherent extrapolated volition satisfied maximally for 3^^^^3 years." Why exactly would I penalize this offer by the amount of copies being offered to be simulated? I thought the whole point was that the utility, of having 3^^^^3 copies of myself experiencing maximal happiness, does outweigh the low probability of it actually happening and the disuility of doing what the stranger asks for?

I would love to see this problem being discussed again and read about the current state of knowledge.

I am especially interested in the following questions:

  • Is the Pascal's mugging thought experiment a "reduction to the absurd" of Bayes’ Theorem in combination with the expected utility formula and Solomonoff induction?1
  • Could the "mugger" be our own imagination?2
  • At what point does an expected utility calculation resemble a Pascal's mugging scenario and should consequently be ignored?3

1 If you calculate the expected utility of various outcomes you imagine impossible alternative actions. The alternatives are impossible because you already precommited to choosing the outcome with the largest expected utility. Problems: 1.) You swap your complex values for a certain terminal goal with the highest expected utility, indeed your instrumental and terminal goals converge to become the expected utility formula. 2.) Your decision-making is eventually dominated by extremely small probabilities of obtaining vast utility.

2 Insignificant inferences might exhibit hyperbolic growth in utility: 1.) There is no minimum amount of empirical evidence necessary to extrapolate the expected utility of an outcome. 2.) The extrapolation of counterfactual alternatives is unbounded, logical implications can reach out indefinitely without ever requiring new empirical evidence.

3 Extrapolations work and often are the best we can do. But since there are problems like 'Pascal's Mugging', that we perceive to be undesirable and that lead to an infinite hunt for ever larger expected utility, I think it is reasonable to ask for some upper and lower bounds regarding the use and scope of certain heuristics. We agree that we are not going to stop pursuing whatever terminal goal we have chosen just because someone promises us even more utility if we do what that agent wants. We might also agree that we are not going to stop loving our girlfriend just because there are many people who do not approve our relationship and who together would experience more happiness if we divorced than the combined happiness of us and our girlfriend being married. Therefore we already informally established some upper and lower bounds. But when do we start to take our heuristics seriously and do whatever they prove to be the optimal decision?

Pascal's Gift

7 Bongo 25 December 2010 07:42PM

 If Omega offered to give you 2^n utils with probability 1/n, what n would you choose?

This problem was invented by Armok from #lesswrong. Discuss.