I have not seen any place to discuss Eliezer Yudkowsky's new paper, titled Timeless Decision Theory, so I decided to create a discussion post. (Have I missed an already existing post or discussion?)
I have not seen any place to discuss Eliezer Yudkowsky's new paper, titled Timeless Decision Theory, so I decided to create a discussion post. (Have I missed an already existing post or discussion?)
Should some person invent a time machine, this section of my essay will need to be revised.
should read
Should some person invent a time machine, this section of my essay wioll haven be revised.
Have I missed an already existing post or discussion?
A prior discussion of the TDT paper occurred in this open thread at the end of Sep 2010.
Is this different from the one that was posted before? (Anyone have the open thread link for when that was discussed?) I have this one as being two pages longer, which is accounted for by the new one giving a full reference list.
Anyway, I read about 90% of the one posted last ~October and found it very interesting, especially the comparison between the cognitive blind spots of a causal vs. alphabetical decision theorist.
Not much to add, but I'd be glad to discuss issues others have brought up.
I finally started reading this paper.
I don't like chapters 8 and 9, in which Eliezer seems to be writing as though he thinks that the reader is an idiot who needs to be begged to give a new idea a chance. As far as I can tell, decision theory is supposed to be math, and mathematicians are supposed to be more agreeable to that sort of thing - there's nothing wrong with defining a "new" system and exploring its consequences, even if that system seems "absurd". (For example, paraconsistent logics.)
Is this a newer version of this paper anywhere? Is it officially still a work in progress? The SIAI blog post announcing the paper said that it had been "completed", but it ends rather abruptly, a few citations are missing, and there's a placeholder for a figure on page 69.
(I just redownloaded it from the URL they currently give, and while I got a copy that apparently fixed a few typos, there were no substantive changes.)
Edit: Oh, and the abstract says that it'll show that TDT formalizes superrationality and can sometimes get C/C on one-shot Prisoner's Dilemmas, but it never gets to that. So yeah, I'll assume it's a work in progress. Should be more clearly identified as such, though.
I tend to write stuff that gets voted down to oblivion. If you are voting this down, could the first few of you comment why? I'd appreciate learning how I don't fit in in case this is another of those posts that doesn't. Thanks in advance.
I extend Martin Gardner's idea. From a causal perspective, it would make no difference if the BOTH boxes were transparent. At 7 AM the alien puts down two transparent boxes in front of you, one with $1000 in it and one with either $1,000,000 or $0 in it. You can look in and see which he chose for you.
You have personally witnessed at least a thousand trials, and have heard from thousands of other people that you can question, many whom you know well, that have each witnessed at least 1000 trials. What you have seen, and heard these others have seen, is that the alien has NEVER been wrong, that EVERY time he put $1,000,000 in box B, the human took only box B, and EVERY time he put nothing in box B, the human took box A. Further, you have seen at least 20 cases where the Alien put $,1000,000 in box B and $1,000 in box A, and even though the boxes were fixed and transparent, you have seen the human choose only box B in all those cases. Further, you have been told by the other 1000 observers, some of whom you know quite well, and all of whom you have been able to question as you wish, that each of them have witnessed at least 20 cases where both boxes had money in them and the human chose only box B with $1,000,000.
You are sitting there with two boxes in front of you and you can see $1,000,000 in box B and $1,000 in box A. What do you do?
"Rationalists always win." Clearly an important part of this problem is BELIEF in the existence of, even the possibility of, an intelligence which can have such predictive powers. Faced with two boxes with money in them, you have the opportunity to stick a fork in this theory once and for all, or you can join the religion of the Omnisicient Alien and pick just the $1,000,000.
Amazingly, even if there is no $1,000,000 in front of you, only $1,000, you have the same choice. As stated, the Alien has correctly predicted every time in the past that when the human chose box A, box B was always empty. For a mere $1,000 you can provide the incredibly valuable information that Aliens are not what they seem to be. You can choose only the empty box B and help free humanity free of the superstitious belief in Omnisicient Aliens.
Can it be rational to pick only box B? In this case I maintain, you are not a rationalist, but a convert to the religion of the omniscient Alien. A rationalist can see with no doubt that he would be $1000 richer taking both boxes, AND he could put a hole in this deification process of the alien.
In my opiinion, this version of the problem exposes the likelihood that there is trickery going on. For 100s or 1000s of shows in a row, Siegfried and Roy made a tiger appear from thin air in a thin and isolated cage suspended in empty space above an empty stage. What are the chances that this was really a case of teleportation or spontaneous generation, and not just a trick where a regular tiger was moved some physically understandable way? And yet if you were priveleged to see this trick performed 100 times, most of you would be amazed every time.
What are the chances that there is not some trickery with this alien? Perhaps this alien created AIs which would choose as he knew they would, clothed them in human flesh, and pre-planted them over decades before coming to earth to create the appearance of being able to predict what humans would do. Is this explanation of what we are seeing really LESS likely than that an alien really has this kind of predictive and computational ability? Even humans with a few $1000 can create illusions, I give you the close-up room at the Magic Castle in L.A. as a place where you can go to see the laws of physics violated with as little as $10 investment in props. Is it more likely that an alien could set up an elaborate "sting" on humanity over a hundred years, or that all humans really are predictable down to how they pick a random number, flip a coin, choose someone from the phone book to flip a coin, or read the least significant digit on a voltmeter attachd to a battery they picked at random from an object in their house?
Occams razor says the Alien is tricking you. Everything else is a believe in magic, a return to religion with God as Omniscient Aliens.
Actually, if a real-world analog to Newcomb's Problem ever came up in my real life, there's a not-insignificant chance that I would turn down the $1000 in the transparent box as well and just walk away -- that is, that I would zero-box -- under the general principle that if I don't trust the motives of the person setting up the game I do better not to take any of the choices they are encouraging me to take, no matter how obvious the choices may seem. Maybe I've wandered into the next Batman movie the box is poisoned or something.
Of course, if you insist on rejecting the setup to Newcomb's Problem rather than cooperating with it, you'll never get to see whether there's anything valuable being set up.
I tend to write stuff that gets voted down to oblivion. If you are voting this down, could the first few of you comment why? I'd appreciate learning how I don't fit in in case this is another of those posts that doesn't. Thanks in advance.
I extend Martin Gardner's idea. From a causal perspective, it would make no difference if the BOTH boxes were transparent. At 7 AM the alien puts down two transparent boxes in front of you, one with $1000 in it and one with either $1,000,000 or $0 in it. You can look in and see which he chose for you.
You have personally witnessed at least a thousand trials, and have heard from thousands of other people that you can question, many whom you know well, that have each witnessed at least 1000 trials. What you have seen, and heard these others have seen, is that the alien has NEVER been wrong, that EVERY time he put $1,000,000 in box B, the human took only box B, and EVERY time he put nothing in box B, the human took box A. Further, you have seen at least 20 cases where the Alien put $,1000,000 in box B and $1,000 in box A, and even though the boxes were fixed and transparent, you have seen the human choose only box B in all those cases. Further, you have been told by the other 1000 observers, some of whom you know quite well, and all of whom you have been able to question as you wish, that each of them have witnessed at least 20 cases where both boxes had money in them and the human chose only box B with $1,000,000.
You are sitting there with two boxes in front of you and you can see $1,000,000 in box B and $1,000 in box A. What do you do?
"Rationalists always win." Clearly an important part of this problem is BELIEF in the existence of, even the possibility of, an intelligence which can have such predictive powers. Faced with two boxes with money in them, you have the opportunity to stick a fork in this theory once and for all, or you can join the religion of the Omnisicient Alien and pick just the $1,000,000.
Amazingly, even if there is no $1,000,000 in front of you, only $1,000, you have the same choice. As stated, the Alien has correctly predicted every time in the past that when the human chose box A, box B was always empty. For a mere $1,000 you can provide the incredibly valuable information that Aliens are not what they seem to be. You can choose only the empty box B and help free humanity free of the superstitious belief in Omnisicient Aliens.
Can it be rational to pick only box B? In this case I maintain, you are not a rationalist, but a convert to the religion of the omniscient Alien. A rationalist can see with no doubt that he would be $1000 richer taking both boxes, AND he could put a hole in this deification process of the alien.
In my opiinion, this version of the problem exposes the likelihood that there is trickery going on. For 100s or 1000s of shows in a row, Siegfried and Roy made a tiger appear from thin air in a thin and isolated cage suspended in empty space above an empty stage. What are the chances that this was really a case of teleportation or spontaneous generation, and not just a trick where a regular tiger was moved some physically understandable way? And yet if you were priveleged to see this trick performed 100 times, most of you would be amazed every time.
What are the chances that there is not some trickery with this alien? Perhaps this alien created AIs which would choose as he knew they would, clothed them in human flesh, and pre-planted them over decades before coming to earth to create the appearance of being able to predict what humans would do. Is this explanation of what we are seeing really LESS likely than that an alien really has this kind of predictive and computational ability? Even humans with a few $1000 can create illusions, I give you the close-up room at the Magic Castle in L.A. as a place where you can go to see the laws of physics violated with as little as $10 investment in props. Is it more likely that an alien could set up an elaborate "sting" on humanity over a hundred years, or that all humans really are predictable down to how they pick a random number, flip a coin, choose someone from the phone book to flip a coin, or read the least significant digit on a voltmeter attachd to a battery they picked at random from an object in their house?
Occams razor says the Alien is tricking you. Everything else is a believe in magic, a return to religion with God as Omniscient Aliens.
So, am I the only one perplexed by why people care about Newcomb's Problem? Like most paradoxes, the confusion is entirely due to a posing the problem in a confusing way; clean things up, and then it becomes obvious. But it strikes me as difficult to get an explanation that's less than a page long out there and taken seriously.
In addition to what the others have said, the class of "Newcomblike problems" does map to real-world scenarios. I do agree that insufficient effort has been spent describing such situations though, which is why I'm compiling examples for a possible article. Here's a peak at what I have so far:
The decision of whether to shoplift is a real-life Newcomb's problem. It is easy to get away with, and your decision does not cause (in the technical sense) the opportunity to exist (the "box" to be "filled"). However, merchants only locate stores ("fill the box") where they predict people won't (in sufficient numbers) take this opportunity, and their accuracy is high enough for retailing to stay profitable in the aggregate.
Evolution and its "genetic" decision theories: You could just be selfish and not spend resources spreading your genes (thus like stiffing the rescuer in Parfit's Hitchhiker); however, you would not be in the position to make such a choice unless you were already selected for your propensity not to make such a choice. (My article on the matter.)
Hazing, akrasia, and abuse cycles (where being abused motivates one to abuse others) are real-life examples of Counterfactual Mugging, since your decision within a "losing branch" has implications for (symmetric) versions of yourself in other branches.
Expensive punishment. Should you punish a criminal when the cost to do so exceeds the value of all future crimes they could ever commit? If you don't, you save on the costs of administering the punishment, but if criminals expect that this is sufficient reason for you not to punish, they have no reason not to commit the crimes. The situation is parallel to that of whether you should pay ransoms or other extortioners. (This has a non-obvious connection to Newcomb's problem that may require explanation -- but I elaborate in the link.)
I think that Newcomb's Problem is a terrible central example to work off of, though. Most of those look like they can be instrumentalized by reputation far better, and then everyone gets the right answers.
I don't think that works. Purely causal calculations of the costs/benefits, even accounting for reputation, can't explain the winning answers in any of those cases except maybe expensive punishment. And even then, you can just use the harder version of the problem I gave in the linked discussion: what if, even accounting for future impact on the criminal and others who are deterred, the punishment still has a net cost?
Could you give me an idea of what you mean by e.g. a causal account of why:
It's true that people usually give the winning answers to these problems (compared to what is possible), and without using TDT/UDT / Drescher's decision theory. But that doesn't answer the problem of finding a rigorous grounding for why they should do so.
Could you give me an idea of what you mean by e.g. a causal account of why:
People that don't shoplift lose more identity by shoplifting than they gain in stolen product.
That means I disagree with the claim that each person is made strictly better off by their local decision to shoplift. If they were actually made better off, they would shoplift. Actions reveal preferences.
This is the issue I got into in the Parfitian filter article I wrote. (And later in some exchanges with Perplexed.)
Basically, the problem with your second paragraph is that actions do not uniquely determine preferences. (See in particular the a/b theory comparisons in the article.) There are an infinite number of preference sets -- not to mention preference/belief sets -- that can explain any given action. So, you have to use a few more constraints to explain behavior.
That, in turn, leads you to the question of whether an agent is pursuing a terminal value, or an instrumental value in the belief that it will satisfy a terminal value. And that's also what makes it hard to say in what sense a shoplifter makes himself better off -- does he satisfy a terminal value? Believe he's satisfying an instrumental value? Correctly or incorrectly?
However, I don't know of a concise way to point to the (purported) benefits of shoplifting.
So we're left with a number of hypotheses: it could be that people overestimate the risks of shoplifting. Or that they never consider it. Or that they have a more complex way of evaluating the benefits (which your "identity loss" approach is a good, insightful example of).
So, there's more to it than a simple action -> preference mapping, but likewise, there's more to the decision theory than these "local monetary gains". Regardless, we have a case where people are doing the equivalent of repeatedly playing Newcomb's problem and one-boxing "even though the box is already filled or not", and it would be interesting to look at the mechanisms at play in such a real-life situation.
Basically, the problem with your second paragraph is that actions do not uniquely determine preferences.
Why is this a problem? [edit] To be clearer, I get why actions to do not uniquely determine preferences, but I don't yet get why I should care.
Sorry for the thread necromancy, but this has an easy answer: read the rest of my comment, after the part you quoted.
It's true that people usually give the winning answers to these problems ...
I think that the source of the communication problem is that some people use the word "winning" for a result that other people would not characterize as a "win".
It is also confusing that you ask for a causal account of something and then shift to a paragraph talking about normative theories and "winning". I suppose it is possible to give an evolutionary account which incorporates both the normative and the causal - is this what you are asking for? Are you asking for an argument that not shoplifting is an ESS? I don't think that one is possible in a society that eschews "punishment".
Lots of separate issues getting tangled up here. Let me try to clarify what I mean:
1) I meant a win in the sense that the aggregate, people's shoplifting decisions lead them to have opportunities that they would not have if they calculated the optimality of shoplifting as causal decision theorists. There is certainly a (corresponding, dual) sense in which people don't win -- specifically, the case where their recognition of certain rights is so lacking that they don't actually every get the opportunity to shoplift -- or even buy -- certain goods in the first place. These are the stores and business models that don't exist in the first place and leave us in a Pareto-inferior position. (IP recognition, I'm looking in your general direction here.)
2) When I asked for a causal account above, what I meant was, "How do you explain, assuming everyone uses CDT, why most people don't shoplift, given the constraints I listed?" That is, what CDT-type reasoning tells you not to shoplift when it's trivial to get away with?
3) I claim that it is possible -- in fact, necessary -- to give an evolutionary account of why people don't act purely as causal decision theorists (and it's not particularly important what you call the non-causal motivations behind their decisions), since people demonstrably differ from CDT. (My Parfitian filter article was an attempt, citing Drescher, to account for these non-causal, "moral" components of human reasoning through natural selection.)
4) However, I don't think the the issue of ESSes and shoplifting are necessarily connected in the sense that you have to explain the (absence of) the latter as the former. However, I believe the opportunity to shoplift is a real-world example of Newcomb's problem, in which people (do the analogue of) one-box, even though it's certainly not because of TDT-type reasoning. This raises the question of why people use a decision theory that gives the same results as TDT would on a "contrived" problem.
How do you explain, assuming everyone uses CDT, why most people don't shoplift, given the constraints I listed?
But that is an absurd request for explanation, because you are demanding that two false statements be accepted as hypotheses:
As to the definition of "winning", I sense that there is still a failure to communicate here. Are you talking about winning individuals or winning societies? As I see it, given your unrealistic hypotheses, the winning strategy is to shoplift, but convince other people that they win by not shoplifting. The losing strategy seems to be the one you advocate - which is apparently to refrain from shoplifting, while encouraging others to shoplift by denying the efficacy of punishment.
But that is an absurd request for explanation, because you are demanding that two false statements be accepted as hypotheses:
- That shoplifting is risk-free.
- That everyone adheres to a particular normative decision theory.
No, I'm showing that they can't both be true. (Btw, what does "normative" add to your meaning here.) (1) is false, but easily close enough to truth for our purposes.
As I see it, given your unrealistic hypotheses, the winning strategy is to shoplift, but convince other people that they win by not shoplifting.
Hence the parallel to Newcomb's problem, where the "winning" strategy is to two-box, but convince Omega you'll one-box, and hence the tension between whether the "individual" or "society" perspective is correct here.
If you would deem it optimal to shoplift, worse stores are available in the first place, just as if you would deem it optimal to two-box, emptier boxes are available in the first place.
Something motivates people not to shoplift given present conditions, which is isomorphic to one-boxing "given" that Omega has left. So, I claim, it's a real life case of people consistently one-boxing. A world (or at least, community) in which people deem it more optimal to shoplift has different (and worse) opportunities than one in which people do not. Their decisions "in the moment" are not unrelated to what kind of community they are in the first place.
(1) is false, but easily close enough to truth for our purposes.
I suspect that this claim is at the heart of the dispute. I think that it is far from close enough to the truth.
The reason people don't shoplift is that they fear the consequences. There is no mystery to be explained. Except perhaps why people are sufficiently motivated by fear of being temporarily physically constrained by a store-owner or security guard and publicly shamed (typical punishment for a first offense).
Btw, what does "normative" add to your meaning here.
It serves to emphasize the type-error that I see in your request. You seem to be criticizing one normative theory (CDT) while promoting another (UDT/TDT). But you are doing so by asking whether the normative theory is satisfactory when used as a descriptive theory. And you are asking that it function descriptively in a fictitious universe in which shoplifters are rarely caught and mildly punished.
I agreed there is no need to invoke TDT/UDT to explain lack of shoplifting.
In addition to what Perplexed said, it seems to me that people tend to care more about their reputation compared to what is evolutionarily adaptive today, probably because in our EEA, you couldn't move to a new city and start over (or if you could move to another tribe, it was only at an extremely high cost), nor did you interact mostly with strangers. That would explain why people are sometimes deterred or motivated by reputation/shame when it doesn't seem to make sense to be, without having to invoke TDT/UDT.
["Shoplifting is risk-free"] is false, but easily close enough to truth for our purposes.
I don't think so. Shoplifting is more and less risky in different places and situations. I bet that the amount of shoplifting is monotone increasing in the amount of risk, even when that amount is relatively small. If that's true, then "why don't people shoplift more" doesn't require an explanation beyond "because they don't want to take the risk." Do you disagree?
Yes, I disagree. Keep in mind, there is a very wide variety of protection a store can have for its goods. The depends on the value of the goods, but also on the "kind of person" that exists in the area, and the latter factor is crucial to understanding the dynamic I'm trying to highlight.
For the same goods, a store will have more security measures in areas where the "kind of people" (decision theory types) tend to steal more. But the population is never uniform. So, although the security measures account for some percentage of prevented shoplifting (and thus can be explained purely though causal consequences to shoplifters), there remains the group that differs from the typical person in the area. This group must stay sufficiently small for the store to stay profitable.
Therefore, the store is relying on a certain fraction of the population refraining from shoplifting even when they could get away with it.
But even if shoplifting is really kept low because of (mistaken) beliefs about its difficulty, that still doesn't eliminate the newcomblike aspect. You still have to account for why this epistemic error happens in just the right way so as to increase total utility. And the explanation for that looks similar to the evolution case I discussed at the beginning of this subthread, but with memes replacing genes: basically, regions with "better norms" or "more systematic overestimation of shoplifting's difficulty" will tend to flourish and outcompete those that don't. Economic competition, then, acts as a sort of "Parfitian filter" in the same sense that evolution does.
Newcomb's puzzle is an idealization of real-life puzzles like Parfit's Hitchhiker. The linked paper discusses this in more detail.
It's not interesting as a brain teaser but as a test case for decision theories. Newcomb's is especially interesting because Newcomb's-winning agents have the potential to reach Pareto efficient outcomes without needing precommitments or other outside help.
I think the confusion comes from the difference in importance to win conflicts vs. to make the correct decision. Many people who think about this problem go "ah, doing X is the obvious solution." When asked to be more formal they come up with decision theories. Other people then explore those theories and find their flaws. Newcomb's problem is important because it led (may be not directly, but I think it contributed) to the schism into evidential decision theory and causal decision theory. Both have different approaches to solving problems.
Newcomb's problem is important because it led (may be not directly, but I think it contributed) to the schism into evidential decision theory and causal decision theory.
As far as I can tell, that's because the causal decision theorists are crippled by using magicless thinking in a magical problem. The only outcome is "huh, people who use all the information provided by a problem do better than people who ignore some of the information!" As schisms go, that seems pretty tame.
The issue is expressing formally the algorithm which uses all the information to get the right answer in Newcomb's.
That does make it clearer why I'm a 0-boxer and uninterested by it, and suggests I should refrain from approaching it on a level as intense as Eliezer's paper until I am interested in formality, as a correct one-page explanation is unlikely to be formal and the reason the problem is interesting is in its formality.
Reading this (well, I'm posting at page 32) made clear the "problem" with Newcomb's problem with transparent boxes (where you can see what's in the boxes before you choose): it's not a decision-determined problem. This is because you can have an algorithm that depends on some property of the Predictor - the "tit for tat" algorithm is the most obvious.
By "tit for tat" I am referring to the notable strategy in the iterated prisoner's dilemma. Agents using this strategy will keep cooperating as long as the other person cooperates, but if the other person defects then they will defect too. It's an excellent strategy by many measures, beating out more complicated strategies, and we probably have something like it built into our heads.
By analogy, a "tit for tat" strategy in Newcomb's problem with transparent boxes would be to one-box if the Predictor "cooperates," and two-box if the Predictor "defects."
But what does the Predictor see when it looks into the future of an agent with this strategy? Either way it chooses, it will have chosen correctly, so the Predictor needs some other, non-decision-determined criterion to decide.
Alternately you could think of it as making the decision-type of the agent undefined (at the time the Predictor is filling the boxes), thus making it impossible for the problem to have any well-defined decision-determined statement.
Just to clarify, I think your analysis here doesn't apply to the transparent-boxes version that I presented in Good and Real. There, the predictor's task is not necessarily to predict what the agent does for real, but rather to predict what the agent would do in the event that the agent sees $1M in the box. (That is, the predictor simulates what--according to physics--the agent's configuration would do, if presented with the $1M environment; or equivalently, what the agent's 'source code' returns if called with the $1M argument.)
If the agent would one-box if $1M is in the box, but the predictor leaves the box empty, then the predictor has not predicted correctly, even if the agent (correctly) two-boxes upon seeing the empty box.
Interesting. This would seem to return it to the class of decision-determined problems, and for an illuminating reason - the algorithm is only run with one set of information - just like how in Newcomb's problem the algorithm has only one set of information no matter the contents of the boxes.
This way of thinking makes Vladimir's position more intuitive. To put words in his mouth, instead of being not decision determined, the "unfixed" version is merely two-decision determined, and then left undefined for half the bloody problem.
This demonstrates that an agent can't know its own decision. In this case, predictor can't know its own prediction, and so can't know agent's action, if that action allows to infer the prediction. (And this limitation can't be fought with computational power, so Omega is as susceptable.) For predictors, it's enough to have a fixpoint, to pick any self-fulfilling prediction. But if the environment is playing diagonal, as you describe, then the predictor can't make a correct prediction.
This is not about failure of environment to be decision-determined, the environment you describe simply has the predictor lose for every decision.
(If you consider the question in enough detail, the distinction between the decision-determined problems and other kinds of problems doesn't make sense, apart from highlighting that decision can be important apart from action-instance in the environment or other concepts, that all these are different concepts and decision makes sense abstractly, on its own.)
If the Predictor breaks sometimes, in a way dependent on the algorithm used, not on the decision made, then that's not decision-determined. That's decision-determined-unless-you-play-tit-for-tat, which doesn't count at all.
I think the fact that it's not decision-determined is fairly important, because that means it's not necessarily a Newcomblike problem. Haven't finished the manuscript yet, so I don't know all the implications of that, but I have my suspicions.
If the Predictor breaks sometimes, in a way dependent on the algorithm used, not on the decision made, then that's not decision-determined.
Yes, that's the mantra. But how do you unpack "dependent" and "breaks"? Dependent with respect to what alternatives (and how to think of those alternatives)? More importantly, how can you decide that something dependent on one thing doesn't depend on some other thing (while some uncertainty remains)?
As far as I can tell, all this dependence business has to be about resolution of logical uncertainty. You work with concepts, say A and B, that define the subject matter without giving you full understanding of their meaning, implications of the definitions. A depends on B when assuming an additional fact about B allows to infer something about A. By controlling B, you control A, and similarly if you find a C that controls B, you can control A through controlling C. All throughout, nothing is actually changed, the concepts are fixed.
If you know that A depends on B, and there's also some C, then unless assuming full knowledge of B allows you to obtain full knowledge of A, you won't be able to conclude that A is truly independent on C (screened off by B). If you are merely unable to see how knowing C can allow learning more about A, doesn't prohibit the possibility of figuring out a way later, and that would mean that C controls A after all.
So we can talk about action-determined outcomes and decision-determined outcomes, where the concepts of an action, or of a decision, are in known dependence with an outcome. But arguing that the outcome doesn't depend on given other concept is much more difficult, and more of impossible if you are dealing with sufficiently complicated uncertainty.
Decision-determined was used in the manuscript to mean completely determined (up to a probability distribution) by "decision-type," and ditto action-determined was used to mean completely determined up to a probability distribution by actions in a causal way. So it's simple to show that something isn't decision-determined, in the sense used; you only need one exception, one case where it depends on the algorithm and not just the decision.
In my example the predictor wins for every situation, but basically yeah. You're right that it could still be decision determined if we're okay with having it break in some cases.
I'm not okay with having it break in some cases, though; the real world doesn't return "undefined" very often. It's possible to "save" it as non-pathological though not decision-determined, which can then be applied to the real world.
Any optimality of basing your decision on the predictor's algorithm does not keep it from being a decision-determined problem.
If you do tit-for-tat, the predictor predicts that you will base your decision on what you see in the transparent box and therefore never fills that box. Your (conterfactual) decision (process) still determines the outcome.
I am still reading the paper, but I have a question:
Conversely in the CGTA variant of Solomon's Problem, a causal decision agent, knowing in advance that he would have to choose between chewing gum and avoiding gum, has no reason to precommit himself to avoiding gum. (pg. 9)
Why not? Based on earlier numbers (pg. 5), chewing gum will give you strictly better results. The paper even mentions that:
This table shows that whether you have the gene CGTA or not, your chance of dying of a throat abscess goes down if you chew gum.
What am I missing?
No reason to precommit to AVOID gum.
Gum is beneficial, so you don't want to precommit against it. Nonetheless, it is evidence of a bad thing.
I don't follow. Chewing gum is strictly better, so I'll precommit to it. Precommitting to picking only box B is better than precomitting to picking box A and B, so if I had to precommit I would choose to do so for box B.
Nonetheless, it is evidence of a bad thing.
That has been debunked.
I guess I am just following the parallels between the two problems.
Yes, you might want to precommit to it. But you don't want to precommit against it, which is Eliezer's point. In the parallel example (Newcombe's box), you do want to precommit against the thing which seems to strictly dominate, and the difference between the two cases is the justification for treating time-invariance as important.
Ok, hah, I don't think we disagree on anything here. I think I made a mistake in reading "has no reason to precommit himself to avoiding gum" as "has no reason to precommit himself [to anything]". My bad. Thanks for helping out!
That would be quite important! =)
Does he need to precommit to chew gum? I haven't read the doc. in months, but I don't recall their being any danger of temporal inconsistancy in that case.
No he doesn't. Eliezer compares this version of Solomon's problem to the Newcomb's problem, where precommitment actually makes a difference.
I tend to write stuff that gets voted down to oblivion. If you are voting this down, could the first few of you comment why? I'd appreciate learning how I don't fit in in case this is another of those posts that doesn't. Thanks in advance.
I extend Martin Gardner's idea. From a causal perspective, it would make no difference if the BOTH boxes were transparent. At 7 AM the alien puts down two transparent boxes in front of you, one with $1000 in it and one with either $1,000,000 or $0 in it. You can look in and see which he chose for you.
You have personally witnessed at least a thousand trials, and have heard from thousands of other people that you can question, many whom you know well, that have each witnessed at least 1000 trials. What you have seen, and heard these others have seen, is that the alien has NEVER been wrong, that EVERY time he put $1,000,000 in box B, the human took only box B, and EVERY time he put nothing in box B, the human took box A. Further, you have seen at least 20 cases where the Alien put $,1000,000 in box B and $1,000 in box A, and even though the boxes were fixed and transparent, you have seen the human choose only box B in all those cases. Further, you have been told by the other 1000 observers, some of whom you know quite well, and all of whom you have been able to question as you wish, that each of them have witnessed at least 20 cases where both boxes had money in them and the human chose only box B with $1,000,000.
You are sitting there with two boxes in front of you and you can see $1,000,000 in box B and $1,000 in box A. What do you do?
"Rationalists always win." Clearly an important part of this problem is BELIEF in the existence of, even the possibility of, an intelligence which can have such predictive powers. Faced with two boxes with money in them, you have the opportunity to stick a fork in this theory once and for all, or you can join the religion of the Omnisicient Alien and pick just the $1,000,000.
Amazingly, even if there is no $1,000,000 in front of you, only $1,000, you have the same choice. As stated, the Alien has correctly predicted every time in the past that when the human chose box A, box B was always empty. For a mere $1,000 you can provide the incredibly valuable information that Aliens are not what they seem to be. You can choose only the empty box B and help free humanity free of the superstitious belief in Omnisicient Aliens.
Can it be rational to pick only box B? In this case I maintain, you are not a rationalist, but a convert to the religion of the omniscient Alien. A rationalist can see with no doubt that he would be $1000 richer taking both boxes, AND he could put a hole in this deification process of the alien.
In my opiinion, this version of the problem exposes the likelihood that there is trickery going on. For 100s or 1000s of shows in a row, Siegfried and Roy made a tiger appear from thin air in a thin and isolated cage suspended in empty space above an empty stage. What are the chances that this was really a case of teleportation or spontaneous generation, and not just a trick where a regular tiger was moved some physically understandable way? And yet if you were priveleged to see this trick performed 100 times, most of you would be amazed every time.
What are the chances that there is not some trickery with this alien? Perhaps this alien created AIs which would choose as he knew they would, clothed them in human flesh, and pre-planted them over decades before coming to earth to create the appearance of being able to predict what humans would do. Is this explanation of what we are seeing really LESS likely than that an alien really has this kind of predictive and computational ability? Even humans with a few $1000 can create illusions, I give you the close-up room at the Magic Castle in L.A. as a place where you can go to see the laws of physics violated with as little as $10 investment in props. Is it more likely that an alien could set up an elaborate "sting" on humanity over a hundred years, or that all humans really are predictable down to how they pick a random number, flip a coin, choose someone from the phone book to flip a coin, or read the least significant digit on a voltmeter attachd to a battery they picked at random from an object in their house?
Occams razor says the Alien is tricking you. Everything else is a believe in magic, a return to religion with God as Omniscient Aliens.
This misses the point of Newcomb's problem entirely. The stuff about boxes and Omega is just an intuition pump; Newcomb's problem itself is more properly written as a computer program, which contains none of that other stuff. It is common to complain that no real-world scenario will ever correspond to that program, but that is true only in the same sense that the world can never contain the frictionless pulleys, perfect vacuums and rigid objects that come up in physics problems. It's not that complications like friction and the possibility of being deceived about the rules don't matter, but rather that you have to solve the simplified problem first before you add those complications back in. In decision theory, "Omega" is short for "without any complications not explicitly mentioned in the problem statement", so if you start adding in possibilities like illusionists then it isn't Newcomb's problem anymore.
My intuition has been pumped hard by this problem. My intuition is that it violates what we know about physics to be able to predict what each of 6 billion human beings will do confronted with the two boxes after one hour's time elapsed.
The particular physics I think is violated is quantum mechanical uncertainty. What we believe we know from quantum mechanical uncertainty is that there are a myriad of microscopic processes of which the outcome in our world cannot be predicted. We encase this result from quantum mechanics in at least two possible interpretations labeled Copenhagen and Many Worlds. But both of these interpretations have in common that for a myriad of common events starting at t1, there are multiple mutually exclusive possible outcomes possible at time t2>t1 that are, as far as either Copenhagen or MWI interpretations allow, intrinsically unpredictable at time t1. That is, at least two possible universes at time t2 are completely consistent with the single example universe at time t1: one in which one of these quantum events has turned out one way, and one in which it has turned out another way.
So now the question comes: does this have ANYTHING to do with Newcomb's problem? And it is trivial to make sure it does. During the hour I have between when the alien sets the boxes in front of me and when I must choose, I acquire a geiger counter, and I open up the stopwatch application on my iPhone. I tune the geiger counter using lead foil and possibly some medical isotopes so that it is triggering on average about once ever 60 seconds. I start the stopwatch, wait until it has run at least 15 seconds, and then stop it next time I hear a click from the geiger counter. I look at the least significant bit on the stopwatch, which is tenths-of-a-second on my iPhone. If that number is even I will pick two boxes if that number is odd, I will pick just box B.
As far as we know from Schrödinger's cat gedankedonks, the exact time of emissions of radioactive decay particles is quantumly "random." In Copenhagen, the collapse is at a random time, in many worlds, there is a different version of the universe for each possible decay time. Either way, for the Alien to have filled that box correctly he must be either 1) Able to predict the outcome of quantum phenomenon in a way that our physics currently believes is impossible 2) have flipped a coin and gotten lucky.
Now, with thousands of humans chosen to play this game, what are the chances that I am the only one chosen who includes a quantum coin toss in his choosing mechanism? Either the chances are low, in which case chances of the alien pulling off this scam are falling as 1/2^N where N is the number of quantum coin tosses among his choosers, OR the Alien is cheating.
The Alien's form of cheating might be one of many things. Perhaps he can correctly predict what SOME humans will do, and he only offers the game to those humans, in which case he will not have offered the game to me or any humans of my ilk.
My intuition has been pumped. I have been shown a gedanken problem which I think has some components equivalent to "assume a circle with four corners," or "assume 2+2=5" or some other counterfactual that is just so counter to the factuals in OUR world that pointing out this counterfactuality is the resolution to the paradox.
The things that rule out God as a good hypothesis is not his name, it is his properties. Perhaps the limited Omniscience of being able to predict reliably what any human will do in an hour when confronted with Newcomb's boxes is god-line enough to be tossed out with God from the list of good hypotheticals. It looks that way to me.
If I am right, we don't need to develop a decision theory that lets a Friendly AI self-modify to pick one box and still call the whole endeavour rational.
If you allow randomization, you have an underspecified problem again. But you can fix it easily enough by saying that Omega fills the box with the same probability that you one-box.
Here's a variant that may help your intuition. Supppose that rather than let you pick directly, Omega asks you to write a computer program that implements whatever strategy you would have used, and that program chooses one or two boxes. In that case, the prediction would be trivial, and you would certainly want to provide a program that one-boxed.
Now suppose that instead of writing a computer program, you are one. Because you'e been uploaded, say. In that case, you would want to be a program that one-boxes.
The thing is, due to the physics underlying your brain, you are a computer program. A very complicated, randomized computer program which can't always be predicted by any means other than simulating it and can't necessarily be simulated without using resources that aren't available in the universe. But that's Omega's problem. Yours is just choosing a number of boxes.
The original specification of Newcomb's problem had the alien empty box B if he predicted I would use a random number generator. I'm not sure why Eliezer removed that restriction, but he did and that is a big part of what I writing about.
If you already believe that a PHYSICAL random number generator can be built based no quantum processes, and that such a generator can be interfaced with a computer and therefore called by, controlled by, with results read by a computer, then you don't need to bother with the details in the next paragraph. The purpose of the next paragraph is to outline the design of such a quantum random number generator.
Get a beta radiation detector with computer interface. Computer must have appropriate two way interface and appropriate library to control and read the radiation detector. Computer must be set up with radiation detector and a beta radiation source (commercially available.)
First part of computer program runs and reads out average rate at which beta particles are being detected. Beta source is moved far away from detector, and it is verified that detector detects at less than once per 10 seconds, on average. Source is moved slowly towards detector until average detector rate is once per 2 seconds or higher. Source can be moved under computer control to make this all a pre=specified program. I would test this program before hardcoding numbers like 10 s and 2 s and the distance ranges the sample was moved, the point would be to get something where the pulses are slow compared to the computer time resolution, but fast compared to any "background" detection rate from this detector.
Now my program freezes the source in place, and runs a 20 s counter. When the 20 s counter is up, the program records the time of the very next beta particle it sees to whatever resolution the computer offers, but at least 1 ms resolution. The computer looks at the tenths-of-second digit in a decimal representation of the time using any onboard clock you care about. Perhaps it is time since the computer program was turned on in order to make it spedifiably simple. If that tenths of a second digit is even, computer chooses two boxes. If that tenths of a second digit is odd, computer chooses only box B.
I believe for this Alien to predict "my" choice, (my computer programs choice) it must be able to predict details of beta decay of my beta emitting sample. Beta decay is a fairly simple atomic decay process which is well characterized by relatively simple quantum mechanics, but which has as best physicists know, an unpredictable actual time that each beta decay will occur.
Now I don't know why Eliezer eliminated the "Alien empties box if you choose box randomly" but my point here is I can with asymptotically certain probability break the "winning" streak of the alien at predicting what humans will do, as I am able to get other humans to employ this technique. Either that or 1) QM as we know it is wrong or 2) the Alien is cheating, i.e., not doing what EY says he is doing.
Assuming EY got rid of "you lose if you go random" from the Alien's response for a reason, I think he is doing the equivalent of assuming pi = 22/7 exactly or that a square has only 3 sides, or SOME such thing where we are no longer in our universe when considering the problem.
That EY might be coming up with a decision theory that applies only to universes other than our own is not what I think he intends.
Seconding jimrandomh: you seem to be talking about issues that don't matter to decision theory very much. Let me reframe.
My own interest in the topic was sparked by Eliezer's remark about "AIs that know each other's source code". As far as I understand, his interest in decision theory isn't purely academic, it's supposed to be applied to building an AI. So the simplest possible approach is to try "solving decision theory" for deterministic programs that are dropped into various weird setups. It's not even necessary to explicitly disallow randomization: the predictor can give you a pony if it can prove you cooperate, and no pony otherwise. This way it's in your interest in some situations to be provably cooperative.
Now, if you're an AI that can modify your own source code, you will self-modify to become "provably cooperative" in precisely those situations where the payoff structure makes it beneficial. (And correspondingly "credibly threatening" in those situations that call for credible threats, I guess.) Classifying such situations, and mechanical ways of reasoning about them, is the whole point of our decision theory studies. Of course no one can prohibit you from randomizing in adversarial situations, e.g. if you assign a higher utility to proving Omega wrong than to getting a pony.
I definitely appreciate your and jimrandomh's comments. I am rereading Eliezer's paper again in light of these comments and clearly getting more on the "decision theory" page as I go.
Provably cooperative seems problematic, but maybe not. As a concept certainly useful. But is there any way to PROVE that the AI is actually running the code she shows you? I suspect probably not.
Also, where I was coming from with my comments may be a misunderstanding of what Eliezer was doing with Newcomb but it may not. At least in other posts, if not in this paper, he has said "rational means winning" and that a self-modifying AI would modify itself to be provably precommitted to box B in Newcomb's problem. What I think about there are two problems, one of which Eliezer touches on, the other which he doesn't.
First that he touches on: if the Alien is simply rewarding people for being irrational than its not clear we want an AI to self-modify to win Newcomb's problem. Clearly an all-powerful alien who threatens humanities existence if it doesn't worship him, maybe we do want an AI to abandon its rationality for that, but I'm not sure, and what you have here is "assuming God comes along and tells us all to toe the line or go to hell, what does Decision theory tell us to do?" Well the main issue there might be being actually sure that it is God that has come along and not just the man-behind-the-curtain, i.e. a trickster who has your dopey AI thinking it is god and abandoning its rationality, i.e. being hijacked by trickery.
The 2nd issue is: there must be some very high level or reliability required when you are contemplating action predicated on very unlikely hypotheses. If our friendly self-modifying AI sees 1000 instances of an Alien providing Newcomb's boxes (and 1000 is the number in Eliezer's paper), I don't want it concluding 1000 = certainty because it doesn't. Especially in a complex world where even finite humans using last century's technolgies can trick the crap out of other humans. If a self-modifying friendly AI sees something come along which appears to violate physics in order to provide a seemingly causal paradox which is laden with the emotion of a million dollars or a cure for your daughter's cancer, then the last thing I want that AI to do is to modify itself BEFORE it properly estimates the probabilities that the Alien is actually no smarter than Siegfried and Roy.
Its not concievable to me that resistance to getting tricked and properly understanding the influence of evidence especially when that evidence may be provided by an Alien even smarter and with more resources than Siegried and Roy is NOT part of decision theory. Maybe it is not the part Eliezer wants to discuss here.
In any case, I am rereading Eliezer's paper and will know more about Decision theory before my next comment. Thank you for your comments in that regard, I am finding I flow through Eliezer's paper more fluidly now after reading those comments.
is there any way to PROVE that the AI is actually running the code she shows you?
Nope; certainty is impossible to come by in worlds that contain a sufficiently powerful deceiver. That said, compiling the code she shows you on a different machine and having her shut herself down would be relatively compelling evidence in similar cases that don't posit an arbitrarily powerful deceiver.
If both boxes are transparent, then the problem is underspecified for agents whose action depends on what they see unless you add a rule to cover them. That doesn't mean that the parts of the problem which you have specified (namely, what happens to unconditional one-boxers and unconditional two-boxers) are invalid, just that you missed a case.
Actually, if a real-world analog to Newcomb's Problem ever came up in my real life, there's a not-insignificant chance that I would turn down the $1000 in the transparent box as well and just walk away -- that is, that I would zero-box -- under the general principle that if I don't trust the motives of the person setting up the game I do better not to take any of the choices they are encouraging me to take, no matter how obvious the choices may seem. Maybe I've wandered into the next Batman movie the box is poisoned or something.
Of course, if you insist on rejecting the setup to Newcomb's Problem rather than cooperating with it, you'll never get to see whether there's anything valuable being set up.
I think inherent in the problem is the condition that you fully understand what is going on and you know you aren't part of some weird trick.
It's not realistic, but being realistic isn't the point of the problem.
In addition to what the others have said, the class of "Newcomblike problems" does map to real-world scenarios. I do agree that insufficient effort has been spent describing such situations though, which is why I'm compiling examples for a possible article. Here's a peak at what I have so far:
The decision of whether to shoplift is a real-life Newcomb's problem. It is easy to get away with, and your decision does not cause (in the technical sense) the opportunity to exist (the "box" to be "filled"). However, merchants only locate stores ("fill the box") where they predict people won't (in sufficient numbers) take this opportunity, and their accuracy is high enough for retailing to stay profitable in the aggregate.
Evolution and its "genetic" decision theories: You could just be selfish and not spend resources spreading your genes (thus like stiffing the rescuer in Parfit's Hitchhiker); however, you would not be in the position to make such a choice unless you were already selected for your propensity not to make such a choice. (My article on the matter.)
Hazing, akrasia, and abuse cycles (where being abused motivates one to abuse others) are real-life examples of Counterfactual Mugging, since your decision within a "losing branch" has implications for (symmetric) versions of yourself in other branches.
Expensive punishment. Should you punish a criminal when the cost to do so exceeds the value of all future crimes they could ever commit? If you don't, you save on the costs of administering the punishment, but if criminals expect that this is sufficient reason for you not to punish, they have no reason not to commit the crimes. The situation is parallel to that of whether you should pay ransoms or other extortioners. (This has a non-obvious connection to Newcomb's problem that may require explanation -- but I elaborate in the link.)
I think that Newcomb's Problem is a terrible central example to work off of, though. Most of those look like they can be instrumentalized by reputation far better, and then everyone gets the right answers.