Vaniver comments on Newcomblike problems are the norm - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (108)
I agree that an intelligent agent who deals with other intelligent agents should have think in a way that makes reasoning about 'dispositions' and 'reputations' easy, because it's going to be doing it a lot.
But it's unclear to me that this requires a change to decision theory, instead of just a sophisticated model of what the agent's environment looks like that's tuned to thinking about dispositions and reputations. I think that an agent that realizes that the game keeps going on, and that its actions result in both immediate rewards and delayed shifts to its environment (which impact future rewards), will behave like you describe in section 4- it will use concepts like "fairness" and "honor" with an implied numerical value attached, because those are its estimates of how taking an anti-social action now will hurt it in the future, which it balances against the present gain to decide whether or not to take the action. And a CDT agent, with the right world-model, seems to me like it will do fine (which is my opinion about the proper Newcomb's problem, as well as Newcomb-like problems). I agree with eli_sennesh in this comment thread that getting the precise value of reputational effects requires actual prescience, but it seems to me that we can estimate it well enough to get along (though it seems possible that much of our 'estimation' is biological tuning rather than stored in memory).
I don't think I buy this interpretation. It seems to me that the CDT agent with a broader scope than 'the immediate future' thinks it's better to not break the precommitment than to break it (in situations where the math works out that way), because of the counterfactual effects that breaking a precommitment will have on the future. You become what you do!
CDT + Precommitments is not pure CDT -- I agree that CDT over time (with the ability to make and keep precommitments) does pretty well, and this is part of what I mean when I talk about how an agent using pure CDT to make every decision would self-modify to stop doing that (e.g., to implement precommitments, which is trivially easy when you can modify your own source code).
Consider the arguments of CDT agents as they twobox, when they claim that they would have liked to precommit but they missed their opportunity -- we can do better by deciding to act as we would have precommitted to act, but this entails using a different decision theory. You can minimize the number of missed opportunities by allowing CDT many opportunities to precommit, but that doesn't change the fact that CDT can't retrocommit.
If you look at the decision-making procedure of something which started out using CDT after it self-modifies a few times, the decision procedure probably won't look like CDT, even though it was implemented by CDT making "precommitments".
And while CDT mostly does well when the games are repeated, there are flaws that CDT won't be able to self-correct (roughly corresponding to CDT's inability to make retrocommitments), these will be the subject of future posts.
By CDT I mean calculating utilities using:
Most arguments that I see for the deficiency of CDT rest on additional assumptions that are not required by CDT. I don't see how we need to modify that equation to take into account precommitments, rather than modifying D(O_j).
For example, this requires the additional assumption that the future cannot cause the past. In the presence of a supernatural Omega, that assumption is violated.
Outside of supernatural opportunities, it's not obvious to me that this is a bug. I'll wait for you to make the future arguments at length, unless you want to give a brief version.
Right, you can modify the function that evaluates outcomes to change the payoffs (e.g. by making exploitation in the PD have a lower payoff that mutual cooperation, because it "sullies your honor" or whatever) and then CDT will perform correctly. But this is trivially true: I can of course cause that equation to give me the "right" answer by modifying D(O_j) to assign 1 to the "right" outcome and 0 to all other outcomes. The question is how you go about modifying D to identify the "right" answer.
I agree that in sufficiently repetitive environments CDT readily modifies the D function to alter the apparent payoffs in PD-like problems (via "precommitments"), but this is still an unsatisfactory hack.
First of all, the construction of the graph is part of the decision procedure. Sure, in certain situations CDT can fix its flaws by hiding extra logic inside D. However, I'd like to know what that logic is actually doing so that I can put it in the original decision procedure directly.
Secondly, CDT can't (or, rather, wouldn't) fix all of its flaws by modifying D -- it has some blind spots, which I'll go into later.
(I don't understand where your objection is here. What do you mean by 'supernatural'? Do you think you should always twobox in a Newcomb's problem where Omega is played by Paul Eckman, a good but imperfect predictor?)
You find yourself in a PD against a perfect copy of yourself. At the end of the game, I will remove the money your clone wins, destroy all records of what you did, re-merge you with your clone, erase both our memories of the process, and let you keep the money that you won (you will think it is just a gift to recompense you for sleeping in my lab for a few hours). You had not previously considered this situation possible, and had made no precommitments about what to do in such a scenario. What do you think you should do?
Also, what do you think the right move is on the true PD?
Given that you're going to erase my memory of this conversation and burn a lot of other records afterward, it's entirely possible that you're lying about whether it's me or the other me whose payout 'actually counts.' Makes no difference to you either way, right? We all look the same, and telling us different stories about the upcoming game would break the assumption of symmetry. Effectively, I'm playing a game of PD followed by a special step in which you flip a fair coin and, on heads, swap my reward with that of the other player.
So, I'd optimize for the combined reward to both myself and my clone, which is to say, for the usual PD payoff matrix, cooperate. If the reward for defecting when the other player cooperates is going to be worth drastically more to my postgame gestalt, to the point that I'd accept a 25% or less chance of that payout in trade for virtual certainty of the payout for mutual cooperation, I would instead behave randomly.
Saying "I wouldn't trust someone like that to tell the truth about whose payout counts" is fighting the hypothetical.
I don't think you need to assume the other party is a clone; you just need to assume that both you and the other party are perfect reasoners.
That they either must both hear the same story or else break the assumption of symmetry is an important objection to the hypothetical. Either choice breaks the problem statement as presented.
Thank you! If I was the other clone and heard that I was about to play a game of PD which would have no consequences for anyone except the other player, who was also me, that would distort my incentives.
It's established in the problem statement that the experimenter is going to destroy or falsify all records of what transpired during the game, including the fact that a game even took place, presumably to rule out cooperation motivated by reputational effects. If you want a perfectly honest and trustworthy experimenter, establish that axiomatically, or at least don't establish anything that directly contradicts.
Assuming that the other party is a clone with identical starting mind-state makes it a much more tractable problem. I don't have much idea how perfect reasoners behave; I've never met one.
I agree with this. It seems to me that answers about how to modify D are basically questions about how to model the future; you need to price the dishonor in defecting, which seems to me to require at least an implicit model of how valuable honor will be over the course of the future. By 'honor,' I just mean a computational convenience that abstracts away a feature of the uncertain future, not a terminal value. (Humans might have this built in as a terminal value, but that seems to be because it was cheaper for evolution to do so than the alternative.)
I don't think I agree with the claim that this is an unsatisfactory hack. To switch from decision-making to computer vision as the example, I hear your position as saying that neural nets are unsatisfactory for solving computer vision, so we need to develop an extension, and my position as saying that neural nets are the right approach, but we need very wide nets with very many layers. A criticism of my position could be "but of course with enough nodes you can model an arbitrary function, and so you can solve computer vision like you could solve any problem," but I would put forward the defense that complicated problems require complicated solutions; it seems more likely to me that massive databases of experience will solve the problem than improved algorithmic sophistication.
In the natural universe, it looks to me like opportunities that promise retrocausation turn out to be scams, and this is certain enough to be called a fundamental property. In hypothetical universes, this doesn't have to be the case, but it's not clear to me how much effort we should spend on optimizing hypothetical universes. In either case, it seems to me this is something that the physics module (i.e. what gives you P(O_j|do(A))) should compute, and only baked into the decision theory by the rules about what sort of causal graphs you think are likely.
Given that professional ethicists are neither nicer nor more dependable than similar people of their background, I'll jump on the signalling grenade to point out that any public discussion of these sorts of questions is poisoned by signalling. If I expected that publicly declaring my willingness to one-box would increase the chance that I'm approached by Newcomb-like deals, then obviously I would declare my willingness to one-box. As it turns out, I'm trustworthy and dependable in real life, because of both a genetic predisposition towards pro-social behavior (including valuing things occurring after my death) and a reflective endorsement of the myriad benefits of behaving in that way.
I decided a long time ago to cooperate with myself as a general principle, and I think that was more a recognition of my underlying personality than it was a conscious change.
If the copy is perfect, it seems unreasonable to me to not draw a causal arrow between my action and my copy's action, as I cannot justify the assumption that my action will be independent of my perfect copy's action. Estimating that the influence is sufficiently high, then it seems that (3,3) is a better option that (0,0). I'm moderately confident a hypothetical me which knew about causal models but hadn't thought about identity or intertemporal cooperation would use the same line of reasoning to cooperate.
The problem is the
do(A)part: thedo(.)function ignores logical acausal connections between nodes. That was the theme of this post.I agree! If the copy is perfect, there is a connection. However, the connection is not a causal one.
Obviously you want to take the action that maximizes your expected utility, according to probability-weighted outcomes. The question is how you check the outcome that would happen if you took a given action.
Causal counterfactual reasoning prescribes evaluating counterfactuals by intervening on the graph using the
do(.)function. This (roughly) involves identifying your action nodeA, ignoring the causal ancestors, overwriting the node with the functionconst a(whereais the action under consideration) and seeing what happens. This usually works fine, but there are some cases where this fails to correctly compute the outcomes (namely, where others are reasoning about the contentsA, where their internal representations ofAwere not affected by yourdo(A=a)).This is not fundamentally a problem of retrocausality, it's fundamentally a problem of not knowing how to construct good counterfactuals. What does it mean to consider that a deterministic algorithm returns something that it doesn't return?
do(.)says that it means "imagine you were not you, but were insteadconst awhile other people continue reasoning as if you were you". It would actually be really surprising if this worked out in situations where others have internal representations of the contents ofA(whichdo(A=.)stomps all over).You answered that you intuitively feel like you should draw an arrow between you and your clone in the above thought experiment. I agree! But constructing a graph like this (where things that are computed via the same process must have the same output) is actually not something that CDT does. This problem in particular was the motivation behind TDT (which uses a different function besides
do(.)to construct counterfactuals that preserve the fact that identical computations will have identical outputs). It sounds like we probably have similar intuitions about decision theory, but perhaps different ideas about what thedo(.)function is capable of?I still think this should be solved by the physics module.
For example, consider two cases. In case A, Ekman reads everything you've ever written on decision theory before September 26th, 2014, and then fills the boxes as if he were Omega, and then you choose whether to one-box or two-box. Ekman's a good psychologist, but his model of your mind is translucent to you at best- you think it's more likely than not that he'll guess correctly what you'll pick, but know that it's just mediated by what you've written that you can't change.
In case B, Ekman watches your face as you choose whether to press the one-box button or the two-box button without being able to see the buttons (or your finger), and then predicts your choice. Again, his model of your mind is translucent at best to you; probably he'll guess correctly, but you don't know what specifically he's basing his decision off of (and suppose that even if you did, you know that you don't have sufficient control over your features to prevent information from leaking).
It seems to me that the two cases deserve different responses- in case A, you don't think your current thoughts will impact Ekman's move, but in case B, you do. In a normal token trade, you don't think your current thoughts will impact your partner's move, but in a mirror token trade, you do. Those differences in belief are because of actual changes in the perceived causal features of the situation, which seems sensible to me.
That is, I think this is a failure of the process you're using to build causal maps, not the way you're navigating those causal maps once they're built. I keep coming back to the criterion "does a missing arrow imply independence?" because that's the primary criterion for building useful causal maps, and if you have 'logical nodes' like "the decision made by an agent with a template X" then it doesn't make sense to have a copy of that logical node elsewhere that's allowed to have a distinct value.
That is, I agree that this question is important:
But my answer to it is "don't try to intervene at a node unless your causal model was built under the assumption you could intervene at that node." The mirror token trade causal map you used in this post works if you intervene at 'template,' but I argue it doesn't work if you intervene at 'give?' unless there's an arrow that points from 'give?' to 'their decision.'
I think I see do(.) operator as less capable than you do; in cases where the physicality of our computation matters then we need to have arrows pointing out of the node where we intervene that we don't need when we can ignore the impacts of having to physically perform computations in reality. Furthermore, it seems to me that when we're at the level where how we physically process possibilities matters, 'decision theory' may not be a useful concept anymore.
Cool, it sounds like we mostly agree. For instance, I agree that once you set up the graph correctly, you can intervene
do(.)style and get the Right Answer. The general thrust of these posts is that "setting up the graph correctly" involves drawing in lines / representing world-structure that is generally considered (by many) to be "non-causal".Figuring out what graph to draw is indeed the hard part of the problem -- my point is merely that "graphs that represent the causal structure of the universe and only the causal structure of the universe" are not the right sort of graphs to draw, in the same way that a propensity theory of probability that only allows information to propagate causally is not a good way to reason about probabilities.
Figuring out what sort of graphs we do want to intervene on requires stepping beyond a purely causal decision theory.
Yeah, the existence of classification into 'future' and 'past' and 'future' not causing 'past', and what is exactly 'future', those are - ideally - a matter of the model of physics employed. Currently known physics already doesn't quite work like this - it's not just the future that can't cause the present, but anything outside the past lightcone.
All those decision theory discussions leave me with a strong impression that 'decision theory' is something which is applied almost solely to the folk physics. As an example of a formalized decision making process, we have AIXI, which doesn't really do what philosophers say either CDT or EDT does.
Actually, I think AIXI is basically CDT-like, and I suspect that it would two-box on Newcomb's problem.
At a highly abstract level, the main difference between AIXI and a CDT agent is that AIXI has a generalized way of modeling physics (but it has a built-in assumption of forward causality), whereas the CDT agent needs you to tell it what the physics is in order to make a decision.
The optimality of the AIXI algorithm is predicated on viewing itself as a "black box" as far as its interactions with the environment are concerned, which is more or less what the CDT agent does when it makes a decision.
AIXI is a machine learning (hyper-)algorithm, hence we can't expect it to perform better than a random coin toss on a one-shot problem.
If you repeatedly pose Newcomb's problem to an AIXI agent, it will quickly learn to one-box.
Trivially, AIXI doesn't model the problem acausal structure in any way. For AIXI, this is just a matter of setting a bit and getting a reward, and AIXI will easily figuring out that setting its decision bit to "one-box" yields an higher expected reward that setting it to "two-box".
In fact, you don't even need an AIXI agent to do that: any reinforcement learning toy agent will be able to do that.
The problem you're discussing is not Newcomb's problem; it's a different problem that you've decided to apply the same name to.
It is a crucial part of the setup of Newcomb's problem that the agent is presented with significant evidence about the nature of the problem. This applies to AIXI as well; at the beginning of the problem AIXI needs to be presented with observations that give it very strong evidence about Omega and about the nature of the problem setup. From Wikipedia:
"By the time the game begins, and the player is called upon to choose which boxes to take, the prediction has already been made, and the contents of box B have already been determined. That is, box B contains either $0 or $1,000,000 before the game begins, and once the game begins even the Predictor is powerless to change the contents of the boxes. Before the game begins, the player is aware of all the rules of the game, including the two possible contents of box B, the fact that its contents are based on the Predictor's prediction, and knowledge of the Predictor's infallibility. The only information withheld from the player is what prediction the Predictor made, and thus what the contents of box B are."
It seems totally unreasonable to withhold information from AIXI that would be given to any other agent facing the Newcomb's problem scenario.
That would require the AIXI agent to have been pretrained to understand English (or some language as expressive as English) and have some experience at solving problems given a verbal explanation of the rules.
In this scenario, the AIXI internal program ensemble concentrates its probability mass on programs which associate each pair of one English specification and one action to a predicted reward. Given the English specification, AIXI computes the expected reward for each action and outputs the action that maximizes the expected reward.
Note that in principle this can implement any computable decision theory. Which one it would choose depend on the agent history and the intrinsic bias of its UTM.
It can be CDT, EDT, UDT, or, more likely, some approximation of them that worked well for the agent so far.
I don't think someone posing Newcomb's problem would be particularly interested in excuses like "but what if the agent only speaks French!?" Obviously as part of the setup of Newcomb's problem AIXI has to be provided with an epistemic background that is comparable to that of its intended target audience. This means it doesn't just have to be familiar with English, it has to be familiar with the real world, because Newcomb's problem takes place in the context of the real world (or something very much like it).
I think you're confusing two different scenarios:
- Someone training an AIXI agent to output problem solutions given problem specifications as inputs.
- Someone actually physically putting an AIXI agent into the scenario stipulated by Newcomb's problem.
The second one is Newcomb's problem; the first is the "what is the optimal strategy for Newcomb's problem?" problem.
It's the second one I'm arguing about in this thread, and it's the second one that people have in mind when they bring up Newcomb's problem.
Then AIXI ensemble will be dominated by programs which associate "real world" percepts and actions to predicted rewards.
The point is that there is no way, short of actually running the (physically impossible) experiment, that we can tell whether the behavior of this AIXI agent will be consistent with CDT, EDT, or something else entirely.
Would it be a valid instructional technique to give someone (particularly someone congenitally incapable of learning any other way) the opportunity to try out a few iterations of the 'game' Omega is offering, with clearly denominated but strategically worthless play money in place of the actual rewards?
The main issue with that is that Newcomb's problem is predicated on the assumption that you prefer getting a million dollars to getting a thousand dollars. For the play money iterations, that assumption would not hold.
The second issue with iterating Newcomb's more generally is that it gives the agent an opportunity to precommit to one-boxing. The problem is more interesting and more difficult if you face it without having had that opportunity.
Why not? People can get pretty competitive even when there's nothing really at stake, and current-iteration play money is a proxy for future-iteration real money.
I'm not sure it really makes an assumption of causality, let alone a forward one. (Apart from the most rudimentary notion that actions determine future input) . Facing an environment with two manipulators seemingly controlled by it, it wont have a hang up over assuming that it equally controls both. Indeed it has no reason to privilege one. Facing an environment with particular patterns under its control, it will assume it controls instances of said pattern. It doesn't view itself as anything at all. It has inputs and outputs, it builds a model of whats inbetween from the experience, if there are two idenical instances of it, it learns a weird model.
Edit: and what it would do in Newcombs, itll one box some and two box some and learn to one box. Or at least, the variation that values information will.
First of all, for any decision problem it's an implicit assumption that you are given sufficient information to have a very high degree of certainty about the circumstances of the problem. If presented with the appropriate evidence, AIXI should be convinced of this. Indeed, given its nature as an "optimal sequence-predictor", it should take far less evidence to convince AIXI than it would take to convince a human. You are correct that if it was presented Newcomb's problem repeatedly then in the long run it should eventually try one-boxing, but if it's highly convinced it could take a very long time before it's worth it for AIXI to try it.
Now, as for an assumption of causality, the model that AIXI has of the agent/environment interaction is based on an assumption that both of them are chronological Turing machines---see the description here. I'm reasonably sure this constitutes an assumption of forward causality.
Similarly, what AIXI would do in Newcomb's problem depends very specifically on its notion of what exactly it can control. Just as a CDT agent does, AIXI should understand that whether or not the opaque box contains a million dollars is already predetermined; in fact, given that AIXI is a universal sequence predictor it should be relatively trivial for it to work out whether the box is empty or full. Given that, AIXI should calculate that it is optimal for it to two-box, so it will two-box and get $1000. For AIXI, Newcomb's problem should essentially boil down to Agent Simulates Predictor.
Ultimately, the AIXI agent makes the same mistake that CDT makes - it fails to understand that its actions are ultimately controlled not by the agent itself, but by the output of the abstract AIXI equation, which is a mathematical construct that is accessible not just to AIXI, but the rest of the world as well. The design of the AIXI algorithm is inherently flawed because it fails to recognize this; ultimately this is the exact same error that CDT makes.
Granted, this doesn't answer the interesting question of "what does AIXI do if it predicts Newcomb's problem in advance?", because before Omega's prediction AIXI has an opportunity to causally affect that prediction.
What it doesn't do, is make an assumption that there must be physical sequence of dominoes falling on each other from one singular instance of it, to the effect.
Not at all. It can't self predict. We assume that the predictor actually runs AIXI equation.
Ultimately, it doesn't know what's in the boxes, and it doesn't assume that what's in the boxes is already well defined (there's certainly codes where it is not), and it can learn it controls contents of the box in precisely the same manner as it has to learn that it controls it's own robot arm or what ever is it that it controls. Ultimately it can do exactly same output->predictor->box contents as it does for output->motor controller->robot arm. Indeed if you don't let it observe 'its own' robot arm, and only let it observe the box, that's what it controls. It has no more understanding that this box labelled 'AIXI' is the output of what it controls, than it has about the predictor's output.
It is utterly lacking this primate confusion over something 'else' being the predictor. The predictor is representable in only 1 way, and that's an extra counter factual insertion of actions into the model.
You need to notice and justify changing the subject.
If I was to follow your line of reasoning, then CDT also one-boxes on Newcomb's problem, because CDT can also just believe that its action causes the prediction. That goes against the whole point of the Newcomb setup - the idea is that the agent is given sufficient evidence to conclude, with a high degree of confidence, that the contents of the boxes are already determined before it chooses whether to one-box or two-box.
AIXI doesn't assume that the causality is made up of a "physical sequence of dominoes falling", but that doesn't really matter. We've stated as part of the problem setup that Newcomb's problem does, in fact, work that way, and a setup where Omega changes the contents of the boxes in advance, rather than doing it after the fact via some kind of magic, is obviously far simpler, and hence far more probable given a Solomonoff prior.
As for the predictor, it doesn't need to run the full AIXI equation in order to make a good prediction. It just needs to conclude that due to the evidence AIXI will assign high probability to the obviously simpler, non-magical explanation, and hence AIXI will conclude that the contents of the box are predetermined, and hence AIXI will two-box.
There is no need for Omega to actually compute the (uncomputable) AIXI equation. It could simply take the simple chain of reasoning that I've outlined above. Moreover, it would be trivially easy for AIXI to follow Omega's chain of reasoning, and hence predict (correctly) that the box is, in fact, empty, and walk away with only $1000.
Again, folk physics. You make your action available to your world model at the time t where t is when you take that action. You propagate the difference your action makes (to avoid re-evaluating everything). So you need back in time magic.
Let's look at the equation here: http://www.hutter1.net/ai/uaibook.htm . You have a world model that starts at some arbitrary point well in the past (e.g. big bang), which proceeds from that past into the present, and which takes the list of past actions and the current potential action as an input. Action which is available to the model of the world since it's very beginning. When evaluating potential action 'take 1 box', the model has money in the first box, when evaluating potential action 'take 2 boxes', the model doesn't have money in the first box, and it doesn't do any fancy reasoning about the relation between those models and how those models can and can't differ. It just doesn't perform this time saving optimization of 'let first box content be x, if i take 2 boxes, i get x+1000 > x'.
Why would they do that?
CDT two-boxes because CDT simply fails to understand that the content of the box is influenced by its decision. It deliberately uses an incorrect epistemic model.
So when the agent two-boxes and it obtains a reward different than what it had predicted, it will simply think it has been lied to, or if it is one hundred percent, certain that the model was correct, then it will experience a logical contradiction, halt and catch fire.
I think with sufficiently sophisticated models essentially all of the decision theories should collapse to recommending the correct answer. But our models are often not sufficiently sophisticated (and if our environment includes agents of comparable or greater complexity it may be that they can't be). Having models (+ decision theories) which are usable by boundedly rational agents and tend to give good outcomes is very valuable.
To my mind this post has presented a good case that Newcomblike scenarios present CDT with issues as a practical decision-making heuristic.
Agreed, in that I've made the argument that EDT (which operates on joint probability distributions) can emulate CDT (which operates on causal graphs) by adopting a particular network structure that (at additional cost) recreates the math of causal graphs. I see the EDT vs. CDT question as basically asking "does it make more sense to use joint probability distributions or causal models?" and the answer is "causal models are a more powerful language that are more closely tuned to the problem of making decisions, so use those."
Now, perhaps there's a way of representing the environment that's better at encoding the decision-relevant information than causal graphs, and that using this superior structure requires upgrading to 'next decision theory' instead of painfully encoding that information into causal graphs. I'm fully aware of the possibility that I'm the hapless Blub programmer here, saying "but why would you ever need to do y?", and if so I'd like to be convinced than y is actually useful.
But a part of convincing me of that, I think, is showing that the environment-belief structure used by whatever 'next decision theory' we're considering is a more powerful language than causal models, and I think the traditional decision theory comparison approach of putting forward a situation and asking how reasoners using various theories would handle it is not particularly convincing at doing that.