Newcomblike problems occur whenever knowledge about what decision you will make leaks into the environment. The knowledge doesn't have to be 100% accurate, it just has to be correlated with your eventual actual action.
This is far too general. The way in which information is leaking into the environment is what separates Newcomb's problem from the smoking lesion problem. For your argument to work you need to argue that whatever signals are being picked up on would change if the subject changed their disposition, not merely that these signals are correlated with the disposition.
Yay!
I realize that "yay!" Isn't really much of a comment, but I was waiting for this and now it's here. The poster has made the world a happier place.
Thanks, this was one of the more insightful things I remember reading about decision theory.
You've argued that many human situations are somewhat Newcomblike. Do we have a decision theory which deals cleanly with this continuum? (where the continuum is expressed for instance via the degree of correlation between your action and the other player's action)
Fantastic post, I think this is right on the money.
Many more Newcomblike scenarios simply don't feel like decision problems: people present ideas to us in specific ways (depending upon their model of how we make choices) and most of us don't fret about how others would have presented us with different opportunities if we had acted in different ways.
I think this is a big deal. Part of the problem is that the decision point (if there was anything so firm) is often quite temporally distant from the point at which the payoff happens. The time when you &quo...
Yes, thank you for writing this- I've been meaning to write something like it for a while and now I don't need to! I initially brushed Newcomb's Paradox off as an edge case and it took me much longer than I would have liked to realize how universal it was. A discussion of this type should be included with every introduction to the problem to prevent people from treating it as just some pointless philosophical thought experiment.
More substantively, can we express mathematically how the correlation between leaked signal and final choice effects the degree of sub optimality in final payouts?
Naively in the actual Newcombe's problem if omega is only correct 1/999,000+epsilon percent of the time then CDT seems to do about as well as whatever theory that solves this problem. Is there a known general case for this reasoning?
Has anyone written at length about the evolution of cooperation in humans in this kind of Newcomblike context? I know there's been oceans of ink spent from IPD perspectives, but what about from the acausal angle?
Thanks for doing this series!
I thought you were going to say that humans play Newcomb-like games with themselves, where a "disordered soul" doesn't bargain with itself properly. :)
I think this article has some truth in it, but that it also overstates its case. It seems that it'll only be certain cases where your demeanour at the time you are read will correlate with your final decision. Like let's suppose you arrive in town for an imperfect Parfit's Hitchhiker and someone gives you an argument for not paying that hadn't occurred to you before. Then it seems like you should be able to defect on the basis of this argument, without affecting the facial reading at the time. Of course, it isn't quite this simple...
The palm example is a bit confusing. Palms don't really tell the future: There is no direct causal link (unless someone listens to a palm-reader!). It would be much better if you gave a different example of confusing causality and correlation where there really was correlation.
It reminds me of Eliezer's example of the machine learning system that seemed to be finding camouflaged tanks, but in fact was confused by the sunniness of the different sets of photos: As far as I can tell that never happened.
Then there is the decision theory example of Solomon wan...
I'd suggest not using palm reading as an example, since palm lines really do not affect the future (unless you believe they do).
Likewise, the example of a machine learning system misanalyzing sunlight patterns as tanks is apparently fake.
The Solomon and Bathsheba problem looks more a discussion of an Oedipal complex at first. What a mess!
Eliezer had the good sense to transform the Smoking Lesion into a Chewing Gum Lesion, since smoking does in fact cause cancer. But chewing gum doesn't cause lesions. Alex Altair's example of toxoplasmosis was at least plau...
I agree that an intelligent agent who deals with other intelligent agents should have think in a way that makes reasoning about 'dispositions' and 'reputations' easy, because it's going to be doing it a lot.
But it's unclear to me that this requires a change to decision theory, instead of just a sophisticated model of what the agent's environment looks like that's tuned to thinking about dispositions and reputations. I think that an agent that realizes that the game keeps going on, and that its actions result in both immediate rewards and delayed shifts to...
I don't understand this yet, which isn't too surprising since I haven't read the background posts yet. However, all the "roughly speaking" summaries of the more exact stuff are enough to show me that this article is talking about something I'm curious about, so I'll be reading in more detail later probably.
Great lecture and article. This cleared up a lot of things for me. One thing I don't understand. You describe how an adversary can "go back in time" by simulating an earlier stage of an agent which started as CDT and self-modified to an improved decision theory, and so force the agent not to self-modify in that way.
You said that if the CDT agent would modify to be unblackmailable, the adversary could simulate an earlier version of that agent (the CDT version) and force it not modify to be unblackmailable.
This reminds me of another case: As has ...
As a person who did not study decision theories specifically, I desperately need more and clearer examples of BetterDT agents predictably outperforming CDT agents.
This is crossposted from my blog. In this post, I discuss how Newcomblike situations are common among humans in the real world. The intended audience of my blog is wider than the readerbase of LW, so the tone might seem a bit off. Nevertheless, the points made here are likely new to many.
1
Last time we looked at Newcomblike problems, which cause trouble for Causal Decision Theory (CDT), the standard decision theory used in economics, statistics, narrow AI, and many other academic fields.
These Newcomblike problems may seem like strange edge case scenarios. In the Token Trade, a deterministic agent faces a perfect copy of themself, guaranteed to take the same action as they do. In Newcomb's original problem there is a perfect predictor Ω which knows exactly what the agent will do.
Both of these examples involve some form of "mind-reading" and assume that the agent can be perfectly copied or perfectly predicted. In a chaotic universe, these scenarios may seem unrealistic and even downright crazy. What does it matter that CDT fails when there are perfect mind-readers? There aren't perfect mind-readers. Why do we care?
The reason that we care is this: Newcomblike problems are the norm. Most problems that humans face in real life are "Newcomblike".
These problems aren't limited to the domain of perfect mind-readers; rather, problems with perfect mind-readers are the domain where these problems are easiest to see. However, they arise naturally whenever an agent is in a situation where others have knowledge about its decision process via some mechanism that is not under its direct control.
2
Consider a CDT agent in a mirror token trade.
It knows that it and the opponent are generated from the same template, but it also knows that the opponent is causally distinct from it by the time it makes its choice. So it argues
It has failed, here, to notice that it can't choose separately from "agents spawned from my template" because it is spawned from its template. (That's not to say that it doesn't get to choose what to do. Rather, it has to be able to reason about the fact that whatever it chooses, so will its opponent choose.)
The reasoning flaw here is an inability to reason as if past information has given others veridical knowledge about what the agent will choose. This failure is particularly vivid in the mirror token trade, where the opponent is guaranteed to do exactly the same thing as the opponent. However, the failure occurs even if the veridical knowledge is partial or imperfect.
3
Humans trade partial, veridical, uncontrollable information about their decision procedures all the time.
Humans automatically make first impressions of other humans at first sight, almost instantaneously (sometimes before the person speaks, and possibly just from still images).
We read each other's microexpressions, which are generally uncontrollable sources of information about our emotions.
As humans, we have an impressive array of social machinery available to us that gives us gut-level, subconscious impressions of how trustworthy other people are.
Many social situations follow this pattern, and this pattern is a Newcomblike one.
All these tools can be fooled, of course. First impressions are often wrong. Con-men often seem trustworthy, and honest shy people can seem unworthy of trust. However, all of this social data is at least correlated with the truth, and that's all we need to give CDT trouble. Remember, CDT assumes that all nodes which are causally disconnected from it are logically disconnected from it: but if someone else gained information that correlates with how you actually are going to act in the future, then your interactions with them may be Newcomblike.
In fact, humans have a natural tendency to avoid "non-Newcomblike" scenarios. Human social structures use complex reputation systems. Humans seldom make big choices among themselves (who to hire, whether to become roommates, whether to make a business deal) before "getting to know each other". We automatically build complex social models detailing how we think our friends, family, and co-workers, make decisions.
When I worked at Google, I'd occasionally need to convince half a dozen team leads to sign off on a given project. In order to do this, I'd meet with each of them in person and pitch the project slightly differently, according to my model of what parts of the project most appealed to them. I was basing my actions off of how I expected them to make decisions: I was putting them in Newcomblike scenarios.
We constantly leak information about how we make decisions, and others constantly use this information. Human decision situations are Newcomblike by default! It's the non-Newcomblike problems that are simplifications and edge cases.
Newcomblike problems occur whenever knowledge about what decision you will make leaks into the environment. The knowledge doesn't have to be 100% accurate, it just has to be correlated with your eventual actual action (in such a way that if you were going to take a different action, then you would have leaked different information). When this information is available, and others use it to make their decisions, others put you into a Newcomblike scenario.
Information about what we're going to do is frequently leaking into the environment, via unconscious signaling and uncontrolled facial expressions or even just by habit — anyone following a simple routine is likely to act predictably.
4
Most real decisions that humans face are Newcomblike whenever other humans are involved. People are automatically reading unconscious or unintentional signals and using these to build models of how you make choices, and they're using those models to make their choices. These are precisely the sorts of scenarios that CDT cannot represent.
Of course, that's not to say that humans fail drastically on these problems. We don't: we repeatedly do well in these scenarios.
Some real life Newcomblike scenarios simply don't represent games where CDT has trouble: there are many situations where others in the environment have knowledge about how you make decisions, and are using that knowledge but in a way that does not affect your payoffs enough to matter.
Many more Newcomblike scenarios simply don't feel like decision problems: people present ideas to us in specific ways (depending upon their model of how we make choices) and most of us don't fret about how others would have presented us with different opportunities if we had acted in different ways.
And in Newcomblike scenarios that do feel like decision problems, humans use a wide array of other tools in order to succeed.
Roughly speaking, CDT fails when it gets stuck in the trap of "no matter what I signaled I should do [something mean]", which results in CDT sending off a "mean" signal and missing opportunities for higher payoffs. By contrast, humans tend to avoid this trap via other means: we place value on things like "niceness" for reputational reasons, we have intrinsic senses of "honor" and "fairness" which alter the payoffs of the game, and so on.
This machinery was not necessarily "designed" for Newcomblike situations. Reputation systems and senses of honor are commonly attributed to humans facing repeated scenarios (thanks to living in small tribes) in the ancestral environment, and it's possible to argue that CDT handles repeated Newcomblike situations well enough. (I disagree somewhat, but this is an argument for another day.)
Nevertheless, the machinery that allows us to handle repeated Newcomblike problems often seems to work in one-shot Newcomblike problems. Regardless of where the machinery came from, it still allows us to succeed in Newcomblike scenarios that we face in day-to-day life.
The fact that humans easily succeed, often via tools developed for repeated situations, doesn't change the fact that many of our day-to-day interactions have Newcomblike characteristics. Whenever an agent leaks information about their decision procedure on a communication channel that they do not control (facial microexpressions, posture, cadence of voice, etc.) that person is inviting others to put them in Newcomblike settings.
5
Most of the time, humans are pretty good at handling naturally arising Newcomblike problems. Sometimes, though, the fact that you're in a Newcomblike scenario does matter.
The games of Poker and Diplomacy are both centered around people controlling information channels that humans can't normally control. These games give particularly crisp examples of humans wrestling with situations where the environment contains leaked information about their decision-making procedure.
These are only games, yes, but I'm sure that any highly ranked Poker player will tell you that the lessons of Poker extend far beyond the game board. Similarly, I expect that highly ranked Diplomacy players will tell you that Diplomacy teaches you many lessons about how people broadcast the decisions that they're going to make, and that these lessons are invaluable in everyday life.
I am not a professional negotiator, but I further imagine that top-tier negotiators expend significant effort exploring how their mindsets are tied to their unconscious signals.
On a more personal scale, some very simple scenarios (like whether you can get let into a farmhouse on a rainy night after your car breaks down) are somewhat "Newcomblike".
I know at least two people who are unreliable and untrustworthy, and who blame the fact that they can't hold down jobs (and that nobody cuts them any slack) on bad luck rather than on their own demeanors. Both consistently believe that they are taking the best available action whenever they act unreliable and untrustworthy. Both brush off the idea of "becoming a sucker". Neither of them is capable of acting unreliable while signaling reliability. Both of them would benefit from actually becoming trustworthy.
Now, of course, people can't suddenly "become reliable", and akrasia is a formidable enemy to people stuck in these negative feedback loops. But nevertheless, you can see how this problem has a hint of Newcomblikeness to it.
In fact, recommendations of this form — "You can't signal trustworthiness unless you're trustworthy" — are common. As an extremely simple example, let's consider a shy candidate going in to a job interview. The candidate's demeanor (
confident
orshy
) will determine the interviewer's predispositiontowards
oragainst
the candidate. During the interview, the candidate may act eitherbold
ortimid
. Then the interviewer decides whether or not to hire the candidate.If the candidate is confident, then they will get the job (worth $100,000) regardless of whether they are bold or timid. If they are shy and timid, then they will not get the job ($0). If, however, thy are shy and bold, then they will get laughed at, which is worth -$10. Finally, though, a person who knows they are going to be timid will have a shy demeanor, whereas a person who knows they are going to be bold will have a confident demeanor.
It may seem at first glance that it is better to be timid than to be bold, because timidness only affects the outcome if the interviewer is predisposed against the candidate, in which case it is better to be timid (and avoid being laughed at). However, if the candidate knows that they will reason like this (in the interview) then they will be shy before the interview, which will predispose the interviewer against them. By contrast, if the candidate precommits to being bold (in this simple setting) then the will get the job.
Someone reasoning using CDT might reason as follows when they're in the interview:
To people who reason like this, we suggest avoiding causal reasoning during the interview.
And, in fact, there are truckloads of self-help books dishing out similar advice. You can't reliably signal trustworthiness without actually being trustworthy. You can't reliably be charismatic without actually caring about people. You can't easily signal confidence without becoming confident. Someone who cannot represent these arguments may find that many of the benefits of trustworthiness, charisma, and confidence are unavailable to them.
Compare the advice above to our analysis of CDT in the mirror token trade, where we say "You can't keep your token while the opponent gives theirs away". CDT, which can't represent this argument, finds that the high payoff is unavailable to it. The analogy is exact: CDT fails to represent precisely this sort of reasoning, and yet this sort of reasoning is common and useful among humans.
6
That's not to say that CDT can't address these problems. A CDT agent that knows it's going to face the above interview would precommit to being bold — but this would involve using something besides causal counterfactual reasoning during the actual interview. And, in fact, this is precisely one of the arguments that I'm going to make in future posts: a sufficiently intelligent artificial system using CDT to reason about its choices would self-modify to stop using CDT to reason about its choices.
We've been talking about Newcomblike problems in a very human-centric setting for this post. Next post, we'll dive into the arguments about why an artificial agent (that doesn't share our vast suite of social signaling tools, and which lacks our shared humanity) may also expect to face Newcomblike problems and would therefore self-modify to stop using CDT.
This will lead us to more interesting questions, such as "what would it use?" (spoiler: we don't quite know yet) and "would it self-modify to fix all of CDT's flaws?" (spoiler: no).