So you don't predict anything, just put nothing in the first box, and advertise this fact clearly enough for the agent making the choice.
A version of the problem in which Omega is predictable is hardly the same thing as a version of the problem in which the first box is always empty. Other algorithms get the million dollars; it's just that AIXI does not. Moreover, AIXI is not being punished simply for being AIXI; AIXI not getting the million dollars is a direct consequence of the output of the AIXI algorithm.
Newcomb's original problem did not include the clause 'by the way, there's nothing in the first box'. You're adding that clause by making additional assertions regarding what AIXI knows about "Omega".
Of course it didn't include that clause; it would be a rather stupid problem if it did include that clause. On the other hand, what is in the statement of Newcomb's problem is "By the time the game begins, and the player is called upon to choose which boxes to take, the prediction has already been made, and the contents of box B have already been determined." Moreover, it is quite clearly stated that the agent playing the game is made fully aware of this fact.
If we stipulate, for the sake of argument, that AIXI cannot work out the contents of the opaque box, AIXI still fails and two-boxes. By the problem statement AIXI should already be convinced that the contents of the boxes are predetermined. Consequently, the vast majority of weight in AIXI's distribution over world models should be held by models in which AIXI's subsequent action has no effect on the contents of the box, and so AIXI will rather straightforwardly calculate two-boxing to have higher utility. Moreover, it's easy for Omega to deduce this, and so the first box will be empty, and so AIXI gets $1000.
Setting the stipulation aside, I still think it should be pretty easy for AIXI to deduce that the box is empty. Given Omega's astounding predictive success it is far more likely that Omega has a non-trivial capacity for intelligent reasoning and uses this reasoning capacity with a goal of making accurate predictions. As such, I would be surprised if an Omega-level predictor was not able to come across the simple argument I gave above. Of course, as I said above, it doesn't really matter if AIXI can't deduce the contents of the box; AIXI two-boxes and loses either way.
There's a truly crazy amount of misunderstandings with regards to what Solomonoff Induction can learn about the world, on LW.
Let's say you run AIXI, letting it oversee some gigabytes of webcam data, at your location. You think AIXI can match the exact location of raindrops on your roof, hours in advance? You think AIXI is going to know all about me - the DNA I have, how may I construct a predictor, etc?
No, I don't think that.
AIXI not getting the million dollars is a direct consequence of the output of the AIXI algorithm.
Really? I thought your predictor didn't evaluate the algorithm, so how is that a 'direct consequence'?
By the problem statement AIXI should already be convinced that the contents of the boxes are predetermined.
Yeah, and in the Turing machine provided with the tape where the action is "choose 1 box" (the tape is provided at the very beginning), the content of the box is predetermined to have 1 million, while in the entirely different Turing machi...
This is crossposted from my blog. In this post, I discuss how Newcomblike situations are common among humans in the real world. The intended audience of my blog is wider than the readerbase of LW, so the tone might seem a bit off. Nevertheless, the points made here are likely new to many.
1
Last time we looked at Newcomblike problems, which cause trouble for Causal Decision Theory (CDT), the standard decision theory used in economics, statistics, narrow AI, and many other academic fields.
These Newcomblike problems may seem like strange edge case scenarios. In the Token Trade, a deterministic agent faces a perfect copy of themself, guaranteed to take the same action as they do. In Newcomb's original problem there is a perfect predictor Ω which knows exactly what the agent will do.
Both of these examples involve some form of "mind-reading" and assume that the agent can be perfectly copied or perfectly predicted. In a chaotic universe, these scenarios may seem unrealistic and even downright crazy. What does it matter that CDT fails when there are perfect mind-readers? There aren't perfect mind-readers. Why do we care?
The reason that we care is this: Newcomblike problems are the norm. Most problems that humans face in real life are "Newcomblike".
These problems aren't limited to the domain of perfect mind-readers; rather, problems with perfect mind-readers are the domain where these problems are easiest to see. However, they arise naturally whenever an agent is in a situation where others have knowledge about its decision process via some mechanism that is not under its direct control.
2
Consider a CDT agent in a mirror token trade.
It knows that it and the opponent are generated from the same template, but it also knows that the opponent is causally distinct from it by the time it makes its choice. So it argues
It has failed, here, to notice that it can't choose separately from "agents spawned from my template" because it is spawned from its template. (That's not to say that it doesn't get to choose what to do. Rather, it has to be able to reason about the fact that whatever it chooses, so will its opponent choose.)
The reasoning flaw here is an inability to reason as if past information has given others veridical knowledge about what the agent will choose. This failure is particularly vivid in the mirror token trade, where the opponent is guaranteed to do exactly the same thing as the opponent. However, the failure occurs even if the veridical knowledge is partial or imperfect.
3
Humans trade partial, veridical, uncontrollable information about their decision procedures all the time.
Humans automatically make first impressions of other humans at first sight, almost instantaneously (sometimes before the person speaks, and possibly just from still images).
We read each other's microexpressions, which are generally uncontrollable sources of information about our emotions.
As humans, we have an impressive array of social machinery available to us that gives us gut-level, subconscious impressions of how trustworthy other people are.
Many social situations follow this pattern, and this pattern is a Newcomblike one.
All these tools can be fooled, of course. First impressions are often wrong. Con-men often seem trustworthy, and honest shy people can seem unworthy of trust. However, all of this social data is at least correlated with the truth, and that's all we need to give CDT trouble. Remember, CDT assumes that all nodes which are causally disconnected from it are logically disconnected from it: but if someone else gained information that correlates with how you actually are going to act in the future, then your interactions with them may be Newcomblike.
In fact, humans have a natural tendency to avoid "non-Newcomblike" scenarios. Human social structures use complex reputation systems. Humans seldom make big choices among themselves (who to hire, whether to become roommates, whether to make a business deal) before "getting to know each other". We automatically build complex social models detailing how we think our friends, family, and co-workers, make decisions.
When I worked at Google, I'd occasionally need to convince half a dozen team leads to sign off on a given project. In order to do this, I'd meet with each of them in person and pitch the project slightly differently, according to my model of what parts of the project most appealed to them. I was basing my actions off of how I expected them to make decisions: I was putting them in Newcomblike scenarios.
We constantly leak information about how we make decisions, and others constantly use this information. Human decision situations are Newcomblike by default! It's the non-Newcomblike problems that are simplifications and edge cases.
Newcomblike problems occur whenever knowledge about what decision you will make leaks into the environment. The knowledge doesn't have to be 100% accurate, it just has to be correlated with your eventual actual action (in such a way that if you were going to take a different action, then you would have leaked different information). When this information is available, and others use it to make their decisions, others put you into a Newcomblike scenario.
Information about what we're going to do is frequently leaking into the environment, via unconscious signaling and uncontrolled facial expressions or even just by habit — anyone following a simple routine is likely to act predictably.
4
Most real decisions that humans face are Newcomblike whenever other humans are involved. People are automatically reading unconscious or unintentional signals and using these to build models of how you make choices, and they're using those models to make their choices. These are precisely the sorts of scenarios that CDT cannot represent.
Of course, that's not to say that humans fail drastically on these problems. We don't: we repeatedly do well in these scenarios.
Some real life Newcomblike scenarios simply don't represent games where CDT has trouble: there are many situations where others in the environment have knowledge about how you make decisions, and are using that knowledge but in a way that does not affect your payoffs enough to matter.
Many more Newcomblike scenarios simply don't feel like decision problems: people present ideas to us in specific ways (depending upon their model of how we make choices) and most of us don't fret about how others would have presented us with different opportunities if we had acted in different ways.
And in Newcomblike scenarios that do feel like decision problems, humans use a wide array of other tools in order to succeed.
Roughly speaking, CDT fails when it gets stuck in the trap of "no matter what I signaled I should do [something mean]", which results in CDT sending off a "mean" signal and missing opportunities for higher payoffs. By contrast, humans tend to avoid this trap via other means: we place value on things like "niceness" for reputational reasons, we have intrinsic senses of "honor" and "fairness" which alter the payoffs of the game, and so on.
This machinery was not necessarily "designed" for Newcomblike situations. Reputation systems and senses of honor are commonly attributed to humans facing repeated scenarios (thanks to living in small tribes) in the ancestral environment, and it's possible to argue that CDT handles repeated Newcomblike situations well enough. (I disagree somewhat, but this is an argument for another day.)
Nevertheless, the machinery that allows us to handle repeated Newcomblike problems often seems to work in one-shot Newcomblike problems. Regardless of where the machinery came from, it still allows us to succeed in Newcomblike scenarios that we face in day-to-day life.
The fact that humans easily succeed, often via tools developed for repeated situations, doesn't change the fact that many of our day-to-day interactions have Newcomblike characteristics. Whenever an agent leaks information about their decision procedure on a communication channel that they do not control (facial microexpressions, posture, cadence of voice, etc.) that person is inviting others to put them in Newcomblike settings.
5
Most of the time, humans are pretty good at handling naturally arising Newcomblike problems. Sometimes, though, the fact that you're in a Newcomblike scenario does matter.
The games of Poker and Diplomacy are both centered around people controlling information channels that humans can't normally control. These games give particularly crisp examples of humans wrestling with situations where the environment contains leaked information about their decision-making procedure.
These are only games, yes, but I'm sure that any highly ranked Poker player will tell you that the lessons of Poker extend far beyond the game board. Similarly, I expect that highly ranked Diplomacy players will tell you that Diplomacy teaches you many lessons about how people broadcast the decisions that they're going to make, and that these lessons are invaluable in everyday life.
I am not a professional negotiator, but I further imagine that top-tier negotiators expend significant effort exploring how their mindsets are tied to their unconscious signals.
On a more personal scale, some very simple scenarios (like whether you can get let into a farmhouse on a rainy night after your car breaks down) are somewhat "Newcomblike".
I know at least two people who are unreliable and untrustworthy, and who blame the fact that they can't hold down jobs (and that nobody cuts them any slack) on bad luck rather than on their own demeanors. Both consistently believe that they are taking the best available action whenever they act unreliable and untrustworthy. Both brush off the idea of "becoming a sucker". Neither of them is capable of acting unreliable while signaling reliability. Both of them would benefit from actually becoming trustworthy.
Now, of course, people can't suddenly "become reliable", and akrasia is a formidable enemy to people stuck in these negative feedback loops. But nevertheless, you can see how this problem has a hint of Newcomblikeness to it.
In fact, recommendations of this form — "You can't signal trustworthiness unless you're trustworthy" — are common. As an extremely simple example, let's consider a shy candidate going in to a job interview. The candidate's demeanor (
confidentorshy) will determine the interviewer's predispositiontowardsoragainstthe candidate. During the interview, the candidate may act eitherboldortimid. Then the interviewer decides whether or not to hire the candidate.If the candidate is confident, then they will get the job (worth $100,000) regardless of whether they are bold or timid. If they are shy and timid, then they will not get the job ($0). If, however, thy are shy and bold, then they will get laughed at, which is worth -$10. Finally, though, a person who knows they are going to be timid will have a shy demeanor, whereas a person who knows they are going to be bold will have a confident demeanor.
It may seem at first glance that it is better to be timid than to be bold, because timidness only affects the outcome if the interviewer is predisposed against the candidate, in which case it is better to be timid (and avoid being laughed at). However, if the candidate knows that they will reason like this (in the interview) then they will be shy before the interview, which will predispose the interviewer against them. By contrast, if the candidate precommits to being bold (in this simple setting) then the will get the job.
Someone reasoning using CDT might reason as follows when they're in the interview:
To people who reason like this, we suggest avoiding causal reasoning during the interview.
And, in fact, there are truckloads of self-help books dishing out similar advice. You can't reliably signal trustworthiness without actually being trustworthy. You can't reliably be charismatic without actually caring about people. You can't easily signal confidence without becoming confident. Someone who cannot represent these arguments may find that many of the benefits of trustworthiness, charisma, and confidence are unavailable to them.
Compare the advice above to our analysis of CDT in the mirror token trade, where we say "You can't keep your token while the opponent gives theirs away". CDT, which can't represent this argument, finds that the high payoff is unavailable to it. The analogy is exact: CDT fails to represent precisely this sort of reasoning, and yet this sort of reasoning is common and useful among humans.
6
That's not to say that CDT can't address these problems. A CDT agent that knows it's going to face the above interview would precommit to being bold — but this would involve using something besides causal counterfactual reasoning during the actual interview. And, in fact, this is precisely one of the arguments that I'm going to make in future posts: a sufficiently intelligent artificial system using CDT to reason about its choices would self-modify to stop using CDT to reason about its choices.
We've been talking about Newcomblike problems in a very human-centric setting for this post. Next post, we'll dive into the arguments about why an artificial agent (that doesn't share our vast suite of social signaling tools, and which lacks our shared humanity) may also expect to face Newcomblike problems and would therefore self-modify to stop using CDT.
This will lead us to more interesting questions, such as "what would it use?" (spoiler: we don't quite know yet) and "would it self-modify to fix all of CDT's flaws?" (spoiler: no).