A solvable Newcomb-like problem - part 2 of 3
This is the second part of a three post sequence on a problem that is similar to Newcomb's problem but is posed in terms of probabilities and limited knowledge.
Part 1 - stating the problem
Part 2 - some mathematics
Part 3 - towards a solution
In game theory, a payoff matrix is a way of presenting the results of two players simultaneously picking options.
For example, in the Prisoner's Dilemma, Player A gets to choose between option A1 (Cooperate) and option A2 (Defect) while, at the same time Player B gets to choose between option B1 (Cooperate) and option B2 (Defect). Since years spent in prison are a negative outcome, we'll write them as negative numbers:

So, if you look at the bottom right hand corner, at the intersection of Player A defecting (A2) and Player B defecting (B2) we see that both players end up spending 4 years in prison. Whereas, looking at the bottom left we see that if A defects and B cooperates, then Player A ends up spending 0 years in prison and Player B ends up spending 5 years in prison.
Another familiar example we can present in this form is the game Rock-Paper-Scissors.
We could write it as a zero sum game, with a win being worth 1, a tie being worth 0 and a loss being worth -1:
But it doesn't change the mathematics if we give both players 2 points each round just for playing, so that a win becomes worth 3 points, a tie becomes worth 2 points and a loss becomes worth 1 point. (Think of it as two players in a game show being rewarded by the host, rather than the players making a direct bet with each other.)

If you are Player A, and you are playing against a Player B who always chooses option B1 (Rock), then your strategy is clear. You choose option A2 (Paper) each time. Over 10 rounds, you'd expect to end up with $30 compared to B's $10.
Let's imagine a slightly more sophisticated Player B, who always picks Rock in the first round, and then for all other rounds picks whatever would beat Player A's choice the previous round. This strategy would do well against someone who always picked the same option each round, but it is deterministic and, if we guess it correctly in advance, we can design a strategy that beats it every time. (In this case, picking Paper-Rock-Scissors then repeating back to Paper). In fact whatever strategy B comes up with, if that strategy is deterministic and we guess it in advance, then we end up with $30 and B ends up with $10.
What if B has a deterministic strategy that B picked in advance and doesn't change, but we don't know at the start of the first round what it is? In theory B might have picked any of the 3-to-the-power-of-10 deterministic strategies that are indistinguishable from each other over a 10 round duel but, in practice, humans tend to favour some strategies over others so, if you know humans and the game of Rock-Paper-Scissors better than Player B does, you have a better than even chance of guessing his pattern and coming out ahead in the later rounds of the duel.
But there's a danger to that. What if you have overestimated your comparative knowledge level and Player B uses your overconfidence to lure you into thinking you've cracked B's pattern, while really B is laying a trap, increasing the predictability of Player A's moves so Player B can then take advantage of that to work out which moves will trump them? This works better in a game like poker, where the stakes are not the same each round, but it is still possible in Rock-Paper-Scissors, and you can imagine variants of the game where the host varies payoff matrix by increasing the lose-tie-win rewards from 1,2,3 in the first round, to 2,4,6 in the second round, 3,6,9 in the third round, and so on.
This is why the safest strategy is to not to have a deterministic strategy but, instead, use a source of random bits to each round pick option 1 with a probability of 33%, option 2 with a probability of 33% or option 3 with a probability of 33% (modulo rounding). You might not get to take advantage of any predictability that becomes apparent in your opponents strategy, but neither can you be fooled into becoming predictable yourself.
On a side note, this still applies even when there is only one round, because unaided humans are not as good at coming up with random bits as they think they are. Someone who has observed many first time players will notice that first time players more often than not choose as their Rock as their 'random' first move, rather than Paper or Scissors. If such a person were confident that they were playing a first time player, they might therefore pick Paper as their first move more frequently than not. Things soon get very Sicilian (in the sense of the duel between Westley and Vizzini in the film The Princess Bride) after that, because a yet more sophisticated player who guessed their opponent would try this, could then pick Scissors. And so ad infinitum, with ever more implausible levels of discernment being required to react on the next level up.
We can imagine a tournament set up between 100 players taken randomly from the expertise distribution of game players, each player submitting a python program that always plays the same first move, and for each of the remaining 9 rounds produces a move determined solely by the the moves so far in that duel. The tournament organiser would then run every player's program once against the programs of each of the other 99 players, so on average each player would collect 99x10x2 = $1,980
We could make things more complex by allowing the programs to use, as an input, how much money their opponent has won so far during the tournament; or iterate over running the tournament several times, to give each player an 'expertise' rating which the program in the following tournament could then use. We could allow the tournament host to subtract from each player a sum of money depending upon the size of program that player submitted (and how much memory or cpu it used). We could give each player a limited ration of random bits, so when facing a player with a higher expertise rating they might splurge and make their move on all 10 rounds completely random, and when facing a player with a lower expertise they might conserve their supply by trying to 'out think' them.
There are various directions we could take this, but the one I want to look at here is what happens when you make the payoff matrix asymmetric. What happens if you make the game unfair, so not only does one player have more at stake than the other player, but the options are not even either, for example:

You still have the circular Rock-Paper-Scissors dynamic where:
If B chose B3, then A wants most to have chosen A1
If A chose A1, then B wants most to have chosen B2
If B chose B2, then A wants most to have chosen A3
If A chose A3, then B wants most to have chosen B1
If B chose B1, then A wants most to have chosen A2
If A chose A2, then B wants most to have chosen B3
so everything wins against at least one other option, and loses against at least one other option. However Player B is clearly now in a better position, because B wins ties, and B's wins (a 9, an 8 and a 7) tend to be larger than A's wins (a 9, a 6 and a 6).
What should Player A do? Is the optimal safe strategy still to pick each option with an equal weighting?
Well, it turns out the answer is: no, an equal weighting isn't the optimal response. Neither is just picking the same 'best' option each time. Instead what do you is pick your 'best' option a bit more frequently than an equal weighting would suggest, but not so much that the opponent can steal away that gain by reliably choosing the specific option that trumps yours. Rather than duplicate material already well presented on the web, I will point you at two lecture courses on game theory that explain how to calculate the exact probability to assign to each option:
- "ECON 159: Game Theory", from Open Yale Courses
- "Game Theory 101: The Complete Series", by William Spaniel
You do this by using the indifference theorem to arrive at a set of linear equations, which you can then solve to arrive at a mixed equilibrium where neither player increases their expected utility by altering the probability weightings they assign to their options.
- Example of calculating the general case for a 3x3 payoff matrix
- More complex example, drawn from poker
- Summary
The TL;DR; points to take away
If you are competing in what is effectively a simultaneous option choice game, with a being who you suspect may have an equal or higher expertise to you at the game, you can nullify their advantage by picking a strategy that, each round chooses randomly (using a weighting) between the available options.
Depending upon the details of the payoff matrix, there may be one option that it makes sense for you to pick most of the time but, unless that option is strictly better than all your other choices no matter what option your opponent picks, there is still utility to gain from occasionally picking the other options in order to keep your opponent on their toes.
Back to Part 1 - stating the problem
This is Part 2 - some mathematics
Next to Part 3 - towards a solution
A solvable Newcomb-like problem - part 1 of 3
This is the first part of a three post sequence on a problem that is similar to Newcomb's problem but is posed in terms of probabilities and limited knowledge.
Part 1 - stating the problem
Part 2 - some mathematics
Part 3 - towards a solution
Omega is an AI, living in a society of AIs, who wishes to enhance his reputation in that society for being successfully able to predict human actions. Given some exchange rate between money and reputation, you could think of that as a bet between him and another AI, let's call it Alpha. And since there is also a human involved, for the sake of clarity, to avoid using "you" all the time, I'm going to sometimes refer to the human using the name "Fred".
Omega tells Fred:
I'd like you to pick between two options, and I'm going to try to predict which option you're going to pick.
Option "one box" is to open only box A, and take any money inside it
Option "two box" is to open both box A and box B, and take any money inside them
but, before you pick your option, declare it, then open the box or boxes, there are three things you need to know.
Firstly, you need to know the terms of my bet with Alpha.
If Fred picks option "one box" then:
If box A contains $1,000,000 and box B contains $1,000 then Alpha pays Omega $1,000,000,000
If box A contains $0 and box B contains $1,000 then Omega pays Alpha $10,000,000,000
If anything else, then both Alpha and Omega pay Fred $1,000,000,000,000
If Fred picks option "two box" then:
If box A contains $1,000,000 and box B contains $1,000 then Omega pays Alpha $10,000,000,000
If box A contains $0 and box B contains $1,000 then Alpha pays Omega $1,000,000,000
If anything else, then both Alpha and Omega pay Fred $1,000,000,000,000
Secondly, you should know that I've already placed all the money in the boxes that I'm going to, and I can't change the contents of the boxes between now and when you do the opening, because Alpha is monitoring everything. I've already made my prediction, using a model I've constructed of your likely reactions based upon your past actions.
You can use any method you like to choose between the two options, short of contacting another AI, but be warned that if my model predicted that you'll use a method which introduces too large a random element (such as tossing a coin) then, while I may lose my bet with Alpha, I'll certainly have made sure you won't win the $1,000,000. Similarly, if my model predicted that you'd make an outside bet with another human (let's call him George) to alter the value of winning $1,001,000 from me I'd have also taken that into account. (I say "human" by the way, because my bet with Alpha is about my ability to predict humans so if you contact another AI, such as trying to lay a side bet with Alpha to skim some of his winnings, that invalidates not only my game with you, but also my bet with Alpha and there are no winning to skim.)
And, third and finally, you need to know my track record in previous similar situations.
I've played this game 3,924 times over the past 100 years (ie since the game started), with humans picked at random from the full variety of the population. The outcomes were:
3000 times players picked option "one box" and walked away with $1,000,000
900 times players picked option "two box" and walked away with $1,000
24 times players flipped a coin and or were otherwise too random. Of those players:
12 players picked option "one box" and walked away with $0
12 players picked option "two box" and walked away with $1,000
Never has anyone ever ended up walking away with $1,001,000 by picking option "two box".
Omega stops talking. You are standing in a room containing two boxes, labelled "A" and "B", which are both currently closed. Everything Omega said matches what you expected him to say, as the conditions of the game are always the same and are well known - you've talked with other human players (who confirmed it is legit) and listened to their advice. You've not contacted any AIs, though you have read the published statement from Alpha that also confirms the terms of the bet and details of the monitoring. You've not made any bets with other humans, even though your dad did offer to bet you a bottle of whiskey that you'd be one of them too smart alecky fools who walked away with only $1,000. You responded by pre-committing to keep any winnings you make between you and your banker, and to never let him know.
The only relevant physical object you've brought along is a radioactive decay based random number generator, that Omega would have been unable to predict the result of in advance, just in case you decide to use it as a factor in your choice. It isn't a coin, giving only a 50% chance of "one box" and a 50% chance of "two box". You can set arbitrary odds (tell it to generate a random integer between 0 and any positive integer you give it, up to 10 to the power of 100). Omega said in his spiel the phrase "too large a random element" but didn't specify where that boundary was.
What do you do? Or, given that such a situation doesn't exist yet, and we're talking about a Fred in a possible future, what advice would you give to Fred on how to choose, were he to ever end up in such a situation?
Pick "one box"? Pick "two box"? Or pick randomly between those two choices and, if so, at what odds?
And why?
Part 1 - stating the problem
next Part 2 - some mathematics
Part 3 - towards a solution
How minimal is our intelligence?
Gwern suggested that, if it were possible for civilization to have developed when our species had a lower IQ, then we'd still be dealing with the same problems, but we'd have a lower IQ with which to tackle them. Or, to put it another way, it is unsurprising that living in a civilization has posed problems that our species finds difficult to tackle, because if we were capable of solving such problems easily, we'd probably also have been capable of developing civilization earlier than we did.
How true is that?
In this post I plan to look in detail at the origins of civilization with an eye to considering how much the timing of it did depend directly upon the IQ of our species, rather than upon other factors.
Although we don't have precise IQ test numbers for our immediate ancestral species, the fossil record is good enough to give us a clear idea of how brain size has changed over time:

and we do have archaeological evidence of approximately when various technologies (such as pictograms, or using fire to cook meat) became common.
Conformity
A rather good 10 minute YouTube video presenting the results of several papers relevant to how conformity affects our thinking:
http://www.youtube.com/watch?v=TrNIuFrso8I
The papers mentioned are:
Sherif, M. (1935). A study of some social factors in perception. Archives of Psychology, 27(187), pp.17-22.
Asch, S.E. (1951). Effects of group pressure upon the modification and distortion of judgment. In H. Guetzkow (ed.) Groups, leadership and men. Pittsburgh, PA: Carnegie Press.
Asch, S.E. (1955). Opinions and social pressure. Scientific American, 193(5), pp.31-35.
Berns, G.S., Chappelow, J., Zink, C.F., Pagnoni, G., Martin-Skurski, M.E., and Richards, J. (2005) 'Neurobiological Correlates of Social Conformity and Independence During Mental Rotation' Biological Psychiatry, 58(3), pp.245-253.
Weaver, K., Garcia, S.M., Schwarz, N., & Miller, D.T. (2007) Inferring the popularity of an opinion from its familiarity: A repetitive voice can sound like a chorus. Journal of Personality and Social Psychology, 92(5), 821-833.
What techniques do other posters, here on LessWrong, use to monitor and counter these effects in their lives?
The video also lists some of the advantages to a society of having a certain amount of this effect in place. Does anyone here conform too little?
Meetup : Cambridge UK Weekly Meeting
Discussion article for the meetup : Cambridge UK Weekly Meeting
See: http://wiki.lesswrong.com/wiki/Less_Wrong_meetup_groups#Cambridge.2C_UK
(It is 11 am, by the local time)
Discussion article for the meetup : Cambridge UK Weekly Meeting
[Book Review] "The Signal and the Noise: Why So Many Predictions Fail—But Some Don’t.", by Nate Silver
Here's a link to a review, by The Economist, of a book about prediction, some of the common ways in which people make mistakes and some of the methods by which they could improve:
Looking ahead : How to look ahead—and get it right
One paragraph from that review:
A guiding light for Mr Silver is Thomas Bayes, an 18th-century English churchman and pioneer of probability theory. Uncertainty and subjectivity are inevitable, says Mr Silver. People should not get hung up on this, and instead think about the future the way gamblers do: “as speckles of probability”. In one surprising chapter, poker, a game from which Mr Silver once earned a living, emerges as a powerful teacher of the virtues of humility and patience.
How to deal with someone in a LessWrong meeting being creepy
One of the lessons highlighted in the thread "Less Wrong NYC: Case Study of a Successful Rationalist Chapter" is Gender ratio matters.
There have recently been a number of articles addressing one social skills issue that might be affecting this, from the perspective of a geeky/sciencefiction community with similar attributes to LessWrong, and I want to link to these, not just so the people potentially causing problems get to read them, but also so everyone else knows the resource is there and has a name for the problem, which may facilitate wider discussion and make it easier for others to know when to point towards the resources those who would benefit by them.
However before I do, in the light of RedRobot's comment in the "Of Gender and Rationality" thread, I'd like to echo a sentiment from one of the articles, that people exhibiting this behaviour may be of any gender and may victimise upon any gender. And so, while it may be correlated with a particular gender, it is the behaviour that should be focused upon, and turning this thread into bashing of one gender (or defensiveness against perceived bashing) would be unhelpful.
Ok, disclaimers out of the way, here are the links:
- An Incomplete Guide to Not Creeping
- Don’t Be A Creeper
- How to not be creepy
- My friend group has a case of the Creepy Dude. How do we clear that up?
- The C-Word
Some of those raise deeper issues about rape culture and audience as enabler, but the TLDR summary is:
- Creepy behaviour is behaviour that tends to make others feel unsafe or uncomfortable.
- If a significant fraction of a group find your behaviour creepy, the responsibility to change the behaviour is yours.
- There are specific objective behaviours listed in the articles (for example, to do with touching, sexual jokes and following people) that even someone 'bad' at social skills can learn to avoid doing.
- If someone is informed that their behaviour is creeping people out, and yet they don't take steps to avoid doing these behaviours, that is a serious problem for the group as a whole, and it needs to be treated seriously and be seen to be treated seriously, especially by the 'audience' who are not being victimised directly.
EDITED TO ADD:
Despite the way some of the links are framed as being addressed to creepers, this post is aimed at least as much at the community as a whole, intended to trigger a discussion on how the community should best go about handling such a problem once identified, with the TLDR being "set of restraints to place on someone who is burning the commons", rather that a complete description that guarantees that anyone who doesn't meet it isn't creepy. (Thank you to jsteinhardt for clearly verbalising the misinterpretation - for discussion see his reply to this post)
Meetup : Punt Trip
Discussion article for the meetup : Punt Trip
Come join the Cambridge LessWrong group for a free punt trip up the beautiful river Cam. A Smörgåsbord of discussion, fun, food and learning to punt.
Meet at the Great Gate of Trinity College, CB2 1TQ, at 11:50 for 12 noon departure. For those coming in by car, the nearest multistory carpark to Trinity is on Park Street, CB5 8AS
There's no need to book, but it will help us judge numbers if you send an email to cambridgelesswrong@googlegroups.com if you think there's more than a 25% chance that you'll be coming.
Discussion article for the meetup : Punt Trip
Global Workspace Theory
Much research has been done upon visual perception. Humans have the illusion that they are directly aware of everything in their 'field of view', but it turns out that they actually navigate not through reality, but through a model of reality that their brain stitches together mainly from the bits the eye is directly looking at as it darts about, with the rest supplied by interpolation based upon expectations.
For more info, read The Illusion of Continuity: Active Perception and the Classical Editing System, by Berliner and Cohen.
Global Workspace Theory is the idea that our awareness of our own thought process works the same way. We have the illusion of an unbroken stream of consciousness, but what we're actually referencing is a model of what the brain thinks it has been consciously thinking about, that is stitched together from brief fragments, the way a spotlight in a theatre might move about shining on different parts of a stage, revealing actors making speeches and interacting with each other. Even when the spotlight moves on, the actors, stage hands and directors remain and keep working. When the spotlight returns, to catch a later part of the drama in that area, we interpolate what the actors would have been doing while we were paying attention elsewhere.
Meetup Formats
[ This post is todo with Cambridge_UK meetups, and is probably of no interest to others. ]
At the meetup on 29th April, 2012, it was suggested that members of the meetup post a variety of meetup formats, so we can try them out systematically, and compare format to outcome. Because outcome depends upon not only format, but also the specific participants and the topic under discussion, it may take using a format more than once (or possibly having it used by a different group) to get a reasonable idea of how reliably a format contributes towards producing positive or negative outcomes.
I volunteered to kick the thread off with a couple of suggestions. I hope people will add other proposals in the comments. Deciding which formats to try when, and matching them to suitable topics, is probably best left to the cambridge mailing list.
Numbers seem to vary between 4 and 12 participants, but feel free to propose formats that don't handle that entire range. If too few people turn up, we can postpone a format to another time (or take it as a vote upon the popularity of the format, if the format for that meeting was announced in advance).
The timeslot is 11am to 12:30, but people should also feel free to propose formats for shorter periods of time.
Proposed Format : small group discussions
Required equipment : A4 paper, pens, countdown timer
Required time : 90 minutes
11:00 start a 10 minutes countdown timer
People arrive, chat, say what is on their mind and, most importantly, write down on paper (one topic per sheet) things they would be interested in discussing. Once the timer goes off, topic sheets may not be altered or created.
11:10 timer goes off
Spread the sheets around the room with a proposer by each sheet (any sheets for which no proposer volunteers get discarded at this stage). Read each topic out aloud, with no discussion/clarification/objections, clockwise from the position of the timer.
Set a one minute countdown. People stand behind the proposer of the topic they most want to discuss, in a queue. If a topic gets 5 or more people, the front 5 take a table and start talking. Set the timer for one minute again, and re-form behind the remaining available topics. Any turn no topic reaches 5 people, junk the smallest topic that has fewer than 3 people. Repeat until the room is divided into groups of 3-5 people discussing different topics. This process should require no discussion, or any talking beyond asking for reminders of the wording of a topic. When a table sits to talk, they are free to interpret the written topic how they like, or even wander completely off it.
11:55 timer goes off to remind people to take a 5 minute pause to put down feedback
On the back of the A4 topic discussion sheet (or on a pre-printed sheet, if anyone is that organised), have a column for the categories:
- I learned something I think will improve my own ability to think rationally
- I think the discussion came up with something that could usefully be posted to LessWrong
- There is an action I commit to taking
And have a row for each participant to put a tick or cross in each column.
12:00 discussion groups either continue, or break up, move about, etc. - unstructured time.
12:20 timer goes off for the last time
People join back into a single discussion. One person from each initial group gives a 1 minute summary saying what the topic was and if the participants generally felt it helped rationality either personally and/or generally. (The aim behind this is that other meeting groups, or even later mettings of the same group with different participants, may want to copy topics that worked well.) Circulate a commit sheet with columns "WHO", "WHEN" and "WHAT", so people can list actions they plan to take (and when they will take them by).
12:30 meeting ends - ajourn for breakfast
Proposed Format : skill focus
Agree online, at least 2 weeks in advance, a particular skill to focus upon, that helps achieve rational thinking (perhaps linked to a specific cognitive bias), and agree a volunteer who will kick-start the meeting.
The volunteer picks 10 minutes worth of material (eg a sequence entry) for everyone to have read, and the week before circulates it to the cambridge mailing list and prints out copies to take to the previous meeting for those who don't read the list, so everyone will know the format and what they're getting into if they turn up.
11:00 people arrive, read the material if they have not already done so, and the agenda for the meeting.
11:08 everyone participating goes upstairs to the separate space, anyone who doesn't want to (or arrives late) stays downstairs.
11:10 starter activity, a game or quiz or some sort (eg 3 rounds of prisoners dilemma, or one of the economic gambling probability decision things - whatever fits the theme and gets people moving and participating).
The rest of the time as specified by the volunteer for this particular skill focus, but probably including a general section (understanding the problem), a training section (practicing the skill), a 'share our real life experiences of this with each other' section, and a 'now apply the skill to my own life and commit to a plan' section.
Proposed Format : competitive planning or estimating
Planning version:
Go around the circle numbering off 1, 2, 1, 2, etc. to form two random groups.
Each group has 10 minutes to discuss how best to split 60 minutes up in order to come up with the best plan in that time.
Each group then spends 60 minutes planning something. The something might be "a summer punt trip", it might be "a freshers fair stall", it might be "a spreadsheet that people can use to make a rational calculation of whether it is worth their while spending time reading LessWrong." - the thing is decided in advance, and the same thing is planned by both teams.
In the final 10 minutes, everyone joins back together again, spends a few minutes presenting their designs, then discussing how well they actually spent the 60 minutes, and how they'd split it differently if they were doing the same thing again.
Estimate version: (equipment needed - pack of cards, or similar)
Split into two groups, as above, plus one person who will set three challenges. The challenge setter goes off to look up some facts (eg the number of tons of wheat grown by China in 2010), while each team spends 10 minutes discussing how they will make an estimate.
Cards are then dealt out - the person with the queen of hearts is the defector who is secretly working for the opposing team
Each group then gets 20 minutes to make the best estimate they can PLUS a sum of 'money' they are wagering on being closer to the true answer than the other team. The team must wager a total of 100 'money' spread over the three challenges. The estimate and the bet are written down and revealed simultaneously.
After the end of the three challenges, reveal the two defectors, who move over to stand with their true group, then calculate which team won the most with their wagers.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)