Cambridge LW: Post-mortem
A brief post-mortem of this weeks LW meet-up.
Topic Selection
This week we had comparatively few people, so what worked this time may not generalise. At the beginning of the meet-up Douglas suggested we use one of his meetup formats. Of the three, skill focus was not usable without preparation, and we voted for small discussions above competitive estimating. After a second vote, we used Douglas' suggested method to decide a topic, rather than our method from last week.
This involved each writing down a suggested topic, revealing them all, and then after a fixed timer expired standing behind our preferred one with not further discussion. Using this method, we eventually agreed unanimously on first organising a LessWrong Punt trip, about which an e-mail should be sent to Cambridge, East Anglia and London LessWrongians, followed by a discussion of rational office politics, specifically whether and how it is worthwhile to attempt to achieve goals through exerting influence on large organisations.
Pre commitments
Once again all pre-commitments were kept, apparently with considerable difficulty on Adam's part. More pre-commitments were made this week, which will once again be posted here to slightly increase the social cost of breaking them.
* Ramana: Send out punting e-mail
* Ben: Post write-up of meeting and continue keeping diary
* Adam: Full note production on courses.
New prisoner's dilemma and chicken tournament
EDIT: As I recieved fewer than 8 entries for both tournaments, I will not be running them. My apologies to those who did enter.
After the success of prase's Iterated Prisoner's Dilemma tournament, a number of people expressed interest in some variations on the initial tournament, including one with a reputation system from one match to the next. I've been trying to teach myself to program over the past few months, so I thought I'd have a go at implementing it.
As well as the vanilla p.d. competition, I thought I'd run a second tournament based on the game of chicken, which is similar to p.d. but reverses the last two pay-offs, making it better (at least in the short term) to co-operate with a defector. This obviously creates a greater potential for bully strategies (although you can also bully in p.d.), so I thought it might have some interesting consequences with a reputation system.
Exact rules are as follows:
- Anybody who wants may enter up to one strategy in each competition (you needn't enter the same strategy in both). You may also enter only one contest if you don't want to enter both. Strategies should be entered by private message to me, and should have names. I will accept duplicates.
- Strategies should be described in English, I may need to ask for clarification. A strategy should be able to tell me whether your program co-operates or defects in any situation, and should be computable.
- There will be two tournaments, Prisoner's Dilemma and Chicken. Each tournament will divide up into matches, and each match into rounds.
- In a round each strategy chooses which to co-operate or defect. The pay-offs for Prisoner's Dilemma are the same as the previous tournament, 4 each if both co-operate, 1 each if both defect otherwise 7 for the defector and 0 for the co-operator. In Chicken the mutual defection pay-off is replaced with a 7 point penalty, the other pay-offs are unchanged.
- In order to make its decision, each strategy is allowed to know the entire history of this match, as well as any prior matches which included either it or its opponent. It does not have access to any other matches, to its opponents identity or to its opponents source code.
- Each match has its length determined by an exponential distribution with each round having a 1% chance of being the last. At the end of the tournament, scores will be adjusted to make up for some programs playing more rounds than others (specifically, your score will be multiplied by the average number of rounds played and divided by the number of rounds you played)
- The tournament will be done as a Round Robin, with matches performed serially so there will usually be previous results to judge by. If I have time I will also try an evolutionary variation.
- In addition to the strategies submitted by players, I will add an automatic co-operator, an automatic defector and a random strategy (with equal chance of co-operating or defecting on any round).
- Names of authors will be revealed by default, if you don't want yours revealed then ask.
- I have created a poll in the comments about where people want the results posted.
Deadline for entry is 21st September. If I get fewer than 8 I won't run it, if I get more than 30 I'll run it with the first 30, so don't delay.
EDIT: To clarify for any who were confused, each strategy has access to all of its own matches and all of its opponents matches, not just matches between itself and its opponent.
Rationality Quotes: April 2011
You all know the rules:
- Please post all quotes separately, so that they can be voted up/down separately. (If they are strongly related, reply to your own comments. If strongly ordered, then go ahead and post them together.)
- Do not quote yourself.
- Do not quote comments/posts on LW/OB.
- No more than 5 quotes per person per monthly thread, please.
I want to learn programming
I would like to learn programming but haven't been able to get started. Advice appreciated, both high-level (you should try learing language X) and low level (you can find a program that will run language X here), the latter has been a particular problem for me, I don't really know how this sort of thing works.
I am currently studying maths and physics, and I have a particular talent for the former, so I would probably do well with a language that plays to that strength. My only actual experience with programming was when my father let me play around with Jython a bit when I was about 13, I had some fun calculating prime numbers and approximating pi but never got any farther.
Thanks in advance for all suggestions.
Freaky unfairness
In Freaky Fairness the author discusses the problem of multi-player game theory problems where all players have access to each-other's source code. He gives an algorithm called Freaky Fairness, and makes the claim that 'all players run Freaky Fairness' is both a Strong Nash Equilibrium and a Pareto Optimum. Unfortunately, as far as I can tell, this claim is false.
For reference, the algorithm of Freaky Fairness works as follows:
-
Calculate the security values in mixed strategies for all subsets of players.
-
Divide all other players into two groups: those whose source code is an exact copy of Freaky Fairness (friends), and everyone else (enemies).
-
If there are no enemies: build a Shapley value from the computed security values of coalitions; play my part in the outcome that yields the highest total sum in the game; give up some of the result to others so that the resulting allocation agrees with the Shapley value.
- If there are enemies: play my part in the outcome that brings the total payoff of the coalition of all enemies down to their security value.
Now consider the following simple game for three players:
- Each player chooses one player to nominate.
- They can nominate themselves but they don't have to.
- If a player gets nominated by at least two players (possibly including themselves) then they win the game and receive $300
- If every player gets nominated exactly once then nobody gets anything.
Conventional competitive game theory notes that each player has a dominant strategy of nominating themselves, and so nobody gets anything. Since the game potentially offers an $300 prize there is room for improvement here. Lets see how Freaky Fairness does:
- The empty coalition and all coalitions with only one player have a security value of 0.
- All coalitions with at least two players have a security value of $300 (this is independent of whether you use the alpha method, the beta method or mixed strategies to calculate security values).
- The Shapley values work out at $100 to each player (this is trivial by symmetry but can be worked out the long way if you feel like it).
- If all players run Freaky Fairness then they will choose one of their number, nominate them and then that player will then share out the money evenly.
If any one player deviates then the other two will split the money between themselves to ensure that this one player gets nothing, so far so good. Unfortunately if two players deviate they is nothing the remaining Freaky Fairness player can do to ensure they don't each walk away with $150, which is more than they get by sticking to Freaky Fairness.
Why does the proof given in the original article fail here? The proof assumes that the Shapley method gives every coalition at least their security value, which unfortunately it doesn't manage in this case. This is not a problem that can be fixed by a better method of sharing, as it should be fairly obvious that there is no way to share a quantity of money between three people such that any two of them get all of it.
Freaky Fairness is still a very impressive algorithm. If we define a class of 'nice games' as games where there exists a way of sharing out the winnings such that every coalition receives at least its security value, then Freaky Fairness works on all these games so long as Shapley does (where Shapley fails you can easily modify FF to find a real solution). We now just need to consider the class of 'nasty games', such as the one above. (For those more familiar with co-operative game theory, nice games are those which have a non-empty core, nasty games are those which do not).
Note: I was quite surprised that none of the smart people in the comments section of the original article noticed this, which leads me to think it's quite possibly wrong. Any criticism is welcome.
The Fallacy of Dressing Like a Winner
Imagine you are a sprinter, and your one goal in life is to win the 100m sprint in the Olympics. Naturally, you watch the 100m sprint winners of the past in the hope that you can learn something from them, and it doesn't take you long to spot a pattern.
Every one of them can be seen wearing a gold medal around their neck. Not only is there a strong correlation, you also examine the rules of the olympics and find that 100% of winner must wear a gold medal at some point, there is no way that someone could win and never wear a gold medal. So you go out and buy a gold medal from a shop, put it around your neck, and sit back, satisfied.
For another example, imagine that you are now in charge of running a large oil rig. Unfortunately, some of the drilling equipment is old and rusty, and every few hours a siren goes off alerting the divers that they need to go down again and repair the damage. This is clearly not an acceptable state of affairs, so you start looking for solutions.
You think back to a few months ago, before things got this bad, and you remember how the siren barely ever went off at all. In fact, from you knowledge of how the equipment works, the there were no problems, the siren couldn't go off. Clearly the solution the problem is to unplug the siren.
(I would like to apologise in advance for my total ignorance of how oil rigs actually work, I just wanted an analogy)
Both these stories demonstrate a mistake which I call 'Dressing Like a Winner' (DLAW).
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)