Prisoner's Dilemma Tournament Results

101 prase 06 September 2011 12:46AM

About two weeks ago I announced an open competition for LessWrong readers inspired by Robert Axelrod's famous tournaments. The competitors had to submit a strategy which would play an iterated prisoner's dilemma of fixed length: first in the round-robin tournament where the strategy plays a hundred-turn match against each of its competitors exactly once, and second in the evolutionary tournament where the strategies are randomly paired against each other and their gain is translated in number of their copies present in next generation; the strategy with the highest number of copies after generation 100 wins. More details about the rules were described in the announcement. This post summarises the results.

continue reading »

Prisoner's Dilemma as a Game Theory Laboratory

17 prase 25 August 2011 02:30PM

Last year Yvain had organised a Diplomacy game between LessWrong users to test how well we perform in practical application of game theory. At least two games had been played, but as far as I know no analysis was made afterwards. One reason is probably that few games involving complex interactions between players constitute at most anecdotal evidence for whatever hypothesis one may test. The second one is lack of comparison to outside players. Although the games were fun, their value as a game theory experiment remains rather low. Could we test our game theoretic skills in a statistically more significant way?

Only recently I have learned about Robert Axelrod's experiment in which he run a competition of different strategies playing iteraded prisoner's dilemma, and got an idea to replicate it. I have already run a similar experiment with five contestants (all being my friends) and now a second run is being prepared, with at least nine strategies in the pool. I am interested in a third run, this time with strategies nominated by LessWrongers. The contestants of the second run which has identical rules are readers of my blog and neither of them is probably familiar with specific LW ideas. Therefore, they would serve as a fairly good control group to test LW's applied rationality skills (or a subset of). After matching the strategies in both groups separately, I plan to put all of them together and see who wins.

So, if you want to participate in this contest, feel free to send me your strategy. The rules are following.

  1. By a strategy I mean a program sent by a contestant or coded according to his/her instructions. The strategies compete in iterated prisoner's dilemmas. A single iteration I will call a turn. In each turn each strategy has to choose between cooperating and defecting. The payoffs are:
    • if both cooperate, 4 points for each
    • if both defect, 1 point for each
    • else 7 points for the defector and 0 points for its cooperating opponent
  2. By a match I mean a series of 100 iterations between the same opponents.
  3. There will be two different competitions, the round-robin tournament and the evolutionary tournament. Two separate final standings will be made, one for each tournament. Any received strategy has to participate in both.
    • In the round-robin tournament strategies will play one match against each other (not against a copy of themselves). The winner will be the strategy which acquires the highest total number of points. Number of won or lost matches is disregarded.
    • The evolutionary tournament will simulate evolution of a population of strategies. In the beginning, equal number of copies of each strategy is present in the pool. Then the strategies are paired randomly (now a strategy may be paired against a copy of itself) and each pair plays one match. In the next generation pool, the strategies will be represented in numbers proportional to the total number of points won by all its copies in the present generation. The total population will be maintained at a constant level (probably 2,000 strategies, up to rounding errors). Strategies may go extinct. The strategy with the highest number of copies after the 100th generation will be considered a winner.
  4. A strategy has access to results of all previous turns in its current match and the number of current turn. It can decide randomly (a pseudo-random generator will be used). It does not have access to results of other matches (including its own previous matches), the number of current generation, population sizes, number of points won by any strategy in any phase of the tournament (except the number of points already won in the current match which can be calculated from the previous turns results) and its opponent's identity (namely it doesn't know whether it plays against a copy of itself). The strategies obviously can't read their opponent's source code.
  5. Each person can send only one strategy. Be honest.
  6. The strategy can be described in any comprehensible language. If I find problems in understanding, which will probably happen if the strategy is described in Lisp or Basque, but can happen even if it is written in English, I will ask.
  7. You should send your strategies by private message, not in comments to this post. Your opponents shouldn't know what you have prepared.
  8. The strategy needn't be original, but if I get two identical strategies, I will treat them as one.
  9. A Fully Random strategy is automatically included in the tournament. It plays defect or cooperate randomly with 50% chance each turn.
  10. Names of the authors of the strategies will be published by default. If you wish your name excluded, specify it in your message, and your strategy will compete anonymously.

The simulation will probably not be run before at least eight strategies are collected and before the beginning of September. The competition is closed, no new strategies are accepted at this moment. 21 different strategies were accepted, their implementations are now being tested. Results will be probably posted on Sunday 4th September.

[Edit: Found inconsistency in using words round and turn to denote the same thing. Now turn is used everywhere.]

Karma threshold for meetup organisers?

19 prase 23 June 2011 10:37AM

A spambot has apparently attacked the meetup section. Since the problems with bots in the discussion section have disappeared after minimal karma requirements were introduced, I suspect that similar limits were omitted for meetup organising. This should be easy to fix.

Meanings of Mathematical Truths

9 prase 05 June 2011 10:59PM

Related Sequence posts: Math is Subjunctively Objective, How to Convince Me That 2+2=3

Discussions whether mathematical theorems can be possibly disproved by observations have been relatively frequent on LessWrong. The last instance which motivated me to write this post was the discussion here. In fact these discussions are closely related to philosophical disputes about contingent and necessary truths. Many standard philosophical disputes are considered solved on LessWrong and there is usually some reference post which dissolves the issue. I don't know of any post that conclusively summarises this problem, although it doesn't seem particularly controversial.

To be concrete, let's take a single statement of arithmetic, say "5324 + 2326 = 7650". Most people will gladly agree that the statement is true. No matter how proficient the debaters are in intellectual sophistry (with the possible exception of postmodern philosophers), they will not dispute that 5324 + 2326 indeed equals to 7650. But the agreement is lost when it comes to the meaning of the discussed simple sentence. What does it refer to? Is it a necessary truth, or can it be empirically tested?

One opinion states that statements about addition refer to counting apples, pigs, megabytes of data or whatever stuff one needs to count. Sets of these objects, either material or abstract, are the referents of those statements. If I take two groups of apples which happen to contain 5324 and 2326 items and put them together, the compound group will consist of 7650 apples. It is not self-evident (as any illiterate tribesman from New Guinea would confirm) that the result will be 7650 and not, for example, 7494. Therefore each putting of apples together counts as a test of the respective proposition about arithmetic. Of course, the number of apples in the compound group must be determined by a method different from counting the two subgroups separately and then adding the numbers on the paper — that would defeat the idea of testing. Also, some people may say that "5324 + 2326" is a definition of "7650"; not that we regularly encounter such opinions when speaking about numbers of this magnitude, but that "1 + 1" is a definition of "2" I have heard countless times. So we have to be careful to create a dictionary which translates "1" as "S0" and "2" as "SS0" and "7650" as something which I am not going to write here, and describe counting of apples as adding one S for each apple in the bag. After that we may go through the ordeal of converting our bag of apples to a horribly long string of S's and finally look up our product in the dictionary - just to see whether the corresponding translation really reads "7650". And if done in this torturously lengthy way, I assume there would be at least a tiny amount of pleasant surprise if we really found "7650" and not "157", and not only because the possibility of an "experimental error". So it seems to be a legitimate empirical test.

Holders of the contrary opinion would certainly not deny that such tests can be arranged (they may dispute the "surprise", though). But their argument is: Even if we conducted the described experiment with apples, and repeatedly found that the result was 157 instead of 7650 (and suppose that the possible errors in counting or translation from "5324" to "SSS...S0" were ruled out), that has no bearing on the truth value of "5324 + 2326 = 7650" as a statement of arithmetic. It is imaginable that physical addition of apples followed some different rules, such that putting 5324 objects together with 2326 objects always yielded  a set of 157 objects — but that doesn't mean that "5324 + 2326 = 157". There would be an isomorphism, in such a hypothetical world, which maps that string to a true statement about apples, nevertheless there is no way how to make it a statement about arithmetic. It would be better to invent a new symbol for the abstract operation which emulates physical addition in that hypothetical world, or even better whole set of new symbols for all digits, to avoid confusion. One may rather say "%#@$ ¯ @#@^ = !%&" instead of "5324 + 2326 = 157". The former is a statement of a certain formal system X which models the modified apple addition and as such it is true, while the latter is false. No matter what apples do in that world, within arithmetic we can still formally prove that 5324 + 2326 is 7650, and nothing else. Even if the inhabitants of our strange hypothetical world called their formal counting system "arithmetic" instead of "X" and the existence of real arithmetic had never occured to them — even such a fact cannot change the universal truth that 5324 + 2326 = 7650.

Although there is hardly any disagreement about expected anticipations, the debates on this question seldom appear conclusive. The apparent disagreement is almost certainly caused by different interpretation of something. It is perhaps not much different from the iconic sound definition dispute or the disputes about morality. In contrast to the sound definition case, where the disagreement is about what a single word "sound" refers to, here the source of misunderstanding is more difficult to locate. On the first sight it may appear that the meaning of "arithmetic" is disputed. However more probably it is the phrase "5324 + 2326 = 7650" with all other statements of arithmetic which is interpreted in several distinct ways. Let's be more specific in what the proposition can mean:

  1. If I take 5324 objects and add another 2326 objects, I get 7650 objects. It holds for a broad range of object types and all reasonable senses of "adding together", therefore it is sensible to express the fact as a general abstract relation between numbers. By the way, we can create a formal system which allows us to deduce similar true propositions, and we call it "arithmetic".
  2. The string "5324 + 2326 = 7650" is a theorem in a formal system given by the following axioms and rules: (Here should stand the axioms and rules.) By the way, we call the system "arithmetic", and it happens to be a good model of counting objects.

(There might be a third interpretation, along the lines of the second one, but with less apparent arbitrariness of arithmetic: "any intelligence necessarily includes representation of a formal system isomorphic to arithmetic, independently of the properties of the external world". I didn't include that to the list, because it is either a very narrow constraint on the definition of intelligence or almost certainly false.)

Because arithmetic actually works (as far as we know) as a model of counting, the two interpretations are equally good and for all practical purposes indistinguishable. It is no surprise that our intuitions can't reliably distinguish between practically equivalent interpretations. Rather, two intuitions come into conflict. The first one tells us that arithmetic isn't arbitrary at all, and thus the second interpretation must be false. The second intuition is based on the self-consistence of mathematics: mathematics has its own ways how to decide between truth and falsity, and those ways never defer to the external world; therefore the first interpretation must be false. But once the meaning is spelled out in sufficient detail, the apparent conflict should disappear.

On Debates with Trolls

22 prase 12 April 2011 08:46AM

One of the best achievements of the LessWrong community is our high standard of discussion. More than anywhere else, people here are actively trying to interpret others charitatively, argue to the point, not use provocative or rude language, apologise for inadvertent offenses while not being overtly prone to take offense themselves, avoid their own biases and fallacies instead seeking them in others, and most importantly, find the truth instead of winning the argument. Maybe the greatest attribute of this approach is its infectivity - I have observed several newcomers to change their discussing habits for better in few weeks. However, not everybody is susceptible to the LW standards and our attitude produces somewhat bizarre results when confronted with genuine trolls.

Recent posts about epistemology1 have all generated large number of replies; in fact, the discussions were among the largest in the last few months. People have commented there (yes, I too am guilty) even if it was clear that the author of the posts doesn't actually react to our arguments. After he was rude and had admitted to do it on purpose. After commiting several fallacies, after generating an unreasonable amount of text of mediocre to low quality, after saying that he is neither trying to convince anyone nor he is willing to learn anything nor he aims for agreement. In short, perhaps all symptoms of trolling were present, and still, people were repeatedly patiently explaining what's wrong with the author's position. Which reaction is, I must admit, sort of amazing - but on the other hand, it is hard to deny that the whole discussion was detrimental to the quality of LW content and was mostly a waste of time.

So, here is the question: why didn't we apply the don't feed the troll meme, as would probably happen much sooner on most forums? I have several hypotheses on that.

1. We are unable to recognise trolls for lack of training. The first hypothesis is quite improbable, given that the concerned troll was downvoted to oblivion2, but still possible. There are not many trolls on LW and perhaps it is difficult to believe that someone is actively seeking that sort of confrontation. I have never understood the psychology of trolls - I try to avoid combative arguments instinctively and find it hard to imagine why somebody would intentionally try to create one. Perhaps a manifestation of the typical mind fallacy combines with compartmentalisation here: although we consciously know that there are trolls out there (as this is hard to ignore), when meeting one our instict tells us that the person cannot be so much different from us.

2. We are unwilling to deal with trolls. The second theory is that although we know that a person isn't sincere, we cherish our standards of discussion so strongly that we still try to respond kindly and maintain a civil debate, or at least one side of the debate. If it is the case, it is not automatically a bad policy. Our rationality is limited and we always operate under the threat of self-serving biases. Some quasi-deontological rule of kindness in debates, even if it is an overkill, may be useful in the same way presumption of innocence is useful in justice.

3. Sunken costs. Once the debate has started, our initial investments feel binding. It is unsettling to quit an argument admitting that it was completely useless and we have lost an hour of our life for nothing. Sunken costs fallacy is well known and widespread, there is no reason to expect we are immune.

4. Best rebuttal contest. An interesting fact is that not only the number of replies was fairly large, but also lot of replies were strongly upvoted. It leads me to suspect that those replies weren't in fact aimed at the opponent in the discussion, but rather intended to impress the fellow LessWrongers. Once the motivation is not "I want to convince my interlocutor" but rather "I can craft an extraordinarily elegant counter-argument which until now didn't appear", the attitude of the opponent doesn't matter. The debate becomes an exercise in arguing, a potentially useful practice maybe, but with many associated dangers.

5. Trollish arguments are fun. I include this possibility mainly for completeness since I don't much believe that significant number of LW users enjoy pointless arguments. But still, there is something fascinating in fallacious arguments. They are frustrating to follow, for sure, especially for a rationalist, but I cannot entirely leave out of consideration the appeal of seeing biases and fallacies in real life, as opposed to mere reading about them in a Kahneman and Tversky paper.

Whatever of the above hypotheses is correct, or even if none of them is correct, I don't doubt that on reflection most of us would prefer to have less irrational discussions. The karma system works somehow, but slowly, and cannot prevent the trollish discussions from gaining momentum if people continue in their present voting patterns. One of the problems lies in upvoting the rebuttals which gives additional motivation for people to participate. There seem to be two main strategies of voting: "I want to see more/less of this" and "this deserves more/less karma than it presently has". The first strategy seems marginally better for dealing with trolls, but both strategies should work better when applied in context. Even a brilliant reply should not be upvoted when placed in an irrational debate: first, it is mostly wasting of resources, and more, we certainly want to see less irrational debates. I don't endorse downvoting good replies, if only because the troll could interpret it as support for his cause. But leaving them on zero seems to be a correct policy.

 


1 I am not going to link to them because I don't want to generate more traffic there; one of those posts figures already on the 4th place when you Google lesswrong epistemology. Neither I write down the precise topic or the name of the author explicitly, which I hope decreases the probability of his appearing here.

2 In fact, the downvoting, even if massive, came relatively late, with the person in question being able to post on the main site after several days.

 

An Anchoring Experiment: Results

18 prase 03 April 2011 10:48PM

This post summarises the results of the experiment which tested how anchoring works on the LW audience. Here is the original post which describes the experiment in more detail. The experiment was supposed to decide between two ways of how anchoring may work. The first hypothesis is that the subject always starts from the anchor and continues in the direction of his/her unbiased estimate, but doesn't go far enough. The alternative hypothesis is that anchoring shifts the centre of the subject's probability distribution towards the anchor, and the whole distribution moves along.

To illustrate the difference, consider the first experimental question, which was about the population of the Central African Republic. The correct value (i.e. the estimate for 2009 listed on Wikipedia) is 4,422,000. The anchor which I have offered here was 20 million. Now, if the first hypothesis is true, the people who, in their unbiased state of mind, would guess less than 20 million, would slide down starting from the 20 million value and stop prematurely; their guesses will be attracted towards the anchor, but not across. The distribution of the biased guesses would be narrower and overall closer to 20 million than the unbiased distribution, but the probability of answering whatever number lower than 20 million would not be changed by anchoring. On the other hand, if the second hypothesis holds, the biased group should guess more than 20 million more often than the control group.

The actual results are such:

Group I (biased; 36 answers collected)

  • more than 20 million: 15 (41.7%)
  • less than 20 million: 21 (58.3%)

Group II (control; 16 answers collected)

  • more than 20 million: 3 (18.7%)
  • less than 20 million: 13 (81.3%)
  • 20 million: 1 (6.3%)

The second question asked for the altitude of the highest point in Sweden (2140 m / 6903 ft). The anchor was 3500 m or 11500 ft (there is about 5 m / 18 ft difference between the values, but I wanted both the metric and the imperial anchor to be round numbers). Here the results are:

Group I (biased; 24 answers collected)

  • more than 3500 m: 9 (37.5%)
  • less than 3500 m: 15 (62.5%)

Group II (control; 30 answers collected)

  • more than 3500 m: 5 (16.7%)
  • less than 3500 m: 13 (83.3%)

The results seem to favour the second hypothesis.

Some more remarks: The participants were expected to change the groups between both parts and the numbers should reflect that, however, six and eight answers are missing from the group II summaries. Few people (about 16% and 33% actually) thus refused to guess the concrete number although they voted in "greater/lower than anchor" questions. It may skew the results, although I don't see in which direction.

There were few weird answers, too. The altitude of the Sweden's highest summit was reported to be both 100 m and 5000 km. Those can be simply interpreted as statistical deviations from common sense* (or typos), however I started to doubt whether all participants were serious. (Which leads to a moral: If you intend to post a survey and be certain about its accuracy, don't do it on the 1st of April.)

Finally, I would like to thank all the commenters who had pointed out several technical problems with the test (such as the answers appearing in the "recent comments" bar).

*) The 100 m guess may even be reasonable: The summit of Yding Skovhøj, the highest point of extraordinarily flat Denmark, lies only 175 metres above the sea.

An Anchoring Experiment

12 prase 01 April 2011 02:19PM

The experiment is closed, for the results look here.

In recent discussion I have expressed an opinion that anchoring may, for some quantitative questions, cause the answer to lie further away from the correct value than the anchor itself. For concreteness, let's suppose that the correct value of a quantity Q is x, and the subject is asked whether Q is greater or lower than y, y > x. My hypothesis is that the anchor moves the subject's probability distribution up as a whole, including the part which already has been lying above y. Therefore the subjects will positively answer the question "Is Q > y ?" more often than their guess would exceed y if they were just asked to estimate the value of Q with no anchor given. One commenter apparently disagreed. I thought it may be interesting to resolve the disagreement experimentally. (More generally, I would like to see how well LW audience fights the standard biases, and if this experiment turns out successful - which means the number of respondents be greater than, say, five - I would think about posting more of this kind.)

How to participate:

The experiment has two parts.

First, toss a coin to decide whether you belong to the biased group I or the control group II for the first question. If you belong to the group I, look at a comment linked below, which will give you a question of form "is Q is greater or lower than y", where y is either significantly lower or significantly greater than the correct value of Q. The comment has a form of a typical LW poll. If you belong to the group II, look at different linked comment which asks "what is the value of Q", and then give your estimate in a subcomment there.

The second part is completely analogical to the first one, only with a different question. If you have participated in the first part within the group I, take part in the group II for the second part, and vice versa. Try to eliminate the irrelevant biases: switch on the anti-kibitzer before looking on the group I questions to avoid being influenced by the votes of others. Don't read the subcomments of the group II questions before writing down your own.

The hypothesis is that the percentage of the group I respondents answering incorrectly will be greater than the percentage of the group II respondents estimating on the incorrect side of the anchor.

First part: Question for the group I. Question for the group II.

Second part: Question for the group I. Question for the group II.