This, and your much clearer second test, are useful, but only insofar that the weapons are chosen equally. Though, as some have found out, they clearly won't be. This would be more useful if you tested with the combinations that seem best [e.g. blue/blue, blue/green, green/green] and dropped the ones that no one who can run even some of the math would play [e.g. red/any]. Could you try that and see if it changes any of the results drastically?
Agreed, re: the limitations of my method. As you suggested, I ran another pass using only the top 7 candidates (wins >= 19 in my previous comment). Here are the results:
3: blue/red
5: blue/green
7: blue/blue
7: green/green
7: green/red
9: green/blue
11: green/yellow
Choosing the top 10 (wins >= 17 from before):
7: blue/red
7: red/green
9: green/green
9: green/red
11: blue/blue
11: blue/green
11: blue/yellow
11: green/blue
11: yellow/yellow
13: green/yellow
Yellow/yellow pops up as a surprise member of the 5-way tie for second place. The green swo...
Note: this image does not belong to me; I found it on 4chan. It presents an interesting exercise, though, so I'm posting it here for the enjoyment of the Less Wrong community.
For the sake of this thought experiment, assume that all characters have the same amount of HP, which is sufficiently large that random effects can be treated as being equal to their expected values. There are no NPC monsters, critical hits, or other mechanics; gameplay consists of two PCs getting into a duel, and fighting until one or the other loses. The winner is fully healed afterwards.
Which sword and armor combination do you choose, and why?