Deleted earlier comment due to a bug in the code.
Here's the result of a naive brute force program that assumes a random distribution of opponents (i.e. any combo is equally likely), sorted by number of wins:
185: red/blue
269: red/red
397: yellow/blue
407: yellow/red
438: red/yellow
464: red/green
471: yellow/green
483: yellow/yellow
512: blue/yellow
528: green/green
539: green/red
561: green/blue
567: green/yellow
578: blue/red
635: blue/green
646: blue/blue
The program is here: http://pastie.org/1217024 (pipe through sort -n)
It performs 30 iterations of all 16 vs 16 matchups. Note that the player that attacks first has an advantage, so doing all 16 vs 16 balances that out (everyone is player 1 as often as he is player 2).
I signed up today to comment in this thread, so don't mock me too heavily. :)
Edit: Bumped iterations to 30 and hit points to 80,000 to try to smooth out randomness in the results.
This, and your much clearer second test, are useful, but only insofar that the weapons are chosen equally. Though, as some have found out, they clearly won't be. This would be more useful if you tested with the combinations that seem best [e.g. blue/blue, blue/green, green/green] and dropped the ones that no one who can run even some of the math would play [e.g. red/any]. Could you try that and see if it changes any of the results drastically?
Note: this image does not belong to me; I found it on 4chan. It presents an interesting exercise, though, so I'm posting it here for the enjoyment of the Less Wrong community.
For the sake of this thought experiment, assume that all characters have the same amount of HP, which is sufficiently large that random effects can be treated as being equal to their expected values. There are no NPC monsters, critical hits, or other mechanics; gameplay consists of two PCs getting into a duel, and fighting until one or the other loses. The winner is fully healed afterwards.
Which sword and armor combination do you choose, and why?