I've seen various contenders for the title of simplest abstract game that's interesting enough that a professional community could reasonably play it full time. While Go probably has the best ratio of interest to complexity, Checkers and Dots and Boxes might be simpler while remaining sufficiently interesting. [1] But is Checkers actually simpler than Go? If so, how much? How would we decide this?
Initially you might approach this by writing out rules. There's an elegant set for Go and I wrote some for Checkers, but English is a very flexible language. Perhaps my rules are underspecified? Perhaps they're overly verbose? It's hard to say.
A more objective test is to write a computer program that implements the rules. It needs to determine whether moves are valid, and identify a winner. The shorter the computer program, the simpler the rules of the game. This only gives you an upper bound on the complexity, because someone could come along and write a shorter one, but in general we expect that shorter programs imply shorter possible programs.
To investigate this, I wrote ones for each of the three games. I wrote them quickly, and they're kind of terse, but they represent the rules as efficiently as I could figure out. The one for Go is based off Tromp's definition of the rules while the other two implement the rules as they are in my head. This probably gives an advantage to Go because those rules had a lot of care go into them, but I'm not sure how much of one.
The programs as written have some excess information, such as comments, vaguely friendly error messages, whitespace, and meaningful variable names. I took a jscompiler-like pass over them to remove as much of this as possible, and making them nearly unreadable in the process. Then I ran them through a lossless compressor, gzip, and computed their sizes:
- Checkers: 648 bytes
- Dots and Boxes: 505 bytes
- Go: 596 bytes
(The programs are on github. If you have suggestions for simplifying them further, send me a pull request.)
[1] Go is the most interesting of the three, and has stood up to centuries of analysis and play, but Dots and Boxes is surprisingly complex (pdf) and there used to be professional Checkers players. (I'm having a remarkably hard time determining if there are still Checkers professionals.)
I also posted this on my blog.
I interpret Daniel_Burfoot's idea as: "import java.util.*" makes subsequent mentions of List longer, since there are more symbols in scope that it has to be distinguished from.
But I don't think that idea actually works. You can decompose the probability of a conjunction into a product of conditional probabilities, and you get the same number regardless of the order of said decomposition. Whatever probability (and corresponding total compressed size) you assign to a certain sequence of imports and symbols, you could just as well record the symbols first and then the imports. In which case by the time you get to the imports you already know that the program didn't invoke any utils other than List and Map, so even being able to represent the distinction between the two forms of import is counterproductive for compression.
The intuition of "there are more symbols in scope that it has to be distinguished from" goes wrong because it fails to account for updating a probability distribution over what symbol comes next based on what symbols you've seen. Such an update can include knowledge of which symbols come in the same header, if that's correlated with which symbols are likely to appear in the same program.
You're totally right about the fact that the compressor could change the order of the raw text in the encoding format, and could infer the import list from the main code body, and encode class name references based on the number of previous occurrences of the class name in the code body. It's not clear to me a priori that this will actually give better compression in practice, but it's possible.
But even if that's true, the main point still holds: limiting the number of external classes you use allows the compressor to same bits.