What you say about random number generators is true, but it can't be the explanation, because there were nowhere near enough samples to get a run of 1674 heads in a row. I ran 5 tests with 1 million sequences of coin flips, so the longest run of heads we expect is 21 heads.
The sum would be near 8 even if the numbers were random. I chose that particular series to make that happen. The sum of the entire series diverges, but it diverges slowly enough that sometimes getting 24 heads in a row doesn't change the output a great deal, as you can see from this table showing # of heads, 1/p(getting that many heads in a row), and the resulting contribution to the sum.
1 2 2
2 4 4
3 8 6.66666666666667
4 16 10.6666666666667
5 32 17.0666666666667
6 64 27.7333333333333
7 128 46.0190476190476
8 256 78.0190476190476
9 512 134.907936507937
10 1024 237.307936507937
11 2048 423.489754689755
12 4096 764.823088023088
13 8192 1394.97693417693
14 16384 2565.26264846265
15 32768 4749.79598179598
16 65536 8845.79598179598
17 131072 16555.9136288548
18 262144 31119.4691844104
19 524288 58713.5744475683
20 1048576 111142.374447568
21 2097152 211006.755399949
22 4194304 401656.937218131
23 8388608 766379.024174653
24 16777216 1465429.69084132
Sorry the post originally didn't say there were 1 million trials per run. I had that in the code, but lost it while trying to reformat the code.
You'd have to take more than 5 million trials to see much different numbers. Here's all cases in a series of 240 runs of 1 million trials each where the average sum was greater than 9:
7: 12.3408045473115
19: 13.8475564640298
21: 17.1112080535515
22: 27.9980309134702
23: 11.2690660132211
31: 9.17316020399938
70: 9.71114553027328
74: 13.8303629749467
97: 9.9859069856619
115: 12.6414850499443
127: 11.0081608201826
138: 9.95738337454533
181: 12.7233012408077
230: 10.0657909667346
Yes... at one million trials per run, you wouldn't expect much more than 20 flips in a run in any case. By my quick calculation, that should result in an average around 3.64 - with perhaps some variability due to a low-probability long string of, say, 30 heads turning up.
Yet you got an average around 8. This suggests that a long chain of heads may be turning up slightly more often than random chance would suggest; that your RNG may be slightly biased towards long sequences.
(Mathematicians may find this post painfully obvious.)
I read an interesting puzzle on Stephen Landsburg's blog that generated a lot of disagreement. Stephen offered to bet anyone $15,000 that the average results of a computer simulation, run 1 million times, would be close to his solution's prediction of the expected value.
Landsburg's solution is in fact correct. But the problem involves a probabilistic infinite series, a kind used often on less wrong in a context where one is offered some utility every time one flips a coin and it comes up heads, but loses everything if it ever comes up tails. Landsburg didn't justify the claim that a simulation could indicate the true expected outcome of this particular problem. Can we find similar-looking problems for which simulations give the wrong answer? Yes.
Here's Perl code to estimate by simulation the expected value of the series of terms 2^k / k from k = 1 to infinity, with a 50% chance of stopping after each term.
(If anyone knows how to enter a code block on this site, let me know. I used the "pre" tag, but the site stripped out my spaces anyway.)
Running it 5 times, we get the answers
ave sum=7.6035709716983
ave sum=8.47543819631431
ave sum=7.2618950097739
ave sum=8.26159741956747
ave sum=7.75774577340324
So the expected value is somewhere around 8?
No; the expected value is given by the sum of the harmonic series, which diverges, so it's infinite. Later terms in the series are exponentially larger, but exponentially less likely to appear.
Some of you are saying, "Of course the expected value of a divergent series can't be computed by simulation! Give me back my minute!" But many things we might simulate with computers, like the weather, the economy, or existential risk, are full of power law distributions that might not have a convergent expected value. People have observed before that this can cause problems for simulations (see The Black Swan). What I find interesting is that the output of the program above doesn't look like something inside it diverges. It looks almost normal. So you could run your simulation many times and believe that you had a grip on its expected outcome, yet be completely mistaken.
In real-life simulations (that sounds wrong, doesn't it?), there's often some system property that drifts slowly, and some critical value of that system property above which some distribution within the simulation diverges. Moving above that critical value doesn't suddenly change the output of the simulation in a way that gives an obvious warning. But the expected value of keeping that property below that critical value in the real-life system being simulated can be very high (or even infinite), with very little cost.
Is there a way to look at a simulation's outputs, and guess whether a particular property is near some such critical threshold? Better yet, is there a way to guess whether there exists some property in the system nearing some such threshold, even if you don't know what it is?
The October 19, 2012 issue of Science contains an article on just that question: "Anticipating critical transitions", Marten Scheffer et al., p. 344. It reviews 28 papers on systems and simulations, and lists about a dozen mathematical approaches used to estimate nearness to a critical point. These include:
So if you're modeling global warming, running your simulation a dozen times and averaging the results may be misleading. [1] Global temperature has sudden [2] dramatic transitions, and an exceptionally large and sudden one (15C in one million years) neatly spans the Earth's greatest extinction event so far on the Permian-Triassic boundary [3]. It's more important to figure out what the critical parameter is and where its critical point is than to try and estimate how many years it will be before Manhattan is underwater. The "expected rise in water level per year" may not be easily-answerable by simulation [4].
And if you're thinking about betting Stephen Landsburg $15,000 on the outcome of a simulation, make sure his series converges first. [5]
[1] Not that I'm particularly worried about global warming.
[2] Geologically sudden.
[3] Sun et al., "Lethally hot temperatures during the early Triassic greenhouse", Science 338 (Oct. 19 2012) p.366, see p. 368. Having just pointed out that an increase of .000015C/yr counts as a "sudden" global warming event, I feel obligated to also point out that the current increase is about .02C/yr.
[4] It will be answerable by simulation, since rise in water level can't be infinite. But you may need a lot more simulations than you think.
[5] Better yet, don't bet against Stephen Landsburg.