PhilGoetz comments on Anticipating critical transitions - Less Wrong

17 Post author: PhilGoetz 09 June 2013 04:28PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (52)

You are viewing a single comment's thread. Show more comments above.

Comment author: PhilGoetz 10 June 2013 07:36:08PM *  3 points [-]

What you say about random number generators is true, but it can't be the explanation, because there were nowhere near enough samples to get a run of 1674 heads in a row. I ran 5 tests with 1 million sequences of coin flips, so the longest run of heads we expect is 21 heads.

The sum would be near 8 even if the numbers were random. I chose that particular series to make that happen. The sum of the entire series diverges, but it diverges slowly enough that sometimes getting 24 heads in a row doesn't change the output a great deal, as you can see from this table showing # of heads, 1/p(getting that many heads in a row), and the resulting contribution to the sum.

1 2 2
2 4 4
3 8 6.66666666666667
4 16 10.6666666666667
5 32 17.0666666666667
6 64 27.7333333333333
7 128 46.0190476190476
8 256 78.0190476190476
9 512 134.907936507937
10 1024 237.307936507937
11 2048 423.489754689755
12 4096 764.823088023088
13 8192 1394.97693417693
14 16384 2565.26264846265
15 32768 4749.79598179598
16 65536 8845.79598179598
17 131072 16555.9136288548
18 262144 31119.4691844104
19 524288 58713.5744475683
20 1048576 111142.374447568
21 2097152 211006.755399949
22 4194304 401656.937218131
23 8388608 766379.024174653
24 16777216 1465429.69084132

Sorry the post originally didn't say there were 1 million trials per run. I had that in the code, but lost it while trying to reformat the code.

You'd have to take more than 5 million trials to see much different numbers. Here's all cases in a series of 240 runs of 1 million trials each where the average sum was greater than 9:

7: 12.3408045473115
19: 13.8475564640298
21: 17.1112080535515
22: 27.9980309134702
23: 11.2690660132211
31: 9.17316020399938
70: 9.71114553027328
74: 13.8303629749467
97: 9.9859069856619
115: 12.6414850499443
127: 11.0081608201826
138: 9.95738337454533
181: 12.7233012408077
230: 10.0657909667346
Comment author: CCC 11 June 2013 09:44:49AM 1 point [-]

Yes... at one million trials per run, you wouldn't expect much more than 20 flips in a run in any case. By my quick calculation, that should result in an average around 3.64 - with perhaps some variability due to a low-probability long string of, say, 30 heads turning up.

Yet you got an average around 8. This suggests that a long chain of heads may be turning up slightly more often than random chance would suggest; that your RNG may be slightly biased towards long sequences.

Comment author: benelliott 13 June 2013 11:20:04AM 0 points [-]

So, I wrote a similar program to Phil and got similar averages, here's a sample of 5 taken while I write this comment

8.2 6.9 7.7 8.0 7.1

These look pretty similar to the numbers he's getting. Like Phil, I also get occasional results that deviate far from the mean, much more than you'd expect to happen with and approximately normally distributed variable.

I also wrote a program to test your hypothesis about the sequences being too long, running the same number of trials and seeing what the longest string of heads is, the results are

19 22 18 25 23

Do these seem abnormal enough to explain the deviation, or is there a problem with your calculations?

Comment author: CCC 13 June 2013 07:06:59PM *  1 point [-]

It's not the highest that matters; it's the distribution within that range.

There was also a problem with my calculations, incidentally; a factor-of-two error, which is enough to explain most of the discreprency. What I did to calculate is, was to add up the harmonic sequence, up to around 24 (1+1/2+1/3+...+1/24), then doubling the last term (1+1/2+1/3+...+1/23 + 2/24). However, the code as given starts out with a 2, and then doubles the numerator with each added term; the calculation I should have used is (2+2/2+2/3+2/4+...+2/23+4/24). That leads to me expecting a value just a little over 7, which is pretty close.

...

I also ran a similar program. I copied and pasted Phil's, then modified it as slightly. My results were:

1 500523

2 250055

3 124852

4 62067

5 31209

6 15482

7 7802

8 4011

9 1978

10 1006

11 527

12 235

13 109

14 68

15 41

16 19

17 10

18 5

21 1

...where the left-hand column is the number of terms in a given sequence, and the right-hand column is the number of times that number of terms came up. Thus, there were 500523 runs of one term each; an insignificant distance from the expected number (500000). Most of the runs were very close to the expected value; interestingly, everything from 14 terms upwards for which there were any runs was above the expected number of runs, and often by a significant margin. The most significant is the single 21-term run; I expect to see 0.476 of those, and I see 1, slightly over twice the expectation. At 15 terms, I expected to see 30.517 runs; I saw 41 of those. At 17 terms, I expect to see 7.629 on average; I see 10 this time.

My final average sum is 7.25959229425851; a little higher than expected, but, now that I've corrected the factor-of-two error in my original calculation, not unexpectedly far off.

So most of the deviation is due to an error in my calculation. The rest is due to the fact that a 21-term or longer run turning up - which can easily happen - will probably pull the average sum up by 1 or more all by itself; it's easier for the average sum to be increased than decreased.