Top original authors by number of quotes. (Note that authors and mentions are not disambiguated.)
Top original authors by karma collected.
Top quote contributors by total karma score collected:
Top quote contributors by statistical significance level:
Top quote contributors by karma score collected in 2012:
Thank you for the data and collections of quotes! However, the link to the source code is pointing to the directory where you have the html files for the 2012 and 2009-2012 "Best of" collections, not to any .zip or .gz of the source code itself, and it seems to be pulling up a default page with the unstyled HTML version of the 2012 collection.
I removed the broken index.html, sorry. Now you can see the whole (messy) directory. The README is actually a list of commands with some comments, the source code consists of parse.py and convolution.py.
I tried some stuff in R. While it looks exponential, none of the code or fitting functions gave good results on the highest-karma quotes - I guess because all the other thousand quotes look so linear. Of course, I could have just messed up in any of the following:
Open http://people.mokk.bme.hu/~daniel/rationality_quotes_2012/rq.html in Firefox; C-a; then:
$ xclip -o | grep Permalink | grep points | cut -f 1 -d' ' | tr '\n' ','
$ R
R> karma <- sort(c(105,73,66,64,63,62,60,60,58,58,57,57,57,57,57,56,56,55,55,54,53,51,50,50,49,49,
48,48,48,47,47,46,46,45,45,44,44,44,43,43,43,43,43,43,43,43,43,42,42,41,41,41,
41,41,40,40,40,40,39,39,38,38,38,38,38,38,38,38,37,37,37,37,37,37,37,37,36,36,
36,36,36,36,36,35,35,35,35,35,34,34,34,34,34,34,34,34,34,34,34,34,34,34,33,33,
33,33,33,33,33,32,32,32,32,32,32,32,32,32,32,32,32,32,31,31,31,31,31,31,31,31,
31,31,31,31,31,30,30,30,30,30,30,30,30,30,30,30,29,29,29,29,29,29,29,29,29,29,
29,29,29,29,29,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,27,27,
27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,26,26,26,26,26,26,
26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,25,25,25,25,
25,25,25,25,25,25,25,25,25,25,25,25,25,25,24,24,24,24,24,24,24,24,24,24,24,24,
24,24,24,24,24,24,24,24,24,24,24,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,
23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,22,22,22,22,22,
22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,
22,22,22,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,
21,21,21,21,21,21,21,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,
20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,19,19,19,19,19,19,19,
19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,
19,19,19,19,19,19,19,19,19,19,19,19,18,18,18,18,18,18,18,18,18,18,18,18,18,18,
18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,
18,18,18,18,18,18,18,18,18,18,18,18,18,18,18,17,17,17,17,17,17,17,17,17,17,17,
17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,
17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,
17,17,17,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,
16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,
16,16,16,16,16,16,16,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,
15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,
15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,14,14,14,14,14,
14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,
14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,
14,14,14,14,14,14,14,14,14,14,14,14,14,13,13,13,13,13,13,13,13,13,13,13,13,13,
13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,
13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,
13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,12,12,12,12,12,12,
12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,
12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,
12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,
12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,
11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,
11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,
11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,
11,11,11,11,11,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10))
R> summary(karma)
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.0 12.0 17.0 19.5 23.0 105.0
R> n <- seq(length(karma))
R> temp <- data.frame(y = karma, x = n)
# first try, fitting a nonlinear model
R> plot(temp$x, temp$y)
R> mod <- nls(y ~ exp(a + b * x), data = temp, start = list(a = 0, b = 0))
R> lines(temp$x, predict(mod, list(x = temp$x))); mod
Nonlinear regression model
model: y ~ exp(a + b * x)
data: temp
a b
1.9094 0.0016
residual sum-of-squares: 17684
Number of iterations to convergence: 9
Achieved convergence tolerance: 8.9e-06
# second try, fitting a quadratic
R> lm(temp$y ~ temp$x + I(temp$x^2))
Call:
lm(formula = temp$y ~ temp$x + I(temp$x^2))
Coefficients:
(Intercept) temp$x I(temp$x^2)
1.33e+01 -1.91e-02 3.96e-05
# third try, log transform
R> exp(fitted(lm(log(temp$y) ~ temp$x)))
....
1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134
35.080 35.123 35.167 35.211 35.255 35.299 35.343 35.387 35.431 35.475 35.520 35.564 35.608 35.653
1135 1136 1137 1138 1139 1140
35.697 35.742 35.786 35.831 35.876 35.920
# fourth and final try, log variation
R> cc <- coef(lm(log(temp$y) ~ temp$x)); cc
(Intercept) temp$x
2.160310 0.001246
R> with(temp, fitted(nls(y ~ exp(a + b*x), start = list(a = cc[1], b = cc[2]))))
...
[1106] 39.594 39.657 39.721 39.784 39.848 39.912 39.976 40.040 40.104 40.168 40.232 40.297 40.361
[1119] 40.426 40.491 40.556 40.620 40.686 40.751 40.816 40.881 40.947 41.012 41.078 41.144 41.210
[1132] 41.276 41.342 41.408 41.474 41.541 41.607 41.674 41.740 41.807
attr(,"label")
[1] "Fitted values"
The extra data doesn't seem to make much difference:
R> karma <- read.table("http://people.mokk.bme.hu/~daniel/rationality_quotes_2012/scores")
R> karma <- sort(karma$V2)
R> summary(karma)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-8.0 4.0 8.0 10.7 15.0 105.0
...
Nonlinear regression model
model: y ~ exp(a + b * x)
data: temp
a b
-0.01088 0.00134
residual sum-of-squares: 22772
Number of iterations to convergence: 7
Achieved convergence tolerance: 3.59e-06
It is roughly exponential in the range between 3 and 60 karma.
Eyeballing it, looks like the previous fit crosses around 40.
R> karma <- karma[karma<40]
...
Nonlinear regression model
model: y ~ exp(a + b * x)
data: temp
a b
-0.01088 0.00134
residual sum-of-squares: 22772
Number of iterations to convergence: 7
Achieved convergence tolerance: 3.59e-06
The fit looks much better:
I am afraid I don't understand your methodology. How is a rank versus value function supposed to look like for an exponentially distributed sample?
When I stated that the middle is roughly exponential, this was the graph that I was looking at:
d <- density(karma)
plot(log(d$y) ~ d$x)
I don't do this for a living, so I am not sure at all, but if I really really had to make this formal, I would probably use maximum likelihood to fit an exponential distribution on the relevant interval, and then Kolmogorov-Smirnoff. It's what shminux said, except there is probably no closed formula because the cutoffs complicate the thing. And at least one of the cutoffs is really necessary, because below 3 it is obviously not exponential.
I finished creating the 2012 edition of the Best of Rationality Quotes collection. (Here is last year's.)
Best of Rationality Quotes 2012 (500kB page, 434 quotes)
and Best of Rationality Quotes 2009-2012 (1200kB page, 1140 quotes)
The page was built by a short script (source code here) from all the LW Rationality Quotes threads so far. (We had such a thread each month since April 2009.) The script collects all comments with karma score 10 or more, and sorts them by score. Replies are not collected, only top-level comments.
As is now usual, I provide various statistics and top-lists based on the data. (Source code for these is also at the above link, see the README.) I added these as comments to the post: