You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Douglas_Knight comments on Vegetarianism Ideological Turing Test Results - Less Wrong Discussion

21 Post author: Raelifin 14 October 2015 12:34AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (26)

You are viewing a single comment's thread. Show more comments above.

Comment author: gjm 14 October 2015 12:56:57PM *  6 points [-]

every single judge thought themselves decently able to discern genuine writing from fakery. The numbers suggest that every single judge was wrong.

I think the first of these claims is a little too pessimistic, and the second may be too.

Here are some comments made by one of the judges (full disclosure: it was me) at the time. "I found these very difficult [...] I had much the same problem [sc. that pretty much every entry felt >50% credible]. [...] almost all my estimates were 40%-60% [...] I fear that this one [...] is just too difficult." I'm pretty sure (though of course memory is deceptive) that I would not have said that I thought myself "decently able to discern genuine writing from fakery". ("Almost all" was too strong, though, if I've correctly guessed which row in the table is mine. Four of my estimates were 70%. One was 99% but that's OK because that was my own entry, which I recognized. The others were all 40-60%. Incidentally, I got two of my four 70% guesses right and two wrong, and four of my eight 40%/60% guesses right and four wrong.)

On the second, I remark that judge 14 (full disclosure: this was definitely not me) scored better than +450 and got only two of the 13 entries wrong. The probability of any given judge getting 11/13 or better by chance is about 1%. [EDITED to add: As Douglas_Knight points out, it would be better to say 10/12 because judge 14 guessed 50% for one entry.] In a sample of 53 people you'll get someone doing this well just by chance a little over half the time. But wait, the two wrong ones were both 60/40 judgements, and judge 14 had a bunch of 70s and 80s and one 90 as well, all of them correct. With judge 14's probability assignments and random actual results, simulation (I'm too lazy to do it analytically) says that as good a logarithmic score happens only about 0.3% of the time. To figure out exactly what that says about the overall results we'd need some kind of probabilistic model for how people assign their probabilities or something, and I'm way too lazy for that, but my feeling is that judge 14's results are good enough to suggest genuinely better-than-chance performance.

If anyone wants to own up to being judge 14, I'd be extremely interested to hear what they have to say about their mental processes while judging.

Comment author: Douglas_Knight 14 October 2015 05:14:47PM 1 point [-]

The judge in row 14 did not get 11/13, but 10/12, having punted on #8 by assigning 50%. This affects at least your first calculation.

Comment author: gjm 14 October 2015 10:10:13PM 0 points [-]

Good catch. But it's the second calculation that I find more interesting.

Comment author: Tem42 16 October 2015 01:23:13AM 0 points [-]

There is also a fair chance that that judge recognized at least one of their own entries... 9/11?