I'm Screwtape, also known as Skyler. I'm an aspiring rationalist originally introduced to the community through HPMoR, and I stayed around because the writers here kept improving how I thought. I'm fond of the Rationality As A Martial Art metaphor, new mental tools to make my life better, and meeting people who are strange in ways I find familiar and comfortable. If you're ever in the Boston area, feel free to say hi.
Starting early in 2023, I'm the ACX Meetups Czar. You might also know me from the New York City Rationalist Megameetup, editing the Animorphs: The Reckoning podfic, or being that guy at meetups with a bright bandanna who gets really excited when people bring up indie tabletop roleplaying games.
I recognize that last description might fit more than one person.
but predicted that it was instead about sensitivity to subtle changes in the wording of questions.
If I try this again next year I'm inclined to keep the wording the same instead of trying to be subtle.
Regarding the dutch book numbers: it seems like, for each of the individual-question presentations of that data, you removed the outliers. When performing the dutch book calculations, however, it seems like you keep the outliers in.
Yep. Well, in the individual reports I reported the version with the outliers, and then sometimes did another pass without outliers. I kept all of the entries that answered all the questions for the dutch book calculations, even if they were outliers. I think this is the correct move: if someone's valuations are wild outliers from everyone elses but in a way that multiplies out and gets them back to a 1:1 ratio, then being an outlier isn't a problem.
(Imagine someone who values a laptop at one million bikes, and a bike at equal to one car, and a car at one millionth of a of a laptop. They're almost certainly a wild outlier, and I'm confused as heck, but they are consistent in their values!)
Hrm. I guess what would be helpful here would be a sense of the range; the average briers floated around .20 to .23, and I don't have a sense of whether that's a tight clustering with a bit of noise or a meaningful difference. To use running a mile as a comparison, differences of seconds mostly aren't important (except at high levels) but differences of minutes are, right?
If Other is larger than I expect, I think of that as a reason to try and figure out what the parts of Other are. Amusingly enough for the question, I'm optimistic about solving this by letting people do more free response and having an LLM sift through the responses.
Thank you! I felt quite clever setting it up.
Yeah, I should probably add a bit at the start or end of that section that everything in it is potentially selection effect. I don't know how to look at the thing I'm curious about without that.
Thinking out loud: If you get a random selection of people from the Pushup Club and count how many pushups they can do, then do the same for general population, the difference could be selection effect. People who like doing pushups are more likely to go to pushup club in the first place, and more likely to stick with it. But I can't realistically pay a bunch of Mechanical Turkers to hang out on LessWrong for six years and watch what happens. Presumably there's some approach actual scientists have here, but I don't know what it is. Suggestions welcome.
In the mean time I'm going to add a bit towards the start of the section warning of potential selection effects.
No, I think that's correct.
There's 107 people who answered above 200, 21 who answered exactly 200, and 113 people who answered below 200. The second quartile (aka the median) is 200. But nobody guessed a negative number, so the people who guessed low aren't pulling the mean down that much. Meanwhile 33 people guessed 1000 or higher, and they can yank the mean a lot without doing that much to the median. If you're asking people to generate numbers, you tend to get whole number quartiles because nobody guesses there's 100.5 stations.
Imagine a the set [1,1,1,2,2,2,2,100,100]. The average is ~23.444, but the median is 2.
Or have I misunderstood the thing that you think needs to be corrected?
Wouldn't that get rid of all of the table of contents?
Ideally I'd have a hierarchy of headings. I think what's happening is it picks up some (but not all) lines that are entirely bold, and treats those as a sort of Heading 4.
Future Survey Discussion thread
A Screwtape Point (and upvotes) to whoever can tell me how to fix the table of contents.
No, I think I'm actually just wrong here and River is correct. I don't know how I wound up with the clockwise rule in my head but I just checked the new driver's pamphlet and it's first to the intersection. Updated.