So you decided to ignore this comment.
I'm glad in retrospect that I didn't fill out your survey, precisely because I expected this to happen. We can chalk up another unethically run study done on LWers.
Also:
One text response included identifying information, which I removed in the public version of the data. If you participated and there is any information you provided that you would like removed from the public version, PLEASE tell me as soon as possible and I will remove it.
Once something is published on the Internet, it practically cannot be retracted. The spreadsheet already exists in the caches of multiple browsers. This is not the proper way to publish study data.
I don't want to marginalise your concerns, but can you give an example how exactly the data included in this survey could (with non-negligible probability) be abused?
For instance, forcibly "outing" an atheist in an unfriendly community.
Perhaps more importantly: if people do not have the presumption of anonymity when they fill out a survey, then survey results in general become more unreliable. There's been plenty of work done on how self-reports change when the people filling them out know they'll be viewed by people in their out-group and/or in-group.
If your concern is that there is too little data in the survey to identify individual members, I would direct you to gwern's essay on The Tragedy of Light.
Completely off the topic of the thread, but I take issue with an essay which claims that Death Note is essentially a thought experiment for "given the perfect murder weapon, how can you screw up anyway?" L caught Light, not because he was making appropriate Bayesian updates on the status of the murderer, but because he was being handed information by authorial fiat. In multiple steps he narrows the hypothesis space far more than the observations available to him permit.
When your hypothesis space is broad enough to include the possibility of supernatural action-at-a-distance fate manipulating murder weapons, it's broad enough to include possible culprits which aren't even human beings. L was jumping the gun even by concluding that he had a pool of 7 billion to narrow down from.
L is justified in assuming humans by the decision-theoretic consequences: he likely can do nothing against supernatural entities (and IIRC, even in the extremely difficult scenario of killing a shinigami, that doesn't stop the killings with Death Notes), so proceeding on the assumption that it is a human is better than not proceeding.
Besides that, I don't think L is 'hax'. (Near and Mikami, on the other hand, is a major example of authorial fiat and the part of Death Note I hate the most.)
There are alternatives to a human killer which would provide some opportunity to make headway, which do not have priors that are obviously lower than a human with a supernatural weapon, such as extraterrestrials or some sort of supernatural creature which is humanly beatable.
The first point where I got really pissed off though was when L jumps all the way to "the killer must know the victim's real name" based on the murder of Lind L. Tailor. Lind L. Tailor was a convicted criminal, and L. wasn't, and killing criminals was already Kira's suspected modus operandi. It was not just possible, but probable, that Kira wouldn't react to a non-criminal's threat to apprehend him (it could have been against protocol, against his/her/it's moral code, unnecessary because Kira is completely unassailable, rejected as unnecessary because Kira is confident enough to think he/she/it is unassailable, etc,) even if doing so was entirely within Kira's abilities. And even if we take for granted that Kira would want to kill L, and assume that Kira has magic action-at-a-distance murder powers, but not magic action-at-a-distance information gathering powers, then whether the victim's name is known or not is just one variable that's flipped between L and Lind L. Tailor. L could just as well have been impervious because he eats too many sweets.
If your concern is that there is too little data in the survey to identify individual members,
Yes, I suspect there is too little data in the survey to identify individual members with enough certainty and reasonable effort. I can't contrive a concrete scenario where an atheist would be identified this way.
I would direct you to gwern's essay on The Tragedy of Light.
Can you please provide some context? I have no idea what the essay is about.
Can you please provide some context? I have no idea what the essay is about.
I can't imagine doing better than gwern's preface.
I'm sorry, I didn't see that comment until now.
I did think about this, and decided that there was no real expectation of privacy in the survey. I'm only some random person on the Internet, after all; why would you give any information to me if you didn't want other random people on the Internet to know it? I thought the text data might be valuable to those who wanted to look at the survey results, and none of the information seemed too embarrassing or identifying, except for the bit I removed.
That said, you have a point; it may have been better to be conservative and not post the text data at all.
Perhaps it would be helpful if a few people who did take the survey weighed in on how they feel about it.
I am not worried about my own data, because it's been out there on the Intertubes since before your survey, because I published it. That said, the choice whether to release personally identifying information should be mine, and mine alone, not yours. I am somewhat unpleasantly surprised that you chose to publish people's personal data without their explicit consent.
I'm only some random person on the Internet, after all; why would you give any information to me if you didn't want other random people on the Internet to know it?
Because, to a very, very rough first approximation, the risk of someone doing something nasty with your personal data increases proportionally to the number of people that you give access to it.
Comparing the US (n=56) with the rest of the world (n=38), pretty much the only difference that I could find in the data was on the questions about religion while growing up, where the Americans had more issues.
One question was statistically significant on its own: Americans were less likely to agree that "My parents were in favor of atheistic or skeptic views" (p < .01). There was also a trend (p < .20) for Americans to have more issues on the related questions "My parents did not care what my religious views were" (disagree), "I felt pressure to conform to the religious views of my parents while growing up", and "I felt religiously excluded by my community while growing up".
On current connectedness questions, the only question with any trend (p < .20) was "I would expect most people in my local community to judge me silently if I express controversial views", with Americans tending to expect slightly more silent judging (p=.10). More on this question below.
I tried breaking down "the rest of the world" (n=38) into smaller groups, like continental Europe (n=18) vs. the English-speaking world (n=19), but couldn't find any meaningful differences with the small sample size.
Within the US, I broke things down into those who currently live in California (n=13), other blue states (n=21: CT, DC, Illinois, Massachusetts, New Jersey, New York, Oregon, Pennsylvania, Rhode Island, WA, Wisconsin), and red states (n=23: Alaska, Arizona, Colorado, Florida, Indiana, Iowa, Kentucky, Montana, NE, North Carolina, Ohio, Tennessee, Texas, UT, unspecified Midwest). Many of these were judgment calls and I tended to put those in the "red" category. There were 4 questions with statistically significant differences (interestingly, on 2 of them the other blue states grouped with California and on 2 they looked more like the red states).
Other questions were in the direction that you'd expect (e.g. "I would expect most people in my local community to carry on a conversation with me if I express controversial views" was lowest in red states and "I regularly have enjoyable conversations with others in my local community" was highest in California) but not close to statistical significance (p > .30). I'd guess that there's a good chance that those associations would be there with a larger sample size.
I think more differences might have shown up if the survey had collected more independent variables, by asking more questions about the features of one's local community (besides country and state/province). For example: is your local community a college/university community, are you in a big metropolitan area, and how left/right is your community politically?
"What would you say is the predominant religion where you live?" was on the survey, but most people answered with some variation on Christianity (with some including additional details that are hard to code); asking about the degree of religiosity (e.g., on a 4-point scale from "very religious" to "not at all religious") would've made it easier to use this question to analyze the data.
To the person who predicted an 80-90% significant difference between different parts of California: I predict with at least 90% confidence that there will be no significant difference, because of the wide spread of locations and smallish sample size of this survey.
As far as I can tell, the survey didn't even ask this question. 13 respondents identified as living in California, but they didn't give any information on which part of California they live in.
The person who made this prediction identified as living in "NorCal" in the "further comments" section. Maybe they meant something more like "I predict that such a difference could be found with an ideal study."
The results for these have been stable for a while now; I'm posting them a bit late. 95 people took the survey after I modified it to add two questions. For the public version, I removed the pre-change data (10 data points).
One text response included identifying information, which I removed in the public version of the data. If you participated and there is any information you provided that you would like removed from the public version, PLEASE tell me as soon as possible and I will remove it.
P.S. To the person who predicted an 80-90% significant difference between different parts of California: I predict with at least 90% confidence that there will be no significant difference, because of the wide spread of locations and smallish sample size of this survey.
(The original post about the survey.)
EDIT: After some comments that it was unethical for me to post the data (in particular the text), I removed public access from the link provided earlier. Given my precommitment to post the data, I assumed it was clear enough to respondents that it would be public. I'm not convinced that this has hurt anyone, but given that others seem to disagree, it seemed prudent to remove it. Please feel free to continue this discussion; I'm interested in your thoughts.