Comment author: gwern 08 April 2013 04:54:01PM *  7 points [-]

Another thing I have accused you of, in my head, is a failure to appropriately apply a multiple test correction when doing some data exploration for trends in the less wrong survey.

It's true I didn't do any multiple correction for the 2012 survey, but I think you're simply not understanding the point of multiple correction.

First, 'Data exploration' is precisely when you don't want to do multiple correction, because when data exploration is being done properly, it's being done as exploration, to guide future work, to discern what signals may be there for followup. But multiple correction controls the false positive rate at the expense of then producing tons of false negatives; this is not a trade-off we want to make in exploration. If you look at the comments, dozens of different scenarios and ideas are being looked at, and so we know in advance that any multiple correction is going to trash pretty much every single result, and so we won't wind up with any interesting hypotheses at all! Predictably defeating the entire purpose of looking. Why would you do this wittingly? It's one thing to explore data and find no interesting relationships at all (shit happens), but it's another thing entirely to set up procedures which nearly guarantee that you'll ignore any relationships you do find. And which multiple correction, anyway? I didn't come up with a list of hypotheses and then methodically go through them, I tested things as people suggested them or I thought of them; should I have done a single multiple correction of them all yesterday? (But what if I think of a new hypothesis tomorrow...?)

Second, thresholds for alpha and beta are supposed to be set by decision-theoretic considerations of cost-benefit. A false positive in medicine can be very expensive in lives and money, and hence any exploratory attitude, or undeclared data mining/dredging, is a serious issue (and one I fully agree with Ioannides on). In those scenarios, we certainly do want to reduce the false positives even if we're forced to increase the false negatives. But this is just an online survey. It's done for personal interest, kicks, and maybe a bit of planning or coordination by LWers. It's also a little useful for rebutting outside stereotypes about intellectual monoculture or homogeneity. In this context, a false positive is not a big deal, and no worse than a false negative. (In fact, rather than sacrifice a disproportionate amount of beta in order to decrease alpha more, we might want to actually increase our alpha!)

This cost-benefit is a major reason why if you look through my own statistical analyses and experiments, I tend to only do multiple correction in cases where I've pre-specified my metrics (self-experiments are not data exploration!) and where a false positive is expensive (literally, in the case of supplements, since they cost a non-trivial amount of $ over a lifetime). So in my Zeo experiments, you will see me use multiple correction for melatonin, standing, & 2 Vitamin D experiments (and also in a recent non-public self-experiment); but you won't see any multiple correction in my exploratory weather analysis.

What I would recommend you do for data exploration is decide ahead of time if you have some particularly interesting hypothesis or not. If not and you're just going to check lots of stuff, then commit to that and the appropriate multiple test correction at the end.

See above on why this is pointless and inappropriate.

That level of correction then also saves your 'noticing' something interesting and checking it specifically being circular (because you were already checking 'everything' and correcting appropriately).

If you were doing it at the end, then this sort of 'double-testing' would be a concern as it might lead your "actual" number of tests to differ from your "corrected against" number of tests. But it's not circular, because you're not doing multiple correction. The positives you get after running a bunch of tests will not have a very high level of confidence, but that's why you then take them as your new fixed set of specific hypotheses to run against the next dataset and, if the results are important, then perhaps do multiple correction.

So for example, if I cared that much about the LW survey results from the data exploration, what I should ideally do is collect the n positive results I care about, announce in advance the exact analysis I plan to do with the 2013 dataset, and decide in advance whether and what kind of multiple correction I want to do. The 2012 results using 2012 data suggest n hypotheses, and I would then actually test them with the 2013 data. (As it happens, I don't care enough, so I haven't.)

Comment author: tenlier 08 April 2013 07:36:00PM -3 points [-]

Gwern, I should be able to say that I appreciate the time you took to respond (which is snarky enough), but I am not able to do so. You can't trust that your response to me is inappropriate and I can't find any reason to invest myself in proving your response is inappropriate. I'll agree my comment to you was somewhat inappropriate and while turnabout is fair play (and first provocation warrants an added response), it is not helpful here (whether deliberate or not). Separate from that, I disagree with you (your response is,historically, how people have managed to be wrong a lot). I'll retire once more.

I believe it was suggested to me when I first asked the potential value of this place that they could help me with my math.

Comment author: gwern 08 April 2013 03:59:56PM *  6 points [-]

It is the single solitary thing any person who knows any stats at all knows.

Many people with statistics degrees or statisticians or statistics professors make the p-value fallacy; so perhaps your standards are too high if LWers merely being as good as statistics professors comes as a disappointment to you.

I seem to recall the same error made by Gwern (and pointed out).

I've pointed out the mis-interpretation of p-values many times (most recently, by Yvain), and wrote a post with the commonness of the misinterpretation as a major point (http://lesswrong.com/lw/g13/against_nhst/), so I would be a little surprised if I have made that error.

Comment author: tenlier 08 April 2013 04:14:10PM *  1 point [-]

Sorry, Gwern, I may be slandering you, but I thought I noticed it long before that (I've been reading, despite my silence). Another thing I have accused you of, in my head, is a failure to appropriately apply a multiple test correction when doing some data exploration for trends in the less wrong survey. Again, I may have you misidentified. Such behavior is striking, if true, since it seems to me one of the most basic complaints Less Wrong has about science (somewhat incorrectly).

Edited: Gwern is right (on my misremembering). Either I was skimming and didn't notice Gwern was quoting or I just mixed corrector with corrected. Sorry about that. In possible recompense: What I would recommend you do for data exploration is decide ahead of time if you have some particularly interesting hypothesis or not. If not and you're just going to check lots of stuff, then commit to that and the appropriate multiple test correction at the end. That level of correction then also saves your 'noticing' something interesting and checking it specifically being circular (because you were already checking 'everything' and correcting appropriately).

Comment author: VincentYu 07 April 2013 05:09:24PM *  13 points [-]

I don't think you responded to my criticisms and I have nothing further to add. However, there are a few critical mistakes in what you have added that you need to correct:

Now pay attention; this is the part everyone gets wrong, including most of the commenters below.

The methodology used in this study, and in most studies, is as follows:

  • Divide subjects into a test group and a control group.

No, Mattes and Gittelman ran an order-randomized crossover study. In crossover studies, subjects serve as their own controls and they are not partitioned into test and control groups.

If you don't understand why that is so, read the articles about the t-test and the F-test. The tests compute what a difference in magnitude of response such that, 95% of the time, if the measured effect difference is that large, the null hypothesis (that the responses of all subjects in both groups were drawn from the same distribution) is false.

No, the correct form is:

  • The tests compute a difference in magnitude of response such that if the null hypothesis is true, then 95% of the time the measured effect is not that large.

The form you quoted is a deadly undergraduate mistake.

ADDED: People are making comments proving they don't understand how the F-test works. This is how it works: You are testing the hypothesis that two groups respond differently to food dye.

Suppose you measured the number of times a kid shouted or jumped, and you found that kids fed food dye shouted or jumped an average of 20 times per hour, and kids not fed food dye shouted or jumped an average of 17 times per hour. When you run your F-test, you compute that, assuming all kids respond to food dye the same way, you need a difference of 4 to conclude that the two distributions (test and control) are different.

If the food dye kids had shouted/jumped 21 times per hour, the study would conclude that food dye causes hyperactivity. Because they shouted/jumped only 20 times per hour, it failed to prove that food dye causes hyperactivity. That failure to prove is then taken as having proved that food dye does not cause hyperactivity, even though the evidence indicated that food dye causes hyperactivity.

This is wrong. There are reasonable prior distributions for which the observation of a small positive sample difference is evidence for a non-positive population difference. For example, this happens when the prior distribution for the population difference can be roughly factored into a null hypothesis and an alternative hypothesis that predicts a very large positive difference.

In particular, contrary to your claim, the small increase of 3 can be evidence that food dye does not cause hyperactivity if the prior distribution can be factored into a null hypothesis and an alternative hypothesis that predicts a positive response much greater than 3. This is analogous to one of Mattes and Gittelman's central claims (they claim to have studied children for which the alternative hypothesis predicted a very large response).

Comment author: tenlier 08 April 2013 03:56:57PM 3 points [-]

This is going to be yet another horrible post. I just go meta and personal. Sorry.

I don't understand how this thread (and a few others like it) on stats can happen; in particular, your second point (re: the basic mistake). It is the single solitary thing any person who knows any stats at all knows. Am I wrong? Maybe 'knows' meaning 'understands'. I seem to recall the same error made by Gwern (and pointed out). I mean the system works in the sense that these comments get upvoted, but it is like. . . people having strong technical opinions with very high confidence about Shakespeare without being able to write out a sentence. It is not inconceivable the opinions are good (stroke, language, etc), but it says something very odd about the community that it happens regularly and is not extremely noticed. My impression is that Less Wrong is insane on statistics, particularly, and some areas of physics (and social aspects of science and philosophy).

I didn't read the original post, paper, or anything other than some comment by Goetz which seemed to show he didn't know what a p-value was and had a gigantic mouth. It's possible I've missed something basic. Normally, before concluding a madness in the world, I'd be careful. For me to be right here means madness is very very likely (e.g., if I correctly guess it's -70 outside without checking any data, I know something unusual about where I live).

Comment author: Viliam_Bur 25 June 2012 09:44:08AM -1 points [-]

Name a post in the sequences I should read that I will find instructive. Is it really so difficult a request that you require a good model of me to answer? Just assume an undergrads knowledge in every field.

I don't have "undergrads knowledge in every field", so how could I know which parts of Sequences are outside that range?

I do suspect (but this may be just my ignorance speaking) that there is most to gain in philosophy, specifically philosophy of science. I never studied philosophy seriously, but my model (prejudice) is that it is a huge confused field with a lot of history, nonsense, and mysterious answers. As in: You learn who was Plato, when and where he lived, and what he said... but the question "is that really true?" is kind of forbidden. You don't learn true or false, you just memorize and classify the ideas. This guy said this, the other guy said that; both of them are famous philosophers, both of them deserve respect; end of story. The only critical thing you can say about a philosopher X is that a philosopher Y disagreed with him, and because Y lived later and read X's books, this criticism is also famous and deserves respect. But if later a philosopher Z says "meh" and follows the teachings of X anyway, well, that deserves respect too. So at the end we have many contradictory answers, all of them worthy of respect, but you can't use any of them to build a better mousetrap. (But you could get a PhD in mousetrap philosophy by explaining how the idea of the mousetrap relates to the idea of the mouse, and why catching or not catching a mouse is just a language game, and that according to a different culture the so-called mouse is an ancient spirit.) -- All this is useless for science. And even if the useful parts are there somewhere, it is worth pointing at them and saying "this part is right, the other parts are wrong".

If you want a specific example, for me it would be e.g. "How to Convince Me That 2 + 2 = 3". As far as I know, this is not a part of undergrad math.

Comment author: tenlier 25 June 2012 05:33:13PM -4 points [-]

I'll backtrack from "last post" for 6 months to last conversation for 6 months. Viliam, you're a reasonably upvoted dude. You seem pretty normal for these parts. Exactly how annoyed do I get to be that your response to me is dumb? Isn't the commitment to some aspects of rationality exemplified by my complete inability to restrain my annoyance with your being an idiot of some value? Yes, yes, I could be better still.

Again, I think your response is very typically LessWrongian: Wordy, vacant, stupid, irrational, over-confident with weaseliness to pretend otherwise, etc, etc. Do I get downvoted for telling you you're being an idiot in thinking you need an undergrads knowledge in every field in order to know any part of the Sequences are outside that range? I didn't ask for an exhaustive list; I asked for one post exceeding an undergrads knowledge in that field. Do I need to explain that in more detail? Do I lose points for being annoyed you took the time to write all those words and not a second to think about them. Do I get to be insulted that your model of me is basically retarded, from my point of view. Maybe that's all you folks are capable of. Fine. What a shocking coincidence that your example comes from an area you never studied seriously, when I basically asked for just a single example of the opposite. The post you link to is fine but totally and completely uninformative to me.

Listen, of course you can defend your stupidity if you assume I'm a moron. You can say, well, if I don't know what they study in philosophy, I can't say blah isn't covered. Can we not have that idiotic conversation? Can we just acknowledge that if you have a good physics knowledge in some area, you know when the conversation is on physics in that area and when it is exceeded without knowing all other fields? Do I have to be as wordy as you? I clearly am being so; I didn't even bother reading the middle of your post. Just stupidity.

That's not very important, and certainly one of dozens of things wrong. However, it's something you'll see.

So, I pointed out an error you made. LessWrongians like when people point out errors they make. The only reason I pointed it out is that I was annoyed. Ideally you'd find some reason other than annoyance for me to talk to you (repeating the request: A post in the sequences that is informative). You can also conclude it is not worth talking to me when that is all that motivates me. Maybe you think you can modify my behavior, but not without a carrot. Perhaps upvotes and downvotes are supposed to serve in that way, but they don't for me.

Comment author: Viliam_Bur 24 June 2012 06:53:49PM 9 points [-]

One of my remaining interests in this place is discovering why I find you all pretty unlikeable. [...] I'd say my reaction to this place is roughly what I feel about anti-science types. Of course, a lot of the dialogue here is superficially anti-science, but I don't think that's what's setting me off. I think I really feel like this place is not just superficially anti-science.

I think you are looking at the wrong places here. I mean -- you are calling people idiots, you are breaking community norms by creating multiple accounts -- this is not related to "anti-science" or anything like that. There are people here who disagree about artificial intelligence, many worlds, existential risks, reductionism, etc., but they usually don't behave like that.

mainly concerned with superficialities (e.g., you will have an unduly strong reaction to my using the word "moronic").

For many people here, being polite and friendly is not a "superficiality". So perhaps this is what irritates you. Maybe you'd prefer a place where people discuss scientific ideas by calling each other idiots. Starting your own blog could also be a solution.

Comment author: tenlier 24 June 2012 09:10:45PM -4 points [-]

I'm not surprised at being downvoted, and I don't mean that in the usual defensive way (i.e., "I have such a good model of you it predicts your behaviour and negative reaction is stupid; I'm superior, yadda yada").

My behavior is worthy of being downvoted and some degree of annoyance with me is perfectly reasonable, appropriate, and likeable. Trying to extrapolate my annoyance from my behavior is misleading since I am not responding to what is irritating me and my (hidden specific) annoyance manifests as general irritability. I would say an appropriate criticism of me (which I have attempted to highlight when being critical) is the degree to which I am a collectivist in the way I think about LessWrong.

Let us try one more time, and I have asked questions like this before. Let's make this my last post for a while (6 months); that way if I return to make some sort of status ploy from this consideration, the promise to have left for that time will diminish the value of the plot. So, let us play a game where you answer this request as if it were true rather than a bid for status. Name a post in the sequences I should read that I will find instructive. Is it really so difficult a request that you require a good model of me to answer? Just assume an undergrads knowledge in every field. Is there any physics that superpasses what a good (but not exceptional) undergrad knows? Or biology, computer science, philsophy, etc? I confess to finding the boxing experiment mildly interesting. In return, I will do something on my own time that would be useful to the Singularity Institute if they did it. If it works, I'll return, tell someone, and insult you all with greater justice.

Comment author: tenlier 24 June 2012 02:53:52PM -4 points [-]

There's a number of comments on this post where people wrongly think they know why someone is in disagreement with them:

http://lesswrong.com/r/discussion/lw/d9b/link_rsa_animate_extremely_entertaining/6wbn http://lesswrong.com/r/discussion/lw/d9b/link_rsa_animate_extremely_entertaining/6wbc http://lesswrong.com/r/discussion/lw/d9b/link_rsa_animate_extremely_entertaining/6w97

Arguably others. The other material is either empty (minus humor) or simply correction of these sorts of trivial errors. I think this is very common on LessWrong.

One of my remaining interests in this place is discovering why I find you all pretty unlikeable. This is a change in my viewpoint since I started actually becoming familiar with the joint and is also pretty surprising since I overlap philosophically in ways that usually make me fond of people. I'd say my reaction to this place is roughly what I feel about anti-science types. Of course, a lot of the dialogue here is superficially anti-science, but I don't think that's what's setting me off. I think I really feel like this place is not just superficially anti-science. Something like your ideas about testing hypotheses and modifying beliefs are fine, but your hypotheses trend moronic (circling back to the opening point). Also, mainly concerned with superficialities (e.g., you will have an unduly strong reaction to my using the word "moronic"). Anyway, just some impressions. I think I'll test something (not in a Gwernian way).

Comment author: Wei_Dai 17 June 2012 10:47:48AM 13 points [-]

But I do think any SIAI-sponsored work should be (at least cross-posted to ) and SIAI-related site/blog, such as this one.

I wish I knew why a number of LWers maintain their own blogs and don't at least cross post to LW. Do they not want more people to read and discuss their posts? Are they afraid their posts will be voted down? Are they trying to signal something, and if so what?

In response to comment by Wei_Dai on SIAI May report
Comment author: tenlier 17 June 2012 11:14:16PM -10 points [-]

Upvoted to +6 currently. Funny. I guess your answers are what humility and false humility map to if you're an idiot of the type LessWrongians appear to be. That is, when people are being humble (in not posting) you could say they're just afraid of not signalling high standards if they implied more people wanted to read, etc (shmushing your explanations into one). On one's own blog, onus is on the reader who knowingly visited. Wouldn't apply to true idiot LessWrongians; i.e., I expect if you did it, it would be about something simple like maintaining control.

Comment author: JenniferRM 15 June 2012 11:49:38PM *  -1 points [-]

That's a surprising conclusion to me which I hadn't seen before, but also doesn't seem too hard to come up with, so I'm curious where I've gone off the rails. This argument has a very Will_Newsomey flavor to it to me.

Perhaps it is not wise to speculate out loud in this area until you've worked through three rounds of "ok, so what are the implications of that idea" and decided that it would help people to hear about the conclusions you've developed three steps back. You can frequently find interesting things when you wander around, but there are certain neighborhoods you should not explore with children along for the ride until you've been there before and made sure its reasonably safe.

Perhaps you could send a PM to Will?

Comment author: tenlier 17 June 2012 04:25:12PM 2 points [-]

Not just going meta for the sake of it: I assert you have not sufficiently thought throught the implications of promoting that sort of non-openness publicly on the board. Perhaps you could PM jsavaltier.

I'm lying, of course. But interesting to register points of strongest divergence between LW and conventional morality (JenniferRM's post, I mean; jsalvatier's is fine and interesting).

Comment author: VincentYu 14 June 2012 02:12:57PM *  8 points [-]

Establish a scholarship to collect information on young talent

Related: Reaching young math/compsci talent

Create a merit scholarship for the type of young talent that SI wants to attract – this can reveal valuable information about this group of people, and can potentially be used a targeted publicity tool if handled well.

Information that could be collected from applications

  • Basic personal details (age, location, contact methods, etc.)
  • Education (past and future)
  • Academic interests
  • Career goals
  • Awards and competition results
  • Third-party reviews (i.e., letters of recommendation)
  • Basic personality assessment (see previous LW discussion on correlates with Big Five personality traits: [1], [2], [3])
  • Ideas about and attitudes toward x-risks/FAI/SI/FHI (these could be responses to prompts – as a bonus, applicants are introduced to the content in the prompts)
  • ... Pretty much anything else (personal anecdote: I've revealed things about myself in college and scholarship applications that I have never expressed to anyone else)

Uses of this information

  • Check whether SI is effectively reaching the right people with its current plans.
  • The compiled list of young talent could be directly used to advertise things like SPARC to the right people.
  • General survey tool.

Potential problems and difficulties

  • Its use as an information gathering tool could be seen negatively.
  • Legal issues?
  • Publicity. The scholarship has to be made known to the relevant people, and this has to be done in such a way that SI is seen as a reputable institute. However, a scholarship does open up new avenues for publicity.
  • Cost and manpower.

Is anyone else doing this?

As with many ideas, we ought to be cautious if we see no one else doing something similar. Indeed, I cannot think of any high school scholarship that is used primarily to collect information for the sponsoring organization (is this really the case?). However, there is good reason for this – no one else is interested in reaching the same group of high school students. SI is the only organization I know of who wants to reach high school students for their research group.

FHI had a competition that could be an attempt to collect information, but I'm not sure.

High school scholarships

It would be wise to consult current high school scholarships, and AoPS has a good list.

Comment author: tenlier 16 June 2012 03:15:16PM -1 points [-]

"Indeed, I cannot think of any high school scholarship that is used primarily to collect information for the sponsoring organization (is this really the case?). However, there is good reason for this – no one else is interested in reaching the same group of high school students. SI is the only organization I know of who wants to reach high school students for their research group."

I find this place persistently surprising, which is nice. Try to imagine what you would think if a religious organization did this and how you would feel. It's alright to hold a scholarship to encourage kids to be interested in a topic; not so to garner information for your own purposes, unless that is incredibly clear upfront. Very Gwernian.

Comment author: tenlier 16 June 2012 03:09:38PM -5 points [-]

It's interesting this post is being upvoted. It reads like jabber to me. I have little idea what it is trying to argue. Stuff like:

"It's likely that when the protein complex undergoes autophosphorylation, other changes occur in the cell as well. If this led to changes in the cell's epigenome, which is very common, and the structure of the epigenome is retained by the cryopreservation, then the cell's epigenome could allow reverse inference of the state of its ion channels. "

Is either meaningless or flawed, probably both. The whold post reminds me of the idea on LessWrong that one might as well just assume Omega will reconstruct you based on trace evidence in the physical world.

View more: Next