summerstay comments on Against NHST - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (62)
Perhaps you would suggest showing the histograms of completion times on each site, along with the 95% confidence error bars?
Presumably not actually 95%, but, as gwern said, a threshold based on the cost of false positives.
Yes, in this case you could keep using p-values (if you really wanted to...), but with reference to the value of, say, each customer. (This is what I meant by setting the threshold with respect to decision theory.) If the goal is to use on a site making millions of dollars*, 0.01 may be too loose a threshold, but if he's just messing with his personal site to help readers, a p-value like 0.10 may be perfectly acceptable.
* If the results were that important, I think there'd be better approaches than a once-off a/b test. Adaptive multi-armed bandit algorithms sound really cool from what I've read of them.
I'd suggest more of a scattergram than a histogram; superimposing 95% CIs would then cover the exploratory data/visualization & confidence intervals. Combine that with an effect size and one has made a good start.