I'm going to take the role of the "undergrad" here and try to interpret this in the following way:
Given that a hypothesis is true -- but it is unknown to be true -- it is far more likely to come by a "statistically significant" result indicating it is wrong, than it is likely to come by a result indicating that another hypothesis is significantly more likely.
In simpler words - it is far easier to "prove" a true hypothesis is wrong by accident, than it is to "prove" that an alternative hypothesis is superior (a better estimator of reality) by accident.
I feel like this paragraph might be a little necessary for someone who haven't read the bayes rule intro, but on the other hand is a bit off-topic in this context and quite distracting, as it raises questions which are not part of this "discussion"; mainly, questions regarding how to approach "one-off" events.
Say, what if I can't quantify the outcome of my decision so nicely like in the case of a bet? What if I need to decide whether to send Miss Scarlet to prison or not based on these likelihood probabilities?
This "argument" by the "scientist" doesn't IMO represent how a true experimentalist would approach the issue; they would not necessarily be so opposed to trying new ways of improving their methods, as long as it is done step by step without replacing the entire system over night (just like the "bayesian" explains in the next paragraph).
This is also a bit side-tracking as it opens up the topic of how much more "experience" computer scientists have given the simpler and much more reproducible systems they're dealing with -- especially in the modern commercial world -- in contrast with natural sciences (I'm a programmer myself, so I'm a bit biased on this).
Not quite. The private state of mind of the researcher changes nothing. It's only the issue of which question is asked.
In this case, the two questions are (a) what is the probability of such an event occurring after tossing a fair coin 6 times and (b) what is the probability of such an event occurring if a fair coin is tossed until it lands tails.
The meaning of the questions does not change, nor do the answers to them. It is only a matter of what question is being asked -- which is obviously important when conducting a study, but is not so counter-intuitive (and much less confusing) when presented in such way (IMO).
"We only ran the 2012 US Presidential Election one time, but that doesn't mean that on November 7th you should've refused a 10betthatpaidout1000 if Obama won."
First of all, as non American I do not know what is specific about November 7th. Second, I think that some people may do not even know, that Obama has actually won.
"That's because we're considering results like HHHTHH or TTTTTT to be equally or more extreme, and there are 14 total possible cases like that."
It is not evident for me, because I am not familiar with statistics. I think where is a need to provide calculation, that there is two examples in form TTTTTT and HHHHHH and 6*2 like HTHHHH or HTTTTT.
If those are the only two options, then you've gone mad :-)
L(H|e) is defined to be P(e|H) (which, yes, was a confusing and bad plan).
Reporting "the probability of H given the actual data e" would not work, because that requires mixing a subjective prior into the objective likelihoods. That is, everyone can agree "this sequence of coin tosses supports 'biased 55% towards heads' over 'fair' by a factor of 20 to 1", but we may still disagree about the probability that the coin is biased given the evidence. (For example, you may have started out thinking it was 100 : 1 likely to be fair, while I started out thinking it was 20 : 1 likely to be biased. Now our posteriors are very different.)
The reason humanity currently uses p-values instead of Bayesian statistics is because scientists don't want to bring subjective probabilities into the mix; the idea is that we can solve that problem by reporting P(e | H) instead of P(H | e). The objective measure of P(e | H) is written L(H | e).
Do the different biases of coin correspond to different effect sizes? (E.g. large effect corresponds to H0.8, medium to H0.6, small effect corresponds to H0.55)
I'm going to take the role of the "undergrad" here and try to interpret this in the following way:
Given that a hypothesis is true -- but it is unknown to be true -- it is far more likely to come by a "statistically significant" result indicating it is wrong, than it is likely to come by a result indicating that another hypothesis is significantly more likely.
In simpler words - it is far easier to "prove" a true hypothesis is wrong by accident, than it is to "prove" that an alternative hypothesis is superior (a better estimator of reality) by accident.
Would you consider this interpretation accurate?
I feel like this paragraph might be a little necessary for someone who haven't read the bayes rule intro, but on the other hand is a bit off-topic in this context and quite distracting, as it raises questions which are not part of this "discussion"; mainly, questions regarding how to approach "one-off" events.
Say, what if I can't quantify the outcome of my decision so nicely like in the case of a bet? What if I need to decide whether to send Miss Scarlet to prison or not based on these likelihood probabilities?
This "argument" by the "scientist" doesn't IMO represent how a true experimentalist would approach the issue; they would not necessarily be so opposed to trying new ways of improving their methods, as long as it is done step by step without replacing the entire system over night (just like the "bayesian" explains in the next paragraph).
This is also a bit side-tracking as it opens up the topic of how much more "experience" computer scientists have given the simpler and much more reproducible systems they're dealing with -- especially in the modern commercial world -- in contrast with natural sciences (I'm a programmer myself, so I'm a bit biased on this).
Not quite. The private state of mind of the researcher changes nothing. It's only the issue of which question is asked.
In this case, the two questions are (a) what is the probability of such an event occurring after tossing a fair coin 6 times and (b) what is the probability of such an event occurring if a fair coin is tossed until it lands tails.
The meaning of the questions does not change, nor do the answers to them. It is only a matter of what question is being asked -- which is obviously important when conducting a study, but is not so counter-intuitive (and much less confusing) when presented in such way (IMO).
To be sure. Does this mean that the claim "We have observed 20 times against 1 that the coin is 55% biased" is only made 1.4% of the time?
If so, it seems like a lot...
"We only ran the 2012 US Presidential Election one time, but that doesn't mean that on November 7th you should've refused a 10betthatpaidout1000 if Obama won."
First of all, as non American I do not know what is specific about November 7th. Second, I think that some people may do not even know, that Obama has actually won.
"That's because we're considering results like HHHTHH or TTTTTT to be equally or more extreme, and there are 14 total possible cases like that."
It is not evident for me, because I am not familiar with statistics. I think where is a need to provide calculation, that there is two examples in form TTTTTT and HHHHHH and 6*2 like HTHHHH or HTTTTT.
Have I gone mad, or do you mean "L(H|e) is simply the probability of H given that the the actual data e occurred"?
If those are the only two options, then you've gone mad :-)
L(H|e) is defined to be P(e|H) (which, yes, was a confusing and bad plan).
Reporting "the probability of H given the actual data e" would not work, because that requires mixing a subjective prior into the objective likelihoods. That is, everyone can agree "this sequence of coin tosses supports 'biased 55% towards heads' over 'fair' by a factor of 20 to 1", but we may still disagree about the probability that the coin is biased given the evidence. (For example, you may have started out thinking it was 100 : 1 likely to be fair, while I started out thinking it was 20 : 1 likely to be biased. Now our posteriors are very different.)
The reason humanity currently uses p-values instead of Bayesian statistics is because scientists don't want to bring subjective probabilities into the mix; the idea is that we can solve that problem by reporting P(e | H) instead of P(H | e). The objective measure of P(e | H) is written L(H | e).
Do the different biases of coin correspond to different effect sizes? (E.g. large effect corresponds to H0.8, medium to H0.6, small effect corresponds to H0.55)
Yes.