Comment author: [deleted] 15 September 2015 05:09:22AM 2 points [-]

Hmmm, yes, I suppose I was making the same mistake they were... I thought that what confidence intervals were are actually what credible intervals are.

In response to comment by [deleted] on Open thread, Sep. 14 - Sep. 20, 2015
Comment author: VincentYu 15 September 2015 05:31:37AM *  6 points [-]

I see. Looking into this, it seems that the (mis)use of the phrase "confidence interval" to mean "credible interval" is endemic on LW. A Google search for "confidence interval" on LW yields more than 200 results, of which many—perhaps most—should say "credible interval" instead. The corresponding search for "credible interval" yields less than 20 results.

Comment author: [deleted] 15 September 2015 01:32:15AM *  0 points [-]

The Fallacy of Placing Confidence in Confidence Intervals

I just read through this, and it sounds like they're trying to squish a frequentist interpretation on a Bayesian tool. They keep saying how the confidence intervals don't correspond with reality, but confidence intervals are supposed to be measuring degrees of belief. Am I missing something here?

In response to comment by [deleted] on Open thread, Sep. 14 - Sep. 20, 2015
Comment author: VincentYu 15 September 2015 05:06:49AM 5 points [-]

I briefly skimmed the paper and don't see how you are getting this impression. Confidence intervals are—if we force the dichotomy—considered a frequentist rather than Bayesian tool. They point out that others are trying to squish a Bayesian interpretation on a frequentist tool by treating confidence intervals as though they are credible intervals, and they state this quite explicitly (p.17–18, emphasis mine):

Finally, we believe that in science, the meaning of our inferences are important. Bayesian credible intervals support an interpretation of probability in terms of plausibility, thanks to the explicit use of a prior. Confidence intervals, on the other hand, are based on a philosophy that does not allow inferences about plausibility, and does not utilize prior information. Using confidence intervals as if they were credible intervals is an attempt to smuggle Bayesian meaning into frequentist statistics, without proper consideration of a prior. As they say, there is no such thing as a free lunch; one must choose. We suspect that researchers, given the choice, would rather specify priors and get the benefits that come from Bayesian theory. We should not pretend, however, that the choice need not be made. Confidence interval theory and Bayesian theory are not interchangeable, and should not be treated as so.

Comment author: Morendil 20 August 2015 10:14:16PM 1 point [-]

Software Engineering, A Historical Perspective J. Marciniak DOI 10.1002/0471028959.sof321

Comment author: VincentYu 22 August 2015 05:26:10AM 3 points [-]

Here. Sorry about the horrible format; I didn't see a better way to download the content or print the page. In addition, I couldn't access the figures.

Comment author: gwern 23 July 2015 06:22:01PM *  1 point [-]

Probability and Statistics for Business Decisions, Robert Schlaifer 1959. Surprisingly expensive used, and unfortunately for such a foundational text in Bayesian decision theory, doesn't seem to be available online. If you can't get a digital copy, does anyone know of a good service or group which would produce a high-quality digital copy given a print edition?

Comment author: VincentYu 24 July 2015 10:37:13AM *  5 points [-]

Page-by-page .djvu scans are available here (found via this search; edit: it seems to appear sporadically in the search results). Full sequence of download links is <http://202.116.13.3/ebook%5C24/24000522/ptiff/00000{001..744}.djvu>


I wrote the following just before finding the scan of the book. I'll post it anyway.

I've used 1DollarScan for about 50 books, including math/stat textbooks, and the quality is consistently good (unless you need accurate color reproduction) even with the cheapest option (i.e., $1 per 100 pages), but you'll need to do your own post-processing to:

  • Lossily compress further and binarize B/W text; expect about 400 KB/page from 1DollarScan.
  • Perform OCR; 1DollarScan's OCR option is expensive and performs okay at best.
  • Straighten pages; pages are often offset slightly from the vertical.
  • Add metadata (e.g., page numbering, section bookmarks).

I use Adobe Acrobat with ABBYY FineReader for these. FineReader's OCR is more accurate than Acrobat's, but Acrobat performs okay by itself. Acrobat's trial can be indefinitely reactivated every month in a Windows VM by reverting to a pre-activation snapshot, whereas FineReader has to be bought or torrented, as its trial is overly restrictive. I don't know of any good options on Linux.

BTW, there's a used copy on Half.com for $39. Not sure if you saw that.

Comment author: RolfAndreassen 02 July 2015 09:22:34PM 2 points [-]

You take the probability of A not happening and multiply by the probability of B not happening. That gives you P(not A and not B). Then subtract that from 1. The probability of at least one of two events happening is just one minus the probability of neither happening.

In your example of 23% and 48%, the probability of getting at least one is

1 - (1-0.23)*(1-0.48) = 0.60.

Comment author: VincentYu 03 July 2015 01:20:06AM 3 points [-]

You take the probability of A not happening and multiply by the probability of B not happening. That gives you P(not A and not B).

Only if A and B are independent.

Comment author: Gram_Stone 12 March 2015 04:55:29AM 0 points [-]

Is the term 'expected value' interchangeable with the term 'expected utility?'

Comment author: VincentYu 10 June 2015 01:50:22AM 1 point [-]

No. "Expected value" refers to the expectation of a variable under a probability distribution, whereas "expected utility" refers specifically to the expectation of a utility function under a probability distribution. That is, expected utility is a specific instantiation of an expected value; expected value is more general than expected utility and can refer to things other than utility.

The importance of this distinction often arises when considering the utility of large sums of money: a person may well decline a deal or gamble with positive expected value (of money) because the expected utility can be negative (for example, see the St. Petersburg paradox).

Comment author: emr 06 March 2015 03:11:21PM *  1 point [-]

Yes! I think this is it. The wikipedia article links to these ray diagrams, which I found helpful (particularly the fourth picture).

I suspected it had to do with an overlap in the penumbra, or the "fuzzy edges", of the shadow, but I kept getting confused because the observation isn't what you would expect, if you think of the penumbra as two separate pictures that you're simply "adding together" as they overlap.

Comment author: VincentYu 10 June 2015 01:35:55AM 1 point [-]

See also this highly-upvoted question on the Physics Stack Exchange, which deals with your question.

Comment author: RichardKennaway 05 June 2015 10:48:34PM *  6 points [-]

I have my genome data from both 23andMe and BGI. I am wondering what to make of it. BGI reports about thirty times as many SNPs as 23andMe. 23andMe: 598897, BGI: 19695817.

Of these, 475801 are reported by both. I looked to see how well they agree with each other, and summarised the results as a count, for each occurring pair of results, of how often that pair occurred. In descending numerical order, and classifying them by type of match or mismatch, this is what I get. (No individual SNPs are identified here.)

87565 CC CC
86952 GG GG
75289 TT TT
75087 AA AA
31069 CT CT
30817 AG GA
27542 CT TC
27484 AG AG
6818 AC CA
6767 GT GT
6373 AC AC
6297 GT TG
270 CG GC
251 CG CG
146 AT TA
138 AT AT
420 C C
402 G G
336 A A
291 T T
582 CT --
576 AG --
426 CC --
399 GG --
348 -- CC
340 -- GG
330 TT --
316 AA --
270 -- AA
240 -- TT
139 GT --
136 AC --
123 -- GA
121 -- CT
113 -- TG
110 -- TC
104 -- GT
101 -- CA
93 -- AC
86 -- AG
26 -- --
5 -- AT
4 CG --
4 -- GC
3 -- TA
2 AT --
2 -- CG
14 C --
13 T --
9 G --
8 -- C
7 A --
5 -- G
2 -- T
51 CC CT
33 AG AA
32 AG GG
31 GG GA
31 CT TT
30 CT CC
25 TT TC
23 AA AG
18 GG AG
15 CC TC
15 AA GA
11 TT TG
11 CC CA
9 TT GT
9 TT CT
9 AC AA
7 CC AC
7 AC CC
6 GT TT
6 GT GG
6 GG GT
6 AA AC
5 TT CC
4 GG AA
4 CC CG
4 AA CA
3 CG CC
3 CC TT
3 AT TT
2 TT TA
1 TT GA
1 GG TG
1 GG GC
1 GG CG
1 GG CC
1 CG GG
1 CC GC
1 CC AA
1 AT AA
1 AA GG
1 G A

The first five lines make sense: the two analyses agree for a large proportion of the SNPs. The sixth shows 23andMe reading AG when BGI reads GA 30817 times. It looks like 23andMe are reporting unequal pairs in alphabetical order, while BGI are reporting them in random order. Taking these as matches, the great majority of SNPs reported by both are reported identically.

Then there are a few thousand SNPs that one or other analysis (in 26 cases, both) list in their output but don't report anything for. What causes this?

Finally, there are a few hundred that the two analyses just give different results for. For most of these, one reports homozygosity for an allele present in the other, but in a few cases the reports are completely different, e.g. one occurrence of TT/GA.

Is this amount of mismatch typical for such analyses?

Comment author: VincentYu 06 June 2015 06:33:35AM *  2 points [-]

Interesting. Thanks for posting this!

I received exactly the same number of SNPs from BGI, so it looks like our data were processed under the same pipeline. I've found three people who have publicly posted their BGI data: two at the Personal Genome Project (hu2FEC01 and hu41F03B, each with 5,095,048 SNPs), and one on a personal website (with 18,217,058 SNPs).

Then there are a few thousand SNPs that one or other analysis (in 26 cases, both) list in their output but don't report anything for. What causes this?

The double dashes are no calls. 23andme reports on a set list of SNPs, and instead of omitting an SNP when they can't confidently determine the genotype, they indicate this with a double dash.

Is this amount of mismatch typical for such analyses?

This seems normal considering the error rates from 23andme that others have been reporting (example). I don't know about BGI's error rates.

I think it might be possible to accurately guess the actual genotypes for some of the mismatches by imputing the genotypes with something like Impute2 (for each mismatched SNP, leave it out and impute it using the nearby SNPs). This will take many hours of work, though, and you might as well phase and impute across the whole genome if you have the time, interest, and processing power to do so (I've been meaning to try this out to learn more about how these things work).

Comment author: gwern 14 May 2015 09:35:45PM *  1 point [-]

My currently unfilled requests on /r/scholar:

https://www.reddit.com/r/Scholar/comments/29hi38/request_2_dissertations_on_online_learning/ :

  1. Santo, S.A.: "Virtual learning, personality, and learning styles". Dissertation Abstracts International Section A, Humanities & Social Sciences, 62, pp. 137 (2001) (slides: http://sloanconsortium.org/conference/proceedings/1999/pdf/99_santo.pdf )
  2. Zobdeh-Asadi, S.: "Differences in personality factors and learners' preference for traditional versus online education". Dissertation Abstract International Section A: Humanities & Social Sciences, 65(2-A), pp. 436 (2004)

https://www.reddit.com/r/Scholar/comments/2xlrv5/article_modafinil_the_unique_properties_of_a_new/ :

https://www.reddit.com/r/Scholar/comments/2xpgig/article_is_lithium_a_neuroprotective_agent/ :

https://www.reddit.com/r/Scholar/comments/32z239/can_transcranial_direct_current_stimulation/ :

https://www.reddit.com/r/Scholar/comments/34nlq5/studying_with_music_is_the_irrelevant_speech/

  • Book chapter: Kantner, J. (2009). "Studying with music: Is the irrelevant speech effect relevant?". In M. R. Kelley (Ed.), Applied memory (pp. 19-40). Hauppauge, NY US: Nova Science Publishers.

https://www.reddit.com/r/Scholar/comments/34nsug/article_the_effect_of_music_as_a_distraction_on/ :

https://www.reddit.com/r/Scholar/comments/352qyo/article_gwas_and_metaanalysis_in_aginglongevity/ :

Comment author: VincentYu 06 June 2015 05:13:20AM 1 point [-]

ILL couldn't get Schretlen et al. Can try again once the paper is included in the print journal, but I'd recommend just asking the authors for a copy.

Comment author: gwern 14 May 2015 09:35:45PM *  1 point [-]

My currently unfilled requests on /r/scholar:

https://www.reddit.com/r/Scholar/comments/29hi38/request_2_dissertations_on_online_learning/ :

  1. Santo, S.A.: "Virtual learning, personality, and learning styles". Dissertation Abstracts International Section A, Humanities & Social Sciences, 62, pp. 137 (2001) (slides: http://sloanconsortium.org/conference/proceedings/1999/pdf/99_santo.pdf )
  2. Zobdeh-Asadi, S.: "Differences in personality factors and learners' preference for traditional versus online education". Dissertation Abstract International Section A: Humanities & Social Sciences, 65(2-A), pp. 436 (2004)

https://www.reddit.com/r/Scholar/comments/2xlrv5/article_modafinil_the_unique_properties_of_a_new/ :

https://www.reddit.com/r/Scholar/comments/2xpgig/article_is_lithium_a_neuroprotective_agent/ :

https://www.reddit.com/r/Scholar/comments/32z239/can_transcranial_direct_current_stimulation/ :

https://www.reddit.com/r/Scholar/comments/34nlq5/studying_with_music_is_the_irrelevant_speech/

  • Book chapter: Kantner, J. (2009). "Studying with music: Is the irrelevant speech effect relevant?". In M. R. Kelley (Ed.), Applied memory (pp. 19-40). Hauppauge, NY US: Nova Science Publishers.

https://www.reddit.com/r/Scholar/comments/34nsug/article_the_effect_of_music_as_a_distraction_on/ :

https://www.reddit.com/r/Scholar/comments/352qyo/article_gwas_and_metaanalysis_in_aginglongevity/ :

Comment author: VincentYu 23 May 2015 01:54:44AM 3 points [-]

I'm still waiting for Schretlen et al.

View more: Prev | Next