You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Covariance in your sample vs covariance in the general population

27 Post author: RomeoStevens 16 May 2012 12:17AM

A popular-media take on a subtle problem in sampling.  I found the graph quite illustrative.

http://www.theatlantic.com/business/archive/2012/05/when-correlation-is-not-causation-but-something-much-more-screwy/256918/

Comments (3)

Comment author: Randaly 16 May 2012 04:44:11AM *  5 points [-]

Incidentally, Pearl's original explanation in Chapter 1 of Causality is here; the whole first edition of the book is available online here.

Comment author: othercriteria 16 May 2012 02:16:32AM *  5 points [-]

Sampling effects like this can be really pernicious for network data (and I imagine similarly for other dependent data). It can be difficult to tell if a network is scale-free from observing a subnetwork [1] or impossible to learn an ERGM (basically, a maximum entropy distribution with graph properties as its statistics) from a subnetwork [2].

[1] M. P. H. Stumpf, C. Wiuf, and R. M. May, “Subnets of scale-free networks are not scale-free: sampling properties of networks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 102, no. 12, p. 4221, 2005.
[2] C. Shalizi, “Consistency under Sampling of Exponential Random Graph Models,” arXiv.org. 2011.

Comment author: jsalvatier 16 May 2012 02:39:30AM 0 points [-]

That was quite good.