DanielVarga comments on MINE: Free tool for detecting novel associations in large data sets - Less Wrong

4 Post author: curiousepic 17 December 2011 03:02PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (5)

You are viewing a single comment's thread.

Comment author: DanielVarga 17 December 2011 09:18:55PM *  1 point [-]

The most important link is the Supplementary Material. I only found it through the reddit thread. (Not much else to go there for. Maybe the pirated paper itself, but that is basically just an extended abstract of the SOM.) Here is the link to the SOM:

http://www.sciencemag.org/content/suppl/2011/12/14/334.6062.1518.DC1/Reshef.SOM.pdf

Figures S5 and S6 (page 41) make me conjecture that compared to LOESS, this new method is an improvement only when the relationship is not a function (but a many-valued function). Not that I am really familiar with LOESS.

Comment author: BruceyB 21 December 2011 07:40:47AM 0 points [-]

I'm far from an expert on LOESS (in fact, I hadn't heard the term before now), but it looks like it doesn't perform a comparable function to MIC. LOESS seems to be an algorithm for producing a non-linear regression while MIC is an algorithm to measure the strength of a relationship between two variables.

In the paper (figure 2A), they compare it to Pearson correlation coefficient, Spearman rank correlation, mutual information, CorGC, and maximal correlation on data in a variety of shapes. Basically, it is effective on a wider range of shapes than any of them.

Comment author: DanielVarga 23 December 2011 11:20:22PM 0 points [-]

Check out figures S5.D and S6 from the SOM. If the relationship is functional (the linear, parabolic, sinusoidal cases on Figure S6), then the R2 calculated from LOESS regression is quite close to this MIC score, and that's not a coincidence. Of course LOESS R2 just dies when it encounters a non-functional relationship.