LINK: AI Researcher Yann LeCun on AI function

Shmi

[ Upvoted. ]

If anyone felt I was uncivil to them in any subthread, I hereby apologize here.

I am not sure causality is a subfield of ML in the sense that I don't think many ML people care about causality. I think causal inference is a subfield of stats (lots of talks with the word "causal" at this year's JSM). I think it's weird that stats and ML are different fields, but that's a separate discussion.

I think it is possible to formalize causality without talking about interventions as Pearl et al. thinks of them, for example people in reinforcement learning do this. But if you start to worry about e.g. time-varying confounders, and you are not using interventions, you will either get stuff wrong, or have to reinvent interventions again. Which would be silly -- so just learn about the Neyman/Rubin model and graphs. It's the formalism that handles all the "gotchas" correctly. (In fact, until interventionists came along, people didn't even have the math to realize that time-varying confounders are a "gotcha" that needs special handling!)

By the way, the only reason I am harping on time-varying confounders is because it is a historically important case that I can explain with a 4 node example. There are lots of other, more complicated "gotchas," of course.

Interventions seem to pop up/get reinvented in seemingly weird places, like the pi constant:

http://infostructuralist.wordpress.com/2010/09/23/directed-stochastic-kernels-and-causal-interventions/

In channels with feedback (thus causality arises!)

http://www.adaptiveagents.org/bayesian_control_rule

http://en.wikipedia.org/wiki/Thompson_sampling

In multi-armed bandit problems (which are related to longitudinal studies in causal inference).

http://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator

http://missingdata.lshtm.ac.uk/index.php?option=com_content&view=article&id=76:missing-at-random-mar&catid=40:missingness-mechanisms&Itemid=96

In handling missing data (can view "missingness" as a causal property). Note the phrasing in the second link: "given the observed data, the missingness mechanism does not depend on the unobserved data." This is precisely the "no unobserved confounders" assumption in causal inference. Not surprisingly the correction is the same as in causal inference.

Also in figuring out what the dimension of a statistical hidden variable DAG model is. For example if A,B,C,D are binary, and U, W are unrestricted, then the dimension of the model

{ p(a,b,c,d) = \sum_{u,w} p(a,b,c,d,u,w) | p(a,b,c,d,u,w) factorizes wrt A -> B -> C -> D, A <- U -> C, B <- W -> D } is 13, not 15, which is weird, but there is an intervention-inspired explanation for why.

you can imagine learning about causality as a feature of the environment

I don't think you can get something for nothing. You will need causal assumptions somewhere.