Choosing prediction over explanation in psychology: Lessons from machine learning

Kaj_Sotala

I skimmed this paper and plan to read it in more detail tomorrow. My first thought is that it is fundamentally confused. I believe the confusion comes from the fact that the word "prediction" is used with two separate meanings: Are you interested in predicting Y given an observed value of X (Pr[Y | X=x]), or are you interested in predicting Y given an intervention on X (i.e. Pr[Y|do(X=x)]).

The first of these may be useful for certain purposes. but If you intend to use the research for decision making and optimization (i.e. you want to intervene to set the value of X , in order to optimize Y), then you really need the second type of predictive ability, in which case you need to extract causal information from the data. This is only possible if you have a randomized trial, or if you have a correct causal model.

You can use the word "prediction" to refer to the second type of research objective, but this is not the kind of prediction that machine learning algorithms are designed to do.

In the conclusions, the authors write:

"By contrast, a minority of statisticians (and most machine learning researchers) belong to the “algorithmic modeling culture,” in which the data are assumed to be the result of some unknown and possibly unknowable process, and the primary goal is to find an algorithm that results in the same outputs as this process given the same inputs. "

The definition of "algorithmic modelling culture" is somewhat circular, as it just moves the ambiguity surrounding "prediction" to the word "input". If by "input" they mean that the algorithm observes the value of an independent variable and makes a prediction for the dependent variable, then you are talking about a true prediction model, which may be useful for certain purposes (diagnosis, prognosis, etc) but which is unusable if you are interested in optimizing the outcome.

If you instead claim that the "input" can also include observations about interventions on a variable, then your predictions will certainly fail unless the algorithm was trained in a dataset where someone actually intervened on X (i.e. someone did a randomized controlled trial), or unless you have a correct causal model.

Machine learning algorithms are not magic, they do not solve the problem of confounding unless they have a correct causal model. The fact that these algorithms are good at predicting stuff in observational datasets does not tell you anything useful for the purposes of deciding what the optimal value of the independent variable is.

In general, this paper is a very good example to illustrate why I keep insisting that machine learning people need to urgently read up on Pearl, Robins or Van der Laan. The field is in danger of falling into the same failure mode as epidemiology, i.e. essentially ignoring the problem of confounding. In the case of machine learning, this may be more insidious because the research is dressed up in fancy math and therefore looks superficially more impressive.

Anders_H9y40

Kaj_Sotala9y10

Not entirely sure I understand you; I read the paper mostly as pointing out that current psych methodology tends to overfit, and that psychologists don't even know what overfitting means. This is true regardless of which type of prediction we're talking about.

0Vaniver9y

I think there are ML algorithms that do figure out the second type. (I don't think this is simple conditioning, as jacob_cannell seems to be suggesting, but more like this.)

0jacob_cannell9y

Yes - general prediction - ie a full generative model - already can encompass causal modelling, avoiding any distinctions between dependent/independent variables: one can learn to predict any variable conditioned on all previous variables. For example, consider a full generative model of an ATARI game, which includes both the video and control input (from human play say). Learning to predict all future variables from all previous automatically entails learning the conditional effects of actions. For medicine, the full machine learning approach would entail using all available data (test measurements, diet info, drugs, interventions, whatever, etc) to learn a full generative model, which then can be conditionally sampled on any 'action variables' and integrated to generate recommended high utility interventions. [...] In any practical near term system, sure. In theory though, a powerful enough predictor could learn enough of the world physics to invent de novo interventions wholecloth. ex: AlphaGo inventing new moves that weren't in its training set that it essentially invented/learned from internal simulations.

5

Choosing prediction over explanation in psychology: Lessons from machine learning

5

5

5

Choosing prediction over explanation in psychology: Lessons from machine learning

5

5