IlyaShpitser comments on LINK: AI Researcher Yann LeCun on AI function - Less Wrong

0 Post author: shminux 11 December 2013 12:29AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (82)

You are viewing a single comment's thread. Show more comments above.

Comment author: IlyaShpitser 11 December 2013 01:59:49PM *  5 points [-]

What counts as a causal problem?

We give patients a drug, and some of them die. In fact, those that get the drug die more often than those that do not. Is the drug killing them or helping them? This is a very real problem we are facing right now, and getting it wrong results in people dying.

Surely any prediction device that would be called "intelligent" by anyone less gung-ho than, say, Ray Kurzweil would enable you to ask it questions like "suppose I -- with my current genome -- chose to smoke; then what?" and "suppose I -- with my current genome -- chose not to smoke; then what?".

I certainly hope that anything actually intelligent will be able to answer counterfactual questions of the kind you posed here. However, the standard language of prediction employed in ML is not able to even pose such questions, let alone answer them.

Comment author: Houshalter 24 March 2014 02:22:39AM 0 points [-]

I don't get it. You gave some people the drug and some people you didn't. It seems pretty straightforward to estimate how likely someone is to die if you give them medicine.

Comment author: gwern 24 March 2014 02:38:14AM 1 point [-]

It seems pretty straightforward to estimate how likely someone is to die if you give them medicine.

Certainly it's straightforward. Here's how one can apply your logic. You gave some people [the ones whose disease has progressed the most] the drug and some people you didn't [because their disease isn't so bad you're willing to risk it]; the % of people dying in the first drugged group is much higher than the % of deaths in the second non-drugged group; therefore, this drug is poison and you're a mass murderer.

See the problem?

Comment author: IlyaShpitser 24 March 2014 12:22:37PM *  1 point [-]

Of course people say "but this is silly, obviously we need to condition on health status."

The point is: what if we can't? Or what if we there are other causally relevant factors here? In fact, what is "causally relevant" anyways... We need a system! ML people don't think about these questions very hard, generally, because culturally they are more interested in "algorithmic approaches" to prediction problems.

(This is a clarification of gwern's response to the grandparent, not a reply to gwern.)

Comment author: Houshalter 31 March 2014 02:53:05PM 0 points [-]

The problem is the data is biased. The ML algorithm doesn't know whether the bias is a natural part of the data or artificially induced. Garbage In - Garbage Out.

However it can still be done if the algorithm has more information. Maybe some healthy patients ended up getting the medicine anyways and were far more likely to live, or some unhealthy ones didn't and were even more likely to die. Now it's straightforward prediction again: How likely is a patient to live based on their current health and whether or not they take the drug?

Comment author: gwern 31 March 2014 03:32:06PM 3 points [-]

The problem is the data is biased. The ML algorithm doesn't know whether the bias is a natural part of the data or artificially induced. Garbage In - Garbage Out.

You're making up excuses. The data is not 'biased', it just is, nor is it garbage - it's not made up, no one is lying or falsifying data or anything like that. If your theory cannot handle clean data from a real-world problem, that's a big problem (especially if there are more sophisticated alternatives which can handle it).

Comment author: Houshalter 31 March 2014 04:47:44PM 1 point [-]

Biased data is a real thing and this is a great example. No method can solve the problem you've given without additional information.

Comment author: gwern 31 March 2014 05:11:04PM *  4 points [-]

This is not biased data. No one tampered with it. No one preferentially left out some data. There is no Cartesian daemon tampering with you. It's a perfectly ordinary causal problem for which one has all the available data. If you run a regression on the data, you will get accurate predictions of future similar data - just not what happens when you intervene and realize the counterfactual. You can't throw your hands up and disdainfully refuse to solve the problem, proclaiming, 'oh, that's biased'. It may be hard, and the best available solution weak or require strong assumptions, but if that is the case, the correct method should say as much and specify what additional data or interventions would allow stronger conclusions.

Comment author: Houshalter 28 February 2015 05:08:48AM 0 points [-]

I'm not certain why I used the word "bias". I think I was getting at that the data isn't representative of the population of interest.

Regardless, no other method can solve the problem specified without additional information (which you claimed). And with additional information, it's straightforward prediction again.

That is, condition on their prior health status, not just the fact they've been given the drug. And prior probabilities.

Comment author: Lumifer 31 March 2014 05:02:25PM 1 point [-]

No method can solve the problem you've given without additional information.

What do you call "solving the problem"?

Any method will output some estimates. Some methods will output better estimates, some worse. As people have pointed out, this was an example of a real problem and yes, real-life data is usually pretty messy. We need methods which can handle messy data and not work just on spherical cows in vacuum.

Comment author: passive_fist 11 December 2013 09:12:58PM 0 points [-]

Prediction by itself cannot solve causal decision problems (that's why AIXI is not the same as just a Solomonoff predictor) but your example is incorrect. What you're describing is a modelling problem, not a decision problem.

Comment author: IlyaShpitser 11 December 2013 09:41:41PM *  2 points [-]

Sorry, I am not following you. Decision problems have the form of "What do you do in situation X to maximize a defined utility function?"

It is very easy to transform any causal modeling example into a decision problem. In this case: "here is an observational study where doctors give drugs to some cohort of patients. This is your data. Here's the correct causal graph for this data. Here is a set of new patients from the same cohort. Your utility function rewards you for minimizing patient deaths. Your actions are 'give the drug to everyone in the set' or 'do not give the drug to everyone in the set.' What do you do?"

Predictor algorithms, as understood by the machine learning community, cannot solve this class of problems correctly. These are not abstract problems! They happen all the time, and we need to solve them now, so you can't just say "let's defer solving this until we have a crazy detailed method of simulating every little detail of the way the HIV virus does its thing in these poor people, and the way this drug disrupts this, and the way side effects of the drug happen, etc. etc. etc."

Comment author: V_V 12 December 2013 12:36:47AM 1 point [-]

Bayesian network learning and Bayesian network inference can, in principle, solve that problem.

Of course, if your model is wrong, and/or your dataset is degenerate, any approach will give you bad results: Gargbage in, garbage out.

Comment author: IlyaShpitser 12 December 2013 12:38:48AM 1 point [-]

Bayesian networks are statistical, not causal models.

Comment author: V_V 12 December 2013 12:53:11PM 0 points [-]

I don't know what you mean by "causal model", but Bayesian networks can deal with the type of problems you describe.

Comment author: IlyaShpitser 12 December 2013 01:42:54PM 2 points [-]

A causal model to me is a set of joint distributions defined over potential outcome random variables.

And no, regardless of how often you repeat it, Bayesian networks cannot solve causal problems.

Comment author: V_V 12 December 2013 04:01:44PM 2 points [-]

I have no idea what you're talking about.

gjm asked you what a causal problem was, you didn't provide a definition and instead gave an example of a problem which seems clearly solvable by Bayesian methods such as hidden Markov models (for prediction) or partially observable Markov decision processes (for decision).

Comment author: IlyaShpitser 12 December 2013 04:57:46PM *  0 points [-]

(a) Hidden Markov models and POMDPs are probabilistic models, not necessarily Bayesian.

(b) I am using the standard definition of a causal model, first due to Neyman, popularized by Rubin. Everyone except some folks in the UK use this definition now. I am sorry if you are unfamiliar with it.

(c) Statistical models cannot solve causal problems. The number of times you repeat the opposite, while adding the word "clearly" will not affect this fact.

Comment author: V_V 12 December 2013 06:40:13PM 0 points [-]

(a) Hidden Markov models and POMDPs are probabilistic models, not necessarily Bayesian.

According to Wikipedia:

A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. A HMM can be considered the simplest dynamic Bayesian network.

.

(b) I am using the standard definition of a causal model, first due to Neyman, popularized by Rubin. Everyone except some folks in the UK use this definition now. I am sorry if you are unfamiliar with it.

I suppose you mean this.

It seems to be a framework for the estimation of probability distributions from experimental data, under some independence assumptions.

(c) Statistical models cannot solve causal problems. The number of times you repeat the opposite, while adding the word "clearly" will not affect this fact.

You still didn't define "causal problem" and what you mean by "solve" in this context.

Comment author: Lumifer 12 December 2013 08:40:03PM 1 point [-]

A causal model to me is a set of joint distributions defined over potential outcome random variables.

Huh?

Can you expand on this, with special attention to the difference between the model and the result of a model, and to the differences from plain-vanilla Bayesian models which will also produce joint distributions over outcomes.

Comment author: IlyaShpitser 12 December 2013 08:57:35PM *  1 point [-]

Sure. Here's the world's simplest causal graph: A -> B.

Rubin et al, who do not like graphs, will instead talk about a joint distribution:

p(A, B(a=1), B(a=0))

where B(a=1) means 'random variable B under intervention do(a=1)'. Assume binary A for simplicity here.

A causal model over A,B is a set of densities { p(A, B(a=1), B(a=0) | [ some property ] } The causal model for this graph would be:

{ p(A, B(a=1), B(a=0) | B(a=1) is independent of A, and B(a=0) is independent of A }

These assumptions are called 'ignorability assumptions' in the literature, and they correspond to the absence of confounding between A and B. Note that it took counterfactuals to define what 'absence of confounding' means.

A regular Bayesian network model for this graph is just the set of densities over A and B (since this graph has no d-separation statements). That is, it is the set { p(A,B) | [no assumptions] }. This is a 'statistical model,' because it is a set of regular old joint densities, with no mention of counterfactuals or interventions anywhere.

The same graph can correspond to very different things, you have to specify.


You could also have assumptions corresponding to "missing graph edges." For example, in the instrumental variable graph:

Z -> A -> B, with A <- U -> B, where we do not see U, we would have an assumption that states that B(a,z) = B(a,z') for all a,z,z'.


Please don't say "Bayesian model" when you mean "Bayesian network." People really should say "belief networks" or "statistical DAG models" to avoid confusion.

Comment author: Lumifer 12 December 2013 09:28:00PM *  1 point [-]

Please don't say "Bayesian model" when you mean "Bayesian network."

I do not mean "Bayesian networks". I mean Bayesian models of the kind e.g. described in Gelman's Bayesian Data Analysis.

p(A, B(a=1), B(a=0)) where B(a=1) means 'random variable B under intervention do(a=1)'. Assume binary A for simplicity here.

You still can express this as plain-vanilla conditional densities, can't you? "under intervention do(a=1)" is just a different way of saying "conditional on A=1", no?

A causal model over A,B is a set of densities { p(A, B(a=1), B(a=0) | [ some property ] }

and

with no mention of counterfactuals or interventions anywhere.

I don't see counterfactuals in your set of densities and how "interventions" are different from conditionality?

Comment author: passive_fist 11 December 2013 09:49:02PM *  0 points [-]

Decision problems have the form of "What do you do in situation X to maximize a defined utility function?"

Yes, but what you are describing is a modelling problem. "Is the drug killing them or helping them?" is not a decision problem, although "Which drug should we give them to save their lives?" is. These are two very different problems, possibly with different answers!

It is very easy to transform any causal modeling example into a decision problem.

Yes, but in the process it becomes a new problem. Although, you are right that modelling is in some respects an 'easier' problem than making decisions. That's also the reason I wrote my top-level comment, saying that it is true that something you can identify in an AI is the ability to model the world.

Comment author: IlyaShpitser 12 December 2013 10:53:41AM 1 point [-]

I guess my point was that there is a trivial reduction (in the complexity theory sense of the word) here, namely that decision theory is "modeling-complete." In other words, if we had algorithm for solving a certain class of decision problems correctly, we automatically have an algorithm for correctly handling the corresponding model (otherwise how could we get the decision problem right?)

Prediction cannot solve causal decision problems, but the reason it cannot is that it cannot solve the underlying modeling problem correctly. (If it could, there is nothing more to do, just integrate over the utility).

Comment author: gjm 11 December 2013 03:57:30PM 0 points [-]

We give patients a drug [...] Is the drug killing them or helping them?

It seems to me that a sufficiently smart prediction machine could answer questions of this kind. E.g., suppose what it really is is a very fast universe simulator. Simulate a lot of patients, diddle with their environments, either give each one the drug or not, repeat with different sets of parameters. I'm not actually recommending this (it probably isn't possible, it produces interesting ethical issues if the simulation is really accurate, etc.) but the point is that merely being a predictor as such doesn't imply inability to answer causal questions.

the standard language of prediction employed in ML

Was Yann LeCun saying (1) "AI is all about prediction in the ordinary informal sense of the word" or (2) "AI is all about prediction in the sense in which it's discussed formally in the machine learning community"? I thought it was #1.

Comment author: IlyaShpitser 11 December 2013 04:28:41PM *  5 points [-]

Simulate a lot of patients

Simulations (and computer programs in general -- think about how debuggers for computer programs work) are causal models, not purely predictive models. Your answer does no work, because being able to simulate at that level of fidelity means we are already Done<tm> with the science of what we are simulating. In particular our simulator will contain in it a very detailed causal model that would contain answers to everything we might want to know. The question is what do we do when our information isn't very good, not when we can just say "let's ask God."

This is a quote from an ML researcher today, who is talking about what is done today. And what is done today for purely predictive modeling are those crazy deep learning networks or support vector machines they have in ML. Those are algorithms specifically tailored to answering p(Y | X) kinds of questions (e.g. prediction questions), not causal questions.


edit: to add to this a little more. I think there is a general mathematical principle at play here, which is similar in spirit to Occam's razor. This principle is : "try to use the weakest assumptions needed to get the right answer." It is this principle that makes "Omega-style simulations" an unsatisfactory answer. It's a kind of overfitting of the entire scientific process.

Comment author: Lumifer 11 December 2013 04:53:47PM 1 point [-]

A good enough prediction engine can substitute, to a degree, for a causal model. Obviously, not always and once you get outside of its competency domain it will break, but still -- if you can forecast very well what effects will an intervention produce, your need for a causal model is diminished.

Comment author: IlyaShpitser 11 December 2013 05:08:21PM *  0 points [-]

I see. So then if I were to give you a causal decision problem, can you tell me what the right answer is using only a prediction engine? I have a list of them right here!

The general form of these problems is : "We have a causal model where an outcome is death. We only have observational data obtained from this causal model. We are interested in whether a given intervention will reduce the death rate. Should we do the intervention?"

Observational data is enough for the predictor, right? (But the predictor doesn't get to see what the causal model is, after all, it just works on observational data and is agnostic of how it came about).

Comment author: Lumifer 11 December 2013 05:25:29PM 0 points [-]

So then if I were to give you a causal decision problem, can you tell me what the right answer is using only a prediction engine?

A good enough prediction engine, yes.

We only have observational data obtained from this causal model.

Huh? You don't obtain observational data from a model, you obtain it from reality.

Observational data is enough for the predictor, right?

That depends. I think I understand prediction models wider than you do. A prediction model can use any kind of input it likes if it finds it useful.

Comment author: IlyaShpitser 11 December 2013 05:56:11PM *  0 points [-]

Huh? You don't obtain observational data from a model, you obtain it from reality.

Right, the data comes from the territory, but we assume the map is correct.

That depends. I think I understand prediction models wider than you do.

The point is, if your 'prediction model' has a rich enough language to incorporate the causal model, it's no longer purely a prediction model as everyone in the ML field understands it, because it can then also answer counterfactual questions. In particular, if your prediction model only uses the language of probability theory, it cannot incorporate any causal information because it cannot talk about counterfactuals.

So are you willing to take me up on my offer of solving causal problems with a prediction algorithm?

Comment author: Lumifer 11 December 2013 06:08:31PM 0 points [-]

the data comes from the territory, but we assume the map is correct.

You don't need any assumptions about the model to get observational data. Well, you need some to recognize what are you looking at, but certainly you don't need to assume the correctness of a causal model.

no longer purely a prediction model as everyone in the ML field understands it

We may be having some terminology problems. Normally I call a "prediction model" anything that outputs testable forecasts about the future. Causal models are a subset of prediction models. Within the context of this thread I understand "prediction model" as a model which outputs forecasts and which does not depend on simulating the mechanics of the underlying process. It seems you're thinking of "pure prediction models" as something akin to "technical" models in finance which look at price history, only at price history, and nothing but the price history. So a "pure prediction model" would be to you something like a neural network into which you dump a lot of more or less raw data but you do not tweak the NN structure to reflect your understanding of how the underlying process works.

Yes, I would agree that a prediction model cannot talk about counterfactuals. However I would not agree that a prediction model can't successfully forecast on the basis of inputs it never saw before.

So are you willing to take me up on my offer of solving causal problems with a prediction algorithm?

Good prediction algorithms are domain-specific. I am not defending an assertion that you can get some kind of a Universal Problem Solver out of ML techniques.