IlyaShpitser comments on LINK: AI Researcher Yann LeCun on AI function - Less Wrong

0 Post author: shminux 11 December 2013 12:29AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (82)

You are viewing a single comment's thread. Show more comments above.

Comment author: IlyaShpitser 11 December 2013 09:41:41PM *  2 points [-]

Sorry, I am not following you. Decision problems have the form of "What do you do in situation X to maximize a defined utility function?"

It is very easy to transform any causal modeling example into a decision problem. In this case: "here is an observational study where doctors give drugs to some cohort of patients. This is your data. Here's the correct causal graph for this data. Here is a set of new patients from the same cohort. Your utility function rewards you for minimizing patient deaths. Your actions are 'give the drug to everyone in the set' or 'do not give the drug to everyone in the set.' What do you do?"

Predictor algorithms, as understood by the machine learning community, cannot solve this class of problems correctly. These are not abstract problems! They happen all the time, and we need to solve them now, so you can't just say "let's defer solving this until we have a crazy detailed method of simulating every little detail of the way the HIV virus does its thing in these poor people, and the way this drug disrupts this, and the way side effects of the drug happen, etc. etc. etc."

Comment author: V_V 12 December 2013 12:36:47AM 1 point [-]

Bayesian network learning and Bayesian network inference can, in principle, solve that problem.

Of course, if your model is wrong, and/or your dataset is degenerate, any approach will give you bad results: Gargbage in, garbage out.

Comment author: IlyaShpitser 12 December 2013 12:38:48AM 1 point [-]

Bayesian networks are statistical, not causal models.

Comment author: V_V 12 December 2013 12:53:11PM 0 points [-]

I don't know what you mean by "causal model", but Bayesian networks can deal with the type of problems you describe.

Comment author: IlyaShpitser 12 December 2013 01:42:54PM 2 points [-]

A causal model to me is a set of joint distributions defined over potential outcome random variables.

And no, regardless of how often you repeat it, Bayesian networks cannot solve causal problems.

Comment author: V_V 12 December 2013 04:01:44PM 2 points [-]

I have no idea what you're talking about.

gjm asked you what a causal problem was, you didn't provide a definition and instead gave an example of a problem which seems clearly solvable by Bayesian methods such as hidden Markov models (for prediction) or partially observable Markov decision processes (for decision).

Comment author: IlyaShpitser 12 December 2013 04:57:46PM *  0 points [-]

(a) Hidden Markov models and POMDPs are probabilistic models, not necessarily Bayesian.

(b) I am using the standard definition of a causal model, first due to Neyman, popularized by Rubin. Everyone except some folks in the UK use this definition now. I am sorry if you are unfamiliar with it.

(c) Statistical models cannot solve causal problems. The number of times you repeat the opposite, while adding the word "clearly" will not affect this fact.

Comment author: V_V 12 December 2013 06:40:13PM 0 points [-]

(a) Hidden Markov models and POMDPs are probabilistic models, not necessarily Bayesian.

According to Wikipedia:

A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. A HMM can be considered the simplest dynamic Bayesian network.

.

(b) I am using the standard definition of a causal model, first due to Neyman, popularized by Rubin. Everyone except some folks in the UK use this definition now. I am sorry if you are unfamiliar with it.

I suppose you mean this.

It seems to be a framework for the estimation of probability distributions from experimental data, under some independence assumptions.

(c) Statistical models cannot solve causal problems. The number of times you repeat the opposite, while adding the word "clearly" will not affect this fact.

You still didn't define "causal problem" and what you mean by "solve" in this context.

Comment author: IlyaShpitser 12 December 2013 08:07:55PM *  1 point [-]

A "Bayesian network" is not necessarily a Bayesian model. Bayesian networks can be used with frequentist methods, and frequently are (see: the PC algorithm). I believe Pearl called the networks "Bayesian" to honor Bayes, and because of the way Bayes theorem is used when you shuffle probabilities around. The model does not necessitate Bayesian methods at all.

I don't mean to be rude, but are we operating at the level of string pattern matching, and google searches here?

You still didn't define "causal problem" and what you mean by "solve" in this context.

Sociological definition : "a causal problem" is a problem that people who do causal inference study. Estimating causal effects. Learning cause-effect relationships from data. Mediation analysis. Interference analysis. Decision theory problems. To "solve" means to get the right answer and thereby avoid going to jail for malpractice.


This is a bizarre conversation. Causal problems aren't something esoteric. Imagine if you kept insisting I define what an algebra problem is. There are all sorts of things you could read on this standard topic.

Comment author: Lumifer 12 December 2013 08:37:09PM 2 points [-]

This is a bizarre conversation.

Looks a like a perfectly normal conversation where people insist on using different terminology sets :-/

Comment author: V_V 12 December 2013 10:36:06PM 0 points [-]

A "Bayesian network" is not necessarily a Bayesian model. Bayesian networks can be used with frequentist methods, and frequently are (see: the PC algorithm).

You can use frequentists methods to learn Bayesian networks from data, as with any other Bayesian model.

And you can also use Bayesian networks without priors to do things like maximum likelihood estimation, which isn't Bayesian sensu stricto, but I don't think this is relevant to this conversation, is it?

I don't mean to be rude, but are we operating at the level of string pattern matching, and google searches here?

No, we are operating at the level of trying to make sense of your claims.

Sociological definition : "a causal problem" is a problem that people who do causal inference study. Estimating causal effects. Learning cause-effect relationships from data. Mediation analysis. Interference analysis. Decision theory problems. To "solve" means to get the right answer and thereby avoid going to jail for malpractice.

Please try to reformulate without using the word "cause/causal".
The term has multiple meanings. You may be using a one of them assuming that everybody shares it, but that's not obvious.

Comment author: Lumifer 12 December 2013 08:40:03PM 1 point [-]

A causal model to me is a set of joint distributions defined over potential outcome random variables.

Huh?

Can you expand on this, with special attention to the difference between the model and the result of a model, and to the differences from plain-vanilla Bayesian models which will also produce joint distributions over outcomes.

Comment author: IlyaShpitser 12 December 2013 08:57:35PM *  1 point [-]

Sure. Here's the world's simplest causal graph: A -> B.

Rubin et al, who do not like graphs, will instead talk about a joint distribution:

p(A, B(a=1), B(a=0))

where B(a=1) means 'random variable B under intervention do(a=1)'. Assume binary A for simplicity here.

A causal model over A,B is a set of densities { p(A, B(a=1), B(a=0) | [ some property ] } The causal model for this graph would be:

{ p(A, B(a=1), B(a=0) | B(a=1) is independent of A, and B(a=0) is independent of A }

These assumptions are called 'ignorability assumptions' in the literature, and they correspond to the absence of confounding between A and B. Note that it took counterfactuals to define what 'absence of confounding' means.

A regular Bayesian network model for this graph is just the set of densities over A and B (since this graph has no d-separation statements). That is, it is the set { p(A,B) | [no assumptions] }. This is a 'statistical model,' because it is a set of regular old joint densities, with no mention of counterfactuals or interventions anywhere.

The same graph can correspond to very different things, you have to specify.


You could also have assumptions corresponding to "missing graph edges." For example, in the instrumental variable graph:

Z -> A -> B, with A <- U -> B, where we do not see U, we would have an assumption that states that B(a,z) = B(a,z') for all a,z,z'.


Please don't say "Bayesian model" when you mean "Bayesian network." People really should say "belief networks" or "statistical DAG models" to avoid confusion.

Comment author: Lumifer 12 December 2013 09:28:00PM *  1 point [-]

Please don't say "Bayesian model" when you mean "Bayesian network."

I do not mean "Bayesian networks". I mean Bayesian models of the kind e.g. described in Gelman's Bayesian Data Analysis.

p(A, B(a=1), B(a=0)) where B(a=1) means 'random variable B under intervention do(a=1)'. Assume binary A for simplicity here.

You still can express this as plain-vanilla conditional densities, can't you? "under intervention do(a=1)" is just a different way of saying "conditional on A=1", no?

A causal model over A,B is a set of densities { p(A, B(a=1), B(a=0) | [ some property ] }

and

with no mention of counterfactuals or interventions anywhere.

I don't see counterfactuals in your set of densities and how "interventions" are different from conditionality?

Comment author: IlyaShpitser 12 December 2013 09:43:08PM *  2 points [-]

You still can express this as plain-vanilla conditional densities, can't you?

No. If conditioning was the same as interventions I could make it rain by watering my lawn and become a world class athlete by putting on a gold medal.

Comment author: Lumifer 12 December 2013 09:52:40PM 0 points [-]

If conditioning was the same as interventions I could make it rain by watering my lawn

I don't understand -- can you unroll?

Comment author: passive_fist 11 December 2013 09:49:02PM *  0 points [-]

Decision problems have the form of "What do you do in situation X to maximize a defined utility function?"

Yes, but what you are describing is a modelling problem. "Is the drug killing them or helping them?" is not a decision problem, although "Which drug should we give them to save their lives?" is. These are two very different problems, possibly with different answers!

It is very easy to transform any causal modeling example into a decision problem.

Yes, but in the process it becomes a new problem. Although, you are right that modelling is in some respects an 'easier' problem than making decisions. That's also the reason I wrote my top-level comment, saying that it is true that something you can identify in an AI is the ability to model the world.

Comment author: IlyaShpitser 12 December 2013 10:53:41AM 1 point [-]

I guess my point was that there is a trivial reduction (in the complexity theory sense of the word) here, namely that decision theory is "modeling-complete." In other words, if we had algorithm for solving a certain class of decision problems correctly, we automatically have an algorithm for correctly handling the corresponding model (otherwise how could we get the decision problem right?)

Prediction cannot solve causal decision problems, but the reason it cannot is that it cannot solve the underlying modeling problem correctly. (If it could, there is nothing more to do, just integrate over the utility).