nshepperd comments on Chocolate Ice Cream After All? - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (77)
I don't understand why you persist in blindly converting historical records into subjective probabilities, as though there was no inference to be done. You can't just set p(Y=death | A0=yes,A1=yes) to the proportion of deaths in the data, because that throws away all the highly pertinent information you have about biology and the selection rule for "when was the treatment applied". (EDIT: ignoring the covariate W would cause Simpson's Paradox in this instance)
EDIT EDIT: Yes,
P(Y = death in a randomly-selected line of the data | A0=yes,A1=yes in the same line of data)is equal to the proportion of deaths in the data, but that's not remotely the same thing asP(this patient dies | I set A0=yes,A1=yes for this patient).I was just pointing out that in the conditional distribution p(Y|A0,A1) derived from the empirical distribution some facts happen to hold that might be relevant. I never said what I am ignoring, I was merely posing a decision problem for EDT to solve.
The only information about biology you have is the 100 records for A0,W,A1,Y that I specified. You can't ask for more info, because there is no more info. You have to decide with what you have.
The information about biology I was thinking of is things like "vital signs tend to be correlated with internal health" and "people with bad internal health tend to die". Information it would be irresponsible to not use.
But anyway, the solution is to calculate
P(this patient dies | I set A0=a0,A1=a1 for this patient, data)(I should have included the conditioning ondataabove but I forgot) by whatever statistical methods are relevant, then to do whichever option of a0,a1 gives the higher number. Straightforward.You can approximate
P(this patient dies | I set A0=a0,A1=a1 for this patient, data)withP_empirical(Y=death | do(A0=a0,A1=a1))from the data, on the assumption that our decision process is independent of W (which is reasonable, since we don't measure W). There are other ways to calculateP(this patient dies | I set A0=a0,A1=a1 for this patient, data), like Solomonoff induction, presumably, but who would bother with that?I agree with you broadly, but this is not the EDT solution, is it? Show me a definition of EDT in any textbook (or Wikipedia, or anywhere) that talks about do(.).
Yes, of course not. That is the point of this example! I was pointing out that facts about p(Y | A0,A1) aren't what we want here. Figuring out the distribution that is relevant is not so easy, and cannot be done merely from knowing p(A0,W,A1,Y).
No, this is the EDT solution.
EDT uses
P(this patient dies | I set A0=a0,A1=a1 for this patient, data)while CDT usesP(this patient dies | do(I set A0=a0,A1=a1 for this patient), data).EDT doesn't "talk about
do" becauseP(this patient dies | I set A0=a0,A1=a1 for this patient, data)doesn't involvedo. It just happens that you can usually approximateP(this patient dies | I set A0=a0,A1=a1 for this patient, data)by usingdo(because the conditions for your personal actions are independent of whatever the conditions for the treatment in the data were).Let me be clear: the use of
doI describe here is not part of the definition of EDT. It is simply an epistemic "trick" for calculatingP(this patient dies | I set A0=a0,A1=a1 for this patient, data), and would be correct even if you just wanted to know the probability, without intending to apply any particular decision theory or take any action at all.Also, CDT can seem a bit magical, because when you use
P(this patient dies | do(I set A0=a0,A1=a1 for this patient), data), you can blindly set the causal graph for your personal decision to the empirical causal graph for your data set, because thedooperator gets rid of all the (factually incorrect) correlations between your action and variables like W.[ I did not downvote, btw. ]
Criticisms section in the Wikipedia article on EDT :
David Lewis has characterized evidential decision theory as promoting "an irrational policy of managing the news".[2] James M. Joyce asserted, "Rational agents choose acts on the basis of their causal efficacy, not their auspiciousness; they act to bring about good results even when doing so might betoken bad news."[3]
Where in the wikipedia EDT article is the reference to "I set"? Or in any text book? Where are you getting your EDT procedure from? Can you show me a reference? EDT is about conditional expectations, not about "I set."
One last question: what is P(this patient dies | I set A0=a0,A1=a1 for this patient, data) as a function of P(Y,A0,W,A1)? If you say "whatever p_empirical(Y | do(A0,A1)) is", then you are a causal decision theorist, by definition.
I don't strongly recall when I last read a textbook on decision theory, but I remember that it described agents using probabilities about the choices available in their own personal situation, not distributions describing historical data.
Pragmatically, when you build a robot to carry out actions according to some decision theory, the process is centered around the robot knowing where it is in the world, and making decisions with the awareness that it is making the decisions, not someone else. The only actions you have to choose are "I do this" or "I do that".
I would submit that a CDT robot makes decisions on the basis of
P(outcome | do(I do this or that), sensor data)while a hypothetical EDT robot would make decisions based onP(outcome | I do this or that, sensor data). HowP(outcome | I do this or that, sensor data)is computed is a matter of personal epistemic taste, and nothing for a decision theory to have any say about.(It might be argued that I am steel-manning the normal description of EDT, since most people talking about it seem to make the error of blindly using distributions describing historical data as
P(outcome | I do this or that, sensor data), to the point where that got incorporated into the definition. In which case maybe I should be writing about my "new" alternative to CDT in philosophy journals.)I think you steel-manned EDT so well, that you transformed it into CDT, which is a fairly reasonable decision theory in a world without counterfactually linked decisions.
I mean Pearl invented/popularized do(.) in the 1990s sometime. What do you suppose EDT did before do(.) was invented? Saying "ah, p(y | do(x)) is what we meant all along" after someone does the hard work to invent the theory for p(y | do(x)) doesn't get you any points!
I disagree. The calculation of
P(outcome | I do this or that, sensor data)does not require any use ofdowhen there are no confounding covariates, and in the case of problems such as Newcomb's, you get a different answer to CDT'sP(outcome | do(I do this or that), sensor data)— the CDT solution throws away the information about Omega's prediction.CDT isn't a catch-all term for "any calculation that might sometimes involve use of
do", it's a specific decision theory that requires you to useP(outcome | do(action), data)for each of the available actions, whether or not that throws away useful information about correlations between yourself and stuff in the past.EDIT: Obviously, before
do()was invented, if you were using EDT you would do what everyone else would do: throw up your hands and say "I can't calculateP(outcome | I do this or that, sensor data); I don't know how to deal with these covariates!". Unless there weren't any, in which case you just go ahead and estimate your P from the data. I've already explained that the use ofdo()is only an inference tool.I think you still don't get it. The word "confounder" is causal. In order to define what a "confounding covarite" means, vs a "non-confounding covariate" you need to already have a causal model. I have a paper in Annals on this topic with someone, actually, because it is not so simple.
So the very statement of "EDT is fine without confounders" doesn't even make sense within the EDT framework. EDT uses the framework of "probability theory." Only statements expressible within probability theory are allowed. Personally, I think it is in very poor taste to silently adopt all the nice machinery causal folks have developed, but not acknowledge that the ontological character of the resulting decision theory is completely different from the terrible state it was before.
Incidentally the reason CDT fails on Newcomb, etc. is the same -- it lacks the language powerful enough to talk about counterfactually linked decisions, similarly to how EDT lacks the language to talk about confounding. Note : this is an ontological issue not an algorithmic issue. That is, it's not that EDT doesn't handle confounders properly, it's that it doesn't even have confounders in its universe of discourse. Similarly, CDT only has standard non-linked interventions, and so has no way to even talk about Newcomb's problem.
The right answer here is to extend the language of CDT (which is what TDT et al essentially does).