This "do" notation may seem mysterious, as it is not part of standard probability theory. However, as Pearl shows in Section 3.2.2 ("Interventions as Variables") of Causality, second edition, all of this can be treated as a notational convenience, as a causal model can be reduced to a certain kind of PGM with variables for interventions, and the "do" notation (or lack of a "do") can be considered to be a statement about the value of an intervention variable.
No, this kind of factorization is used for any probabilistic graphical model (PGM), whether or not it is causal. The difference is that for a causal model an arc from node x to node y additionally indicates that x has a causal influence on y, whereas there is no such assumption in general for PGMs.
This example is flawed because the analysis does not condition on all the information you have. The analysis assumes that
P(toxoplasmosis | pet cat, B) > P(toxoplasmosis | not pet cat, B)
where B is your background information. Why should this be so? If B is the background information of an outside observer who does not have access to your inner thoughts and feelings, then this is a plausible claim, because petting or not petting the cat provides evidence as to how fond you are of cats.
But you already know how fond you are of cats. Even if your introspective powers are week, once you feel (or do not feel) the urge to pet the cat, the Bayesian update for probability of toxoplasmosis has already happened. If a flawed decision analysis causes you to not pet the cat, that decision provides no further evidence about your degree of fondness for cats.
No, the difference between the two sentences lies entirely in the background information assumed. The first sentence implicitly assumes background information B that includes the fact that someone did, in fact, shoot JFK. The second sentence implicitly assumes that we have some sort of structural equation model (as discussed in Pearl's book Causality) from which we can show that JFK must have been shot -- even if we exclude from our background information all events occurring on or after November 22, 1963.
The expression P(a_x [ ]-> o_i) is meaningless. Probability theory is an extension of classical propositional logic (CPL) to express degrees of certainty; probabilities are assigned to CPL formulas. "a_x [ ]-> o_i" is not a CPL formula, as there is no counterfactual conditional operator in CPL.
This claim is wrong, and the formula is correct. The formula shown is just a special case of the standard formula for the expected value of any random variable.
What is true is that the decision rule of maximizing expected utility can go wrong if you don't condition on all the relevant information when computing expected utility. What the author has written as
P(o_i | a_x) and E[U | a_x]
should actually be written as
P(o_i | a_x, B) and E[U | a_x, B]
where B is all the background information relevant to the problem. Problems with maximizing expected utility arise when B fails to include relevant causal information.
Probability theory already tells you how to define that. There are no degrees of freedom left once you've defined the outcome and conditioning information. The only possible area of debate is over what specific information we are conditioning on. If two different analyses get different answers, and they're both using probability theory correctly, then they must be conditioning on DIFFERENT information.
This strikes me as pretty shaky reasoning. You've been talking about cases where you have access to the actual decision-making code of the agents you're interacting with, and they have access to yours, and therefore can prove some sort of optimality of a decision-making algorithm. When it comes to voting, none of that applies. We don't even have access to our own decision-making algorithm, much less those of others.
I can't figure out what this paragraph means -- I have no idea what the "et cetera" could be. I'm wondering if "when we have complex systems of..." should be "when we have complex systems such as..."
A causal model goes beyond the graph by including specific probability
functions P(Xi|pai) for how to calculate the probability of each node
Xi taking on the value xi given the values pai of xi's immediate
ancestors. It is implicitly assumed that the causal model factorizes,
so that the probability of any value assignment x to the whole graph
can be calculated using the product:
P(x)=∏iP(xi|pai)
Then the counterfactual conditional P(x|do(Xj=xj)) is calculated via:
P(x|do(Xj=xj))=∏i≠jP(xi|pai)
First of all, it seems to me that "..Xi taking on the value xi given the values pai of xi's immediate ancestors" should be "...Xi's immediate ancestors" (capital X). Otherwise I didn't understand this part.
Further down, I don't know what "do(Xj=xj)" means and I'm unable to figure out from context. So this is where I stopped reading.
In fairness, I'm not actually a computer scientist, but this is the closest description of me among the advanced courses of this topic.
This isn't quite right as an exposition of Lewis's argument – it elides the distinction between the irrationality of "managing the news" and the way that (according to Lewis) the scenario pre-rewards an irrational choice. Evidential agents don't just "seem" to win – they really do win, because the scenario is set up to arbitrarily pre-reward them for being the kind of agents who one-box. Furthermore, it's claimed that the behaviour which is thereby arbitrarily pre-rewarded is irrational, because it amounts to managing the news.
The sense in which two-boxing is said to be irrational news-management is that doing so will give you evidence that you have been pre-rewarded, but won't causally affect the contents of the box – if you're an evidential agent, and have been pre-rewarded as such, you would still get the $1m if you were to miraculously, unforeseeably switch to two-boxing; and if you're a causal agent, and have been pre-punished as such, you would still not get the $1m if you were to miraculously, unforeseeably switch to one-boxing. The kind of agents that one-box really do do better, but once you've been rewarded for being that kind of person you may as well act contrary to your kind two-box anyway, despite the negative news value of doing so.
Did you just swap the pronouns here? In the previous sentences the speaker was the seller and the listener was the buyer, but now it sounds like it's the other way around.
Is simplified Parfit's Hitchhiker the same as what was described above? I'm uncertain because this is the first time on the page that it's been called "simplified."
This use of "naturally" may be jarring, since it may not feel obvious to the reader just being introduced to logical decision theories that this is how they work. (And I think it is common for readers to be annoyed when an author treats something as obvious that's not clear to them.)
Consider, "As you may suspect, logical decision agents..." or "As we shall see, logical decision agents..."
Okay, read through this section again, and I think it makes sense to me now. Would love to see an explicit walkthrough of the calculation with actual numbers though.
I got lost here (and in the following equations). I think it's a combination of needing the "factorizes" redlink filled in, and not understanding the do() syntax.
Ah, one additional thing I'm confused about -- what do Xi and xi refer to? I thought Xi referred to the node (so that SEASON would be X0, {RAINING, SPRINKLER} X1, {SIDEWALK} X2, and {SLIPPERY} X3), but then I'm not sure what lowercase xi would refer to...
The red link directs to an article called
Here
.The symbol before the
->
doesn't render for me.The link to the formal paper doesn't have anything.
This "do" notation may seem mysterious, as it is not part of standard probability theory. However, as Pearl shows in Section 3.2.2 ("Interventions as Variables") of Causality, second edition, all of this can be treated as a notational convenience, as a causal model can be reduced to a certain kind of PGM with variables for interventions, and the "do" notation (or lack of a "do") can be considered to be a statement about the value of an intervention variable.
See also this Powerpoint deck.
No, this kind of factorization is used for any probabilistic graphical model (PGM), whether or not it is causal. The difference is that for a causal model an arc from node x to node y additionally indicates that x has a causal influence on y, whereas there is no such assumption in general for PGMs.
This example is flawed because the analysis does not condition on all the information you have. The analysis assumes that
P(toxoplasmosis | pet cat, B) > P(toxoplasmosis | not pet cat, B)
where B is your background information. Why should this be so? If B is the background information of an outside observer who does not have access to your inner thoughts and feelings, then this is a plausible claim, because petting or not petting the cat provides evidence as to how fond you are of cats.
But you already know how fond you are of cats. Even if your introspective powers are week, once you feel (or do not feel) the urge to pet the cat, the Bayesian update for probability of toxoplasmosis has already happened. If a flawed decision analysis causes you to not pet the cat, that decision provides no further evidence about your degree of fondness for cats.
No, the difference between the two sentences lies entirely in the background information assumed. The first sentence implicitly assumes background information B that includes the fact that someone did, in fact, shoot JFK. The second sentence implicitly assumes that we have some sort of structural equation model (as discussed in Pearl's book Causality) from which we can show that JFK must have been shot -- even if we exclude from our background information all events occurring on or after November 22, 1963.
The expression P(a_x [ ]-> o_i) is meaningless. Probability theory is an extension of classical propositional logic (CPL) to express degrees of certainty; probabilities are assigned to CPL formulas. "a_x [ ]-> o_i" is not a CPL formula, as there is no counterfactual conditional operator in CPL.
This claim is wrong, and the formula is correct. The formula shown is just a special case of the standard formula for the expected value of any random variable.
What is true is that the decision rule of maximizing expected utility can go wrong if you don't condition on all the relevant information when computing expected utility. What the author has written as
P(o_i | a_x) and E[U | a_x]
should actually be written as
P(o_i | a_x, B) and E[U | a_x, B]
where B is all the background information relevant to the problem. Problems with maximizing expected utility arise when B fails to include relevant causal information.
Probability theory already tells you how to define that. There are no degrees of freedom left once you've defined the outcome and conditioning information. The only possible area of debate is over what specific information we are conditioning on. If two different analyses get different answers, and they're both using probability theory correctly, then they must be conditioning on DIFFERENT information.
This strikes me as pretty shaky reasoning. You've been talking about cases where you have access to the actual decision-making code of the agents you're interacting with, and they have access to yours, and therefore can prove some sort of optimality of a decision-making algorithm. When it comes to voting, none of that applies. We don't even have access to our own decision-making algorithm, much less those of others.
No idea what this means.
I can't figure out what this paragraph means -- I have no idea what the "et cetera" could be. I'm wondering if "when we have complex systems of..." should be "when we have complex systems such as..."
I got stuck at this paragraph:
First of all, it seems to me that "..Xi taking on the value xi given the values pai of xi's immediate ancestors" should be "...Xi's immediate ancestors" (capital X). Otherwise I didn't understand this part.
Further down, I don't know what "do(Xj=xj)" means and I'm unable to figure out from context. So this is where I stopped reading.
In fairness, I'm not actually a computer scientist, but this is the closest description of me among the advanced courses of this topic.
This isn't quite right as an exposition of Lewis's argument – it elides the distinction between the irrationality of "managing the news" and the way that (according to Lewis) the scenario pre-rewards an irrational choice. Evidential agents don't just "seem" to win – they really do win, because the scenario is set up to arbitrarily pre-reward them for being the kind of agents who one-box. Furthermore, it's claimed that the behaviour which is thereby arbitrarily pre-rewarded is irrational, because it amounts to managing the news.
The sense in which two-boxing is said to be irrational news-management is that doing so will give you evidence that you have been pre-rewarded, but won't causally affect the contents of the box – if you're an evidential agent, and have been pre-rewarded as such, you would still get the $1m if you were to miraculously, unforeseeably switch to two-boxing; and if you're a causal agent, and have been pre-punished as such, you would still not get the $1m if you were to miraculously, unforeseeably switch to one-boxing. The kind of agents that one-box really do do better, but once you've been rewarded for being that kind of person you may as well act contrary to your kind two-box anyway, despite the negative news value of doing so.
Did you just swap the pronouns here? In the previous sentences the speaker was the seller and the listener was the buyer, but now it sounds like it's the other way around.
and I can toss it?
I find this sentence hard to read. Maybe the punctuation marks are in the wrong places?
I'm not sure I understand this part. Did you get "roughly 2" just by dividing 13 by 7?
Why should the first 7 respondents think of themselves as being part of the first 7 (rather than the first 4, etc)?
pick one
person
due to
Above, first-person pronouns referred to Player 2, but now they seem to refer to Player 1. Was the switch intentional?
Two L's
comma?
use colon instead?
Is simplified Parfit's Hitchhiker the same as what was described above? I'm uncertain because this is the first time on the page that it's been called "simplified."
This use of "naturally" may be jarring, since it may not feel obvious to the reader just being introduced to logical decision theories that this is how they work. (And I think it is common for readers to be annoyed when an author treats something as obvious that's not clear to them.)
Consider, "As you may suspect, logical decision agents..." or "As we shall see, logical decision agents..."
Okay, read through this section again, and I think it makes sense to me now. Would love to see an explicit walkthrough of the calculation with actual numbers though.
I got lost here (and in the following equations). I think it's a combination of needing the "factorizes" redlink filled in, and not understanding the do() syntax.
Ah, one additional thing I'm confused about -- what do Xi and xi refer to? I thought Xi referred to the node (so that SEASON would be X0, {RAINING, SPRINKLER} X1, {SIDEWALK} X2, and {SLIPPERY} X3), but then I'm not sure what lowercase xi would refer to...
$10