Comment author: IlyaShpitser 20 October 2015 05:43:36PM *  2 points [-]

I think these are isomorphic, estimating E[Y] if Y is missing at random conditional on C is the same as estimating E[Y | do(a)] = E[Y | "we assign you to a given C"].

"Causal inference is a missing data problem, and missing data is a causal inference problem."


Or I may be "missing" something. :)

Comment author: snarles 20 October 2015 06:38:32PM *  1 point [-]

Yes, I think you are missing something (although it is true that causal inference is a missing data problem).

It may be easier to think in terms of the potential outcomes model. Y0 is the outcome is no treatment, Y1 is the outcome of treatment, you only ever observe either Y0 or Y1, depending on whether D=0 or 1. Generally you are trying to estimate E[Y1] or E[Y0] or their difference.

The point is that the quantity Robbins and Wasserman are trying to estimate, E[Y], does not depend on the importance sampling distribution. Whereas the quantity I am trying to estimate, E[Y|f(X)], does depend on f. Changing f changes the population quantity to be estimated.

It is true that sometimes people in causal inference are interested in estimating things like E[Y1 - Y0|D], " e.g. the treatment effect on the treated." However this is still different from my setup because D is a random variable, as opposed to an arbitrary function of the known variables like f(X).

Comment author: IlyaShpitser 19 October 2015 10:45:04PM *  3 points [-]

OP will correct me if I am wrong, but I think he is trying to restate the Robins/Wasserman example. You do not need to model f(X), but the point of that example is that you know f, but the conditional model for Y is very very complicated. So you either do a Bayesian approach with a prior and a likelihood for Y, or you just use Horvitz-Thompson with f.

I like to think of that example using causal inference: you want to estimate the causal effect p(Y | do(A)) of A on Y when the policy for assigning treatment A: p(A | C) is known exactly, but p(Y | A, C) is super complex. Likelihood-based methods like being Bayesian will use \sum_C p(Y | A, C) p(C). But you can just look at \sum{samples i} Yi 1/p(A | C) to get the same thing and avoid modeling p(Y | A,C). But doing that isn't Bayesian.

See also this:

http://www.biostat.harvard.edu/robins/coda.pdf

I think we talked about this before.

Comment author: snarles 20 October 2015 05:22:47PM 2 points [-]

My example is very similar to the Robbins/Wasserman example, but you end up drawing different conclusions. Robbins/Wasserman show that you can't make sense of importance sampling in a Bayesian framework. My example shows that you can't make sense of "conditional sampling" in a Bayesian framework. The goal of importance sampling is to estimate E[Y], while the goal of conditional sampling is to estimate E[Y|event] for some event.

We did talk about this before, that's how I first learnt of the R/W example.

Comment author: RichardKennaway 19 October 2015 10:30:46PM 1 point [-]

There are a couple of things I'm not understanding here.

Firstly, the example of the cancer survival test seems to have some inconsistency. The fitted model is said to give the right answer in 990 out of 1000 test cases. Where do you subsequently get the Beta(1000,2) distribution from? I am not seeing the source of that 2. And given that the model is right on exactly 99% of the test cases, how is the imaginary Bayesian coming up with a clearly wrong interval [0.996,0.9998]?

Secondly, in the later example of estimating E[ Y | f(X)=1 ], the method foisted on the Bayesian appears to involve estimating the whole of the function f. This seems to me an obviously misguided approach to the problem, whatever one's views on statistical argument. Why cannot the Bayesian say, with the frequentist, it doesn't matter what f is, I have been asked about the population for which f(X)=1. I do not need to model the process f by which that population was selected, only the behaviour of Y within that population? And then proceed in the usual way.

Comment author: snarles 19 October 2015 10:43:23PM *  2 points [-]

I do not need to model the process f by which that population was selected, only the behaviour of Y within that population?

There are some (including myself and presumably some others on this board) who see this practice as epistemologically dubious. First, how do you decide which aspects of the problem to incorporate into your model? Why should one only try to model E[Y|f(X)=1] and not the underlying function g(x)=E[Y|x]? If you actually had very strong prior information about g(x), say that "I know g(x)=h(x) with probability 1/2 or g(x) = j(x) with probability 1/2" where h(x) and j(x) are known functions, then in that case most statisticians would incorporate the underlying function g(x) in the model; and in that case, data for observations with f(X)=0 might be informative for whether g(x) = h(x) or g(x) = j(x). So if the prior is weak (as it is in my main post) you don't model the function, and if the prior is strong, you model the function (and therefore make use of all the observations)? Where do you draw the line?

I agree, most statisticians would not model g(x) in the cancer example. But is that because they have limited time and resources (and are possibly lazy) and because using an overcomplicated model would confuse their audience, anyways? Or because they legitimately think that it's an objective mistake to use a model involving g(x)?

Comment author: gjm 19 October 2015 10:21:35PM 2 points [-]

Beta(1000,2)

Was that meant to be Beta(1000,10)? (With appropriately updated probabilities as a result?)

Comment author: snarles 19 October 2015 10:36:12PM 1 point [-]

Good catch, it should be Beta(991, 11). The prior is uniform = Beta(1,1 ) and the data is (990 successes, 10 fails)

Comment author: snarles 13 October 2015 06:55:28PM 2 points [-]

How do you get the top portion of the second payoff matrix from the first? Intuitively, it should be by replacing the Agent A's payoff with the sum of the agents' payoffs, but the numbers don't match.

Most people are altruists but only to their in-group, and most people have very narrow in-groups. What you mean by an altruist is probably someone who is both altruistic and has a very inclusive in-group. But as far as I can tell, there is a hard trade-off between belonging to a close-knit, small in-group and identifying with a large, diverse but weak in-group. The time you spend helping strangers is time taken away from potentially helping friends and family.

Comment author: JamesPfeiffer 17 September 2015 10:51:56PM 1 point [-]

1) We don't need an unbounded utility function to demonstrate Pascal's Mugging. Plain old large numbers like 10^100 are enough.

2) It seems reasonable for utility to be linear in things we care about, e.g. human lives. This could run into a problem with non-uniqueness, i.e., if I run an identical computer program of you twice, maybe that shouldn't count as two. But I think this is sufficiently murky as to not make bounded utility clearly correct.

Comment author: snarles 11 October 2015 10:23:10PM *  0 points [-]

Like V_V, I don't find it "reasonable" for utility to be linear in things we care about.

I will write a discussion topic about the issue shortly.

EDIT: Link to the topic: http://lesswrong.com/r/discussion/lw/mv3/unbounded_linear_utility_functions/

Comment author: snarles 17 September 2015 06:27:26PM *  2 points [-]

I'll need some background here. Why aren't bounded utilities the default assumption? You'd need some extraordinary arguments to convince me that anyone has an unbounded utility function. Yet this post and many others on LW seem to implicitly assume unbounded utility functions.

Comment author: snarles 26 August 2015 09:38:51PM *  2 points [-]

Let's talk about Von Neumann probes.

Assume that the most successful civilizations exist digitally. A subset of those civilizations would selfishly pursue colonization; the most convenient means would be through Von Neumann machines.

Tipler (1981) pointed out that due to exponential growth, such probes should already be common in our galaxy. Since we haven't observed any, we must be alone in the universe. Sagan and Newman countered that intelligent species should actually try to destroy probes as soon as they are detected. This counterargument, known as "Sagan's response," doesn't make much sense if you assume that advanced civilizations exist digitally. For these civilizations, the best way to counter another race of Von Neumann probes is with their own Von Neumann probes.

Others (who have not been identified by the Wikipedia article) have tried to explain the visible absence of probes by theorizing how civilizations might deliberately limit the expansion range of the probes. But why would any expansionist civilization even want to do so? One explanation would be to avoid provoking other civilizations. However, it still remains to be explained why the very first civilizations, which had no reason to fear other alien civilizations, would limit their own growth. Indeed, any explanation of the Fermi paradox has to be able to explain why the very first civilization would not have already colonized the universe, given that the first civilization was likely to be aware of their uncontested claim to the universe.

The first civilization either became dominated by a singleton, or remained diversified into the space age. For the following theory, we have to assume the latter--besides, we should hope for our own sake that singletons don't always win. If the civilization remains diverse, at least some of the factions transition to a digital existence, and given the advantages provided for civilizations existing in that form, we could expect the digitalized civilizations to dominate.

Digitalized civilizations still have a wide range of possible value systems. There exist hedonistic civilizations, which gain utility from having immense computational power for recreational simulations or proving useless theorems, and there also exist civilizations which are more practically focused on survival. But any type of civilization has to act in self-preservation.

Details of the strategic interactions of the digitalized civilizations depend on speculative physics and technology: particularly in the economics of computation. Supposing dramatic economies of scale in computation (for example, supposing that quantum computers provide an exponential scaling of utility by cost), then it becomes plausible that distinct civilizations would cooperate. However, all known economies of scale have limits, in which case the most likely outcome is for distinct factions to maintain control of their own computing resources. Without such an incentive for cooperation, the civilizations would have to be wary of threats from the other civilizations.

Any digitalized civilization has to protect itself from being compromised from within. Rival civilizations with completely incompatible utility functions could still exploit each other's computing resources. Hence, questions about the theoretical limitations of digital security and data integrity could be relevant to predicting the behavior of advanced civilizations. It may turn out to be easy for any civilization to protect a single computational site. However, any civilization expanding to multiple sites would face a much trickier security problem. Presumably, the multiple sites should be able to interact in some way, since otherwise, what is the incentive to expand? However, any interaction between a parent site and a child site opens the parent site (and therefore the entire network) to compromise.

Colonization sites near any particular civilization quickly become occupied, hence a civilization seeking to expand would have to send a probe to a rather distant region of space. The probe should be able to independently create a child site, and then eventually this child site should be able to interact with the parent site. However, this then requires the probe to carry some kind of security credentials which would allow the child site to be authenticated by the parent site in the future. These credentials could potentially be compromised by an aggressor. The probe has a limited capacity to protect itself from compromise, and hence there is a possibility that an aggressor could "capture" the probe, without being detected by the probe itself. Thus, even if the probe has self-destruction mechanisms, they would be circumvented by a sufficiently sophisticated approach. A compromised probe would behave exactly the same as a normal probe, and succeed in creating a child site. However, after the compromised child site has started to interact with the parent, at some point, it can launch an attack and capture the parent network for the sake of the aggressor.

Due to these considerations, civilizations may be wary of sending Von Neumann probes all over the universe. Civilizations may still send groups of colonization probes, but the probes may delay colonization so as to hide their presence. One might imagine that a "cold war" is already in progress in the universe, with competing probes lying hidden even within our own galaxy, but lying in stalemate for billions of years.

Yet, new civilizations are basically unaffected by the cold war: they have nothing to lose from creating a parent site. Nevertheless, once a new civilization reaches a certain size, they have too much to lose from making unsecured expansions.

But some civilizations might be content to simply make independent, non-interacting "backups" of themselves, and so have nothing to fear if their probes are captured. It still remains to explain why the universe isn't visibly filled with these simplistic "backup" civilizations.

Comment author: Dorikka 10 August 2015 01:28:59PM 2 points [-]

What would you like to learn about?

Comment author: snarles 14 August 2015 04:27:56PM 2 points [-]

Sociology, political science and international politics, economics (graduate level), psychology, psychiatry, medicine.

Comment author: Dorikka 10 August 2015 01:30:13PM 2 points [-]

What topics might you be able to teach others about?

Comment author: snarles 14 August 2015 04:06:52AM 2 points [-]

Undergraduate mathematics, Statistics, Machine Learning, Intro to Apache Spark, Intro to Cloud Computing with Amazon

View more: Prev | Next