You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Why IQ shouldn't be considered an external factor

2 estimator 04 April 2015 05:58PM

This is a sort-of response to this post.

"Things under your control" (more generally, free will) is an ill-defined concept: you are an entity within physics; all of your actions and thoughts are fully determined by physical processes in your brain. Here, I will assume that "things under your control" are any things that are controlled by your brain, since it is a consistent definition, and it's what people usually mean when they talk about things under one's control.

So, you may be interested in the question: how much one's success depends on his thoughts and actions (i.e. things that are controlled by his brain) vs. how it depends on the circumstances/environment (i.e. things that aren't)? Another formulation: how you can change one's life outcomes if you could alter neural signals emitted by his brain?

We also could draw the borderline somewhere else; maybe add physical traits, like height or attractiveness to the "internal factors" category, or maybe assign some brain parts to the "external factors" category. The question whether your life success is mostly determined by "internal factors" or "external factors" would remain valid -- and we call it "internal vs. external locus of control" question.

But what happens when we assign IQ to the "external factors" category?

IQ test is an attempt to measure some value, which is supposed to be a measure of something like quality of one's thinking process. So, this value can be seen as a function IQ(brain), which maps brains to numbers. Your thoughts and actions don't depend on your IQ score; IQ score depends on your thoughts. That's how the causal arrows are arranged.

But it's possible to ask, what can we change if we can change brain, conditional on the fixed IQ score. But then the "free will" intuition collapses; it's hard to imagine what we could change if our thought processes were restricted in some weird way. And such question is hardly practical, in my opinion. It's true that one can measure his IQ, and that IQ rarely changes much, but still: if you consider IQ fixed and external factor out of your control, then you must consider your thought processes restricted to some set and therefore, not totally under your control.

Define "things under your control" as "things under your brain neural signals' control", and then we will have a consistent definition, and we will find ourselves in the common sense domain. Declare that everything is under control of physics, and then we will, again, have a consistent definition of "things under your control" (empty set), and now we are in the physics domain. Both cases are quite intuitive.

But when we consider IQ external, "things under your control" are your thoughts, but not quite; we can control our thoughts, but only as long as they reside on some weird manifold of thought-space. I guess that in such case, your "free will" intuitions would be disrupted. Basically, we can't slice some part of what we call "personality" out and still have our intuitions about personality and free will sane.

TL; DR: You shouldn't consider any functions of your current brain state as external when discussing locus of control, since such viewpoint is actually counterintuitive and, therefore, makes you prone to errors.

More marbles and Sleeping Beauty

4 Manfred 23 November 2014 02:00AM

I

Previously I talked about an entirely uncontroversial marble game: I flip a coin, and if Tails I give you a black marble, if Heads I flip another coin to either give you a white or a black marble.

The probabilities of seeing the two marble colors are 3/4 and 1/4, and the probabilities of Heads and Tails are 1/2 each.

The marble game is analogous to how a 'halfer' would think of the Sleeping Beauty problem - the claim that Sleeping Beauty should assign probability 1/2 to Heads relies on the claim that your information for the Sleeping Beauty problem is the same as your information for the marble game - same possible events, same causal information, same mutual exclusivity and exhaustiveness relations.

So what's analogous to the 'thirder' position, after we take into account that we have this causal information? Is it some difference in causal structure, or some non-causal anthropic modification, or something even stranger?

As it turns out, nope, it's the same exact game, just re-labeled.

In the re-labeled marble game you still have two unknown variables (represented by flipping coins), and you still have a 1/2 chance of black and Tails, a 1/4 chance of black and Heads, and a 1/4 chance of white and Heads.

And then to get the thirds, you ask the question "If I get a black marble, what is the probability of the faces of the first coin?" Now you update to P(Heads|black)=1/3 and P(Tails|black)=2/3.

II

Okay, enough analogies. What's going on with these two positions in the Sleeping Beauty problem?

1:            2:

Here are two different diagrams, which are really re-labelings of the same diagram. The first labeling is the problem where P(Heads|Wake) = 1/2. The second labeling is the problem where P(Heads|Wake) = 1/3. The question at hand is really - which of these two math problems corresponds to the word problem / real world situation?

As a refresher, here's the text of the Sleeping Beauty problem that I'll use: Sleeping Beauty goes to sleep in a special room on Sunday, having signed up for an experiment. A coin is flipped - if the coin lands Heads, she will only be woken up on Monday. If the coin lands Tails, she will be woken up on both Monday and Tuesday, but with memories erased in between. Upon waking up, she then assigns some probability to the coin landing Heads, P(Heads|Wake).

Diagram 1:  First a coin is flipped to get Heads or Tails. There are two possible things that could be happening to her, Wake on Monday or Wake on Tuesday. If the coin landed Heads, then she gets Wake on Monday. If the coin landed Tails, then she could either get Wake on Monday or Wake on Tuesday (in the marble game, this was mediated by flipping a second coin, but in this case it's some unspecified process, so I've labeled it [???]).  Because all the events already assume she Wakes, P(Heads|Wake) evaluates to P(Heads), which just as in the marble game is 1/2.

This [???] node here is odd, can we identify it as something natural? Well, it's not Monday/Tuesday, like in diagram 2 - there's no option that even corresponds to Heads & Tuesday. I'm leaning towards the opinion that this node is somewhat magical / acausal, just hanging around because of analogy to the marble game. So I think we can take it out. A better causal diagram with the halfer answer, then, might merely be Coin -> (Wake on Monday / Wake on Tuesday), where Monday versus Tuesday is not determined at all by a causal node, merely informed probabilistically to be mutually exclusive and exhaustive.

Diagram 2:  A coin is flipped, Heads or Tails, and also it could be either Monday or Tuesday. Together, these have a causal effect on her waking or not waking - if Heads and Monday, she Wakes, but if Heads and Tuesday, she Doesn't wake. If Tails, she Wakes. Her pre-Waking prior for Heads is 1/2, but upon waking, the event Heads, Tuesday, Don't Wake gets eliminated, and after updating P(Heads|Wake)=1/3.

There's a neat asymmetry here. In diagram 1, when the coin was Heads she got the same outcome no matter the value of [???], and only when the coin was Tails were there really two options. In Diagram 2, when the coin is Heads, two different things happen for different values of the day, while if the coin is Tails the same thing happens no matter the day.

 

Do these seem like accurate depictions of what's going on in these two different math problems? If so, I'll probably move on to looking closer at what makes the math problem correspond to the word problem.

Deriving probabilities from causal diagrams

5 Manfred 13 November 2014 12:28AM

What this is: an attempt to examine how causal knowledge gets turned into probabilistic predictions.

I'm not really a fan of any view of probability that involves black boxes. I want my probabilities (or more practically, the probabilities of toy agents in toy problems I consider) to be derivable from what I know in a nice clear way, following some desideratum of probability theory at every step.

Causal knowledge sometimes looks like a black box, when it comes to assigning probabilities, and I would like to crack open that box and distribute the candy inside to smiling children.

What this is not: an attempt to get causal diagrams from constraints on probabilities.

That would be silly - see Pearl's article that was recently up here. Our reasonable desire is the reverse: getting the constraints on probabilities from the causal diagrams.

 

The Marble Game

Consider marbles. First, I use some coin-related process to get either Heads or Tails. If Tails, I give you a black marble. If Heads, I use some other process to choose between giving you a black marble or a white marble.

Causality is an important part of the marble game. If I manually interfere with the process that gives Heads or Tails, this can change the probability you should assign of getting a black marble. But if I manually interfere with the process that gives you white or black marbles, this won't change your probability of seeing Heads or Tails.

 

What I'd like versus what is

The fundamental principle of putting numbers to beliefs, that always applies, is to not make up information. If I don't know of any functional differences between two events, I shouldn't give them different probabilities. But going even further - if I learn a little information, it should only change my probabilities a little.

The general formulation of this is to make your probability distribution consistent with what you know, in the way that contains the very least information possible (or conversely, the maximum entropy). This is how to not make up information.

I like this procedure; if we write down pieces of knowledge as mathematical constraints, we can find correct distribution by solving a single optimization problem. Very elegant. Which is why it's a shame that this isn't at all what we do for causal problems.

Take the marble game. To get our probabilities, we start with the first causal node, figure out the probability of Heads without thinking about marbles at all (that's easy, it's 1/2), and then move on to the marbles while taking the coin as given (3/4 for black and 1/4 for white).

One cannot do this problem without using causal information. If we neglect the causal diagram, our information is the following: A: We know that Heads and Tails are mutually exclusive and exhaustive (MEE), B: we know that getting a black marble and getting a white marble are MEE, and C: we know that if the coin is Tails, you'll get a black marble.

This leaves three MEE options: Tails and Black (TB), HB, and HW. Maximizing entropy, they all get probability 1/3.

One could alternately think of it like this: if we don't have the causal part of the problem statement (the causal diagram D), we don't know whether the coin causes the marble choice, or the marble causes the coin choice - why not pick a marble first, and if it's W we give you an H coin, but if it's B we flip the coin? Heck, why have one cause the other at all? Indeed, you should recover the 1/3 result if you average over all the consistent causal diagrams.

So my question is - what causal constraints is our distribution subject to, and what is it optimizing? Not piece by piece, but all at once?

 

Rephrasing the usual process

One method is to just do the same steps as usual, but to think of the rationale in terms of knowledge / constraints and maximum entropy.

We start with the coin, and we say "because the coin's result isn't caused by the marbles, no information pertaining to marbles matters here. Therefore, P(H|ABCD) is just P(H|A) = 1/2" (First application of maximum entropy). Then we move on to the marbles, and applying information B and C, plus maximum entropy a second time, we learn that P(B|ABCD) = 3/4. All that our causal knowledge really meant for our probabilities was the equation P(H|ABCD)=P(H|A).

Alternatively, what if we only wanted to maximize something once, but let causal knowledge change the thing we were maximizing? We can say something like "we want to minimize the amount of information about the state of the coin, since that's the first causal node, and then minimize the amount of information about it's descendant node, the marble." Although this could be represented as one equation using linear multipliers, it's clearly the same process just with different labels.

 

Is it even possible to be more elegant?

Both of these approaches are... functional. I like the first one a lot better, because I don't want to even come close to messing with the principle of maximum entropy / minimal information. But I don't like that we never get to apply this principle all at once. Can we break our knowledge down more so that everything happen nice and elegantly?

The way we stated our knowledge above was as P(H|ABCD) = P(H|A). But this is equivalent to the statement that there's a symmetry between the left and right branches coming out of the causal node. We can express this symmetry using the equivalence principle as P(H)=P(T), or as P(HB)+P(HW)=P(TB).

But note that this is just hiding what's going on, because the equivalence principle is just a special case of the maximum entropy principle - we might as well just require that P(H)=1/2 but still say that at the end we're "maximizing entropy subject to this constraint."

 

Answer: Probably not

The general algorithm followed above is, for each causal node, to insert the condition that the probabilities of outputs of that node, given the starting information including the causal diagram, are equal to the probabilities given only the starting information related to that node or its parents - information about the descendants does not help determine probabilities of the parents.

Understanding Simpson's Paradox

11 Vaniver 18 September 2013 07:07PM

An article by Judea Pearl, available here. It's quick at 8 pages, and worth reading if you enjoy statistics (though I think people who already are familiar with the math of causality1 will get more out of it than others2). I'll talk here about the part that I think is generally interesting:

continue reading »

The difference between Determinism & Pre-determination

3 RogerS 25 July 2013 11:41AM

1. Scope

 

There are two arm-waving views often expressed about the relationship between “determinism/causality” on the one hand and “predetermination/predictability in principle” on the other. The first treats them as essentially interchangeable: what is causally determined from instant to instant is thereby predetermined over any period - the Laplacian view. The second view is that this is a confusion, and they are two quite distinct concepts. What I have never seen thoroughly explored (and therefore propose to make a start on here) is the range of different cases which give rise to different relationships between determinism and predetermination. I will attempt to illustrate that, indeed, determinism is neither a necessary nor a sufficient condition for predetermination in the most general case.

To make the main argument clear, I will relegate various pedantic qualifications, clarifications and comments to [footnotes].

Most of the argument relates to cases of a physically classical, pre-quantum world (which is not as straightforward as often assumed, and certainly not without relevance to the world we experience). The difference that quantum uncertainty makes will be considered briefly at the end.

 

2. Instantaneous determinism

To start with it is useful to define what exactly we mean by an (instantaneously) determinist system. In simple terms this means that how the system changes at any instant is fully determined by the state of the system at that instant [1]. This is how physical laws work in a Newtonian universe. The arm-waving argument says that if this is the case, we can derive the state of the system at any future instant by advancing through an infinite number of infinitesimal steps. Since each step is fully determined, the outcome must be as well. However, as it stands this is a mathematical over-simplification. It is well known that an infinite number of infinitesimals is indeterminate as such, and so we have to look at this process more carefully - and this is where there turn out to be significant differences between different cases.

 

3. Convergent and divergent behaviour

To illustrate the first difference that needs to be recognized, consider two simple cases - a snooker ball just about to collide with another snooker ball, and a snooker ball heading towards a pocket. In the first case, a small change in the starting position of the ball (assuming the direction of travel is unchanged) results in a steadily increasing change in the positions at successive instants after impact - that is, neighbouring trajectories diverge. In the second case, a small change in the starting position has no effect on the final position hanging in the pocket: neighbouring trajectories converge. So we can call these “convergent” and “divergent” cases respectively. [1.1]

Now consider what happens if we try to predict the state of some system (e.g. the position of the ball) after a finite time interval. Any attempt to find the starting position will involve a small error. The effect on the accuracy of prediction differs markedly in the two cases. In the convergent case, small initial errors will fade away with time. In the divergent case, by contrast, the error will grow and grow. Of course, if better instruments were available we could reduce the initial error and improve the prediction - but that would also increase the accuracy with which we could check the final error! So the notable fact about this case is that no matter how accurately we know the initial state, we can never predict the final state to the same level of accuracy - despite the perfect instantaneous determinism assumed, the last significant figure that we can measure remains as unpredictable as ever. [2]

One possible objection that might be raised to this conclusion is that with “perfect knowledge” of the initial state, we can predict any subsequent state perfectly. This is philosophically contentious - rather analagous to arguments about what happens when an irresistable force meets an immovable object. For example, philosophers who believe in “operational definitions” may doubt whether there is any operation that could be performed to obtain “the exact initial conditions”. I prefer to follow the mathematical convention that says that exact, perfect, or infinite entities are properly understood as the limiting cases of  more mundane entities. On this convention, if the last significant figure of the most accurate measure we can make of an outcome remains unpredictable for any finite degree of accuracy, then we must say that the same is true for “infinite accuracy”.

 

The conclusion that there is always something unknown about the predicted outcome places a “qualitative upper limit”, so to speak, on the strength of predictability in this case, but we must also recognize a “qualitative lower limit” that is just as important, since in the snooker impact example whatever the accuracy of prediction that is desired after whatever time period, we can always calculate an accuracy of initial measurement that would enable it. (However, as we shall shortly see [3], this does not apply in every case.)  The combination of predictability in principle to any degree, with necessary unpredictability to the precision of the best available measurement, might be termed “truncated predictability”.

 

4. More general cases

The two elemementary cases considered so far illustrate the importance of distinguishing convergent from divergent behaviour, and so provide a useful paradigm to be kept in mind, but of course, most real cases are more complicated than this.

To take some examples, a system can have both divergent parts and convergent parts at any instant - such as different balls on the same snooker table; an element whose trajectory is behaving divergently at one instant may behave convergently at another instant; convergent movement along one axis may be accompanied by divergent movement relative to another; and, significantly, divergent behaviour at one scale may be accompanied by convergent behaviour at a different scale. Zoom out from that snooker table, round positions to the nearest metre or so, and the trajectories of all the balls follow that of the adjacent surface of the earth.

There is also the possibility that a system can be potentially divergent at all times and places. A famous case of such behaviour is the chaotic behaviour of the atmosphere, first clearly understood by Edward Lorentz in 1961. This story comes in two parts, the second apparently much less well known than the first.

 

5. Chaotic case: discrete

The equations normally used to describe the physical behaviour of the atmosphere formally describe a continuum, an infinitely divisible fluid. As there is no algebraic “solution” to these equations, approximate solutions have to be found numerically, which in turn require the equations to be “discretised”, that is adapted to describe the behaviour at, or averaged around, a suitably large number of discrete points. 

 

The well-known part of Lorenz’s work [4] arose from an accidental observation, that a very small change in the rounding of the values at the start of a numerical simulation led in due course to an entirely different “forecast”. Thus this is a case of divergent trajectories from any starting point, or “sensitivity to initial conditions” as it has come to be known.

 

The part of “chaos theory” that grew out of this initial insight describes the convergent trajectories from any starting point: they diverge exponentially, with a time constant known as the Kolmogorov constant for the particular problem case [5]. Thus we can still say, as we said for the snooker ball, that whatever the accuracy of prediction that is desired after whatever time period, we can always calculate an accuracy of initial measurement that would enable it.

 

6. Chaotic case: continuum

Other researchers might have dismissed the initial discovery of sensitivity to initial conditions as an artefact of the computation, but Lorenz realised that even if the computation had been perfect, exactly the same consequences would flow from disturbances in the fluid in the gaps between the discrete points of the numerical model.  This is often called the “Butterfly Effect” because of a conference editor's colourful summary that “the beating of a butterfly’s wings in Brazil could cause a tornado in Texas”.

 

It is important to note that the Butterfly Effect is not strictly the same as “Sensitivity to Initial Conditions” as is often reported [6], although they are closely related. Sensitivity to Initial Conditions is an attribute of some discretised numerical models. The Butterfly Effect describes an attribute of the equations describing a continuous fluid, so is better described as “sensitivity to disturbances of minimal extent”, or in practice, sensitivity to what falls between the discrete points modelled.

 

Since, as noted above, there is no algebraic solution to the continuous equations, the only way to establish the divergent characteristics of the equations themselves is to repeatedly reduce the scale of discretisation (the typical distance between the points on the grid of measurements) and observe the trend. In fact, this was done for a very practical reason: to find out how much benefit would be obtained, in terms of the durability of the forecast [7], by providing more weather stations. The result was highly significant: each doubling of the number of stations increased the durability of the forecast by a smaller amount, so that (by extrapolation) as the number of imaginary weather stations was increased without limit, the forecast durability of the model converged to a finite value[8]. Thus, beyond this time limit, the equations that we use to describe the atmosphere give indeterminate results, however much detail we have about the initial conditions. [9]

 

Readers will doubtless have noticed that this result does not strictly apply to the earth’s atmosphere, because that is not the infinitely divisible fluid that the equations assumed (and a butterfly is likewise finitely divisible). Nevertheless, the fact that there are perfectly well-formed, familiar equations which by their nature have unpredictable outcomes after a finite time interval vividly exposes the difference between determinism and predetermination.

 

With hindsight, the diminishing returns in forecast durability from refining the scale of discretisation is not too surprising: it is much quicker for a disturbance on a 1 km scale to have effects on a 2 km scale than for a disturbance on a 100 km scale to have effects on a 200 km scale.

 

7. Consequences of quantum uncertainty

It is often claimed that the the Uncertainty Principle of quantum mechanics [10] makes the future unpredictable [11], but in the terms of the above analysis this is far from the whole story.

 

The effect of quantum mechanics is that at the scale of fundamental particles [12] the laws of physical causality are probabilistic. As a consequence, there is certainly no basis, for example, to predict whether an unstable nucleus will disintegrate before or after the expiry of its half-life.

 

However, in the case of a convergent process at ordinary scales, the unpredictability at quantum scale is immaterial, and at the scale of interest predictability continues to hold sway. The snooker ball finishes up at the bottom of the pocket whatever the energy levels of its constituent electrons. [13]

 

It is in the case of divergent processes that quantum effects can make for unpredictability at large scales. In the case of the atmosphere, for example, the source of that tornado in Texas could be a cosmic ray in Colombia, and cosmic radiation is strictly non-deterministic. The atmosphere may not be the infinitely divisible fluid considered by Lorenz, but a molecular fluid subject to random quantum processes has just the same lack of predictability.

 

[EDIT] How does this look in terms of the LW-preferred Many Worlds interpretation of quantum mechanics?[14] In this framework, exact "objective prediction" is possible in principle but the prediction is of an ever-growing array of equally real states. We can speak of the "probability" of a particular outcome in the sense of the probability of that outcome being present in any state chosen at random from the set. In a convergent process the cases become so similar that there appears to be only one outcome at the macro scale (despite continued differences on the micro scale); whereas in a divergent process the "density of probability" (in the above sense) becomes so vanishingly small for some states that at a macro scale the outcomes appear to split into separate branches. (They have become decoherent.) Any one such branch appears to an observer within that branch to be the only outcome, and so such an observer could not have known what to "expect" - only the probability distribution of what to expect. This can be described as a condition of subjective unpredictability, in the sense that there is no subjective expectation that can be formed before the divergent process which can be reliably expected to coincident with an observation made after the process. [END of EDIT]

 

8. Conclusions

What has emerged from this review of different cases, it seems to me, is that it is the convergent/divergent dichotomy that has the greatest effect on the predictability of a system’s behaviour, not the deterministic/quantised dichotomy at subatomic scales.

 

More particularly, in short-hand:-

Convergent + deterministic => full predictability

Convergent + quantised => predictability at all super-atomic scales

Divergent + deterministic + discrete => “truncated predictability”

Divergent + deterministic + continuous => unpredictability

[EDIT] Divergent + quantised => objective predictability of the multiverse but subjective unpredictability

 

Footnotes

1. The “state” may already include time derivatives of course, and in the case of a continuum, the state includes spatial gradients of all relevant properties.

1.1 For simplicity I have ignored the case between the two where neighbouring trajectories are parallel. It should be obvious how the argument applies to this case. Convergence/divergence is clearly related to (in)stability, and less directly to other properties such as (non)-linearity and (a)periodicity, but as convergence defines the characteristic that matters in the present context it seems better to focus on that.

2. In referring to a “significant figure” I am of course assuming that decimal notation is used, and that the initial error has diverged by at least a factor of 10.

3. In section 6.

4. For example, see Gleick, “Chaos”, "The Butterfly Effect" chapter.

5. My source for this statement is a contribution by Eric Kvaalen to the New Scientist comment pages.

6. E.G by Gleick or Wikipedia.

7. By durability I mean the period over which the required degree of accuracy is maintained.

8. This account is based on my recollection, and notes made at the time, of an article in New Scientist, volume 42, p290. If anybody has access to this or knows of an equivalent source available on-line, I would be interested to hear!

9. I am referring to predictions of the conditions at particular locations and times. It is, of course, possible to predict average conditions over an area on a probabilistic basis, whether based on seasonal data, or the position of the jetstream etc. These are further examples of how divergence at one scale can be accompanied by something nearer to convergence on another scale.

10. I am using “quantum mechanics” as a generic term to include its later derivatives such as quantum chromodynamics. As far as I understand it these later developments do not affect the points made here. However, this is certainly well outside my  professional expertise in aspects of Newtonian mechanics, so I will gladly stand corrected by more specialist contributors!

11. E.G. by Karl Popper in an appendix to The Poverty of Historicism.

12. To be pedantic, I’m aware that this also applies to greater scales, but to a vanishingly small extent.

13. In such cases we could perhaps say that predictability is effectively an “emergent property” that is not present in the reductionist laws of the ultimate ingredients but only appears in the solution space of large scale aggregates. 

14. Thanks to the contributors of the comments below as at 30 July 2013 which I have tried to take into account. The online preview of "The Emergent Multiverse: Quantum Theory According to the Everett Interpretation" by David Wallace has also been helpful to understanding the implications of Many Worlds.

 

 

 

 

 

 

 

[LINK] If correlation doesn’t imply causation, then what does?

4 Strilanc 12 July 2013 05:39AM

A post about how, for some causal models, causal relationships can be inferred without doing experiments that control one of the random variables.

If correlation doesn’t imply causation, then what does?

To help address problems like the two example problems just discussed, Pearl introduced a causal calculus. In the remainder of this post, I will explain the rules of the causal calculus, and use them to analyse the smoking-cancer connection. We’ll see that even without doing a randomized controlled experiment it’s possible (with the aid of some reasonable assumptions) to infer what the outcome of a randomized controlled experiment would have been, using only relatively easily accessible experimental data, data that doesn’t require experimental intervention to force people to smoke or not, but which can be obtained from purely observational studies.

Best causal/dependency diagram software for fluid capture?

1 [deleted] 08 April 2013 07:20PM

I've found most graphing software too clunky, or having too much mental friction, for my purpose of creating graphically represented plans, to convert written diagrams into digital form, or to do preference inference based on the structure of my goals (amongst other things).

So far the only tool that I've seen that reduces this friction is GraphViz [1], since I think I can literally just list down connection after connection in markup, with no care for structure or reasonableness, and then prune connections after I see how the entire thing looks. Point and click is for suckers.

However, I also like the approach of Freemind that quickly outputs a visual map that is easily traversable; but it doesn't do much for me when the causality is more involved.

Are there any alternatives that anyone is aware of?

[1] If you are not familiar with GraphViz, see this amusing introduction that maps the social network in R. Kelly's hit hip hopera, "Trapped in the Closet".

Combining causality with algorithmic information theory

15 [deleted] 09 June 2012 01:31AM

Warning: maths.

Causal inference using the algorithmic Markov condition (Janzing and Schölkopf, 2008) replaces conditional independences between random variables, which define the structure of causal graphs, with algorithmic conditional independences between bit strings.

Conditional probabilities between variables become conditional complexities between strings, i.e. K(x|y) is the length of the shortest program that can generate the string x from y. Similarly, algorithmic mutual information I(x:y) is the amount of information that can be omitted in defining a string y given a shortest compressor for string x, I(x:y) = K(y) - K(y|x*). K(x,y) is the complexity of the concatenation of two strings x and y. These lead naturally to a definition of algorithmic conditional independence as I(x:y|z) = K(x|z) + K(y|z) - K(x,y|z) = 0 , where equality is defined up to the standard additive constant.

Then a lot of sexy, confusing proofs happen. When the dust settles, it looks like if you take some strings describing observations, interpret them as nodes in a graph, and "factor" so that a certain algorithmic Markov condition holds (every node string should be algorithmically independent of its non-descendant node strings given the optimal compressor of its parents' node strings), then every node can be computed by an O(1) program run on a Turing machine, with the node's parents and a noise term as input (with each node's noise string being jointly independent of the others). 

Notably, this means that if we make two observations which were "generated from their parents by the same complex rule", then we can "postulate another causal link between the nodes that explains the similarity of mechanisms". They say "complex rule" because the mutual algorithmic information between simple information strings, like some digits of pi, will be swallowed up by additive constants. Which all seems very close to rediscovering TDT.

There's more to the paper, but that's the tasty bit, so the summary ends here.

Causation, Probability and Objectivity

7 antigonus 18 March 2012 06:54AM

Most people here seem to endorse the following two claims:

1. Probability is "in the mind," i.e., probability claims are true only in relation to some prior distribution and set of information to be conditionalized on;
2. Causality is to be cashed out in terms of probability distributions á la Judea Pearl or something.

However, these two claims feel in tension to me, since they appear to have the consequence that causality is also "in the mind" - whether something caused something else depends on various probability distributions, which in turn depends on how much we know about the situation. Worse, it has the consequence that ideal Bayesian reasoners can never be wrong about causal relations, since they always have perfect knowledge of their own probabilities.

Since I don't understand Pearl's model of causality very well, I may be missing something fundamental, so this is more of a question than an argument.

[LINK] Judea Pearl wins 2011 Turing Award

20 [deleted] 15 March 2012 04:32PM

Link to ACM press release.

In addition to their impact on probabilistic reasoning, Bayesian networks completely changed the way causality is treated in the empirical sciences, which are based on experiment and observation. Pearl's work on causality is crucial to the understanding of both daily activity and scientific discovery. It has enabled scientists across many disciplines to articulate causal statements formally, combine them with data, and evaluate them rigorously. His 2000 book Causality: Models, Reasoning, and Inference is among the single most influential works in shaping the theory and practice of knowledge-based systems. His contributions to causal reasoning have had a major impact on the way causality is understood and measured in many scientific disciplines, most notably philosophy, psychology, statistics, econometrics, epidemiology and social science.

While that "major impact" still seems to me to be in the early stages of propagating through the various sciences, hopefully this award will inspire more people to study causality and Bayesian statistics in general.

Michael Nielsen explains Judea Pearl's causality

18 gwern 24 January 2012 07:35PM

Michael Nielsen has posted a long essay explaining his understanding of the Pearlean causal DAG model. I don't understand more than half, but that's much more than I got out of a few other papers. Strongly recommended for anyone interested in the topic.

"Trials and Errors: Why Science Is Failing Us"

7 gwern 19 December 2011 06:48PM

Jonah Lehrer has up another of his contrarian science articles: "Trials and Errors: Why Science Is Failing Us".

Main topics: the failure of drugs in clinical trials, diminishing returns to pharmaceutical research, doctors over-treating, and Humean causality-correlation distinction, with some Ioannidis mixed through-out.

See also "Why epidemiology will not correct itself"


In completely unrelated news, Nick Bostrom is stepping down from IEET's Chairman of the Board.