Comment author: Vaniver 18 June 2015 02:12:31PM 1 point [-]

The causal structure is basically a chaotic system, which means that NewtonIan style differential equations aren't much use, and big computerized models are. Ordinary weather forecasting uses big models, and I don't see why climate change, which is essentially very long term forecasting would different.

Climatological models and meteorological models are very different. If they weren't, then "we can't predict whether it will rain or not ten days from now" (which is mostly true) would be a slam-dunk argument against our ability to predict temperatures ten years from now. One underlying technical issue is that floating point arithmetic is only so precise, and this gives you an upper bound on the amount of precision you can expect from your simulation given the number of steps you run the model for. Thus climatological models have larger cells, larger step times, and so on, so that you can run the model for 50 model-years and still think the result that comes out might be reasonable.

(I also don't think it's right to say that Newtonian-style diffeqs aren't much use; the underlying update rules for the cells are diffeqs like that.)

Comment author: nostalgebraist 19 June 2015 04:47:50PM *  7 points [-]

I'm not sure if I'm understanding you correctly, but the reason why climate forecasts and meterological forecasts have different temporal ranges of validity is not that the climate models are coarser, it's that they're asking different questions.

Climate is (roughly speaking) the attractor on which the weather chaotically meanders on short (e.g. weekly) timescales. On much longer (1-100+ years) this attractor itself shifts. Weather forecasts want to determine the future state of the system itself as it evolves chaotically, which is impossible in principle after ~14 days because the system is chaotic. Climate forecasts want to track the slow shifts of the attractor. To do this, they run ensembles with slightly different initial conditions and observe the statistics of the ensemble at some future date, which is taken (via an ergodic assumption) to reflect the attractor at that date. None of the ensemble members are useful as "weather predictions" for 2050 or whatever, but their overall statistics are (it is argued) reliable predictions about the attractor on which the weather will be constrained to move in 2050 (i.e. "the climate in 2050").

It's analogous to the way we can precisely characterize the attractor in the Lorenz system, even if we can't predict the future of any given trajectory in that system because it's chaotic. (For a more precise analogy, imagine a version of the Lorenz system in which the attractor slowly changes over long time scales)

A simple way to explain the difference is that you have no idea what the weather will be in any particular place on June 19, 2016, but you can be pretty sure that in the Northern Hemisphere it will be summer in June 2016. This has nothing to do with differences in numerical model properties (you aren't running a numerical model in your head), it's just a consequence of the fact that climate and weather are two different things.

Apologies if you know all this. It just wasn't clear to me if you did from your comment, and I thought I might spell it out since it might be valuable to someone reading the thread.

Comment author: JGWeissman 27 December 2012 06:28:46PM 1 point [-]

Now suppose they each ask the question "what is the probability that, when doing what I did, one will come up with at most the number of tails I actually saw?"

That is throwing away data. The evidence that they each observed is the sequence of coin flip results, and the number of tails in that sequence is a partial summary of the data. The reason they get different answers is because that summary throws away more data for B than A. As you say, B already expected to get exactly one tail, so that summary tells him nothing new and he has no information to update on, while A can recover from this summary the number of heads and only loses information about the order (which cancels out anyways in the likelihood ratios between theories of independent coin flips). But if you calculate the probability that they each see that sequence you get the same answer for both, p(heads)^9999 * (1 - p(heads).

That is, the data gathering procedure is needed to interpret a partial summary of the data, but not the complete data.

Comment author: nostalgebraist 27 December 2012 10:43:08PM 1 point [-]

Sure, the likelihoods are the same in both cases, since A and B's probability distributions assign the same probability to any sequence that is in both of their supports. But the distributions are still different, and various functionals of them are still different -- e.g., the number of tails, the moments (if we convert heads and tails to numbers), etc.

If you're a Bayesian, you think any hypothesis worth considering can predict a whole probability distribution, so there's no reason to worry about these functionals when you can just look at the probability of your whole data set given the hypothesis. If (as in actual scientific practice, at present) you often predict functionals but not the whole distribution, then the difference in the functionals matters. (I admit that the coin example is too basic here, because in any theory about a real coin, we really would have a whole distribution.)

My point is just that there are differences between the two cases. Bayesians don't think these differences could possibly matter to the sort of hypotheses they are interested in testing, but that doesn't mean that in principle there can be no reason to differentiate between the two.

Comment author: nostalgebraist 27 December 2012 05:51:10PM *  1 point [-]

Incidentally, Eliezer, I don't think you're right about the example at the beginning of the post. The two frequentist tests are asking distinct questions of the data, and there is not necessarily any inconsistency when we ask two different questions of the same data and get two different answers.

Suppose A and B are tossing coins. A and B both get the same string of results -- a whole bunch of heads (let's say 9999) followed by a single tail. But A got this by just deciding to flip a coin 10000 times, while B got it by flipping a coin until the first tail came up. Now suppose they each ask the question "what is the probability that, when doing what I did, one will come up with at most the number of tails I actually saw?"

In A's case the answer is of course very small; most strings of 10000 flips have many more than one tail. In B's case the answer is of course 1; B's method ensures that exactly one tail is seen, no matter what happens. The data was the same, but the questions were different, because of the "when doing what I did" clause (since A and B did different things). Frequentist tests are often like this -- they involve some sort of reasoning about hypothetical repetitions of the procedure, and if the procedure differs, the question differs.

If we wanted to restate this in Bayesian terms, we'd have to do so by taking into account that the interpreter knows what the method is, not just what the data is, and the distributions used by a Bayesian interpreter should take this into account. For instance, one would be a pretty dumb Bayesian if one's prior for B's method didn't say you'd get one tail with probability one. The observation that's causing us to update isn't "string of data," it's "string of data produced by a given physical process," where the process is different in the two cases.

(I apologize if this has all been mentioned before -- I didn't carefully read all the comments above.)

Comment author: nostalgebraist 24 December 2012 10:20:20AM 3 points [-]

"Bayesianism's coherence and uniqueness proofs cut both ways. Just as any calculation that obeys Cox's coherency axioms (or any of the many reformulations and generalizations) must map onto probabilities, so too, anything that is not Bayesian must fail one of the coherency tests. This, in turn, opens you to punishments like Dutch-booking (accepting combinations of bets that are sure losses, or rejecting combinations of bets that are sure gains)."

I've never understood why I should be concerned about dynamic Dutch books (which are the justification for conditionalization, i.e., the Bayesian update). I can understand how static Dutch books are relevant to finding out the truth: I don't want my description of the truth to be inconsistent. But a dynamic Dutch book (in the gambling context) is a way that someone can exploit the combination of my belief at time (t) and my belief at time (t+1) to get something out of me, which doesn't seem like it should carry over to the context of trying to find out the truth. When I want to find the truth, I simply want to have the best possible belief in the present -- at time (t+1) -- so why should "money" I've "lost" at time (t) be relevant?

Perhaps I simply want to avoid getting screwed in life by falling into the equivalents of Dutch books in real, non-gambling-related situations. But if that's the argument, it should depend on how frequently such situations actually crop up -- the mere existence of a Dutch book shouldn't matter if life is never going to make me take it. Why should my entire notion of rationality be based on avoiding one particular -- perhaps rare -- type of misfortune? On the other hand, if the argument is that falling for dynamic Dutch books constitutes "irrationality" in some direct intuitive sense (the same way that falling for static Dutch books does), then I'm not getting it.