Comment author: RichardKennaway 17 February 2015 08:06:31AM 4 points [-]

For a modern review with heavy footnotes, see eg Galileo, Courtier: The Practice of Science in the Culture of Absolutism, pp 95-100, (though the whole chapter is good)

I think these words are rather telling (emphasis in the original, p.96):

...Galileo began to be attacked for the Copernican implications of his discoveries only after the reliability of his telescope began to be accepted. In a sense, attacks on his Copernicanism were a sure sign of his enemies' taking his telescope and discoveries seriously.

And it goes on to show how the dispute was conducted on both sides in terms of status, Galileo getting princes on side by sending them telescopes, and his opponents attacking him because he was succeeding.

Under Aristotle's physics everything above the moon is made of different stuff with different physics anyway, so any amount of accuracy when looking at stuff of the four elements doesn't allow one to induct to accuracy in observations of the heavens.

That sounds a rather odd argument to make, even at the time. Astronomy from antiquity was founded on accurate observations. Galileo's contemporaries could argue that the telescope wasn't good enough, but hardly that getting a better view of the heavens could reveal nothing new. They were arguing over what could be seen, not that seeing was the wrong thing to do.

Comment author: Jonathan_Lee 18 February 2015 10:50:12AM 4 points [-]

That sounds a rather odd argument to make, even at the time. Astronomy from antiquity was founded on accurate observations.

Astronomy and epistemology aren't quite the same. Predicting where Saturn would be on a given date requires accurate observation, and nobody objected to Coperniucus as a calculational tool. For example, the Jesuits are teaching Copernicus in China in Chinese about 2 years after he publishes, which implies they translated and shipped it with some alacrity.

The heavens were classically held to be made of different stuff; quintessense (later called aether) was not like regular matter -- this is obvious from the inside, because it maintains perpetual motion where normal matter does not. A lot of optical phenomena (eg. twinkling stars, the surface of the moon) were not seen as properties of the objects in question but properties of regular 4-elements matter between us and them.

By a modern standard, the physics is weird and disjointed... but that is historically how it was seen.

Comment author: RichardKennaway 16 February 2015 01:25:02PM 6 points [-]

Thank you for that informed account of the history.

You mention three times, without attributing it to any contemporary of Galileo, that the telescope "distorted the vision", which is a tendentious description. Given that the military application of the telescope was grasped as soon as the instrument became known, who at the time made this criticism? Did they similarly eschew its terrestrial use for the improvement of vision?

Comment author: Jonathan_Lee 17 February 2015 02:54:47AM 6 points [-]

The precise phrasing is deliberately a little tendentious, but the issue of the epistemological status of the telescope was raised by loads of people at the time. For a modern review with heavy footnotes, see eg Galileo, Courtier: The Practice of Science in the Culture of Absolutism, pp 95-100, (though the whole chapter is good)

For example, the first anti-Galilean tract is by Horky in 1610 and focussed mostly on the lack of reliability of the telescope. For another, Magini's letters (confirmed in Kepler and Galileo) write of a "star party" in 1610 where Galileo attempted to convince a number of astronomers of the discovery of the Medician (now Galilean) moons; noone else could see the moons and additionally the telescope produced doubled images of everything more distant than the moon.

There wasn't much dispute about terrestial applications. Under Aristotle's physics everything above the moon is made of different stuff with different physics anyway, so any amount of accuracy when looking at stuff of the four elements doesn't allow one to induct to accuracy in observations of the heavens.

Comment author: Jonathan_Lee 16 February 2015 12:52:39PM 41 points [-]

tl;dr: The side of rationality during Galileo's time would be to recognise one's confusion and recognise that the models did not yet cash out in terms of a difference in expected experiences. That situation arguably holds until Newton's Principia; prior to that no one has a working physics for the heavens.

The initial heliocentric models weren't more accurate by virtue of being heliocentric; they were better by virtue of having had their parameters updated with an additional 400 years of observational data over the previous best-fit model (the Alfonsine tables from the 1250s). The geometry was similarly complicated; there was still a strong claim that only circular motions could be maintained indefinitely, and so you have to toss 60 or so circular motions in to get the full solar system on either model.

Basically everyone was already using the newer tables as calculational tools, and it had been known from ancient times that you could fix any point you wanted in an epicyclic model and get the same observational results. The dispute was about which object was in fact fixed. Kepler dates to the same time, and will talk about ellipses (and dozens of other potential curves) in place of circular motion from 1610, but he cannot predict where a planet will be efficiently. He's also not exactly a paragon of rationality; astrology and numerology drive most of his system, and he quite literally ascribes his algebraic slips to god.

A brief but important digression into Aristotle is needed; he saw as key that was made was that the motion of the planets is unceasing but changes, whereas all terrestrial motions ceased eventually. He held that circular motions were the only kind of motion that could be sustained indefinitely, and even then, only by a certain special kind of perfect matter. The physics of this matter fundamentally differed from the physics of normal stuff in Aristotle. Roughly and crudely, if it can change then it has to have some kind of dissipative / frictional physics and so will run down.

Against that backdrop, Galileo's key work wasn't the Dialogue, but the Siderius Nuncius. There had been two novae observed in the 40 years prior, and this had been awkward because a whole bunch of (mostly neo-Platonists) were arguing that this showed the heavens changed, which is a problem for Aristotle. Now Galileo shows up and using a device which distorts his vision, he claims to be able to deduce: * There are Mountains on the moon (so that it is not a sphere contra Aristotle) * There are Invisible objects orbiting Jupiter * That the planets show disks * That the Sun has spots, which move across the face and separately change with time * That Venus has phases (which essentially require that it orbit the Sun) * That Saturn has lumps on it (and thus not a sphere -- he's seeing the rings) As an observational program, this is picked up with and deeply explored by loads of people (inc. Jesuits like Riccioli). But to emphasise: Galileo is using a device which distorts his vision and which can only be tested on terrestrial objects and claiming to use it to find out stuff about the heavens, which contemporary physics says is grossly different. Every natural philosopher who's read Aristotle recognises that this kind of procedure hasn't historically been useful.

From a viewpoint which sees a single unified material physics, these observations kill Aristotelian cosmology. You've got at least three centers of circular-ish motion, which means you can't mount the planets on transparent spheres to actually move them around. You have an indication that the Sun might be rotating, and is certainly dynamic. If you kill Aristotle's cosmology, you have to kill most of his physics, and thus a good chunk of his philosophy. That's a problem, because since Aquinas the Catholic church had been deriving theology as a natural consequence of Aristotle in order to secure themselves against various heresies. And now some engineer with pretensions is turning up, distorting his vision and claiming to upend the cart.

What Galileo does not have is a coherent alternative package of physics and cosmology. He claims to be able to show a form of circular inertia from first principles. He claims that this yields a form of relativity in motion which makes it difficult to discern your true motion without reference to the fixed stars. He claims that physics is kinda-sorta universal, based on his experience with cannon (which Aristotelian physics would dismiss because [using modern terminology] experiments where you apply forces yourself are not reproducible and so cannot yield knowledge). This means his physics has real issues explaining dissipative effects. He doesn't have action at a distance, so he can't explain why the planets do their thing (whereas there are physical models of Aristotelian / Ptolemaic models).

He gets into some pro forma trouble over the book, because he doesn't put a disclaimer on it saying that he'll retract it if it's found to be heretical. Which is silly and it gets his knuckles rapped over it. The book is "banned", which means two things, for there are two lists of banned books. One is "burn before reading" and the other is more akin to being in the Restricted Section; Galileo's work is the latter.

Then he's an ass in the Dialogue. Even that would not have been an issue, but at the time he's the court philosopher of the Grand Duke of Tuscany, Cosimo I de' Medici. This guy is a secular problem for the Pope; he has an army, he's not toeing the line, there's a worry that he'll annex the Papal states. So there's a need to pin his ears back, and Galileo is a sufficiently senior member of the court that Cosimo won't ignore his arrest nor will he go to war over it.

So the Inquisition cooks up a charge for political purposes, has him "tortured" (which is supposed to mean they /show/ him the instruments of torture, but they actually forget to), get him to recant (in particular get Cosimo to come beg for his release), and release him to "house arrest" (where he is free to come, go, see whoever, write, etc). The drama is politics, rather than anything epistemological.

As to the disputes you mention, some had been argued through by the ancient Greeks. For example, everyone knew that measurements were imprecise, and so moving the earth merely required that the stars were distant. It was also plain that if you accepted Galileo's observations as being indicative of truth, then Aristotelian gravity was totally dead, because some stuff did not strive to fall (cometary tails were also known to be... problematic).

Now, Riccioli is writing 20 years later, in an environment where heliocentrism has become a definite thing with political and religious connotations, associated to neo-Platonism, anti-Aristotelean, anti-Papal thinking. This is troublesome because it strikes at the foundational philosophy underpinning the Church, and secular rulers in Europe are trying to strategically leverage this. Much like Aquinas, Riccioli's bottom line is /written/ already. He has to mesh this new stack of observational data with something which looks at least somewhat like Aristotle. Descartes is contracted at about the same time to attempt to rederive Catholicism from a new mixed Aristotilean / Platonist basis.

As a corollary, he's being quite careful to list every argument which anyone has made, and every refutation (there's a comparatively short summary here). Most of the arguments presented have counterpoints from the other side, however strained they might seem from a modern view. It's more akin to having 126 phenomena which need to be explained than anything else. They don't touch on the apparently changing nature of the planets (by this point cloud bands on Jupiter could be seen) and restrict themselves mostly to the physics of motion. There's a lot of duplication of the same fundamental point, and it's not a quantitative discussion. There are some "in principle" experiments discussed, but a fair few had been considered by Galileo and calculated to be infeasible (eg. observing 1 inch deflections in cannon shot at 500 yards, when the accuracy is more like a yard).

Obviously Newton basically puts a stop to the whole thing, because (modulo a lack of mechanism) he can give you a calculational tool which spits out Kepler and naturally fixes the center of mass. There are still huge problems; the largest is that even point-like stars appear to have small disks from diffraction, and until you know this you end up thinking every other star has to be larger than the entire solar system. And the apparent madness of a universal law is almost impossible to understate. It's really ahistorical to think that a very modern notion of parsimony in physics could have been applied to Galileo and his contemporaries.

Comment author: jeremysalwen 14 September 2013 09:46:14PM *  3 points [-]

The subtlety is about what numerical data can formally represent your full state of knowledge. The claim is that a mere probability of getting the $2 payout does not.

However, a single probability for each outcome given each strategy is all the information needed. The problem is not with using single probabilities to represent knowledge about the world, it's the straw math that was used to represent the technique. To me, this reasoning is equivalent to the following:

"You work at a store where management is highly disorganized. Although they precisely track the number of days you have worked since the last payday, they never remember when they last paid you, and thus every day of the work week has a 1/5 chance of being a payday. For simplicity's sake, let's assume you earn $100 a day.

You wake up on Monday and do the following calculation: If you go in to work, you have a 1/5 chance of being paid. Thus the expected payoff of working today is $20, which is too low for it to be worth it. So you skip work. On Tuesday, you make the same calculation, and decide that it's not worth it to work again, and so you continue forever.

I visit you and immediately point out that you're being irrational. After all, a salary of $100 a day clearly is worth it to you, yet you are not working. I look at your calculations, and immediately find the problem: You're using a single probability to represent your expected payoff from working! I tell you that using a meta-probability distribution fixes this problem, and so you excitedly scrap your previous calculations and set about using a meta-probability distribution instead. We decide that a Gaussian sharply peaked at 0.2 best represents our meta-probability distribution, and I send you on your way."

Of course, in this case, the meta-probability distribution doesn't change anything. You still continue skipping work, because I have devised the hypothetical situation to illustrate my point (evil laugh). The point is that in this problem the meta-probability distribution solves nothing, because the problem is not with a lack of meta-probability, but rather a lack of considering future consequences.

In both the OPs example and mine, the problem is that the math was done incorrectly, not that you need meta-probabilities. As you said, meta-probabilities are a method of screening off additional labels on your probability distributions for a particular class of problems where you are taking repeated samples that are entangled in a very particular sort of way. As I said above, I appreciate the exposition of meta-probabilities as a tool, and your comment as well has helped me better understand their instrumental nature, but I take issue with what sort of tool they are presented as.

If you do the calculations directly with the probabilities, your calculation will succeed if you do the math right, and fail if you do the math wrong. Meta-probabilities are a particular way of representing a certain calculation that succeed and fail on their own right. If you use them to represent the correct direct probabilities, you will get the right answer, but they are only an aid in the calculation, they never fix any problem with direct probability calculations. The fixing of the calculation and the use of probabilities are orthogonal issues.

To make a blunt analogy, this is like someone trying to plug an Ethernet cable into a phone jack, and then saying "when Ethernet fails, wifi works", conveniently plugging in the wifi adapter correctly.

The key of the dispute in my eyes is not whether wifi can work for certain situations, but whether there's anything actually wrong with Ethernet in the first place.

Comment author: Jonathan_Lee 15 September 2013 01:18:46AM 1 point [-]

So, my observation is that without meta-distributions (or A_p), or conditioning on a pile of past information (and thus tracking /more/ than just a probability distribution over current outcomes), you don't have the room in your knowledge to be able to even talk about sensitivity to new information coherently. Once you can talk about a complete state of knowledge, you can begin to talk about the utility of long term strategies.

For example, in your example, one would have the same probability of being paid today if 20% of employers actually pay you every day, whilst 80% of employers never paid you. But in such an environment, it would not make sense to work a second day in 80% of cases. The optimal strategy depends on what you know, and to represent that in general requires more than a straight probability.

There are different problems coming from the distinction between choosing a long term policy to follow, and choosing a one shot action. But we can't even approach this question in general unless we can talk sensibly about a sufficient set of information to keep track of about. There are two distinct problems, one prior to the other.

Jaynes does discuss a problem which is closer to your concerns (that of estimating neutron multiplication in a 1-d experiment 18.15, pp579. He's comparing two approaches, which for my purposes differ in their prior A_p distribution.

Comment author: jeremysalwen 14 September 2013 08:06:08PM *  21 points [-]

The exposition of meta-probability is well done, and shows an interesting way of examining and evaluating scenarios. However, I would take issue with the first section of this article in which you establish single probability (expected utility) calculations as insufficient for the problem, and present meta-probability as the solution.

In particular, you say

What’s interesting is that, when you have to decide whether or not to gamble your first coin, the probability is exactly the same in the two cases (p=0.45 of a $2 payout). However, the rational course of action is different. What’s up with that?

Here, a single probability value fails to capture everything you know about an uncertain event. And, it’s a case in which that failure matters.

I do not believe that this is a failure of applying a single probability to the situation, but merely calculating the probability wrongly, by ignoring future effects of your choice. I think this is most clearly illustrated by scaling the problem down to the case where you are handed a green box, and only two coins. In this simplified problem, we can clearly examine all possible strategies.

  • Strategy 1 would be to hold on to your two dollar coins. There is a 100% chance of a $2.00 payout
  • Strategy 2 would be to insert both of your coins into the box. There is a 50.5% chance of a $0.00 payout, 40.5% chance of a $4.00 payout and a 9% chance of a $2.00 payout.
  • Strategy 3 would be to insert one coin, and then insert the second only if the first pays out. There is a 55% chance of $1.00 payout, a 4.5% chance of a $2.00 payout, and a 40.5% chance of a $4.00 payout.
  • Strategy 4 would be to insert one coin, and then insert the second only if the first doesn't pay out. There is a 50.5% chance of a 0.00$ payout, a 4.5% chance of a $2.00 payout, and a 45% chance of a $3.00 payout.

When put in these terms, it seems quite obvious that your choice to open the box would depend on more than the expected payoff from only the first box, because quite clearly your choice to open the first box pays off (or doesn't pay off) when opening (or not opening) the other boxes as well. This seems like an error in calculating the payoff matrix rather than a flaw with the technique of single probability values itself. It ignores the fact that opening the first box not only pays you off immediately, but also pays you off in the future by giving you information about the other boxes.

This problem easily succumbs to standard expected value calculations if all actions are considered. The steps remain the same as always:

  1. Assign a utility to each dollar amount outcome
  2. Calculate the expected utility of all possible strategies
  3. Choose the strategy with the highest expected utility

In the case of two coins, we were able to trivially calculate the outcomes of all possible strategies, but in larger instances of the problem, it might be advisable to use shortcuts in the calculations. However, it still remains true that the best choice will still be the one you would have gotten if you had done out the full expected value calculation.

I think the confusion arises because a lot of the time problems are presented in a way that screens them off from the rest of the world. For example, you are given a box, and it either has $10.00 or $100.00. Once you open the box, the only effect it has on you is the amount of money you got. After you get the money, the box does not matter to the rest of the world. Problems are presented this way so that it is easy to factor out the decisions and calculations you have to make from every other decision you have to make. However, decision are not necessarily this way (in fact in real life, very few decisions are). In the choice of inserting the first coin or not, this is simply not the case, despite having superficial similarities to standard "box" problems.

Although you clearly understand that the payoffs from the boxes are entangled, you only apply this knowledge in your informal approach to the problem. The failure to consider the full effects of your actions in opening the first box may be psychologically encouraged by the technique of "single probability calculations", but it is certainly not a failure of the technique itself to capture such situations.

Comment author: Jonathan_Lee 14 September 2013 08:40:02PM 4 points [-]

The substantive point here isn't about EU calculations per se. Running a full analysis of everything that might happen and doing an EU calculation on that basis is fine, and I don't think the OP disputes this.

The subtlety is about what numerical data can formally represent your full state of knowledge. The claim is that a mere probability of getting the $2 payout does not. It's the case that on the first use of a box, the probability of the payout given its colour is 0.45 regardless of the colour.

However, if you merely hold onto that probability, then if you put in a coin and so learn something about the boxes you can't update that probability to figure out what the probability of payout for the second attempt is. You need to go back and also remember whether the box is green or brown. The point of Jaynes and the A_p distribution is that it actually does screen off all other information. If you keep track of it you never need to worry about remembering the colour of the box, or the setup of the experiment. Just this "meta-distribution".

Comment author: Jonathan_Lee 30 June 2013 09:16:30AM 4 points [-]

Concretely, I have seen this style of test (for want of better terms, natural language code emulation) used as a screening test by firms looking to find non-CS undergraduates who would be well suited to develop code.

In as much as this test targets indirection, it is comparatively easy to write tests which target data driven flow control or understanding state machines. In such a case you read from a fixed sequence and emit a string of outputs. For a plausible improvement, get the user to log the full sequence of writes, so that you can see on which instruction things go wrong.

There also seem to be aspects of coding which are not simply being technically careful about the formal function of code. The most salient to me would be taking an informally specified natural language problem and reducing it to operations one can actually do. Algorithmic / architectural thinking seems at least as rare as fastidiousness about code.

Comment author: Qiaochu_Yuan 10 June 2013 03:49:32AM 0 points [-]

The A_p distribution seems really, really important, I don't really feel like I completely understand it, and Jaynes is the only source I've heard even talk about it. Do you happen to know if it's discussed in the wider literature under a different name or something?

Comment author: Jonathan_Lee 12 June 2013 11:21:58PM 2 points [-]

To my knowledge, it's not discussed explicitly in the wider literature. I'm not a statistician by training though, so my knowledge of the literature is not brilliant.

On the other hand, talking to working Bayesian statisticians about "what do you do if we don't know what the model should be" seems to reliably return answers of broad form "throw that uncertainty into a two-level model, run the update, and let the data tell you which model is correct". Which is the less formal version of what Jaynes is doing here.

This seems to be a reasonable discussion of the same basic material, though in a setting of finitely many models rather than the continuum of p models for Jaynes.

Comment author: JonahSinick 08 June 2013 07:58:23PM *  2 points [-]

Independently of whether Fermat thought of it as an example, Cauchy could have considered lots of sequences of functions in order to test his beliefs, and I find it likely that had he spent time doing so, he would have struck on this one.

On a meta-level, my impression is that you haven't updated your beliefs based on anything that I've said on any topic, in the course of our exchanges, whether online or in person. It seems very unlikely that no updates are warranted. I may be misreading you, but to the extent that you're not updating, I suggest that you consider whether you're being argumentative when you could be inquisitive and learn more as a result.

Comment author: Jonathan_Lee 08 June 2013 08:59:08PM *  1 point [-]

Thank you for calling out a potential failure mode. I observe that my style of inquisition can come across as argumentative, in that I do not consistently note when I have shifted my view (instead querying other points of confusion). This is unfortunate.

To make my object level opinion changes more explicit:

  • I have had a weak shift in opinion towards the value of attempting to quantify and utilise weak arguments in internal epistemology, after our in person conversation and the clarification of what you meant.

  • I have had a much lesser shift in opinion of the value of weak arguments in rhetoric, or other discourse where I cannot assume that my interlocutor is entirely rational and truth-seeking.

  • I have not had a substantial shift in opinion about the history of mathematics (see below).

As regards the history of mathematics, I do not know our relative expertise, but my background prior for most mathematicians (including JDL_{2008}) has a measure >0.99 cluster that finds true results obvious in hindsight and counterexamples to false results obviously natural. My background prior also suggests that those who have spent time thinking about mathematics as it was done at the time fairly reliably do not have this view. It further suggests that on this metric, I have done more thinking than the median mathematician (against a background of Cantab. mathmos, I would estimate I'm somewhere above the 5th centile of the distribution). The upshot of this is that your recent comments have not substantively changed my views about the relative merit of Cauchy and Euler's arguments at the time they were presented; my models of historians of mathematics who have studied this do not reliably make statements that look like your claims wrt. the Basel problem.

I do not know what your priors look like on this point, but it seems highly likely that our difference in views on the mathematics factor through to our priors, and convergence will likely be hindered by being merely human and having low baud channels.

Comment author: JonahSinick 08 June 2013 06:11:02AM *  2 points [-]

That it worked in every instance of continuous functions that had been considered up to that point,

In ~1659, Fermat considered the sequence of functions f(n,x) = x^n for n = 0, 1, 2, 3, .... Each of these is a continuous function of x. If you restrict these functions to the interval between 0 and 1, and take the limit as n goes to infinity, you get a discontinuous function.

So there's a very simple counterexample to Cauchy's ostensible theorem from 1821, coming from a sequence of functions that had been studied over 150 years before. If Cauchy had actually looked at those examples of sequences of function that had been considered, he would have recognized his ostensible theorem to be false. By way of contrast, Euler did extensive empirical investigation to check the plausibility of his result. The two situations are very, very different.

Comment author: Jonathan_Lee 08 June 2013 07:26:16PM 2 points [-]

Fermat considered the sequence of functions f(n,x) = x^n for n = 0, 1, 2, 3, ....

Only very kind of. Fermat didn't have a notion of function in the sense meant later, and showed geometrically that the area under certain curves could be computed by something akin to Archimedes' method of exhaustion, if you dropped the geometric rigour and worked algebraically. He wasn't looking at a limit of functions in any sense; he showed that the integral could be computed in general.

The counterexample is only "very simple" in the context of knowing that the correct condition is uniform convergence, and knowing that the classical counterexamples look like x^n, n->\infty or bump functions. Counterexamples are not generally obvious upfront; put another way, it's really easy to engage in Whig history in mathematics.

Model Stability in Intervention Assessment

5 Jonathan_Lee 06 June 2013 11:24PM

In this post, I hope to examine the Bayesian Adjustment paradigm presented by Holden Karnofsky of Givewell from a mathematical viewpoint, in particular looking at how we can rigorously manage the notion of uncertainty in our models and the stability of an estimate. Several recent posts have touched on related issues.

In practise, we will need to have some substantive prior on the likely range of impacts that interventions can achieve, and I will look briefly at what kinds of log-ranges are supported in the literature, and the extent to which these can preclude extreme impact scenarios. I will then briefly look at less formal notions of confidence in a model, which may be more tractable either computationally or for heuristic purposes than a formal bayesian approach.

continue reading »

View more: Next