Comment author: wedrifid 21 June 2010 06:50:56AM *  0 points [-]

from the perspective of a calculus student who has gone through the standard run of American school math,

That's the problem. See that bunch of symbols? That isn't the best way to teach stuff. It is like trying to teach them math while speaking a foreign language (even if technically we are saving the greek till next month). To teach that concept you start with the kind of picture I was previously describing, have them practice that till they get it then progress to diagrams that change once in the middle, etc.

Perhaps the students here were prepared differently but the average student started getting problems with calculus when it reached a point slightly beyond what you require for the basic physics we were talking about here. ie. they would be able to do 1. and but have no chance at all with 2:

Comment author: taiyo 21 June 2010 07:13:24AM 2 points [-]

I'm not claiming that working from the definition of derivative is the best way to present the topic. But it is certainly necessary to present the definition if the calculus is being taught in math course. Part of doing math is being rigorous. Doing derivatives without the definition is just calling on a black box.

On the other hand, once one has the intuition for the concept in hand through more tangible things like pictures, graphs, velociraptors, etc., the definition falls out so naturally that it ceases to be something which is memorized and is something that can be produced ``on the fly''.

Comment author: wedrifid 21 June 2010 04:16:54AM *  0 points [-]

Most people who learn it have a very hard time doing so, and they're already well above average in mathematical ability.

Well above average mathematical ability and cannot do calculus to the extent of understanding rates of change? For crying out loud. You multiply by the number up to the top right of the letter then reduce that number by 1. Or you do the reverse in the reverse order. You know, like you put on your socks then your shoes but have to take off your shoes then take off your socks.

Sometimes drawing a picture helps prime an intuitive understanding of the physics. You start with a graph of velocity vs time. That is the 'acceleration'. See... it is getting faster each second. Now, use a pencil and progressively color in under the line. that's the distance that is getting covered. See how later on more when it is going faster more distance is being traveled at one time and we have to shade in more area? Now, remember how we can find the area of a triangle? Well, will you look at that... the maths came out the same!

Comment author: taiyo 21 June 2010 06:24:06AM *  5 points [-]

I teach calculus often. Students don't get hung up on mechanical things like (x^3)' = 3x^2. They instead get hung up on what

has to do with the derivative as a rate of change or as a slope of a tangent line. And from the perspective of a calculus student who has gone through the standard run of American school math, I can understand. It does require a level up in mathematical sophistication.

Comment author: sketerpot 21 June 2010 04:50:23AM 2 points [-]

I was thinking more basic: induction, recursion, reasoning about trees. Understanding those things on an intuitive level is one of the main barriers that people face when they learn to program. It's one thing to be able to solve problems out of a textbook involving induction or recursion, but another thing to learn them so well that they become obvious -- and it's that higher level of understanding that's important if you want to actually use these concepts.

Comment author: taiyo 21 June 2010 06:04:57AM 0 points [-]

I'm not sure about all the details, but I believe that there was a small kerfuffle a few decades ago over a suggestion to change the apex of U.S. ``school mathematics'' from calculus to a sort of discrete math for programming course. I cannot remember what sort of topics were suggested though. I do remember having the impression that the debate was won by the pro-calculus camp fairly decisively -- of course, we all see that school mathematics hasn't changed much.

Comment author: sketerpot 21 June 2010 02:34:04AM *  10 points [-]

I've got a tangential question: what math, if learned by more people, would give the biggest improvement in understanding for the effort put into learning it?

Take calculus, for example. It's great stuff if you want to talk about rates of change, or understand anything involving physics. There's the benefit; how about the cost? Most people who learn it have a very hard time doing so, and they're already well above average in mathematical ability. So, the benefit mostly relates to understanding physics, and the cost is fairly high for most people.

Compare this with learning basic probability and statistical thinking. I'm not necessarily talking about learning anything in depth, but people should have at least some exposure to ideas like probability distributions, variance, normal distributions and how they arise, and basic design of experiments -- blinding, controlling for variables, and so on. This should be a lot easier to learn than calculus, and it would give insight into things that apply to more people.

I'll give a concrete example: racism. Typical racist statements, like "black people are lazy and untrustworthy," couldn't possibly be true in more than a statistical sense, and obviously a statistical statement about a large group doesn't apply to every member of that group -- there's plenty of variance to take into account. Basic statistical thinking makes racist bigotry sound preposterously silly, like someone claiming that the earth is flat. This also applies to every other form of irrational bigotry that I can think of off the top of my head.

Remember when Larry Summers suggested that maybe part of the reason for the underrepresentation of women in Harvard's science faculty was that women may have lower variance in intelligence than men, and so are underrepresented in the highest part of the intelligence bell curve? What almost everybody heard was "Women can't be scientists because they're stupid." People heard a statistical statement and had no idea how to understand it.

There are important, relevant subjects that people just can not understand without basic statistical thinking. I would like to see most people exposed to basic statistical thinking.

Are there any other kinds of math that offer high bang-for-the-buck, as far as learning difficulty goes? (I've always thought that the math behind computer programming was damn useful stuff, but the engineering students I've talked with usually find it harder than calculus, so maybe that's not the best idea.)

Comment author: taiyo 21 June 2010 03:54:44AM 2 points [-]

Probability theory as extended logic.

I think it can be presented in a manner accessible to many (Jaynes PT:LOS is not accessible to many).

Comment author: taiyo 20 June 2010 07:22:54PM 4 points [-]

Jaynes references Polya's books on the role of plausible reasoning in mathematical investigations. The three volumes are How to Solve it, and two volumes of Mathematics and Plausible Reasoning. They are all really fun and interesting books which kind of give a glimpse of the cognitive processes of a successful mathematician.

Particularly relevant to Jaynes' discussion of weak syllogisms and plausibility is a section of Vol. 2 of Mathematics and Plausible Reasoning which gives many other kinds of weak syllogisms. Things like: "A is analogous to B, B true, so A is more credible."

Just a heads up in case anyone wants to see more of this sort of thing (as at least one person on IRC #lesswrong did).

There are also fun exercises -- for example: cryptic crossword clues as an exercise in plausible reasoning.

Comment author: Morendil 15 June 2010 12:32:09AM 3 points [-]

Questions for the first part of Chapter 1:

  • Compare Jaynes' framing of probability theory with your previous conceptions of "probability". What are the differences?
  • What do you make of Jaynes' observation that plausible inference is concerned with logical connections, and must be carefully distinguished from physical causation?

(If you can think of other/better questions, please ask away!)

Comment author: taiyo 17 June 2010 05:58:09AM 2 points [-]

Along with the distinction between causal and logical connections, when considering the conditional premise of the syllogisms (if A then B), Jaynes warns us to distinguish between those conditional statements of a purely formal character (the material conditional ) and those which assert a logical connection.

It seems to me that the weak syllogisms only "do work" when the conditional premise is true due to a logical connection between antecedent and consequent. If no such connection exists, or rather, if our mind cannot establish such a connection, then the plausibility of the antecedent doesn't change upon learning the consequent.

For example, "if the garbage can is green then frogs are amphibians" is true since frogs are amphibians, but this fact about frogs does not increase (or decrease) the probability that the garbage can is green since presumably, most of us don't see a connection between the two propositions.

At some point in learning logic, I think I kind of lost touch with the common language use of conditionals as asserting connections. I like that Jaynes reminds us of the distinction.

Comment author: taiyo 10 June 2010 09:56:00PM 0 points [-]

I think this is not so important, but it helpful to think about nonetheless. I guess the first step is to define what is meant by 'Bayesian'. In my original comment, I took one necessary condition to be that a Bayesian gadget is one which follows from the Cox-Polya desiderata. It might be better to define it to be one which uses Bayes' Theorem. I think in either case, Maxent fails to meet the criteria.

Maxent produces the distribution on the sample space which maximizes entropy subject to any known constraints which presumably come from data. If there are no constraints, then one gets the principle of indifference which can also be gotten straight out of the Cox-Polya desiderata as you say. But I think these are two different approaches to the same target. Maxent needs something new -- namely Shannon's information entropy (by 'new' I mean new w.r.t. Cox-Polya). Furthermore, the derivation of Maxent is really different from the derivation of the principle of indifference from Cox-Polya.

I could be completely off here, but I believe the principle of indifference argument is generalized by the transformation group stuff. I think this because I can see the action of the symmetric group (this is the group (group in the abstract algebra sense) of permutations) on the hypothesis space in the principle of indifference stuff. Anyway, hopefully we'll get up to that chapter!

Comment author: taiyo 13 June 2010 01:10:03AM 1 point [-]

Upon further study, I disagree with myself here. It does seem like entropy as a measurement of uncertainty in probability distributions does more or less fall out of the Cox Polya desiderata. I guess that 'common sense' one is pretty useful!

Comment author: Morendil 10 June 2010 07:51:13AM *  2 points [-]

As a warm-up, and to indicate how I intend to prompt discussion (subject to the group's feedback) I have posted a summary of the Preface. (ETA: for instance, implications of this method are that it's up to participants to check back on the post from time to time to see if new summaries have been posted; then after reading the parts summarized, come back and answer this comment. Does that work?)

I will start work today on a summary of as much of Chapter 1 as might make for a nice bite-sized chunk to discuss, and post that in a few days, or sooner if the discussion on the Preface dies down quickly.

Discussion question for the Preface: can you think of further examples of the type of "old ideas" Jaynes refers to?

Comment author: taiyo 11 June 2010 12:18:56AM 2 points [-]

I wonder if Jaynes' statement is really true? Here is an example that is on my mind because I'm reading the (thus far) awesome book The Making of the Atomic Bomb. Apologies if I get details wrong:

In the 1930s, there was a lot of work done on neutron bombardment of uranium. At some point, Fermi fired slow moving neutrons at uranium and got a bunch of interesting reaction products that he concluded were most plausibly transuranic elements. I believe he came to this conclusion because the models of the day discounted the hypothesis that a slow moving neutron could do anything but release a "small" particle like a helium nucleus or something and furthermore there was experimental work done to discount the lower elements that were in the vicinity of uranium.

Some weird experimental data by Joliet and Curie which seemed inconsistent with the prevailing model came up later. Hahn and Strassman seemed not to believe their results, and so tried to replicate them and found similar anomalies. A careful chemical analysis of the reaction products of uranium bombardment found elements like barium -- much lower on the periodic table. Meitner and Frisch came along and provided a new model which turned out to be right.

So here was data that when analyzed with respect to old models seemed implausible. The data was questioned, but then replicated, studied and then understood. The result was that the old model had to be cast aside for something new. The reason is that the data was incompatible with the model (or at least implausible enough) that a new model needed to be created.

Isn't this narrative the way knowledge often goes? New data comes along and blows up old ideas because the new data is inconsistent with or implausible in the old model. Does this jibe with Jaynes' statement?

Comment author: gwern 10 June 2010 08:36:58PM 2 points [-]

In particular, I think Maximum entropy methods are not Bayesian in the sense that they do not follow from the Cox-Polya desiderata.

IIRC, this was my understanding of Jaynes's position on maxent:

  1. the Cox-Polya desiderata say that multiple allowed derivations of a problem ought to all lead to the same answer
  2. if we consider a list of identifiers about which we know nothing, and we ask whether the first one is more likely than the nth one, then we should answer that they are equal, because if we say either greater than or less than, we could shuffle the list and get a contradictory answer. By induction, we ought to say that all members of the list are equiprobable, which only allows entries to be 1/n probable.
  3. hence, we get the Principle of Indifference. (Points 1-3 are my version of chapter 2 or 3, IIRC.)
  4. Maxent is just the same idea, abstract and applied to non-list thingies. (I haven't actually gotten this far, but it seems like the obvious next step.)

The arguments seem to me to be as Bayesian as anything in his building up of Bayesian methods from the Cox-Polya criteria.

Comment author: taiyo 10 June 2010 09:56:00PM 0 points [-]

I think this is not so important, but it helpful to think about nonetheless. I guess the first step is to define what is meant by 'Bayesian'. In my original comment, I took one necessary condition to be that a Bayesian gadget is one which follows from the Cox-Polya desiderata. It might be better to define it to be one which uses Bayes' Theorem. I think in either case, Maxent fails to meet the criteria.

Maxent produces the distribution on the sample space which maximizes entropy subject to any known constraints which presumably come from data. If there are no constraints, then one gets the principle of indifference which can also be gotten straight out of the Cox-Polya desiderata as you say. But I think these are two different approaches to the same target. Maxent needs something new -- namely Shannon's information entropy (by 'new' I mean new w.r.t. Cox-Polya). Furthermore, the derivation of Maxent is really different from the derivation of the principle of indifference from Cox-Polya.

I could be completely off here, but I believe the principle of indifference argument is generalized by the transformation group stuff. I think this because I can see the action of the symmetric group (this is the group (group in the abstract algebra sense) of permutations) on the hypothesis space in the principle of indifference stuff. Anyway, hopefully we'll get up to that chapter!

Comment author: Larks 10 June 2010 09:54:21AM *  1 point [-]

Re: Preface

Is there a good reason why the Maximum Entropy method is treated as distinct from the Bayesian, rather than simply as a method for generating priors?

Comment author: taiyo 10 June 2010 07:31:29PM *  1 point [-]

I think Jaynes more or less defines 'Bayesian methods' to be those gadgets which fall out of the Cox-Polya desiderata (i.e. probability theory as extended logic). Actually, this can't be the whole story given the following quote on page xxiii:

"It is true that all 'Bayesian' calculations are included automatically as particular cases of our rules; but so are all 'frequentist' calculations. Nevertheless, our basic rules are broader than either of these."

In any case, Maximum entropy gives you the pre-Bayesian ensemble (I got that word from here) which then allow the Bayesian crank to turn. In particular, I think Maximum entropy methods are not Bayesian in the sense that they do not follow from the Cox-Polya desiderata.

View more: Prev | Next