Comment Permalink

I think you could've done better with integration by parts.

In physics, integration by parts is usually applied for a definite integral in which you can neglect the uv term. Thus, integration by parts reads: "The integral of udv = integral of -vdu, that is, you can trade what you differentiate in a product, as long as the functions in question have a small integral over the boundary".

Common examples are when you integrate over some big volume, as most physical quantities are very small far away from the stuff.

I also think the intuition behind Bayes rule as usually interpreted here on LW, that is, it provides the updating rule posterior odds = prior odds*likelihood ratio and thereby also provides a formalization of how good evidence is. As for the derivation from P(A|B) defined as equal to P(A and B)/P(B), I think this is best described by saying that P(A|B) is the probability of A once you know B, so you take the mass associated to the worlds where A is true once B is true and compare to your total mass, which is the mass associated to the worlds where B is true. The former is really just "mass of A and B", so you are done.

Now, P(A and B) = P(B)P(A|B), which I think of as "First, take probability B is true, then given that we are in this set of worlds, take the probability that A is true". Essentially translating from locating sets to probabilities.

From here, Bayes theorem is the simple fact that A and B = B and A. So P(B)P(A|B) = P(A and B) = P(A)P(B|A). If you draw a square with 4 rectangles where the first row is P(A), where the second row is P(-A), where the first column is P(B), and where the second is P(-B), and each rectangle represents a possibility like P(A and -B), then this equation just splits the rectangle P(A and B) into (rectangle compared to row) * row = (rectangle compared to column) * column. Divide by P(B) (that is, the row) to get Bayes law.

For the sine rule, I think it also helps to show that the fraction a/sin(a) is the diameter of the circumcircle. Wikipedia has good pictures.

For an extra math fact that totally doesn't need to be in the post, it is interesting that for spherical triangles, the law of sines just needs to be modified so that you take the sine of the lengths as well. In fact you can do similar in hyperbolic space (by using sinh), and there's a taylor series form involving the curvature for a version of sine that makes the law of sines still true in any constant curvature space. (you can find this on the same wiki page).

Reply

See in context

45 Equations Mean Things

by abstractapplic

19th Mar 2025

3 min read

10

45

I asserted that this forum could do with more 101-level and/or mathematical and/or falsifiable posts, and people agreed with me, so here is one. People confident in highschool math mostly won’t get much out of most of this, but students browsing this site between lectures might.

The Sine Rule

Say you have a triangle with side lengths a, b, c and internal angles A, B, C. You know a, know A, know b, and want to know B. You could apply the Sine Rule. Or you could apply common sense: “A triangle has the same area as itself”. ^[1]

The area of a triangle is half the base times the height. If you treat a as the base, the height is c*sin(B). So the area is a*c*sin(B)/2. But if you treat b as the base, the height is c*sin(A). So the area is also b*c*sin(A)/2. So a*c*sin(B)/2 = b*c*sin(A)/2. And if you divide through by abc/2, you get sin(B)/b=sin(A)/a.

In practice, you might be well-advised to just recall and regurgitate the relevant equation. But notice that this is literally equivalent to the informal version.

Bayes’ Theorem

A demon-hunter has a 10% chance of encountering an archdevil on a given mission. A demon-hunter who doesn’t encounter an archdevil has a 80% per-mission survival rate; for a demon-hunter who does, that number is 30%.

Say you know a demon-hunter survived their latest excursion, but don’t know anything else, and want to calculate the probability they encountered an archdevil. You could apply Bayes’ Theorem. Or you could apply common sense: “Things that couldn’t have happened didn’t” and “Probabilities add to 1” (arguably with a little assistance from "Odds ratios aren't affected by tests that don’t distinguish between them”).

Before you get the good news, the four possible outcomes are:

met archdevil & survived (3%),
met archdevil & died (7%),
avoided archdevil & survived (72%), and
avoided archdevil & died (18%).

After you get the good news, the only paths which could have been taken are met&survived and avoided&survived. But those two only have 75% probability between them, and probabilities add to 1, so they get scaled up appropriately, by multiplying through by 1/0.75. This gives you a 4% chance that they met an archdevil, and a 96% chance they didn’t.

(The part I gloss over is probabilities preserving their proportions.^[2] But, like, of course they do! If you bet that a fair die will roll above three, and later find out you won – eliminating 1, 2 and 3 as hypotheses – do you think something like “all the probability from the eliminated hypotheses must have gone into 4”? No, you think “4, 5, and 6 are equally likely rolls based on what I know, so there’s a 1/3 chance it was 4”.)

In practice, you might be well-advised to just recall and regurgitate the relevant equation. But notice that this is literally equivalent to the informal version.

Integration By Parts

Say you want to integrate (x^2)(e^x). You could repeatedly apply integration by parts. Or you could repeatedly apply common sense: “If you differentiate something which produces your target, you’ll get your target, but you might also get some other stuff, which you’ll have to deal with”.

If you differentiate (x^2)(e^x), one of the outputs you’ll get is (x^2)(e^x). You’ll also get some other stuff, in this case (2x)(e^x). So you also need to figure out what to differentiate to take care of that. If you differentiate -(2x)(e^x), one of the outputs you’ll get is -(2x)(e^x), which cancels the (2x)(e^x). You’ll also get some other stuff, in this case -2(e^x). So you also need to figure out what to differentiate to take care of that. If you differentiate 2(e^x), one of the outputs you’ll get is 2(e^x), which cancels the -2(e^x). But you’ll also get some other stuff, in this case 0.^[3] So you also need to figure out what the differentiate to take care of that.^[4] If you differentiate any number that doesn’t have a variable next to it you get 0; this can be represented by a “c”. So the total answer is (x^2)(e^x) – (2x)(e^x) + 2(e^x) + c (by convention we say +c even when -c would make more sense; it doesn’t matter, since “any number” can be negative just as easily as positive).

In practice, you might be well-advised to just recall and regurgitate the relevant equation. But notice that this is literally equivalent to the informal version.

Conclusion

You didn’t need to know any of this. You can just apply the equations and get the right answers. And of course you already assumed they were all proven somehow, even if you didn’t know the details; I won’t insult you by claiming you needed to be taught that. The dumb, subtle thing I’m trying to gently bludgeon into you – which I worry your teachers didn’t – is the closeness with which the math can match the meaning, if you take the time to make sense of it.

^{^}
It’s possible to solve this even more simply with “a triangle has the same height as itself”, but that doesn’t map as cleanly to the standard expression.
^{^}
“When you eliminate the impossible, whatever remains has probability proportional to the probability it had before you eliminated the impossible.” - Sherlock Holmes, probably, before Watson butchered the quote.
^{^}
Technically you were getting this at every step, but it’s easier to treat all the 0s as one big 0.
^{^}
Wait, are you saying you can conceptualize the “+c” thing as a consequence of integration by parts? I never thought of it that way! Buddy, you can conceptualize most things as most other things, ask any poet. But to answer your question . . . yes, you can.

RationalityWorld Modeling

Frontpage

45

New Comment

11 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:11 AM

[-]Richard_Kennaway7d60

A small correction: the probability of "avoided archdevil & died" should be 18%, not 8%. This isn't used in the subsequent calculation, but if the question had been "Looks like he's not coming back. What's the chance an archdevil got him?" it would. (28% = 7/(7+18).)

Reply

[-]abstractapplic7d20

Can't believe I missed that; edited; ty!

Reply

[-]XelaP6d51

I think you could've done better with integration by parts.

In physics, integration by parts is usually applied for a definite integral in which you can neglect the uv term. Thus, integration by parts reads: "The integral of udv = integral of -vdu, that is, you can trade what you differentiate in a product, as long as the functions in question have a small integral over the boundary".

Common examples are when you integrate over some big volume, as most physical quantities are very small far away from the stuff.

I also think the intuition behind Bayes rule as usually interpreted here on LW, that is, it provides the updating rule posterior odds = prior odds*likelihood ratio and thereby also provides a formalization of how good evidence is. As for the derivation from P(A|B) defined as equal to P(A and B)/P(B), I think this is best described by saying that P(A|B) is the probability of A once you know B, so you take the mass associated to the worlds where A is true once B is true and compare to your total mass, which is the mass associated to the worlds where B is true. The former is really just "mass of A and B", so you are done.

Now, P(A and B) = P(B)P(A|B), which I think of as "First, take probability B is true, then given that we are in this set of worlds, take the probability that A is true". Essentially translating from locating sets to probabilities.

From here, Bayes theorem is the simple fact that A and B = B and A. So P(B)P(A|B) = P(A and B) = P(A)P(B|A). If you draw a square with 4 rectangles where the first row is P(A), where the second row is P(-A), where the first column is P(B), and where the second is P(-B), and each rectangle represents a possibility like P(A and -B), then this equation just splits the rectangle P(A and B) into (rectangle compared to row) * row = (rectangle compared to column) * column. Divide by P(B) (that is, the row) to get Bayes law.

For the sine rule, I think it also helps to show that the fraction a/sin(a) is the diameter of the circumcircle. Wikipedia has good pictures.

For an extra math fact that totally doesn't need to be in the post, it is interesting that for spherical triangles, the law of sines just needs to be modified so that you take the sine of the lengths as well. In fact you can do similar in hyperbolic space (by using sinh), and there's a taylor series form involving the curvature for a version of sine that makes the law of sines still true in any constant curvature space. (you can find this on the same wiki page).

Reply

[-]AnthonyC7d42

which I worry your teachers didn’t

Oh it can be so much worse than that - actively pushing students away from that kind of understanding. I've had math teachers mark answers wrong because I (correctly) derived a rule I'd forgotten instead of phrasing it the way they taught it, or because they couldn't follow the derivation. Before college, I can think of maybe two of my teachers who actually seemed to understand high school math in any deeper way.

Reply

[-]AnthonyC6d41

Wanted to add:

I think this post is great for here on LW, but if someone wanted to actually start teaching students to understand math more deeply, calling it common sense probably comes off as condescending, because it doesn't feel that way until you get comfortable with it. There's a lot to unlearn and for a lot of people it is very intimidating.

Personally I wish we treated math class at least some of the time as a form of play. We make sure to teach kids about jokes and wordplay and do fun science-y demonstrations, but math is all dry and technical. We assign kids books to read like A Wrinkle in Time and The Phantom Tollbooth. But, I don't think my elementary school teachers had any clue what a tesseract was, or what the Mathemagician and Dodecahedron are all about, and so that whole aspect of these books was just a lost opportunity for all but maybe 3 kids in my grade.

Reply

[-]danielechlin6d10

More specifically, the correctness of the proof (at least in the triangles case) is common sense, coming up with the proof is not.

The integrals idea gets sketchy. Try it with e^(1/x). It's just a composition of functions so reverse the chain rule then deal with any extra terms that come up. Of course, it's not integrable. There's not really any utility in overextending common sense to include things that might or might not work. And you're very close to implying "it's common sense" is a proof for things that sound obvious but aren't.

Reply

1

[-]AnthonyC6d20

Sure. And I'm of the opinion that it is only common sense after you've done quite a lot of the work of developing a level of intuition for mathematical objects that most people, including a significant proportion of high school math teachers, never got.

Reply

[+][comment deleted]5d10

Deleted by Perry Cai, Last Friday at 1:34 PM

Reason: Already posted

[-]Lorxus7d30

Not much to add apart from "this is clean and really good, thanks!".

Reply

[-]Grayson Chao7d20

Overall, I think this is a much better way to teach math - in some sense it's similar to removing date memorization from history classes, which I also agree with. I do have an issue with the phrase "a triangle has the same area as itself." A more user-friendly intuition for me is "if you describe the same thing two ways, it's still the same thing." This seems more generalizable and also gets more directly at the point that sin(A)/a is a complete description of a triangle's proportions.

Reply

[-]Haotian6d20

Agreed with respect to better way to teach maths. However, noting that teaching like this requires students who want to learn like this which is almost always going to be the minority. For folks interested/enjoy this perspective, I encourage you to read A Mathematician's Lament by Paul Lockhart.

Reply

Moderation Log