Learning Mathematics in Context

Crux

I have almost no direct knowledge of mathematics. I took various mathematics courses in school, but I put in the minimal amount of effort required to pass and immediately forgot everything afterwards.

When people learn foreign languages, they often learn vocabulary and grammar out of context. They drill vocabulary and grammar in terms of definitions and explanations written in their native language. I, however, have found this to be intolerably boring. I'm conversational in Japanese, but every ounce of my practice came in context: either hanging out with Japanese friends who speak limited English, or watching shows and adding to Anki new words or sentence structures I encounter.

I'm convinced that humans must spike their blood sugar and/or pump their body full of stimulants such as caffeine in order to get past the natural tendency to find it unbearably dull to memorize words and syntax by rote and lifeless connection with the structures in their native language.

I've tried to delve into some mathematics recently, but I get the impression that most of the expositions fall into one of two categories: Either (1) they assume that I'm a student powering my day with coffee and chips and that I won't find it unusual if I'm supposed to just trust that once I spend 300 hours pushing arbitrary symbols around I'll end up with some sort of insight. Or (2) they do enter the world of proper epistemological explanations and deep real-world relevance, but only because they expect that I'm already quite well-versed in various background information.

I don't want an introduction that assumes I'm the average unthinking student, and I don't want an exposition that expects me to understand five different mathematical fields before I can read it. What I want seems likely to be uncommon enough that I might as well simply say: I don't care what field it is; I just want to jump into something which assumes no specifically mathematical background knowledge but nevertheless delves into serious depths that assume a thinking mind and a strong desire for epistemological sophistication.

I bought Calculus by Michael Spivak quite a while ago because the Amazon reviews led me to believe it may fit these considerations. I don't know whether that's actually the case or not though, as I haven't tried reading it yet.

Any suggestions would be appreciated.

I once took a math course where the first homework assignment involved sending the professor an email that included what we wanted to learn in the course (this assignment was mostly for logistical reasons: professor's email now autocompletes, eliminating a trivial inconvenience of emailing him questions and such, professor has all our emails, etc). I had trouble answering the question, since I was after learning unknown unknowns, thereby making it difficult to express what exactly it was I was looking to learn. Most mathematicians I've talked to agree that, more or less, what is taught in secondary school under the heading of "math" is not math, and it certainly bears only a passing resemblance to what mathematicians actually do. You are certainly correct that the thing labelled in secondary schools as "math" is probably better learned differently, but insofar as you're looking to learn the thing that mathematicians refer to as "math" (and the fact you're looking at Spivak's Calculus indicates you, in fact, are), looking at how to better learn the thing secondary schools refer to as "math" isn't actually helpful. So, let's try to get a better idea of what mathematicians refer to as math and then see what we can do.

The two best pieces I've read that really delve into the gap between secondary school "math" and mathematician's "math" are Lockhart's Lament and Terry Tao's Three Levels of Rigour. The common thread between them is that secondary school "math" involves computation, whereas mathematician's "math" is about proof. For whatever reason, computation is taught with little motivation, largely analogously to the "intolerably boring" approach to language acquisition; proof, on the other hand, is mostly taught by proving a bunch of things which, unlike computation, typically takes some degree of creativity, meaning it can't be taught in a rote manner. In general, a student of mathematics learns proofs by coming to accept a small set of highly general proof strategies (to prove a theorem of the form "if P then Q", assume P and derive Q); they first practice them on the simplest problems available (usually set theory) and then on progressively more complex problems. To continue Lockhart's analogy to music, this is somewhat like learning how to read the relevant clef for your instrument and then playing progressively more difficult music, starting with scales. [1] There's some amount of symbol-pushing, but most of the time, there's insight to be gleaned from it (although, sometimes, you just have to say "this is the correct result because the algebra says so", but this isn't overly common).

Proofs themselves are interesting creatures. In most schools, there's a "transition course" that takes aspiring math majors who have heretofore only done computation and trains them to write proofs; any proofy math book written for any other course just assumes this knowledge but, in my experience (both personally and working with other students), trying to make sense of what's going on in these books without familiarity with what makes a proof valid or not just doesn't work; it's not entirely unlike trying to understand a book on arithmetic that just assumes you understand what the + and * symbols mean. This transition course more or less teaches you to speak and understand a funny language mathematicians use to communicate why mathematical propositions are correct; without taking the time to learn this funny language, you can't really understand why the proof of a theorem actually does show the theorem is correct, nor will you be able to glean any insight as to why, on an intuitive level, the theorem is true (this is why I doubt you'd have much success trying to read Spivak, absent a transition course). After the transition course, this funny language becomes second nature, it's clear that the proofs after theorem statements, indeed, prove the theorems they claim to prove, and it's often possible, with a bit of work [2], to get an intuitive appreciation for why the theorem is true.

To summarize: the math I think you're looking to learn is proofy, not computational, in nature. This type of math is inherently impossible to learn in a rote manner; instead, you get to spend hours and hours by yourself trying to prove propositions [3] which isn't dull, but may take some practice to appreciate (as noted below, if you're at the right level, this activity should be flow-inducing). The first step is to do a transition, which will teach you how to write proofs and discriminate between correct proofs from incorrect; there will probably some set theory.

So, you want to transition; what's the best way to do it?

Well, super ideally, the best way is to have an experienced teacher explain what's going on, connecting the intuitive with the rigorous, available to answer questions. For most things mathematical, assuming a good book exists, I think it can be learned entirely from a book, but this is an exception. That said, How to Prove It is highly rated, I had a good experience with it, and other's I've recommended it to have done well. If you do decide to take this approach and have questions, pm me your email address and I'll do what I can.

This analogy breaks down somewhat when you look at the arc musicians go through. The typical progression for musicians I know is (1) start playing in whatever grade the music program of the school I'm attending starts, (2) focus mainly on ensemble (band, orchestra) playing, (3) after a high (>90%) attrition rate, we're left with three groups: those who are in it for easy credit (orchestra doesn't have homework!); those who practice a little, but are too busy or not interested enough to make a consistent effort; and those who are really serious. By the time they reach high school, everyone in this third group has private instructors and, if they're really serious about getting good, goes back and spends a lot of times practicing scales. Even at the highest level, musicians review scales, often daily, because they're the most fundamental thing: I once had the opportunity to ask Gloria dePasquale what the best way to improve general ability, and she told me that there's 12 major scales and 36 minor scales and, IIRC, that she practices all of them every day. Getting back to math, there's a lot here that's not analogous to math. Most notably, there's no analogue to practicing scales, no fundamental-level thing that you can put large amounts of time into practicing and get general returns to mathematical ability: there's just proofs, and once you can tell a valid proof from an invalid proof, there's almost no value that comes from studying set theory proofs very closely. There's certainly an aesthetic sense that can be refined, but studying whatever proofs happen to be at to slightly above your current level is probably the most helpful (like in flow), if it's too easy, you're just bored and learn nothing (there's nothing there to learn), and if it's too hard, you get frustrated and still learn nothing (since you're unable to understand what's going on).)
"With a bit of work", used in a math text, means that a mathematically literate reader who has understood everything up until the phrase's invocation should be able to come up with the result themselves, that it will require no real new insight; "with a bit of work, it can be shown that, for every positive integer n, (1 + 1/n)^n < e < (1 + 1/n)^(n+1)". This does not preclude needing to do several pages of scratch work or spending a few minutes trying various approaches until you figure out one that works; the tendency is for understatement. Related, most math texts will often leave proofs that require no novel insights or weird tricks as exercises for the reader. In Linear Algebra Done Right, for instance, Axler will often state a theorem followed by "as you should verify", which should require some writing on the reader's part; he explicitly spells this out in the preface, but this is standard in every math text I've read (and I only bother reading the best ones). You cannot read mathematics like a novel; as Axler notes, it can often take over an hour to work through a single page of text.
Most math books present definitions, state theorems, and give proofs. In general, you definitely want to spend a bit of time pondering definitions; notice why they're correct/how the match your intuition, and seeing why other definitions weren't used. When you come to a theorem, you should always take a few minutes to try to prove it before reading the book's proof. If you succeed, you'll probably learn something about how to write proofs better by comparing what you have to what the book has, and if you fail, you'll be better acquainted with the problem and thus have more of an idea as to why the book's doing what it's doing; it's just an empirical result (which I read ages ago and cannot find) that you'll understand a theorem better by trying to prove it yourself, successful or not. It's also good practice. There's some room for Anki (I make cards for definitions—word on front, definition on back—and theorems—for which reviews consist of outlining enough of a proof that I'm confident I could write it out fully if I so desired to) but I spend the vast majority of my time trying to prove things.

Your comment made me think, and I'll look up some of the recommendations. I like the analogy with musicians and also the part where you talked about how the analogy breaks down.

However, I'd like to offer a bit of a different perspective to the original poster on this part of what you said.

To summarize: the math I think you're looking to learn is proofy, not computational, in nature.

Your advice is good, given this assumption. But this assumption may or may not be true. Given that the post says:

I don't care what field it is.

I think there's the possibility that the original poster would be interested in computational mathematics.

Also, it's not either or. It's a false dichotomy. Learning both is possible and useful. You likely know this already, and perhaps the original poster does as well, but since the original poster is not familiar with much math, I thought I'd point that out in case it's something that wasn't obvious. It's hard to tell, writing on the computer and imagining a person at the other end.

If the word "computational" is being used to mean following instructions by rote without really understanding why, or doing the same thing over and over with no creativity or insight, then it does not seem to be what the original poster is looking for. However, if it is used to mean creatively understanding real world problems, and formulating them well enough into math that computer algorithms can help give insights about them, then I didn't see anything in the post that would make me warn them to steer clear of it.

There are whole fields of human endeavor that use math and include the term "computational" and I wouldn't want the original poster to miss out on them because of not realizing that the word may mean something else in a different context, or to think that it's something that professional mathematicians or scientists or engineers don't do much. Some mathematicians do proofs most of the time, but others spend time on computation, or even proofs about computation.

Fields include computational fluid dynamics, computational biology, computational geometry...the list goes on.

Speaking of words meaning different things in different contexts, that's one thing that tripped me up when I was first learning some engineering and math beyond high school. When I read more advanced books, I knew when I was looking at an unfamiliar word that I had to look it up, but I hadn't realized that some words that I already was familiar with had been redefined to mean something else, given the context, or that the notation had symbols that meant one thing in one context and another thing in another context. For example, vertical bars on either side of something could mean "the absolute value of" or it could mean "the determinant of this matrix", and "normal forces" meant "forces perpendicular to the contact surface". Textbooks are generally terribly written and often leave out a lot.

In other words, the jargon can be sneaky and sound exactly like words that you already know. It's part of why mathematical books seem so nonsensical to outsiders.

Excellent points; "rigorous" would have been a better choice. I haven't yet had the time to study any computational fields, but I'm assuming the ones you list aren't built on the "fuzzy notions, and hand-waving" that Tao talks about.

I should also add I don't necessarily agree 100% with every in Lockhart's Lament; I do think, however, that he does an excellent job of identifying problems in how secondary school math is taught and does a better job than I could of contrasting "follow the instructions" math with "real" math to a lay person.

Interesting. One of my recurring themes is that mathematics and statistics are very different things and require different kind of brains/thinking -- people good at one will rarely be good at the other, too.

If you define mathematics as being about proofs (and not so much about computation), the distinction becomes more pronounced: statistics isn't about proofs at all, it's about dealing with uncertainty. There are certainly areas where they touch (e.g. proving that certain estimators have certain properties), but at their core, mathematics and statistics are not similar at all.

I'm skeptical that there is any such distinction. "Computational" math is near-worthless in the absence of a proof of correctness for what you're computing. Even statistics relies on such proofs, though sometimes these can only provide approximate results. (For instance, maximum-likelihood methods are precisely optimal under the simplifying assumption of uniform priors and a 0-1 loss function.)

Even statistics relies on such proofs

Statistical tools rely on such proofs.

Statistics is an applied science, similar to engineering. It has to deal with the messy world where you might need to draw conclusions from a small data set of uncertain provenance where some outliers might be data entry mistakes (or maybe not), you are uncertain of the shape of the distributions you are dealing with, have a sneaking suspicion that the underlying process is not stable in time, etc. etc. None of the nice assumptions underlying nice proofs of optimality apply. You still need to analyse this data set.

Statistics is an applied science, similar to engineering.

Except for all that pesky theoretical statistics.

Math people can have that :-) It is, basically, applied math, anyway.

Except it's not math. Disciplines are socially constructed, statistics is what statisticians do. Applied math is what applied math people do. There are lots of very theoretical stats departments. I think you are having a similar confusion people have sometimes about computer science and programming.

I think if you say stuff like "well, all those people who publish in Annals of Statistics are applied math people" I am not sure what you are really saying. There is some intersection w/ applied math, ML, etc., but theoretical stats has their own set of big ideas that define the field and give it character.

I think you are having a similar confusion people have sometimes about computer science and programming.

I don't think I do? I am well aware of the famous Dijkstra's quote.

As you mentioned, statistics is what statisticians do. Most statisticians don't work in academia. I don't doubt there are a lot of theory-heavy stats deparments, just like there are a lot of physics-heavy engineering departments.

Going up one meta-level, I'm less interested in what discipline boundaries have the social reality constructed, and more interested in feeling for the joint in the underlying territory.

Not sure why we are having this discussion. Statistics is a discipline with certain themes, like "intelligently using data for conclusions we want." These themes are sufficient to give it its own character, and make it both an applied and theoretical discipline. I don't think you are a statistician, right? Why are you talking about this?

Statistics is as much an applied discipline as physics.

Why are you talking about this?

Because I'm interested in the subject. Do you have objections?

You can post about whatever you want. I have objections if you start mischaracterizing what statistics is about for fun on the internet. Fun on the internet is great, being snarky on the internet is ok, misleading people is not.

edit: In fact, you can view this whole recent "data science" thing that statisticians are so worried about as a reaction to the statistics discipline becoming too theoretical and divorced from actual data analysis problems. [This is a controversial opinion, I don't think I share it, quite.]

I don't believe I'm mischaracterizing statistics. My original point was an observation that, in my experience, good mathematicians and good statisticians are different. Their brains work differently. To use an imperfect analogy, good C programmers and good Lisp programmers are also quite different. You just need to think in a very different manner in Lisp compared to C (and vice versa). That, of course, doesn't mean that a C programmer can't be passably good in Lisp.

I understand that in the academia statistics departments usually focus on theoretical statistics. That's fine -- I don't in particular care about "official" discipline boundaries. For my purposes I would like to draw a divide between theoretical statistics and, let's call it practical statistics. I find it useful to classify theoretical statistics as applied math, and practical statistics as something different from that.

Data science is somewhat different from traditional statistics, but I'm not sure its distinction lies on the theoretical-practical divide. As a crude approximation, I'd say that traditional statistics is mostly concerned with extracting precise and "provable" information out of small data sets, and data science tends to drown in data and so loves non-parametric models and ML in particular.

Ok, I am not interested in wasting more time on this, all I am saying is:

Math people can have that :-) It is, basically, applied math, anyway.

This is misleading. Theoretical statistics is not applied math, either. I think you don't know what you are talking about, re: this subject.

None of the nice assumptions underlying nice proofs of optimality apply.

Well, this is a matter of degree. There is a reason we use these tools in the first place. A good statistician must be quite aware of the underlying assumptions of each tool, if only so that they can switch to something else when warranted. (For instance, use "robust" methods which try to identify and appropriately discount outliers.)

A good statistician must be quite aware of the underlying assumptions of each tool

Well, of course.

and appropriately discount outliers

Heh. The word "appropriately" is a tricky one. There is a large variety of robust methods which use different ways of discounting outliers, naturally with different results. The statistician will need to figure out what's "appropriate" in this particular case and proofs don't help here.

What's your goal for which you want to learn math?

My main academic interests relate to the fundamentals of communication (analogous to micro economics), along with the pattern by which information and knowledge flows throughout society (like macro economics).

Until recently my focus has been on natural language, which is why I decided to learn Japanese. Without deep understanding in a second language, my endeavor to understand the process of natural-language communication (including not only words but also gestures and so on) would be hopelessly limited. I've also spent many thousands of hours constructing various artificial verbal languages for personal note-taking and linguistic experimentation.

Over the past few days, however, I've started to turn my attention to mathematics. While languages such as English, Japanese, and so forth are one-dimensional systems isomorphic to a large range of reality and constrained by the oddities of the automatic pathways we call our "natural-language hardware", my understanding is that many fields of mathematics function as more complex and precise isomorphic systems which operate in terms of brain functions more properly called "S2" or "manual". Often they transcend the 1D line of verbal language to 2D diagrammatic representations.

See this passage from Ernst Mach (1838-1916):

Language, the instrument of this communication, is itself an economical contrivance. Experiences are analysed, or broken up, into simpler and more familiar experiences, and then symbolized at some sacrifice of precision. The symbols of speech are as yet restricted in their use within national boundaries, and doubtless will long remain so. But written language is gradually being metamorphosed into an ideal universal character. It is certainly no longer a mere transcript of speech. Numerals, algebraic signs, chemical symbols, musical notes, phonetic alphabets, may be regarded as parts already formed of this universal character of the future; they are, to some extent, decidedly conceptual, and of almost general international use. The analysis of colors, physical and physiological, is already far enough advanced to render an international system of color-signs perfectly practical.

Clearly his vision of mathematics and other pencil-and-paper artificial representational systems growing and eventually combining into a single general-use international language has not come to pass in the intervening 100+ years. Mathematics has remained a specific-use tool that boasts high levels of complexity and precision within its isolated sections of thought representation and world modeling, while having extremely low coverage of the range of topic space. Humans have made huge industrial advancements, but we still fall back on the tribal device we call "words" for most of our communication attempts.

I've spent a huge number of hours designing artificial verbal-language systems which resemble natural languages except without the grammatical irregularities or folk psychology and physics, but I hold no illusion as to the point. It's a stopgap measure that I'm using to gain greater understanding of the limitations of word-based communication in an age where such systems still reign supreme. My hope for the future lies not in words, but in general-use diagrammatic or visual communication systems which include software involvement.

It would be inefficient or even irresponsible of me to attempt to make meaningful contributions within this field without possessing a solid understanding of the historical development and epistemological underpinnings of certain high-bandwidth mathematical systems. The conclusion is that it's unimportant which mathematical field I pursue at least in the beginning, provided the field is important within the context of human societal development and in engaging the material I gain a nuanced understanding of the content and a deep appreciation of how the originators created the system. Only once I develop fluency in a sufficient number of areas will I know which specific fields to consider further.

In short: I'm interested in developing a general-purpose 2D or 3D visual representational system. Attempting such an endeavor without having an appreciation for historical attempts to create non-verbal languages would be careless.

You should definitely learn some model theory, it's about the relationship of language and subject.

provided the field is important within the context of human societal development and in engaging the material I gain a nuanced understanding of the content and a deep appreciation of how the originators created the system.

I'll suggest investigating the problem of "squaring the circle." It has it's roots in the origins of mathematics, passes through geometric proofs (including the notions of formal proofs and proof from elementary axioms), was unsolved for 2000 years in the face of myriad attempts, and was proved impossible to solve using the relatively modern techniques of abstract algebra.

The linked site has references (some already mentioned in this thread) that may be helpful ...

R.Courant and H.Robbins, What is Mathematics?, Oxford University Press, 1996

H.Dorrie, 100 Great Problems Of Elementary Mathematics, Dover Publications, NY, 1965.

W.Dunham, Journey through Genius, Penguin Books, 1991

M.Kac and S.M.Ulam, Mathematics and Logic, Dover Publications, NY, 1968.

including ...

R.B.Nelsen, Proofs Without Words, MAA, 1993

which may be of special interest to you.

Most people are bad at understanding. As students they usually prefer to memorize things, because it is a strategy that works best in short term. When they grow up and become teachers, they recite things to students and expect them to memorize it.

In math, in addition to memorizing facts verbally, there is also a lot of procedural knowledge (solving equations). This is probably one of the reasons most people hate math. But even the procedural knowledge can be taught in the memorizing way; only the verbal memory is replaced by the muscle memory.

Understanding is a step yet beyond procedural knowledge. Most people don't get there; even most teachers don't.

And being able to explain stuff to beginners -- that's the ultimate art. It requires not only having a good understanding of the topic, but also being able to untangle it to a linear thread that can be gradually fed to a human and will allow them to build a proper model of the topic. This requires also an understanding of humans, and an understanding of understanding.

So why aren't most math textbooks better? I guess it's either because there are not enough good mathematicians who also happen to be good at explaining to beginners... or maybe the market for textbooks that teach understanding simply is not big enough.

If you want to learn a specific topic, maybe you could ask about it on LW.

I agree with you that procedural knowledge is frequently based upon memorization. However you then use this other term: understanding. Are you sure that understanding is distinct from memorization of lots of related concepts and then drawing inferences of the relations between those concepts? Possibly understanding is the memorization of certain concepts which can be applied to a variety of other concepts.

Edit - To put this another way it seems like you're saying we focus too much on crystallized intelligence and not enough on fluid intelligence however it seems to be harder to increase fluid intelligence, and it seems to me that increases in crystallized intelligence can at least partly compensate for deficiencies in fluid intelligence.

I guess these things are not sharply divided, but for me the difference between "memorization" and "understanding" is whether the fact is isolated or connected to a larger network, and whether if you forget it you have a chance to rediscover it.

As a consequence, when people use "memorization", the more they know, the harder it becomes, because they have a longer list of facts to remember. While if they have "understanding", the more they know, the larger is the existing network where they can plug the new things. Human memory is built in a way that makes it easier to recall things that are connected to other things.

A good way to teach math is to have students discover various things, starting at elementary school. You achieve it mostly by providing problems where the solution is already within their reach. The art is to take each "inferential step" and try to split it into multiple smaller substeps whenever possible; and then you just create a list of problems that make the student discover the individual substeps one by one. (The other important aspect is continuous debugging: you ask the students to explain their solutions by "thinking loudly" i.e. being explicit about everything they do, and you observe whether their generated model is correct. If they mess up something, e.g. create a wrong generalization, it is best to provide a problem where their approach will be obviously wrong, so they notice and correct themselves. Another method, if at least some students in the class got the model right, is to let the class debate.)

If you do it right, not only will the students get deep understanding, but they will also like the math, because it will feel like something they discovered for themselves (the IKEA effect?), instead of something that an authority told them to believe (suggesting lower status).

Unfortunately, for many teachers it is difficult to use this method correctly, if they are used to teach (and learn) by memorization. This is a recursive problem, because to use this method properly, you must understand why it works, instead of just trying to immitate the steps, immitate them incorrectly, and then see that your students got stuck and didn't discover the thing they were supposed to discover. At that point there are two options: (a) say "screw it" and give the information to the students explicitly to memorize, and keep telling everyone that the new system doesn't work; or (b) keep waiting until the magic happens, without doing the necessary steps that would make it happen, and later have the parents complain that their child is in the third grade and still cannot do the basic addition.

Are you sure that understanding is distinct from memorization of lots of related concepts and then drawing inferences of the relations between those concepts?

Some people will memorize a lot and still fail to draw the right inferences. Sometimes because they memorized some parts incorrectly, or missed/forgot some important parts. Sometimes it is compartmentalization; it doesn't occur to them that some things can also be used in different contexts. Drawing some inferences from the very beginning is a more reliable approach.

it seems like you're saying we focus too much on crystallized intelligence and not enough on fluid intelligence

I am not sure I use these terms correctly, but seems to me that the process of discovery of some concept requires fluid intelligence, but from then on, the concept itself becomes a part of crystallized intelligence. Actually, it seems the other way round: having the students draw inferences in many little steps requires them to only use a little fluid intelligence at a time; but remembering many unconnected facts and then having to discover the important patterns on their own would require greater fluid intelligence, but only once in a while. Many conveniently small gulps, versus a few large ones that make most people choke.

Ironically, there are two groups of students that seem to achieve the greatest benefits from the teaching method I am trying to describe here -- those who do the mathematical olympiad, and those who completely suck at math. For the former, having a method that relies more on understanding than on memorization, allows them to go much further, and to use the skills in unusual contexts. For the latter, having a method that leads them in little steps is the only way to learn anything, instead of remaining stuck at the very beginning.

It is frequently the average student who complains about the method as unnecessary. Because the average students are already good at memorization (that's what they do all the time at school), and the curricullum is more or less designed to fit into their memory (if that's the style that most teachers and students use, expecting to learn more would be unrealistic).

I graduated in applied math. Some people mock with me, but I keep prominently in my bookshelf both The Complete Idiot's Guide to Calculus and Statistics. That Calculus book was the one that made me understand math and made me passionate about it.

Unfortunately there isn't a lot out there that is like what you are looking for. Here are some books that I've read that may come close.

I like the books and essays written by Steven Strogatz, a professor at Cornell University. He's written some things intended for the general public, including a pop science book called Sync and a series of essays in the New Yorker. He also writes journal articles and textbooks. He has a way with words, of being able to describe complicated mathematical concepts without equations. Here is his website: http://www.stevenstrogatz.com

However, the two books that most awoke my love of fractals and mathematics were:

Fractals: The Patterns of Chaos: Discovering a New Aesthetic of Art, Science, and Nature by John Briggs http://www.amazon.com/Fractals-Patterns-Discovering-Aesthetic-Touchstone/dp/0671742175/

An Eye For Fractals: A Graphic And Photographic Essay by Michael Mcguire http://www.amazon.com/Eye-Fractals-Graphic-Photographic-Nonlinearity/dp/0201554402/

They are introductions. They might not go into as much depth as you want. I'm not sure a single book would.

Life in Moving Fluids by Steve Vogel of Duke University is a mixture of biology, fluid dynamics and mathematics. It can be appreciated without background knowledge, but it skips over some explanations, so probably not all of it would be accessible. But it does delve into serious depths. The first edition has more explanations than the later edition. http://www.amazon.com/Life-Moving-Fluids-Princeton-Paperbacks/dp/0691026165/

Prime Mover: A Natural History of Muscle, by the same author Steven Vogel is easier to read, but has less math. http://www.amazon.com/Prime-Mover-Natural-History-Muscle/dp/0393021262/

Design in Nature: How the Constructal Law Governs Evolution in Biology, Physics, Technology, and Social Organizations by Adrian Bejan, also a professor at Duke University That's the introductory book, but if you want more detail, he has other books and scientific papers. I like his book Shape and Structure in Engineering and Nature, but it doesn't explain things very well, and a different book may be a better next step. http://www.amazon.com/Design-Nature-Constructal-Technology-Organizations/dp/0307744345/

Complex Adaptive Systems: An Introduction to comptutational models of social life (Princeton studies in complexity) by John H. Miller and Scott E. Page It was pretty readable to me, compared to textbooks in general.

Prime Obsession: Bernhard Riemann and the Greatest Unsolved Problem in Mathematics by John Derbyshire It's an interesting topic and the book is well-written, but somehow I didn't manage to finish it. So I don't recommend it as strongly as some of the others. http://www.amazon.com/Prime-Obsession-Bernhard-Greatest-Mathematics/dp/0452285259/

Structures: Or Why Things Don't Fall Down by J.E. Gordon A good introduction to concepts of tension and compression and how they are used in buildings. Uses some math. http://www.amazon.com/Structures-Things-Dont-Fall-Down/dp/0306812835/

What is Mathematics? by Courant and Robbins is a classic exploration that goes reasonably deep into most areas of math.

You could try Godel Escher Bach: An Eternal Golden Braid...

On a slightly meta-level, I only got over my bad attitude towards calculus by taking astronomy and physics courses in which it was used to figure out interesting stuff...

http://www.amazon.com/Calculus-Easy-Way-Series/dp/0812091418/ was pretty good, there are also others in the series. It teaches calculus with a sci-fi story where discovering new calculus ideas helps the protagonist. Basically HPMOR, except more directly didactical, on a specific subject, and with no real plot.

I'm convinced that humans must spike their blood sugar and/or pump their body full of stimulants such as caffeine in order to get past the natural tendency to find it unbearably dull to memorize words and syntax by rote and lifeless connection with the structures in their native language.

Just a comment: This is certainly not true for every human. Some people really enjoy that.

Learn about proofs and practice doing proof problems.

I recommend trying to solve problems. If you don't know how to solve a certain problem, join IRC, mailing list or some other math-related community. Look for humans rather than numbers. f(x)=y+5 are some dead characters that only make sense with some supplementary knowledge.

Books are good, but in the long run human communication is probably more important. You will eventually meet with a problem you can't solve or you might have questions or the answer might be unclear for some reason or another.

Perhaps a Mathematics for Philosophers book like this http://www.amazon.com/dp/1551119099 ?