A while back I read How to Measure Anything and found it fascinating. In my day job, I spend quite a bit of time trying to make sense of the world by looking at dashboards of requests, latencies, error rates, etc. (software systems).
After finishing the book and taking copious notes, I understood that it gave me a prepackaged process that I could apply as-is, but I found it very difficult to adapt to everyday situations. I don't think I picked up a good intuition about stats, in other words.
I'm looking to change that. Specifically, I want to learn to apply stats in these two situations:
- measuring things. Mostly software systems, but open to little experiments. Dan Luu used to measure a lot of fun things.
- understanding how others measure things. I'd like to be able to judge if claims made in a paper about covid spread or social media addiction are backed up by the math/data in the paper.
The challenge I'm facing is that I know a bunch of techniques, but not how they relate to each other and the problems they're meant to solve. To illustrate what I mean: I know how to get percentiles and calculate means, but until today morning I didn't know why averaging percentiles is usually a bad idea. I'm missing the map.
I've seen these books recommended as a good way to start:
- Statistics, 4th Edition 4th Edition, by Freedman, Pisani, and Purves
- Probability Theory: The Logic of Science, by Jaynes
- An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements, by Taylor
- Think Stats, by Downey
But I also wanted to ask someone familiar with the field:
- Is it best to start with an introductory textbook and branch out from there?
- Are there specific subfields / topics I should be focusing on (or avoiding)?
- Is what I'm looking to learn labeled in some way? For example, I can't tell if this is data analytics or data science or X.
Being able to accurately assess a paper's claims is, unfortunately, a very high bar. A large proportion of scientists fall short of it. see: [https://statmodeling.stat.columbia.edu/2022/03/05/statistics-is-hard-etc-again/]
Most people with a strong intuition for statistics have taken courses in probability. It is foundational material for the discipline.
If you haven't taken a probability course, and if you're serious about wanting to learn stats well, I would strongly recommend to start there. I think Harvard's intro probability course is good and has free materials: https://projects.iq.harvard.edu/stat110/youtube
I've taught out of Freedman, but not the other texts. It's well written, but it is targeted at a math-phobic audience. A fine choice if you do not wish to embark on the long path
Thanks, this is incredibly useful.
I think I understand enough to put together a curriculum to delve into this topic. Starting with the harvard course you recommended.