All of Younes Kamel's Comments + Replies

I know about index funds. Even those are not nearly as safe as people think. It is a fallacy to assume that because the SP500 on average grows 7% a year that you will get a 7%/year return rate on your investment. Your true expected return is lower than that. People have a hard time predicting how they will behave in particular situations. They swear they won't sell after a crash, and yet they do. You might say you are not like that, but probabilistically speaking you probably are. You might get sick and need to get cash quick and sell while the market is d... (read more)

Your intuition is a model.

Sure, you can use a broad definition of "model" to include any decision making process. But I used the word model to refer to probabilistic and quantitative models.

A few examples of topics where you really don't include any cost/benefit estimates in your decision (as opposed to strawman examples of INCORRECT cost/benefit use) would go a long way.

Sure. An example from my life is "I refrain from investing in the stock market because we do not understand how it works and it is too uncertain". I don't rely on cost benefit analysis in ... (read more)

1eggsyntax
I think you haven't really responded to Dagon's key point here: You express concern about Caplan underestimating the importance of climate change. What if I think the risk of the Large Hadron Collider collapsing the false vacuum is a much bigger deal, and that any resources currently going to reduce or mitigate climate change should instead go to preventing false vacuum collapse. Both concerns have lots of unknown unknowns. On what grounds would you convince me -- or a decisionmaker controlling large amounts of money -- to focus on climate change instead? Presumably you think the likelihood of catastrophic climate change is higher -- on what basis? Probabilistic models may get weaker as we move toward deeper uncertainty, but they're what we've got, and we've got to choose how to direct resources somehow. Even under level 3 uncertainty, we don't always have the luxury of seeing a course of action that would be better in all scenarios (eg I think we clearly don't in my example -- if we're in the climate-change-is-higher-risk scenario, we should put most resources toward that; if we're in the vacuum-collapse-is-higher-risk scenario, we should put our resources there instead.
7Noosphere89
While I agree that the Efficient Market Hypothesis basically means you shouldn't pick stocks, indexes like the S&P 500 are pretty good to invest in due to you getting the risk-free rate. That's usually around 7% long term. Focus on long-term growth, and don't time the market. You can invest, as long as you are willing to focus on decades of holding a index.

it seems intuitively to me, someone who admittedly doesn't know much about the subject, like unknown unknowns could be accurately modeled most of the time with a long-tailed normal distribution

How fat tailed do you make it ? You said you use past extreme events to choose a distribution. But what if the past largest event is not the largest possible event ? What if the past does not predict the future in this case ?

You can say "the largest truck that ever crossed by bridge was 20 tons, therefore my bridge has to be able to sustain 20 tons" but that is a log... (read more)

Answer by Younes Kamel*50

I wrote a post summarizing misuses of statistics here. You can read that if you want a short version. If you want to learn to evaluate studies and gauge their rigor, then read Inuitive biostatistics by Harvey Motuslsky and Statistics done wrong by Alex Reinhart. These were my main sources for my post. After reading them you should have a good understanding of statistics intuitively, without necessarily knowing the math. If you have to read only one, then definitely go for Intuitive biostatistics. It includes perhaps 90% of the content of the other book and... (read more)

Perhaps the most important takeaway from our study is hidden in plain sight: the field is in danger of being drowned by noise. Different optimizers exhibit a surprisingly similar performance distribution compared to a single method that is re-tuned or simply re-run with different random seeds. It is thus questionable how much insight the development of new methods yields, at least if they are conceptually and functionally close to the existing population.

 

This is from the author's conclusion. They do also acknowledge that a couple optimizers seem to b... (read more)

You're right, I should have written "but it turns out most of them could be beaten by the untuned version of several competitors on the 5 five datasets", as one can see in the figures. Thank you for pointing it out, I'll edit the post.

3TLW
You're inadvertently P-hacking I think. There are some that are beaten on specific tasks by the untuned version of several competitors, but these improve in other tasks. Consider the following simple model: 1. Each algorithm has a normal distribution of scenario-specific performances with mean =<some parameter dependent on algorithm> and stddev=sqrt(2)/2 2. Each scenario has a normal distribution of run-specific performances with mean=<scenario-specific performance for this algorithm and task> and stddev=sqrt(2)/2 I have algorithm X, with mean=0. Overall run performance is roughly [1] a normal distribution with mean=0 and stddev=1. I improve this into algorithm Y, with mean=1. Overall run performance is roughly [1] a normal distribution with mean=1 and stddev=1. Algorithm X will have a higher mean scenario-specific performance than algorithm Y ~16%[2] of the time! 1. ^ I don't know offhand if this is exact; it appears close at the very least. 2. ^ >>> import random >>> count = 10**6 >>> sum(random.normalvariate(mu=0, sigma=2**0.5/2) > random.normalvariate(mu=1, sigma=2**0.5/2) for _ in range(count)) / count 0.158625

I'm not as versed in mistakes of meta-analysis yet, but I'm working on it ! Once I compile enough meta-analysis misuses I will add them to the post. Here is one that's pretty interesting :

https://crystalprisonzone.blogspot.com/2016/07/the-failure-of-fail-safe-n.html
 

Many studies still use fail-safe N to account for publication bias when it has been shown to be invalid. If you see a study that uses it you can act as if they did not account for publication bias at all.

2meedstrom
As someone who wants to do systematic review (meta-analysis with a certain rigidly prescribed structure), I will love to hear about the mistakes to watch out for!

100% agree with defaulting to non-gaussian distribution. That is what rigorous statistics would look like imo. 

I'm starting to realize that as well. It can give you the intuition without having to memorize theorems. I think I'm going to start using simulations a lot more.

1TLW
I find it's more helpful as a tool to catch wrong intuitions than as a crutch for missing intuition, personally. If you made a mistake with your simulation and you had the wrong intuition (or right intuition), you know something is up (unless the mistake happened to line up with a wrong intuition, at least). If you made a mistake with your simulation and you had no intuition, you're off in the weeds. Some general pieces of advice, from someone who does a surprising number of quick simulations for sanity-checking: 1. Try to introduce small amounts of correlation in everything. In actuality, everything[1] is correlated to some degree. Most of the time this does not matter. Every once in a while, it makes a huge difference. 2. Try to introduce small amounts of noise into everything. In actuality, everything[2] has noise to some degree. Most of the time this does not matter. Every once in a while, it makes a huge difference. 3. Beware biased RNGs. Both the obvious and the not so obvious. Most of the time this does not matter. Every once in a while, it makes a huge difference. 4. Beware floating-point numbers in general. You can write something quickly using floats. You can write something safely using floats. Have fun doing both at once. 1. Corollary: if you can avoid division (or rewrite to avoid division), use integers instead of floats. Especially if you're in a language with native bigints. 2. Rerunning with two different floating-point precisions (e.g. Decimal's getcontext().prec) can be a decent sanity check, although it's not a panacea[3]. 5. Beware encoding the same assumptions into your simulation as you did in your intuition. 6. R... I can't really say. 7. Python is decent. Python is also slow. 1. numpy is faster if you're doing block operations. If you can (and know how to) restructure your code to take advantage of this, numpy can be quick. If you don't, numpy can be even slower than standard Python. 2. PyPy can offer a significant p

Yes, for sure. You can still fall for selective skepticism where you scrutinize studies you "like" much more than studies you don't like. You can deal with that by systematically applying the same checklist to every study you read, but that might be time consuming. The real solution is probably a community that is versed in statistics and that have open debates on the quality of studies, perhaps cumulatively, biases will cancel each other if the community has enough diversity of thought. Hence the value of pluralism.

1meedstrom
First off, I like the compilation you made and I'm tempted to memorize it despite all I'm saying. This 'pluralism' solution does not feel meaty -- your last sentence "Hence the value of pluralism" sounds to me like an applause light. I mean yeah, ultimately you and I build a lot of what we know on trust in the whole collective of scientists. But it's not directly relevant/useful to say so; there should be a halfway good solution for yourself as a solo rationalist, and calibrating yourself against others' beliefs is an extra measure you may apply later. Because I still prefer all those others to have used good solo toolkits for themselves: it makes them more reliable for me too. Tentatively, for a real solution, I propose that it's better to focus on what right statistics looks like so that wrong statistics will automatically generate a feeling of puzzlement, and this way you still anyways get the ability to compare the quality of two studies. Or you could learn each type of misuse as part of thoroughly learning the concept where they apply, with focus on better understanding that concept, not on learning about the misuse.