The "model" you never name for the stock price distribution is a normal or Gaussian distribution. You point out that this model fails spectacularly to predict the highest variability point 20 sigma event. What you don't point out that this model fails to predict even the 5 sigma events.
Looking at the plot of "Daily Changes in the Dow," we see it is plotted over fewer than 95 years. Each trading year has a little less than 260 trading days in it. So plotted should be at most 24,000 daily changes. For a normal distribution, a 1/24000 event, the biggest event we would expect to happen once on this whole graph, would be somewhere between 4 and 4.5 sigma.
But instead of seeing one, or close to one 4.5 sigma or higher events in 24,000 daily changes, we see about 28 by my rough count looking at the graph. The data starts in 1915. By 1922 we have seen 5 "1 in100 years" events.
My point being: by 1922, we know the model that daily changes in the stock market fit a normal or gaussian distribution is just pure crap. We don't need to wait 70 years for 20 sigma event to know the model isn't just wrong, it is stupid. We suspect this fact within the first year or two of data, and we know with high reliability that the model is garbage by 1922.
What I don't understand is why anybody rational would ever state the data fits a normal distribution, in the trivially obvious event that even within a few hundred events, we see an overwhelmingly obvious excess of large daily changes from what a normal distribution would predict.
One need not go to a mathematically well-described alternative distribution to produce a much better prediction of how often 20 sigma events will actually occur. Simply plotting the histogram of daily changes will show the tail probability falling off much more slowly than a normal distribution predicts. Before the 20 sigma event occurs, one might by estimating from the actual distribution of changes underestimate its probably by a factor of a few (or not, I have not done the exercise on this data), but one will not have underestimated it by tens of orders of magnitude.
There is no "uncertainty" in modeling daily stock market changes as normally (or log-normally) distributed. One can look at just the first few 100 points in the measured data, to be CERTAIN that a normal or a log-normal distribution is simply wrong.
The model used is the Black-Scholes model with, as you point out, a normal distribution. It endures, despite being clearly wrong, because there doesn't seem to be any good alternatives.
Most models attempting to estimate or predict some elements of the world, will come with their own estimates of uncertainty. It could be the Standard Model of physics predicting the mass of the Z boson as 91.1874 ± 0.0021 GeV, or the rather wider uncertainty ranges of economic predictions.
In many cases, though, the uncertainties in or about the model dwarf the estimated uncertainty in the model itself - especially for low probability events. This is a problem, because people working with models often try to use the in-model uncertainty and adjust it to get an estimate of the true uncertainty. They often realise the model is unreliable, but don't have a better one, and they have a measure of uncertainty already, so surely doubling and tripling this should do the trick? Surely...
The following three cases are going to be my go-to examples for showing what a mistake this can be; they cover three situations: extreme error, being in the domain of a hard science, and extreme negative impact.
Black Monday
On October 19, 1987, the world's stock markets crashed, shedding a huge value in a very short time. The Dow Jones Industrial Average dropped by 22.61% that day, losing between a fifth and a quarter of its value.
How likely was such an event, according to the prevailing models of the time? This was apparently a 20-sigma event, which means that the event was twenty standard deviations away from the expected behaviour.
Such events have a probability of around 10-50 of happening, which in technical mathematical terms is classified as "very small indeed". If every day were compressed into a second, and the stock markets had been running since the big bang... this gives us only about 1017 seconds. If every star in the observable universe ran its own stock market, and every day were compressed into a second, and we waited a billion times the lifetime of the universe... then we might expect to observe a twenty sigma event. Once.
No amount of reasonable "adjusting" of the probability could bring 10-50 into something plausible. What is interesting, though, is that if we took the standard deviation as the measure of uncertainty, the adjustment is much smaller. One day in a hundred years is a roughly 3x10-5 event, which corresponds very roughly to three standard deviations. So "simply" multiplying the standard deviation by seven would have been enough. It seems that adjusting ranges is more effective than adjusting probabilities.
Castle Bravo
But economics is a soft science. Surely errors couldn't occur in harder sciences, like physics? In nuclear bomb physics, where the US had access to the best brains and the best simulations, and some of the best physical evidence (and certainly a very high motivation to get it right), such errors could not occur? Ok, maybe at the very beginning, but by 1954, the predictions must be very accurate?
The Caste Bravo hydrogen bomb was the highest yield nuclear bomb ever detonated by the United States (though not necessarily intended to be). The yield was predicted to be 6 megatons of TNT, within a maximum range of 4 to 8. It ended up being an explosion of 15 megatons of TNT, around triple the expectation, with fallout landing on inhabited islands(on the Rongelap and Utirik atolls) and spreading across the world, killing at least one person (a Japanese fisherman).
What went wrong? The bomb designers considered that the lithium-7 isotope used in the bomb was essentially inert, when in fact it... wasn't. And that was sufficient to triple the yield of the weapon, far beyond what the designers had considered possible.
Again, extending the range is more successful than adjusting the probabilities of high events. And even the hard sciences are susceptible to great errors.
Physician, wash thyself
The previous are good examples of dramatic underestimates of uncertainty by the model's internal measure of uncertainty. They are especially good because we have numerical estimates of the internal uncertainty. But they lack one useful rhetorical component: evidence of a large scale disaster. Now, there are a lots of of models to choose from which dramatically underestimated the likelihood of disaster, but I'll go with one problem in particular: the practice that doctors used to have of not washing their hands.
Ignaz Semmelweis noted in 1847 that women giving birth in the presence of doctors died at about twice the rate of those attended only by midwives. He correctly deduced that doctors were importing something from their experiments on cadavers, and that this was causing the women to die. He instituted a policy of using a solution of chlorinated lime for washing hands between autopsy work and the examination of patients - with great success, sometimes even reducing the death rate to zero in some months.
This caused a problem. Semmelweis was unable to convince other doctors about this, for a variety of standard biases. But the greatest flaw was that he had no explanation for this behaviour. Yes, he could point at graphs and show improvements - but there was nothing in standard medical theory that could account for it. The other doctors could play around with their models or theories for days, but they could never explain this type of behaviour.
His claims, in effect, lacked scientific basis. So they ignored them. Until Pasteur came up with a new model, there was just no way to understand these odd results.
The moral of this is that sometimes the uncertainty can not only be much greater than that of the model. Sometimes the uncertainty isn't even visible anywhere in the model.
And sometimes making these kinds of mistakes can lead to millions of unnecessary deaths, and the decisions made (using doctors for childbirths) have the absolute opposite effect than was intended.