The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.
But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.
Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.
To that end: what are some examples of discoveries which nobody else was anywhere close to figuring out?
A few tentative examples to kick things off:
- Shannon's information theory. The closest work I know of (notably Nyquist) was 20 years earlier, and had none of the core ideas of the theorems on fungibility of transmission. In the intervening 20 years, it seems nobody else got importantly closer to the core ideas of information theory.
- Einstein's special relativity. Poincaré and Lorentz had the math 20 years earlier IIRC, but nobody understood what the heck that math meant. Einstein brought the interpretation, and it seems nobody else got importantly closer to that interpretation in the intervening two decades.
- Penicillin. Gemini tells me that the antibiotic effects of mold had been noted 30 years earlier, but nobody investigated it as a medicine in all that time.
- Pasteur's work on the germ theory of disease. There had been both speculative theories and scattered empirical results as precedent decades earlier, but Pasteur was the first to bring together the microscope observations, theory, highly compelling empirical results, and successful applications. I don't know of anyone else who was close to putting all the pieces together, despite the obvious prerequisite technology (the microscope) having been available for two centuries by then.
(Feel free to debate any of these, as well as others' examples.)
I don't think these conditions are particularly weak at all. Any prior that fulfils it is a prior that would not be normalised right if the parameter-function map were one-to-one.
It's a kind of prior people like to use a lot, but that doesn't make it a sane choice.
A well-normalised prior for a regular model probably doesn't look very continuous or differentiable in this setting, I'd guess.
The generic symmetries are not what I'm talking about. There are symmetries in neural networks that are neither generic, nor only present at finite sample size. These symmetries correspond to different parametrisations that implement the same input-output map. Different regions in parameter space can differ in how many of those equivalent parametrisations they have, depending on the internal structure of the networks at that point.
I know it 'deals with' unrealizability in this sense, that's not what I meant.
I'm not talking about the problem of characterising the posterior right when the true model is unrealizable. I'm talking about the problem where the actual logical statement we defined our prior and thus our free energy relative to is an insane statement to make and so the posterior you put on it ends up negligibly tiny compared to the probability mass that lies outside the model class.
But looking at the green book, I see it's actually making very different, stat-mech style arguments that reason about the KL divergence between the true distribution and the guess made by averaging the predictions of all models in the parameter space according to their support in the posterior. I'm going to have to translate more of this into Bayes to know what I think of it.