I am extremely interested in these sorts of questions myself (message me if you would want to chat more about them). In terms of the relation between accuracy and calibration, I think you might be able to see some of this relation from Open Philanthropy's report on the quality of their predictions. In footnote 10, I believe they decompose Brier score into a term for miscalibration, a term for resolution, and a term for entropy.
Also, would you be able to explain a bit how it would be possible for someone who is perfectly calibrated at predicting rain to predict rain at 90% probability but the Bayes factor based on that information to not by 9? To me it seems like for someone to be perfectly calibrated at the 90% confidence level the ratio of it having rained to it not having rained whenever they predict 90% rain has to be 9:1 so P(say rain 90% | rain) = 90% and P(say rain 90% | no rain)=10%?
I think this is a very useful post that is talking about many of the right things. One question though: isn't it only worth focusing on the worlds where iterative design does not work for alignment to the extent to which progress can still be made towards mitigating those worlds? It appears to me that progress in technical fields is usually accomplished through iterative design, so it makes sense to have a high prior on non-iterative approaches being less effective. Depending on your specific numbers here, it seems like it could be worth it to pay attention to the areas more tractable for iterative design or less. I think its also misleading to think of iterative design as either working or failing. Fields have gradations of ability for prompt and high-quality feedback and ability for repeated trials. It also seems like problems that initially seem hard to iterate on can often be formulated in ways that allows better iteration (like the ELK problem being formulated in a way that allows for testing toy solutions and counterexamples). I worry that trying to focus in an unnuanced way about worlds where iterative design fails may miss out on opportunities to formulate some of these hard problems in ways that might make them easier to iterate on.
Really good post. Based on this, it seems extremely valuable to me to test the assumption that we already have animal-level AIs. I understand that this is difficult due to built-in brain structure in animals, different training distributions, and the difficulty of creating a simulation as complex as real life. It still seems like we could test this assumption by doing something along the lines of training a neural network to perform as well as a cat's visual cortex on image recognition. I predict that if this was done in a way that accounted for the flexibility of real animals that the AI wouldn't perform better than an animal at around cat or raven level (80% confidence). I predict that even if AI was able to out-perform a part of an animal's brain in one area, it would not be able to out-perform the animal in more than 3 separate areas as broad as vision (60% confidence). I am quite skeptical of greater than 20% probability of AGI in less than 10 years, but contrary evidence here could definitely make me change my mind.