I disagree with almost everything you wrote, here are some counter-arguments:
That said, I don't claim that everything is perfect and we're all definitely going to be fine. Particularly, I agree that it will be hard or impossible to get everyone to follow this methodology, and I don't yet see a good plan to enforce compliance. I'm also afraid of what will happen if we get stuck on not being able to confidently align a system that we've identified as dangerous (in this case it will get increasingly more likely that the model gets deployed anyway, or that other less compliant actors will achieve a dangerous model).
Finally - I get the feeling that your writing is motivated by your negative outlook, and not by trying to provide good analysis, concrete feedback, or an alternative plan. I find it unhelpful.
I had the impression collective punishment was disallowed in the IDF, but as far as I can tell by googling this only applies to keeping soldiers from their vacations (including potentially a weekend). I couldn't find anything about the origin but I bet collectively keeping a unit from going home was pretty common before it was disallowed in 2015, and I think it still happens today sometimes even though it's disallowed.
source: https://www.idf.il/%D7%90%D7%AA%D7%A8%D7%99-%D7%99%D7%97%D7%99%D7%93%D7%95%D7%AA/%D7%90%D7%AA%D7%A8-%D7%94%D7%A4%D7%A7%D7%95%D7%93%D7%95%D7%AA/%D7%A4%D7%A7%D7%95%D7%93%D7%95%D7%AA-%D7%9E%D7%98%D7%9B-%D7%9C/%D7%9E%D7%A9%D7%98%D7%A8-%D7%95%D7%9E%D7%A9%D7%9E%D7%A2%D7%AA-33/%D7%A9%D7%99%D7%A4%D7%95%D7%98-03/%D7%9E%D7%A0%D7%99%D7%A2%D7%AA-%D7%97%D7%95%D7%A4%D7%A9%D7%94-33-0352/