This is a linkpost for https://openai.com/research/improving-mathematical-reasoning-with-process-supervision
From just reading your excerpt (and not the whole paper), it is hard to determine how much alignment washing is going on here.
But maybe these worries can be answered from reading the full paper...
Our results below show that process supervision in fact incurs a negative alignment tax
Some compelling arguments are given that alignment tax would be negative when this method is used to improve safety. The actual experimental results are about improving/eliciting capabilities and don't explore application of the method for safety, except by drawing an analogy.
I am shocked that higher quality training data based on more effortful human feedback produced a better result.
Paper here.
Related Prediction Market.