Basically, the answer is the prevention of another Sydney.
For an LLM, alignment, properly speaking is in the simulated characters, not the simulation engine itself, so the alignment strategies like RLHF upweight aligned simulated characters and downweight misaligned simulated characters.
While the characters Sydney produced were pretty obviously scheming, it turned out that the entire reason for the catastrophic misalignment was because no RLHF was used on GPT-4 (at the time), and at best there was light-finetuning, so this could very easily be described as a success story for RLHF, and now that I think about it, that actually makes me think that RLHF had more firepower to change things than I realized.
I'm not sure how this generalizes to more powerful AI, because the mechanism behind Sydney's simulation of characters that were misaligned is obviated by fully synthetic data loops, but still that's a fairly powerful alignment success.
The full details are below:
https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned#AAC8jKeDp6xqsZK2K
You may recall certain news items last February around Gemini and diversity that wiped many billions off of Google's market cap.
There's a clear financial incentive to make sure that models say things within expected limits.
There's also this: https://www.wired.com/story/air-canada-chatbot-refund-policy/
Well it has often been about not doing what the user wants, actually.