I think you get the point but say openAI "trains" GPT-5 and it turns out to be so dangerous that it can persuade anybody of anything and it wants to destroy the world.
We're already screwed, right? Who cares if they decide not to release it to the public? Or like they can't "RLHF" it now, right? It's already existentially dangerous?
I guess maybe I just don't understand how it works. So if they "train" GPT-5, does that mean they literally have no idea what it will say or be like until the day that the training is done? And then they are like "Hey what's up?" and they find out?
We just "hope" that we will get first something that is dangerous but cannot outpower everyone, just trick some and then the rest will stop it. In your scenario, we are screwed yes. That's what this forum is about isn't it ;)