I would make clear distinction between risk of AGI going rogue and AGI being used by people with poor ethics. In general, the problem of preventing a machine of accidentally doing due to malfunction is very different from preventing people from malicious use. If the idea of AI scientist can solve the first problem then it is worth promotion.
Preventing bad actors from using AI is difficult in general because they could use open source version or develop one on their own - especially the state actors could do that. Thus, IMHO the best way to prevent, for example North Korea decides to use AI against US is for the US to have superior AI on its own.
How about using Yoshua Bengio's AI scientist (https://yoshuabengio.org/2023/05/07/ai-scientists-safe-and-useful-ai/) for alignment? The idea for AI scientist is to train AI to just understand the world (quite like LLMs do) but don't do any human values alignment. AI scientist just sincerely answers questions but doesn't consider any implications of providing the answer or whether humans like the answer or not. It doesn't have any goals.
When user asks the main autonomous system to produce detailed plan to achieve the given goal - this plan may be too complicated to be understood for human. Human may not spot potential hidden agenda. But AI scientist may be used to look at the plan and answer the questions about potential implications - can it be illegal, controversial, harm any humans, etc. Wouldn't that prevent rogue AGI scenario?