If the problem would be solvable by an anarchist group of humans that can adhoc improvise their organization then it is not that clear that a bureaucracy of AIs is fit for the task.
The implication doesn't seem tight at all.
What if there are some groups of (intelligent, motivated, conscientious, etc.) humans that could solve the alignment problem but other groups cannot? As I understand your idea is to make the AIs mimic one of the groups in order to scale up their progress. But maybe the group you mimic is one of the ones that can't solve it, while other groups can.
Also I don't find it obvious that an AI that is trained to mimic the actions of a human will have the same consequences as that human will, because the AI might not deploy those actions in a strategically appropriate way.
I doubt it's possible to build a safe bureaucracy out of unsafe parts
The intended construction is to build a safer bureaucracy out of less safe parts/agents (or just less robustly safe ones). So they shouldn't break in most cases of running the bureaucracy, and the bureaucracy as a whole should break even less frequently. If the distillation of such a bureaucracy gives a safer part/agent than the original part, that is an iterative improvement. This doesn't need to change the game in one step, only improve the situation with each step, in a direction that is hard to formulate without resorting to the device of a bureaucracy. Otherwise this could be done with the more lightweight prompt/tuning setup, where the bureaucracy is just the prompt given to a single part/agent.
Epistemic Status: I don't particularly endorse this argument. Just curious to see whether people have any interesting or novel counterarguments.