I don't trust a hypothetical arbitrary superintelligence but I agree that a superintelligence is too much power for any extant organization, which means that "corrigibility" is not an alignment win condition. An AI resisting modification to do bad things (whatever that might mean on reflection) seems like a feature, not a bug.
This is a special post for quick takes by p4rziv4l. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
I don't trust a hypothetical arbitrary superintelligence but I agree that a superintelligence is too much power for any extant organization, which means that "corrigibility" is not an alignment win condition. An AI resisting modification to do bad things (whatever that might mean on reflection) seems like a feature, not a bug.