Is AGI alignment even possible in the long term? Will AGI simply outsmart our best defenses? It would be, after all, superhuman (and by an enormous margin). Isn’t it likely that an AGI will recognize what actions humans took to control it and simply undo those controls? Or just create a novel move, like AlphaGo did, and completely sidestep them. An AGI could also just wait until conditions are favorable to take charge. What is time to an immortal intelligence? Especially time as short as a few human lifespans. Unless mis-alignment is physically impossible, it seems as if all attempts will ultimately be futile. I hope I’m wrong.
Provided that AGI becomes smart enough without passing through the universe-destroying paperclip maximizer stage, one idea could be inventing a way for humanity to be, in some form, useful to the AGI, e.g. as a time-tested biological backup
I wouldn't call being kept as biological backup particularly beneficial for humanity, but it's the only plausible way humanity being useful enough for a sufficiently advanced AGI I can currently think of.
Destroying the universe might just take long enough for AGI to evolve itself sufficiently to reconsider. I should have actually used "earth-destroying" instead in the answer above.