I figure that the AI would most likely create a copy of itself, modify that, and see how it turns out. Of course, once you have a sufficiently smart AI, you can probably trust that it knows how best to modify itself.
Is an AI smart enough to keep a copy of itself in a box? What happens when Eliezer plays the AI-boxing game against himself?
Peter Suber is the Director of the Harvard Open Access Project, a Senior Researcher at SPARC (not CFAR's SPARC), a Research Professor of Philosophy at Earlham College, and more. He also created Nomic, the game in which you "move" by changing the rules, and wrote the original essay on logical rudeness.
In "Saving Machines From Themselves: The Ethics of Deep Self-Modification" (2002), Suber examines the ethics of self-modifying machines, sometimes quite eloquently: