Thinking about what an unaligned AGI is more or less likely to do with its power, as an extension of instrumentally convergent goals and underlying physical and game theoretic constraints, is an IMO neglected and worthwhile exercise. In the spirit of continuing it, a side point follows:

I don't think turning Earth into a giant computer is optimal for compute-maximizing, because of heat dissipation. You want your computers to be cold, and a solid sphere is the worst 3D shape for that, because it is the solid with the lowest surface area to volume ratio. It is more likely that Earth's surface would be turned into computers, but then again, all that dumb mass beneath the computronium crust impedes heat dissipation. I think it would make more sense to put your compute in solar orbit. Plenty of energy from the Sun, and matter from the asteroid belts.

I might get around to writing a post about this.

That twin would have different weights, and if we are talking about RL-produced mesaoptimizers, it would likely have learned a different misgeneralization of the intended training objective. Therefore, the twin would by default have an utility function misaligned with that of the original AI. This means that while the original AI may find some usefulness in interpreting the weights of its twin if it wants to learn about its own capabilities in situations similar to the training environment, it would not be as useful as having access to its own weights.

Regarding point 10, I think it would be pretty useful to have a way to quantify how much useful thinking inside these recursive LLM models is happening within the (still largely inscrutable) LLM instances vs in the natural language reflective loop.