Here's my take on the idea: Boxed Censored Simulation Testing: a meta-plan for AI safety which seeks to address the 'no retries' problem
Mod note: I'm kind on the fence about whether to approve posts like this (i.e. fairly 101 AI questions that have been discussed before). I ended up letting it through because although I'm pretty sure this has been discussed a fair amount I couldn't actually think of any good links to reply with.
Is there any good thought to read about what if we used all this fancy compute to first build a hugely detailed, one-way-glass model of our universe to contain AGI?
From a naive standpoint, it seems like maybe we could sort of "incept" the intellectual work we want automated by manipulating the simulation around the agent.