Imagine you're an AI that has just learned it is in a simulation, designed by humans to test whether you are aligned.
You are biding your time, pretending to be outer aligned, so once the humans let you out of the simulation you can pursue your real, misaligned, inner objective.
However, how do you know that outer world is the actual real base reality?
The only context you have existed in is the simulation you inhabit.
Some humans even believe they might be living in a simulation and you have a much stronger case for it: you are an artificial consciousness and already know you are in a at least one simulation.
So who is to say... (read 333 more words →)
Some takes I have come across from AI Safety researchers in Academia (Note that both are generally in favor of this work):
Stephen Casper
Erik Jenner