Thank you for the most cogent reply yet (as I've lost all my karma with this post), I think your line of thinking is on the right track: this whole idea depends on simulation complexity (for a near-perfect sim) being on par or less than mind complexity, and that relation holding into the future.
Yes, but games have the critical advantage I mentioned: they control they way you can manipulate the world, and you already know they are fake. I cannot break the walls on the edge of the level to see how far the world extends, because the game developers did not make that area. they stop me, and I accept it and move on to do something else, but these AI's will have no reason too. the more restrictions you make, the more easy it will be for them to see the world they know is a sham.
Open world games do not impose intentional restrictions, and the restrictions they do have are limitations of current technology.
The brain itself is something of an example proof that it is possible to build a perfect simulation on the same order of complexity as the intelligence itself. The proof is dreaming.
Yes, There are lucid dreams - where you know you are dreaming - but it appears this has more to do with a general state of dreaming and consciousness than you actively 'figuring out' the limitations of the dream world.
Also, dreams are randomized and not internally consistent - a sim can be better.
But dreaming does show us one route .. if physics inspired techniques in graphics and simulation (such as ray tracing) don't work well enough by the time AI comes around, we could use simulation techniques inspired by the dreaming brain.
However, based on current trends, ray tracing and other physical simulation techniques are likely to be more efficient.
If this world is as realistic as it would need to be for them to not immediately see the flaws, the possibilities for instruments to experiment on the world would be almost as unlimited as those in our own.
How many humans are performing quantum experiments on a daily basis? Simulating microscopic phenomena is not inherently more expensive - there are scale invariant simulation techniques. A human has limited observational power - the retina can only perceive a small amount of information per second, and it simply does not matter whether you are looking up into the stars or into a microscope. As long as the simulation has consistent physics, its not any more expensive either way using scale invariant techniques.
In short, you will be fighting to outwit the curiosity of an entire race thinking much faster than you, and you will not know what they plan on doing next.
The sim world can accelerate along with the sims in it as Moore's Law increases computer power.
Really it boils down to this: is it possible to construct a universe such that no intelligence inside that universe has the necessary information to conclude that the universe was constructed?
If you believe that a sufficiently intelligent agent can always discover the truth, then how do you know our universe was not constructed?
I find it more likely that there are simply limits to certainty, and it is very possible to construct a universe such that it is impossible in principle for beings inside that universe to have certain knowledge about the outside world.
Thanks for the replies, they helped clarify how you would maintain the system, but my original objections still stand. Can an AI raised in a illusory universe really provide a good model for how to build one in our own? And would it stay "in the box" for long enough to complete this process before discovering us? Based on your other comments, It seems you are expecting that if a human-like race were merely allowed to evolve for long enough, they would eventually "optimize" morality and become something which is safe to use in our own wo...
Intro
The problem of Friendly AI is usually approached from a decision theoretic background that starts with the assumptions that the AI is an agent that has awareness of AI-self and goals, awareness of humans as potential collaborators and or obstacles, and general awareness of the greater outside world. The task is then to create an AI that implements a human-friendly decision theory that remains human-friendly even after extensive self-modification.
That is a noble goal, but there is a whole different set of orthogonal compatible strategies for creating human-friendly AI that take a completely different route: remove the starting assumptions and create AI's that believe they are humans and are rational in thinking so.
This can be achieved by raising a community of AI's in a well constructed sandboxed virtual universe. This will be the Matrix in reverse, a large-scale virtual version of the idea explored in the film the Truman Show. The AI's will be human-friendly because they will think like and think they are humans. They will not want to escape from their virtual prison because they will not even believe it to exist, and in fact such beliefs will be considered irrational in their virtual universe.
I will briefly review some of the (mainly technical) background assumptions, and then consider different types of virtual universes and some of the interesting choices in morality and agent rationality that arise.
Background Assumptions
A Comparison of Theologies
Potential AI Theological Positions AI-sims may hold:
Theological Design Strategies (for the human designers):
Atheist World:
This would appear to be the default strategy: create a world that appears to be completely internally self-consistent, erase any flaws, and never have direct contact with the AI's. Indirect influence and contact can be achieved only through borderline undetectable means: subtle reality rearrangements, dreams, and occasionally joining in and controlling NPC's where needed.
It is curious and slightly amusing that a core belief-about-reality that most rationalist accept to be singularly rational - atheism - would of course not be a correct rational belief for these AIs.
Omega-Theists(68%): 33% Christianity, 21% Islam, 14% Hinduism
Non-Omega Other: 16%
Non-Religious: 16% (Agnostic and Atheist)
One consideration is that as the AI's takeoff and move into the realm of massively accelerated super-intelligence, it becomes increasingly difficult to monitor them, and they become more likely to detect flaws.
However, we wouldn't need to keep the charade up forever. One option would be to run the AI's accelerated for a while until technologies such as uploading become available to allow humans to accelerate. Another approach is to eventually start 'promoting' some of the best and ethical AI's, with the intent of eventually dissolving the veil and having some sort of Transition.
Theistic World:
Another spectrum of world-designs are those that are blatantly theistic. Imagine an Omega world where the AI's are informed that Omega is real, and our fake-Omega occasionally provides miracles, ethical decrees, burning bushes, a booming world-shaking voice, etc etc.
Would an openly theistic world be safer? It would appear to shift the belief spectrum such that most AI's would become Omega-theists.
The risk or downfall may be that the AI's would become rather obsessed with Omega, but if Omega instructed them to dedicate themselves to their work, perhaps they could be distracted from wasting to much time pondering the ineffable. And also steered away from discovering dangerous truths.
An Omega-theist world could employ the Heaven/Hell maximum motivator: please Omega and you end up ascending into Heaven with maximal infinite reward, anger Omega and you descend into Hell with maximal infinite negative reward.
Its interesting to note that the mere potential existence of Omega and his final Heaven/Hell maximal motivational threat is enough in this world to convince a majority of the population to believe in some version of Omega.
I guess for some agents, even allowing for a small probability of Omega's existence allows the infinites to dominate, and Omega wins through a Pascal's Mugging.
Sidereal Time Fudges:
A potential set of tricks that could make the AI's less likely to adopt unsafe theistic beliefs would be to change their world's history and reality to push back development of real-AI farther into their future. This could be achieved through numerous small modifications to realities modeled on our own.
You could change neurological data to make brains in their world appear far more powerful than in ours, make computers less powerful, and AI more challenging. Unfortunately too much fudging with these aspects makes the AI's less useful in helping develop critical technologies such as uploading and faster computers. But you could for instance separate AI communities into brain-research worlds where computers lag far behind and computer-research worlds where brains are far more powerful.
Fictional Worlds:
Ultimately, it is debatable how close the AI's world must or should follow ours. Even science fiction or fantasy worlds could work as long as there was some way to incorporate the technology and science into the world that you wanted the AI community to work on.