All of Andrew Smith's Comments + Replies

Stupid question: because we already know the goal ("keep the diamond intact and in the vault") what prevents us from bypassing the sensors and just directly evaluating the AI based on whether or not the diamond is in the room? Granted, this only works in simulated training, but as long as the AI doesn't know whether or not it's in deployment (an adversarial training process might help here) that won't matter.

As any goal we could have is a subset of the possible states of the area we care about, verifying whether or not our goal is achieved should be easier... (read more)

4paulfchristiano
The hard part is building a simulation so good that an AI transfers perfectly from the simulation to the real world. This is already extremely difficult for simple robots (I actually worked on sim-to-real transfer as an intern at OpenAI), and in general the problem gets harder the smarter your AI gets (since it can "notice" more and more possible mismatches between your simulations are reality).