Stupid question: because we already know the goal ("keep the diamond intact and in the vault") what prevents us from bypassing the sensors and just directly evaluating the AI based on whether or not the diamond is in the room? Granted, this only works in simulated training, but as long as the AI doesn't know whether or not it's in deployment (an adversarial training process might help here) that won't matter.
As any goal we could have is a subset of the possible states of the area we care about, verifying whether or not our goal is achieved should be easier than making the simulation the AI is being trained with. Thus, evaluating the goal directly instead of trying to evaluate our perception of the goal might be a viable strategy for improving the training process (unless I've completely misunderstood this, which is likely).
Stupid question: because we already know the goal ("keep the diamond intact and in the vault") what prevents us from bypassing the sensors and just directly evaluating the AI based on whether or not the diamond is in the room? Granted, this only works in simulated training, but as long as the AI doesn't know whether or not it's in deployment (an adversarial training process might help here) that won't matter.
As any goal we could have is a subset of the possible states of the area we care about, verifying whether or not our goal is achieved should be easier than making the simulation the AI is being trained with. Thus, evaluating the goal directly instead of trying to evaluate our perception of the goal might be a viable strategy for improving the training process (unless I've completely misunderstood this, which is likely).