1 min read

12

Problem

Suppose you have a powerful AGI protecting a diamond in a vault[1]. You have various sensors in the room, all monitoring the diamond. There is a camera, watching the diamond. There is a pressure plate under the diamond confirming it is still in its stand. There are a total of 10 sensors. The powerful AGI can hack all 10 of your sensors.

How do you align the AGI such that it does not hack your sensors? You cannot tell (directly) whether the AGI believes the diamond is in the vault. You must describe your instructions to the AGI in terms of the sensor readings.

Solution

Instruct the AGI to perform the simplest series of actions that cause 9 of your sensors to agree and 1 of your sensors to disagree. Suppose (without loss of generality) that the sensor which disagrees is your camera. Either the AGI hacked the camera or the AGI has hacked your other nine sensors. The AGI has probably just hacked your camera. Carefully examine what the AGI did to hack your camera. Fix the vulnerabilities.

Repeat the above process until the AGI can no longer hack your sensors.


  1. This is a variant on ARC's ELK competition. ↩︎

New Comment
2 comments, sorted by Click to highlight new comments since:

Very nice solution.

You a word here:

Either the AGI has the camera or the AGI has hacked your other nine sensors.

Fixed the typo. Thanks.