So, here's my pet theory for <1-person friendly> AI that I'd love to put out of it's misery: "Don't do anything your designer wouldn't approve of". It's loosely based on the "Gandi wouldn't take a pill that would turn him into a murderer" principle.
A possible implementation: Make an emulation of the designer and use it as an isolated component of the AI. Any plan of action has to be submitted for approval to this component before being implemented. This is nicely recursive and rejects plans such as "make a plan of action deceptively complex such that my designer will mistakenly approve it" and "modify my designer so that they approve what I want them to approve".
There could be an argument about how the designer's emulation would feel in this situation, but.. torture vs. dust specks! Also, is this a corrupted version of <1-person CEV>?
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
What exactly are we trying to learn from this thought experiment that we cannot already learn from the torture/dust-speck experiment?