Open Thread, Jun. 29 - Jul. 5, 2015

Gondolinian

The only one I can think of (though I can't find the specific article) is Goertzel's description of an architecture where the guardian component is separate from the main AGI

You're probably thinking of GOLEM. The Gödel machine is another proposal along somewhat similar lines.

Some discussions more directly related to your suggestion could be:

Roman Yampolskiy's proposal for "safe questions"

Our proposed protocol is based on the idea of asking what we will call ‘safe questions’ with restricted answers. We define a safe question as one for which a human being could find an answer without any help from superintelligence, all on his own merit. Why is this useful? Suppose that a scientist working on a cure for cancer is looking at two potential cures. Each one could be developed and tested without assistance from the AI, but would take about 3 years of hard work each. Which one to try first if the scientist believes that they have about an equal chance of working? If a wrong option is chosen to be tested first, humanity will still obtain a cure for cancer but it will happen 3 years later. What if we could ask the AI to suggest which option to try first? [...]

To make sure that a potential question is safe, a group of experts could be used to review it. The experts should all be trained AI safety engineers, meaning that they are familiar with the design of the AI and its confinement environment as well as the latest developments in machine ethics (Allen et al., 2006; Anderson & Anderson, 2007; Hall, 2000; Moor, 2006; Tonkens, 2009). Experts may also need to be trained in computer psychology a currently non-existent profession which might become a reality in the future (Epstein, 1997). [...]

The limited outbound channel shell consist of binary (0 or 1) answers next to original unmodified safe questions presented by the intermediary non-intelligent software in random order as to avoid communication via covert timing or resource channels which make steganography possible. The output should in no way be controllable by the AI, outside of the multiple choice menu.

"Thinking inside the box" (section 3.1.) also has some discussion about restricting an AGI's outputs to be safe by selecting the questions to be asked appropriately. There was also some discussion along these lines in the beginning of chap. 10 of Superintelligence, e.g., running multiple copies of different Oracles and comparing their answers with each other to avoid deception - "the answers of the different oracles have to match each other" could be a simple and formally definable criteria.

Kaj_Sotala11y110

JoshuaFox11y60

Thank you, Kaj. Those references are what I was looking for.

It looks like there might be a somewhat new idea here. Previous suggestions, as you mention, restrict output to a single bit; or require review by human experts. Using multiple AGI oracles to check each other is a good one, though I'd worry about acausal coordination between by the AGIs, and I don't see that the safety is provable beyond checking that answers match.

This new variant gives the benefit of provable restrictions and the relative ease of implementing a narrow-AI proof system to ch... (read more)

8

Open Thread, Jun. 29 - Jul. 5, 2015

8

8

8

Open Thread, Jun. 29 - Jul. 5, 2015

8

8