JoshuaFox comments on Open Thread, Jun. 29 - Jul. 5, 2015 - Less Wrong

5 Post author: Gondolinian 29 June 2015 12:14AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (210)

You are viewing a single comment's thread.

Comment author: JoshuaFox 29 June 2015 05:16:03AM *  2 points [-]

Could someone point me to any existing articles on this variant of AI-Boxing and Oracle AGIs:

The boxed AGI's gatekeeper is a simpler system which runs formal proofs to verify that AGI's output satisfies a simple, formally definable. The constraint is not "safety" in general but rather is narrow enough that we can be mathematically sure that the output is safe. (This does limit potential benefits from the AGI.)

The questions about what the constraint should be remains open, and of course the fact that the AGI is physically embodied puts it in causal contact with the rest of the universe. But as a partial or short-term solution, has anyone written about it? The only one I can think of (though I can't find the specific article) is Goertzel's description of an architecture where the guardian component is separate from the main AGI.

Comment author: Kaj_Sotala 29 June 2015 06:18:18PM 7 points [-]

The only one I can think of (though I can't find the specific article) is Goertzel's description of an architecture where the guardian component is separate from the main AGI

You're probably thinking of GOLEM. The Gödel machine is another proposal along somewhat similar lines.

Some discussions more directly related to your suggestion could be:

Our proposed protocol is based on the idea of asking what we will call ‘safe questions’ with restricted answers. We define a safe question as one for which a human being could find an answer without any help from superintelligence, all on his own merit. Why is this useful? Suppose that a scientist working on a cure for cancer is looking at two potential cures. Each one could be developed and tested without assistance from the AI, but would take about 3 years of hard work each. Which one to try first if the scientist believes that they have about an equal chance of working? If a wrong option is chosen to be tested first, humanity will still obtain a cure for cancer but it will happen 3 years later. What if we could ask the AI to suggest which option to try first? [...]

To make sure that a potential question is safe, a group of experts could be used to review it. The experts should all be trained AI safety engineers, meaning that they are familiar with the design of the AI and its confinement environment as well as the latest developments in machine ethics (Allen et al., 2006; Anderson & Anderson, 2007; Hall, 2000; Moor, 2006; Tonkens, 2009). Experts may also need to be trained in computer psychology a currently non-existent profession which might become a reality in the future (Epstein, 1997). [...]

The limited outbound channel shell consist of binary (0 or 1) answers next to original unmodified safe questions presented by the intermediary non-intelligent software in random order as to avoid communication via covert timing or resource channels which make steganography possible. The output should in no way be controllable by the AI, outside of the multiple choice menu.

  • "Thinking inside the box" (section 3.1.) also has some discussion about restricting an AGI's outputs to be safe by selecting the questions to be asked appropriately. There was also some discussion along these lines in the beginning of chap. 10 of Superintelligence, e.g., running multiple copies of different Oracles and comparing their answers with each other to avoid deception - "the answers of the different oracles have to match each other" could be a simple and formally definable criteria.
Comment author: JoshuaFox 30 June 2015 04:49:03AM *  4 points [-]

Thank you, Kaj. Those references are what I was looking for.

It looks like there might be a somewhat new idea here. Previous suggestions, as you mention, restrict output to a single bit; or require review by human experts. Using multiple AGI oracles to check each other is a good one, though I'd worry about acausal coordination between by the AGIs, and I don't see that the safety is provable beyond checking that answers match.

This new variant gives the benefit of provable restrictions and the relative ease of implementing a narrow-AI proof system to check it. It's certainly not the full solution to the FAI problem, but it's a good addition to our lineup of partial or short-term solutions in the area of AI Boxing and Oracle AI.

I'll get this feedback to the originator of this idea and see what can be made of it.