The only one I can think of (though I can't find the specific article) is Goertzel's description of an architecture where the guardian component is separate from the main AGI
You're probably thinking of GOLEM. The Gödel machine is another proposal along somewhat similar lines.
Some discussions more directly related to your suggestion could be:
Our proposed protocol is based on the idea of asking what we will call ‘safe questions’ with restricted answers. We define a safe question as one for which a human being could find an answer without any help from superintelligence, all on his own merit. Why is this useful? Suppose that a scientist working on a cure for cancer is looking at two potential cures. Each one could be developed and tested without assistance from the AI, but would take about 3 years of hard work each. Which one to try first if the scientist believes that they have about an equal chance of working? If a wrong option is chosen to be tested first, humanity will still obtain a cure for cancer but it will happen 3 years later. What if we could ask the AI to suggest which option to try first? [...]
To make sure that a potential question is safe, a group of experts could be used to review it. The experts should all be trained AI safety engineers, meaning that they are familiar with the design of the AI and its confinement environment as well as the latest developments in machine ethics (Allen et al., 2006; Anderson & Anderson, 2007; Hall, 2000; Moor, 2006; Tonkens, 2009). Experts may also need to be trained in computer psychology a currently non-existent profession which might become a reality in the future (Epstein, 1997). [...]
The limited outbound channel shell consist of binary (0 or 1) answers next to original unmodified safe questions presented by the intermediary non-intelligent software in random order as to avoid communication via covert timing or resource channels which make steganography possible. The output should in no way be controllable by the AI, outside of the multiple choice menu.
Thank you, Kaj. Those references are what I was looking for.
It looks like there might be a somewhat new idea here. Previous suggestions, as you mention, restrict output to a single bit; or require review by human experts. Using multiple AGI oracles to check each other is a good one, though I'd worry about acausal coordination between by the AGIs, and I don't see that the safety is provable beyond checking that answers match.
This new variant gives the benefit of provable restrictions and the relative ease of implementing a narrow-AI proof system to ch...
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.