At the current AGI-12 conference, some designers have been proponents of keeping AGI's safe by bringing them up in human environments, providing them with interactions and feedback in a similar way to how we bring up human children. Obviously that approach would fail for a fully smart AGI with its own values - it would pretend to follow our values for as long as it needed, and then defect. However, some people have confidence if we started with a limited, dumb AGI, then we could successfully inculcate our values in this way (a more sophisticated position would be that though this method would likely fail, it's more likely to succeed than a top-down friendliness project!).
The major criticism of this approach is that it anthropomorphises the AGI - we have a theory of children's minds, constructed by evolution, culture, and our own child-rearing experience. And then we project this on the alien mind of the AGI, assuming that if the AGI presents behaviours similar to a well-behaved child, then it will become a moral AGI. The problem is that we don't know how alien the AGI's mind will be, and if our reinforcement is actually reinforcing the right thing. Specifically, we need to be able to find some way of distinguishing between:
- An AGI being trained to be friendly.
- An AGI being trained to lie and conceal.
- An AGI that will behave completely differently once out of the training/testing/trust-building environment.
- An AGI that forms the wrong categories and generalisations (what counts as "human" or "suffering", for instance), because it lacks human-shared implicit knowledge that was "too obvious" for us to even think of training it on.
An AGI that is not either deeply neuromorphic or possessing a well-defined and formally stable utility function sounds like... frankly one of the worst ideas I've ever heard. I'm having difficulty imagining a way you could demonstrate the safety of such a system, or trust it enough at any point to give it enough resources to learn. Considering that the fate of intelligent life in our future light cone may hang in the balance, standards of safety must obviously be very high! Intuition is, I'm sorry, simply not an acceptable criteria on which to wager at least billions, and perhaps trillions of lives. The expected utility math does not wash if you actually expect OpenCog to work.
On a more technical level, human values are broadly defined as some function over a typical human brain. There may be some (or many) optimizations possible, but not such that we can rely on them. So, for a really good model of human values, we should not expect to need less than the entropy of a human brain. In other words, nobody, whether they're Eliezer Yudkowsky with his formalist approach or you, is getting away with less than about ten petabytes of good training samples. Those working on uploads can skip this step entirely, but neuromorphic AI is likely to be fundamentally less useful.
And this assumes that every bit of evidence can be mapped directly to a bit in a typical human brain map. In reality, for a non-FOOMed AI, the mapping it likely to be many orders of magnitude less efficient. I suspect, but cannot demonstrate right now, that a formalist approach starting with a clean framework along the lines of AIXI is going to be more efficient. Quite aside from that, even assuming you can acquire enough data to train your machine reliably, then you still need it to do... something. Human values include a lot of unpleasant qualities. Simply giving it human values and then allowing it to grow to superhuman intellect is grossly unsafe. Ted Bundy had human values. If your plan is to train it on examples of only nice people, then you've got a really serious practical problem of how to track down >10 petabytes of really good data on the lives of saints. A formalist approach like CEV, for all the things that bug me about it, simply does not have that issue, because its utility function is defined as functions of the observed values of real humans.
In other words, for a system that's as alien as the architecture of OpenCog, even if we assume that the software is powerful and general enough to work (which I'm in no way convinced of), attempting to inculate it with human values is extremely difficult, dangerous, and just plan unethical.