At the current AGI-12 conference, some designers have been proponents of keeping AGI's safe by bringing them up in human environments, providing them with interactions and feedback in a similar way to how we bring up human children. Obviously that approach would fail for a fully smart AGI with its own values - it would pretend to follow our values for as long as it needed, and then defect. However, some people have confidence if we started with a limited, dumb AGI, then we could successfully inculcate our values in this way (a more sophisticated position would be that though this method would likely fail, it's more likely to succeed than a top-down friendliness project!).
The major criticism of this approach is that it anthropomorphises the AGI - we have a theory of children's minds, constructed by evolution, culture, and our own child-rearing experience. And then we project this on the alien mind of the AGI, assuming that if the AGI presents behaviours similar to a well-behaved child, then it will become a moral AGI. The problem is that we don't know how alien the AGI's mind will be, and if our reinforcement is actually reinforcing the right thing. Specifically, we need to be able to find some way of distinguishing between:
- An AGI being trained to be friendly.
- An AGI being trained to lie and conceal.
- An AGI that will behave completely differently once out of the training/testing/trust-building environment.
- An AGI that forms the wrong categories and generalisations (what counts as "human" or "suffering", for instance), because it lacks human-shared implicit knowledge that was "too obvious" for us to even think of training it on.
Thanks for sharing your personal feeling on this matter. However, I'd be more interested if you had some sort of rational argument in favor of your position!
The key issue is the tininess of the hyperbubble you describe, right? Do you have some sort of argument regarding some specific estimate of the measure of this hyperbubble? (And do you have some specific measure on mindspace in mind?)
To put it differently: What are the properties you think a mind needs to have, in order for the "raise a nice baby AGI" approach to have a reasonable chance of effectiveness? Which are the properties of the human mind that you think are necessary for this to be the case?
Well, consider this: it takes only a very small functional change to the human brain to make 'raising it as a human child' a questionable strategy at best. Crippling a few features of the brain produces sociopaths who, notably, cannot be reliably inculcated with our values, despite sharing 99.99etc% of our own neurological architecture.
Mind space is a tricky thing to pin down in a useful way, so let's just say the bubble is really tiny. If the changes your making are larger than the changes between a sociopath and a neurotypical human, then you should... (read more)