CCC comments on Debunking Fallacies in the Theory of AI Motivation - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (343)
The consequences of a misunderstanding are not a function of the size of the misunderstanding. Rather, they are a consequence of the ability that the acting agent (in this case, the AI) has to influence the world. A superintelligent AI has an unprecedentedly huge ability to influence the world, therefore, in the worst case, the potential consequences of a misunderstanding are unprecedentedly huge.
The nature of the misunderstanding - whether small or large - makes little difference here. And the nature of the problem, that is, communicating with a non-human and thus (presumably) alien in nature artificial intelligence, is rife with the potential for misunderstanding.
Considering that an artificial intelligence - at least at first - might well have immense computational ability and massive intelligence but little to no actual experience in understanding what people mean instead of what they say, this is precisely the sort of misunderstanding that is possible if the only mission objective given to the system is something along the lines of "get three astronauts (human) to location designated (Moon)". (Presumably, instead of waiting several years, it would take a few minutes to order a rental car instead - assuming it knew about rental cars, or thought to look for them).
Now, if the AI is capable of solving the difficult problem of separating out what people mean from what they say - which is a problem that human-level intelligences still have immense trouble with at times - and the AI is compassionate enough towards humanity to actually strongly value human happiness (as opposed to assigning approximately as much value to us as we assign to ants), then yes, you've done it, you've got a perfectly wonderful Friendly AI.
The problem is, getting those two things right is not simple. I don't think your proposed structure guarantees either of those.
I am not surprised. I am very familiar with the effect - often, what one person means when they write something is not what another person sees when they read it.
That paragraph is what I saw when I read our paper. I think that it is likely that our implicit assumptions about the structure of such an AI differ drastically - I suspect that you are not mentioning (or are only briefly hinting at) the parts you consider obvious, and I am not considering those parts obvious and therefore not seeing them present at all.
This is incredibly important, and something that I did not see in your proposal. What is the nature of these constraints?
A properly Friendly AI will remain Friendly even as it improves its own intelligence. Agreed.
...I'm just not seeing how your proposed design brings us any closer to Friendliness.