DSimon comments on The genie knows, but doesn't care - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (515)
That's not very realistic. If you trained AI to parse natural language, you would naturally reward it for interpreting instructions the way you want it to. If the AI interpreted something in a way that was technically correct, but not what you wanted, you would not reward it, you would punish it, and you would be doing that from the very beginning, well before the AI could even be considered intelligent. Even the thoroughly mediocre AI that currently exists tries to guess what you mean, e.g. by giving you directions to the closest Taco Bell, or guessing whether you mean AM or PM. This is not anthropomorphism: doing what we want is a sine qua non condition for AI to prosper.
Suppose that you ask me to knit you a sweater. I could take the instruction literally and knit a mini-sweater, reasoning that this minimizes the amount of expended yarn. I would be quite happy with myself too, but when I give it to you, you're probably going to chew me out. I technically did what I was asked to, but that doesn't matter, because you expected more from me than just following instructions to the letter: you expected me to figure out that you wanted a sweater that you could wear. The same goes for AI: before it can even understand the nuances of human happiness, it should be good enough to knit sweaters. Alas, the AI you describe would make the same mistake I made in my example: it would knit you the smallest possible sweater. How do you reckon such AI would make it to superintelligence status before being scrapped? It would barely be fit for clerk duty.
Realistically, AI would be constantly drilled to ask for clarification when a statement is vague. Again, before the AI is asked to make us happy, it will likely be asked other things, like building houses. If you ask it: "build me a house", it's going to draw a plan and show it to you before it actually starts building, even if you didn't ask for one. It's not in the business of surprises: never, in its whole training history, from baby to superintelligence, would it have been rewarded for causing "surprises" -- even the instruction "surprise me" only calls for a limited range of shenanigans. If you ask it "make humans happy", it won't do jack. It will ask you what the hell you mean by that, it will show you plans and whenever it needs to do something which it has reasons to think people would not like, it will ask for permission. It will do that as part of standard procedure.
To put it simply, an AI which messes up "make humans happy" is liable to mess up pretty much every other instruction. Since "make humans happy" is arguably the last of a very large number of instructions, it is quite unlikely that an AI which makes it this far would handle it wrongly. Otherwise it would have been thrown out a long time ago, may that be for interpreting too literally, or for causing surprises. Again: an AI couldn't make it to superintelligence status with warts that would doom AI with subhuman intelligence.
Why does the hard takeoff point have to be after the point at which an AI is as good as a typical human at understanding semantic subtlety? In order to do a hard takeoff, the AI needs to be good at a very different class of tasks than those required for understanding humans that well.
So let's suppose that the AI is as good as a human at understanding the implications of natural-language requests. Would you trust a human not to screw up a goal like "make humans happy" if they were given effective omnipotence? The human would probably do about as well as people in the past have at imagining utopias: really badly.
Semantic extraction -- not hard takeoff -- is the task that we want the AI to be able to do. An AI which is good at, say, rewriting its own code, is not the kind of thing we would be interested in at that point, and it seems like it would be inherently more difficult than implementing, say, a neural network. More likely than not, this initial AI would not have the capability for "hard takeoff": if it runs on expensive specialized hardware, there would be effectively no room for expansion, and the most promising algorithms to construct it (from the field of machine learning) don't actually give AI any access to its own source code (even if they did, it is far from clear the AI could get any use out of it). It couldn't copy itself even if it tried.
If a "hard takeoff" AI is made, and if hard takeoffs are even possible, it would be made after that, likely using the first AI as a core.
I wouldn't trust a human, no. If the AI is controlled by the "wrong" humans, then I guess we're screwed (though perhaps not all that badly), but that's not a solvable problem (all humans are the "wrong" ones from someone's perspective). Still, though, AI won't really try to act like humans -- it would try to satisfy them and minimize surprises, meaning that if would keep track of what humans would like what "utopias". More likely than not this would constrain it to inactivity: it would not attempt to "make humans happy" because it would know the instruction to be inconsistent. You'd have to tell it what to do precisely (if you had the authority, which is a different question altogether).