Tyrrell_McAllister2 comments on Ethics Notes - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (44)
So, should we think of the injunction as essentially a separate non-reflective AI that monitors the main AI, but which the main AI can't modify until it's mature?
If so, that seems to run into all the sorts of problems that you've pointed out with trying to hardcode friendly goals into AIs. The foremost problem is that we can't ensure that the "injunction" AI will indeed shut down the main AI under all those circumstances in which we would want it to. If the main AI learns of the "injunction" AI, it might, in some manner that we didn't anticipate, discover a way to circumvent it.
The kinds of people whom you've criticized might reply, "well, just hard code the injunction AI to shut down the main AI if the main AI tries to circumvent the injunction AI." But, of course, we can't anticipate what all such circumventions will look like, so we don't know how to code the injunction AI to do that. If the main AI is smarter than us, we should expect that it will find circumventions that don't look like anything that we anticipated.
This has a real analog in human ethical reasoning. You've focused on cases where people violate their ethics by convincing themselves that something more important is at stake. But, in my experience, people are also very prone to convincing themselves that they aren't really violating their ethics. For example, they'll convince themselves that they aren't really stealing because the person from whom they stole wasn't in fact the rightful owner. I've heard people who stole from retailers arguing that the retailer acquired the goods by exploiting sweatshops or their own employees, or are just evil corporations, so they never had rightful ownership of the goods in the first place. Hence, the thief reasons, taking the goods isn't really theft.
Similarly, your AI might be clever enough to find a way around any hard-coded injunction that will occur to us. So far, this "injunction" strategy sounds to me like trying to develop in advance a fool-proof wish for genies.