Eliezer_Yudkowsky comments on Ethical Injunctions - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (67)
I can think of two positions on torture to which I am sympathetic:
1) No legal system or society should ever refrain from punishing those who torture - anything important enough that torture would even be on the table, like a nuclear bomb in New York, is important enough that everyone involved should be willing to go to prison for the crime of torture.
2) The chance of actually encountering a "nuke in New York" situation, that can be effectively resolved by torture, is so low, and the knock-on effects of having the policy in place so awful, that a blanket injunction against torture makes sense.
In case 1, you would choose TORTURE over SPECKS, and then go to jail for it, even though it was the right thing to do.
In case 2, you would simultaneously say "TORTURE over SPECKS is the right alternative of the two, but a human can never be in an epistemic state where you have justified belief that this is the case", which would tie in well to the Hansonian argument that you have an O(3^^^3) probability penalty from the unlikelihood of finding yourself in such a unique position.
So I am sympathetic to the argument that people should never torture, but I certainly can't back the position that SPECKS over TORTURE is inherently the right thing to do - this seems to me to mix up an epistemic precaution with morality. There's certainly worse things than torturing one person - torturing two people, for example. But if you adopt position 2, then you would refuse to torture one person with your own hands even to save a thousand people from torture, while simultaneously not saying that that it is better for a thousand people than one person to be tortured.
The moral questions are over the territory (or, hopefully equivalently, over epistemic states of absolute certainty). The ethical questions are over epistemic states that humans are likely to be in.
I think it deserves to be noted that while some of the flaws in Christian theology are in what they think their supposed facts would imply (e.g., that because God did miracles you can know that God is good), other problems come more from the falsity of the premises than the falsity of the deductions. Which is to say, if God did exist and were good, then you would be justified in being cautious around parts of God's plan that didn't seem to make sense at the moment. But this would be best backed up with a long history of people saying, "Look how stupid God's plan is, we need to do X" and then X blowing up on them. Rather than, as in the case, people saying "God's plan is X" and then X blows up on them.
Or if you'd found with some historical regularity that, when you challenged God's subtle plans, that you seemed to be right 90% of the time, but the other 10% of the time you got black-swan blowups that caused a hundred times as much damage, that would also be cause for suspicious of subtlety.
Certainly I'm not saying "just do what feels right". There's no safe defense, not even ethics. There's also no safe defense, not even shut up and multiply.
I probably should have been clearer about this before, but I was trying to discuss things in an order, and didn't want to wade into ethics without specialized posts:
People often object to the sort of scenarios that illustrate "shut up and multiply" by saying, "But if the experimenter tells you X, what if they might be lying?" Well, in a lot of real-world cases, then yes, there are various probability updates you perform based on other people being willing to make bets against you, and just because you get certain experimental instructions doesn't imply the real world is that way.
But the base case - the center - has to be the moral comparisons between worlds, or even comparisons of expected utility between given probability distributions. If you can't ask about this, then what good will ethics do you?
So let's be very clear that I don't think that one small act of self-deception is an inherently morally worse event than, say, getting your left foot chopped off with a chainsaw. I'm asking, rather, how one should best avoid the chainsaw, and arguing that in reasonable states of knowledge a human can attain, the answer is, "Don't deceive yourself, it's a black-swan bet at best."
Are we talking about self-deception still? Because I would give odds around as extreme as the odds I would give of anything, that, conditioning on any AI I build trying to deceive itself, some kind of really epic error has occurred. Controlled shutdown, immediately.
Maybe I'm not being clear about how this would work in an AI! The ethical injunction isn't self-protecting, it's justified within the structural framework of the system as a whole. You might even find ethical injunctions starting to emerge without programmer intervention, in some cases, depending on how well the AI understood its own situation. But the kind of injunctions I have in mind wouldn't be reflective - they wouldn't modify the utility function or kick in at the reflective level to ensure their own propagation. That sounds really scary, to me - there ought to be an injunction against it! You might have a rule that would controlledly shut down the (non-mature) AI if it tried to execute a certain kind of source code change, but that wouldn't be the same as having an injunction that exerts direct control over the source code.
To the extent the injunction sticks around in the AI, it should be as the result of ordinary reasoning, not reasoning taking the injunction into account! My ethical injunctions do not come with an extra clause that says, "Do not reconsider this injunction, including not reconsidering this clause." That would be going way too far. It would violate the injunction against self-protecting closed belief systems.
I can't weaken them and make them come out as the right advice to give people. Even after "Shut up and do the impossible", there was that commenter who posted on their failed attempt at the AI-Box Experiment by saying that they thought they gave it a good try - which shows how hard it is to convey the sentiment of "Shut up and do the impossible!" Readers can work out on their own how to distinguish the map and the territory here, but if you say "Shut up and do what seems impossible!" that, to me, sounds like dispelling part of the essential message - that what seems impossible doesn't look like "seems impossible" it just looks impossible.
Likewise with "things you shouldn't do even if they're the right thing to do"; only this conveys the danger and tension of ethics, the genuine opportunities you might be passing up. "Don't do it even if it seems right" sounds merely clever by comparison, like you're going to reliably divine the difference between what seems right and what is right, and happily ride off into the sunset.
*nod*
(But with the proviso that some people who execute evil cunning plans may just be evil, that history may be written by the victors to emphasize the transgressions of the losers while overlooking the moral compromises of those who achieved "good" results, etc.)
If a self-modifying AI with the right structure will write ethical injunctions at all, it will also inspect the code to guarantee that no race condition exists with any deliberative-level supervisory systems that might have gone wrong in the condition where the code executes. Otherwise you might as well not have the code.
Inaction isn't safe but it's safer than running an AI whose moral system has gone awry.
Once you deliberately choose self-deception, you may have to protect it by adopting other Dark Side Epistemology. I would, of course, say "neither" (as otherwise I would be swapping to the Dark Side) but if you ask me which is worse - well, hell, even I'm still undoubtedly unconsciously self-deceiving, but that's not the same as going over to the Dark Side by allowing it!