This is turning out to be harder to get across than I figured. First you thought I thought an AI should keep its programmers awake until they died, now it should wirehead them? I'm not an orc.
I conjecture that \when you set an AI to start doing its thing, after endless simulations and consideration of whatever goals you've given it, you tell it not to dick with you, so that if you've accidentally made a murder-bot, you can turn it off.
The alternative is to have complete confidence in your extended testing. Which you presumably come close to (since you are turning on an AI), but why not also have the red button? What does it hurt?
It isn't trying to figure out clever ways to get around your restriction, because it doesn't want to. The world in which it pursues whatever goal you've given it is one in which it will double never try and hide anything from you or change what you'd think of it. It is, in a very real sense, showing off for you.
This is untrue, even simple reinforcement learning machines come up with clever ways to get around their restrictions, what makes you think an actually smart AI won't come up with even more ways to do it. It doesn't see this as "getting around your restrictions" - that's anthropomorphizing to assume that the AI decides to take on "subgoals" that are the exact same as your values - it just sees it as the most efficient way to get rewards.
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.