Then why would it be more difficult to make scope boundaries a 'value' than increasing a reward number? Why is it harder to make it endorse a time limit to self-improvement than making it endorse increasing its reward number?
... it seems to me that this state of "I am experiencing this compulsion/phobia, but I don't endorse it, and I want to be rid of it, so let me look for a way to bypass or resist or eliminate it" is precisely what it feels like to be an algorithm equipped with a rule that enforces/prevents a set of choices which it isn't engineered to optimize for.
But where does that distinction come from? To me such a distinction between 'value' and 'compulsion' seems to be anthropomorphic. If there is a rule that says 'optimize X for X seconds' why would it make a difference between 'optimize X' and 'for X seconds'?
But where does that distinction come from?
It comes from the difference between the targets of an optimizing system, which drive the paths it selects to explore, and the constraints on such a system, which restrict the paths it can select to explore.
An optimizing system, given a path that leads it to bypass a target, will discard that path... that's part of what it means to optimize for a target.
An optimizing system, given a path that leads it to bypass a constraint, will not necessarily discard that path. Why would it?
An optimizing system, given a pat...
Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?