I don't want to eat babies.
If you gave me a pill that would make me want to eat babies, I would refuse to take that pill, because if I took that pill I'd be more likely to eat babies, and I don't want to eat babies.
That's a special case of a general principle: even if an AI can modify itself and act independently, if it doesn't want to do X, then it won't intentionally change its goals so as to come to want to do X.
So it's not pointless to design an AI with a particular goal, as long as you've built that AI such that it won't accidentally experience goal changes.
Incidentally, if you're really interested in this subject, reading the Sequences may interest you.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
I agree with you that "my desire is not do do X, therefore I wouldn't do X even if I knew it was the right thing to do" isn't a valid argument. It's also not what I said. What I said was "my desire is not do do X, therefore I wouldn't choose to desire to do X even if I could choose that." Whether it's right or wrong doesn't enter into it.
As for your scenario... yes, I agree with you that IF "eating babies is wrong" is the sort of thing that can be discovered about the world, THEN an AI could discover it, and THEREFORE is not guaranteed to continue eating babies just because it initially values baby-eating.
It is not clear to me that "eating babies is wrong" is the sort of thing that can be discovered about the world. Can you clarify what sort of information I might find that might cause me to "find out" that eating babies is wrong, if I didn't already believe that?
Let me get this straight, are you saying that if you believe X, there can't possibly exist any information that you haven't discovered yet that could convince your belief is false? You can't know what connections and conclusions might AI deduce out of every information put together. They might conclude that humanity is a stain of universe and even if they thought wiping humanity out wouldn't accomplish anything (and they strongly desired against doing so), they might wipe us out purely because the choice "wipe humanity" would be assigned higher value than the choice "not to wipe out humanity".
Also, is the statement "my desire is not do do X, therefore I wouldn't choose to desire to do X even if I could choose that." your subjective feeling, or do you base it on some studies? For example, this statement doesn't apply to me, as I would, under certain circumstances, choose to desire to do X, even if it was not my desire initially. Therefore it's not an universal truth, therefore may not apply to AI either.