TheAncientGeek comments on Debunking Fallacies in the Theory of AI Motivation - LessWrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (343)
Humans can fail to realise the implications of uncontroversial statements. Humans are failing to realise that goal stability is architecture dependent.
But you shouldn't be, at least in an un scare quoted sense of values. Goals and values aren't descriptive labels for de facto behaviour. The goal if a paperclipper is to make paperclips; if it crashes, as an inevitable result of executing its code, we don't say, " Aha! It had the goal to crash all along".
Goal stability doesn't mean following code, since unstable systems follow their code too....using the actual meaning of "goal".
Meta: trying to defend a claim by changing the meaning of its terms is doomed to failure.