Dmytry comments on [draft] Concepts are Difficult, and Unfriendliness is the Default: A Scary Idea Summary - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (39)
It simply won't. Harm, value, life, we never defined those; they are the commonly agreed upon labels which we apply to things for communication purposes, and it works on a limited set of things that already exist but does not define anything outside context of this limited set.
It would have maximization of some sort of complexity metric (perhaps while acting conservatively and penalizing actions it can't undo to avoid self harm in the form of cornering oneself), which it first uses on itself to self improve for a while without even defining what self is. Consider evolution as example; it doesn't really define fitness in the way that humans do. It doesn't work like - okay we'll maximize the fitness that is defined so and so, so there's what we should do.
edit: that is to say, it doesn't define 'life' or 'harm'. It has a simple goal system involving some metrics, which incidentally prevents the self harm, and permits self improvement, in the sense that we would describe it this way like we would describe the shooting-at-short-part-of-visible-spectrum robot as blue-minimizing one (albeit that is not very good analogy as we define blue and minimization independently of the robot).