Vladimir_Nesov comments on Open Thread: April 2010 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (524)
By "show-stopper" I simply mean that we absolutely have to solve it in some way. Syntactic preference is one way, what you suggest could conceivably be another.
An advantage I see with syntactic preference is that it's at least more or less clear what are we working with: formal programs and strategies. This opens the whole palette of possible approaches to the remaining problems to try on. With "all mathematical structures" thing, we still don't know what we are supposed to talk about, there is as of now no way forward already at that step. At least syntactic preference allows to make one step further, to firmer ground, even though admittedly it's unclear what to do next.
I mean the "complexity of value"/"value is fragile" thesis. It seems to me quite convincing, and from the opposite direction, I have the "preference is detailed" conjecture resulting from the nature of preference in general. For "is it possible to build AI", we don't have similarly convincing arguments (and really, it's an unrelated claim that only contributes connotation of error in judgment, without giving an analogy in the method of arriving at that judgment).
I agree with "complexity of value" in the sense that human preference, as a mathematical object, has high information content. But I don't see a convincing argument from this premise to the conclusion that the best course of action for us to take, in the sense of maximizing our values under the constraints that we're likely to face, involves automated extraction of preferences, instead of writing them down manually.
Consider the counter-example of someone who has the full complexity of human values, but would be willing to give up all of their other goals to fill the universe with orgasmium, if that choice were available. Such an agent could "win" by building a superintelligence with just that one value. How do we know, at this point, that our values are not like that?
Whatever the case is with how acceptable the simplified values are, automated extraction of preference seems to be the only way to actually knowably win, rather than striking a compromise, which simplified preference is suggested to be. We must decide from information we have; how would you come to know that a particular simplified preference definition is any good? I don't see a way forward without having a more precise moral machine than a human first (but then, we won't need to consider simplified preference).