KatjaGrace comments on Superintelligence 20: The value-loading problem - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (21)
What do you think of Ernest Davis' view? Is the value loading problem a problem?
Did anyone else immediately try to come up with ways Davis' plan would fail? One obvious failure mode would be in specifying which dead people count - if you say "the people described in these books," the AI could just grab the books and rewrite them. Hmm, come to think of it: is any attempt to pin down human preferences by physical reference rather than logical reference vulnerable to tampering of this kind, and therefore unworkable? I know EY has written many times before about a "giant logical function that computes morality", but this puts that notion in a bit of a different light for me. Anyway, I'm sure there other less obvious ways Davis' plan could go wrong too. I also suspect he's sneaking a lot into that little word, "disapprove".
In general though, I'm continually astounded at how many people, upon being introduced to the value loading problem and some of the pitfalls that "common-sense" approaches have, still say "Okay, but why couldn't we just do [idea I came up with in five seconds]?"
Not as such, no. It's a possible failure mode, similar to wireheading; but both of those are avoidable. You need to write the goal system in such a way that makes the AI care about the original referent, not any proxy that it looks at, but there's no particular reason to think that's impossible.
Agreed.
Davis massively underestimates the magnitude and importance of the moral questions we haven't considered, which renders his approach unworkable.
I don't. Building a transhuman civilization is going to raise all sorts of issues that we haven't worked out, and do so quickly. A large part of the possible benefits are going to be contingent on the controlling system becoming much better at answering moral questions than any individual humans are right now. I would be extremely surprised if we don't end up losing at least one order of magnitude of utility to this approach, and it wouldn't surprise me at all if it turns out to produce a hellish environment in short order. The cost is too high.
I don't understand what scenario he is envisioning, here. If (given sufficient additional information, intelligence, rationality and development time) we'd agree with the morality of this result, then his final statement doesn't follow. If we wouldn't, it's a good old-fashioned Friendliness failure.