ChristianKl comments on Open Thread, Aug. 22 - 28, 2016 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (67)
(1) Given: AI risk comes primarily from AI optimizing for things besides human values.
(2) Given: humans already are optimizing for things besides human values. (or, at least besides our Coherent Extrapolated Volition)
(3) Given: Our world is okay.^[CITATION NEEDED!]
(4) Therefore, imperfect value loading can still result in an okay outcome.
This is, of course, not necessarily always the case for any given imperfect value loading. However, our world serves as a single counterexample to the rule that all imperfect optimization will be disastrous.
(5) Given: A maxipok strategy is optimal. ("Maximize the probability of an okay outcome.")
(6) Given: Partial optimization for human values is easier than total optimization. (Where "partial optimization" is at least close enough to achieve an okay outcome.)
(7) ∴ MIRI should focus on imperfect value loading.
Note that I'm not convinced of several of the givens, so I'm not certain of the conclusion. However, the argument itself looks convincing to me. I’ve also chosen to leave assumptions like “imperfect value loading results in partial optimization” unstated as part of the definitions of those 2 terms. However, I’ll try and add details to any specific areas, if questioned.
I don't that's a good description of the orthogonality thesis. An AI that optimizes for a single human value like purity could still produce huge problems.
Human's don't effectively self modify to achieve specific objectives in the way an AGI could.
Why do you believe that?
Probably not, but it highlights the relevant (or at least related) portion. I suppose I could have been more precise by specifying terminal values, since things like paperclips are obviously instrumental values, at least for us.
Agreed, except in the trivial case where we can condition ourselves to have different emotional responses. That's substantially less dangerous, though.
I'm not sure I do, in the sense that I wouldn't assign the proposition >50% probability. However, I might put the odds at around 25% for a Reduced Impact AI architecture providing a useful amount of shortcuts.
That seems like decent odds of significantly boosting expected utility. If such an AI would be faster to develop by even just a couple years, that could make the difference between winning and loosing an AI arms race. Sure, it'd be at the cost of a utopia, but if it boosted the odds of success enough it'd still have enough expected utility to compensate.