MrMind comments on Open Thread, Aug. 22 - 28, 2016 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (67)
(1) Given: AI risk comes primarily from AI optimizing for things besides human values.
(2) Given: humans already are optimizing for things besides human values. (or, at least besides our Coherent Extrapolated Volition)
(3) Given: Our world is okay.^[CITATION NEEDED!]
(4) Therefore, imperfect value loading can still result in an okay outcome.
This is, of course, not necessarily always the case for any given imperfect value loading. However, our world serves as a single counterexample to the rule that all imperfect optimization will be disastrous.
(5) Given: A maxipok strategy is optimal. ("Maximize the probability of an okay outcome.")
(6) Given: Partial optimization for human values is easier than total optimization. (Where "partial optimization" is at least close enough to achieve an okay outcome.)
(7) ∴ MIRI should focus on imperfect value loading.
Note that I'm not convinced of several of the givens, so I'm not certain of the conclusion. However, the argument itself looks convincing to me. I’ve also chosen to leave assumptions like “imperfect value loading results in partial optimization” unstated as part of the definitions of those 2 terms. However, I’ll try and add details to any specific areas, if questioned.
I think a problem arises with conclusion 4: I can agree that humans imperfectly steering the world for their own values has resulted in a world averagely ok, but AI will possibly be much more powerful than humans.
As far as corporation and sovereing states can be seen to be super-human entities, then we can see that imperfect value optimization has created massive suffering: think of all the damage a ruthless corporation can inflict e.g. by polluting the environment, or a state where political assassination is easy and widespread.
An imperfectly aligned value optimization might result in an average world that is ok, but possibly this world would be separated in a heaven and hell, which I think is not an acceptable outcome.
This is a good point. Pretty much all the things we're optimizing for which aren't our values are due to coordination problems. (There's also Akrasia/addiction sorts of things, but that's optimizing for values which we don't endorse upon reflection, and so arguably isn't as bad as optimizing for a random part of value-space.)
So, Moloch might optimize for things like GDP instead of Gross National Happiness, and individuals might throw a thousand starving orphans under the bus for a slightly bigger yacht or whatever, but neither is fully detached from human values. Even if U(orphans)>>U(yacht), at least there’s an awesome yacht to counterbalance the mountain of suck.
I guess the question is precisely how diverse human values are in the grand scheme of things, and what the odds are of hitting a human value when picking a random or semi-random subset of value-space. If we get FAI slightly wrong, precisely how wrong does it have to be before it leaves our little island of value-space? Tiling the universe with smiley faces is obviously out, but what about hedonium, or wire heading everyone? Faced with an unwinnable AI arms race and no time for true FAI, I’d probably consider those better than nothing.
That's a really, really tiny sliver of my values though, so I'm not sure I'd even endorse such a strategy if the odds were 100:1 against FAI. If that's the best we could do by compromising, I'd still rate the expected utility of MIRI's current approach higher, and hold out hope for FAI.