Vladimir_Nesov comments on The mathematics of reduced impact: help needed - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (94)
Ah, now we're getting somewhere.
I disagree. With the most basic ontology - say, standard quantum mechanics with some model of decoherence - you could define pretty clearly what "iron" is (given a few weeks, I could probably do that myself). You'd need a bit more ontology - specifically, a sensible definition of position - to get "Earth's atmosphere". But all these are strictly much easier than defining what "love" is.
Also, in this model, it doesn't matter much if your definitions aren't perfect. If "iron" isn't exactly what we thought it was, as long as it measures something present in the atmosphere that could diverge given a bad AI, we've got something.
Structurally the two are distinct. The Open Source Wish Project fails because it tries to define a goal that we "know" but are unable to precisely "define". All the terms are questionable, and the definition gets longer and longer as they fail to nail down the terms.
In coarse graining, instead, we start with lots of measures that are much more precisely defined, and just pile on more of them in the hope of constraining the AI, without understanding how exactly the constraints works. We have two extra things going for us: first, the AI can always output NULL, and do nothing. Secondly, the goal we have setup for the AI (in terms of its utility function) is one that is easy for it to achieve, so it can only squeeze a little bit more out by taking over everything, so even small deviations in the penalty function are enough to catch that.
Personally, I am certain that I could find a loop-hole in any "wish for immortality", but given a few million coarse-grained constraints ranging across all types of natural and artificial process, across all niches of the Earth, nearby space or the internet... I wouldn't know where to begin. And this isn't an unfair comparison, because coming up with thousands of these constraints is very easy, while spelling out what we mean by "life" is very hard.
You're missing the point: the distinction between the thing itself and various indicators of what it is.
I thought I was pretty clear on the distinction: traditional wishes are clear on the thing itself (eg immortality) but hopeless at the indicators; this approach is clear on the indicators, and more nebulous on how they achieve the thing (reduced impact).
By pilling on indicators, we are, with high probability, making it harder for the AI to misbehave, closing out more and more avenues for it to do so, pushing it to use methods that are more likely to fail. We only have to get the difference between "expected utility for minimised impact (given easy to max utility function)" and "unrestricted expected utility for easy to max utility function" (a small number) to accomplish our goals.
Will the method accomplish this? Will improved versions of the method accomplish this? Nobody knows yet, but given what's at stake, it's certainly worth looking into.