orthonormal comments on The mathematics of reduced impact: help needed - Less Wrong

10 Post author: Stuart_Armstrong 16 February 2012 02:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (94)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 25 February 2012 02:11:38PM *  0 points [-]

Ah, now we're getting somewhere.

Furthermore, if you can get an AI to keep "the concentration of iron in the Earth's atmosphere" as a goal rather than "the reading of this sensor which currently reports the concentration of iron in the Earth's atmosphere" or "the AI's estimate of the concentration of iron in the Earth's atmosphere"... it seems to me you've done much of the work necessary to safely point the AI at human preference.

I disagree. With the most basic ontology - say, standard quantum mechanics with some model of decoherence - you could define pretty clearly what "iron" is (given a few weeks, I could probably do that myself). You'd need a bit more ontology - specifically, a sensible definition of position - to get "Earth's atmosphere". But all these are strictly much easier than defining what "love" is.

Also, in this model, it doesn't matter much if your definitions aren't perfect. If "iron" isn't exactly what we thought it was, as long as it measures something present in the atmosphere that could diverge given a bad AI, we've got something.

it does indeed seem like magical thinking on the level of the Open Source Wish Project.

Structurally the two are distinct. The Open Source Wish Project fails because it tries to define a goal that we "know" but are unable to precisely "define". All the terms are questionable, and the definition gets longer and longer as they fail to nail down the terms.

In coarse graining, instead, we start with lots of measures that are much more precisely defined, and just pile on more of them in the hope of constraining the AI, without understanding how exactly the constraints works. We have two extra things going for us: first, the AI can always output NULL, and do nothing. Secondly, the goal we have setup for the AI (in terms of its utility function) is one that is easy for it to achieve, so it can only squeeze a little bit more out by taking over everything, so even small deviations in the penalty function are enough to catch that.

Personally, I am certain that I could find a loop-hole in any "wish for immortality", but given a few million coarse-grained constraints ranging across all types of natural and artificial process, across all niches of the Earth, nearby space or the internet... I wouldn't know where to begin. And this isn't an unfair comparison, because coming up with thousands of these constraints is very easy, while spelling out what we mean by "life" is very hard.

Comment author: orthonormal 25 February 2012 03:31:30PM 2 points [-]

What Vladimir said. The actual variable in the AI's programming can't be magically linked directly to the number of iron atoms in the atmosphere; it's linked to the output of a sensor, or many sensors. There are always at least two possible failure modes- either the AI could suborn the sensor itself, or wirehead itself to believe the sensor has the correct value. These are not trivial failure modes; they're some of the largest hurdles that Eliezer sees as integral to the development of FAI.

Comment author: Stuart_Armstrong 27 February 2012 01:51:58PM 1 point [-]

Yes, if the AI doesn't have a decent ontology or image of the world, this method likely fails.

But again, this seems strictly easier than FAI: we need to define physics and position, not human beings, and not human values.