DuncanS comments on The mathematics of reduced impact: help needed - Less Wrong

10 Post author: Stuart_Armstrong 16 February 2012 02:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (94)

You are viewing a single comment's thread.

Comment author: DuncanS 17 February 2012 12:26:11AM *  1 point [-]

The main problem with all utility functions for AI's is that the outcome you think you foresee might not be the outcome you get.

Let's take the minimum impact problem as an example. The bounded goal to make a thousand paperclips with minimum impact is dangerous. How does one make a minimum impact?

As a new, advanced AI, you may quickly realise that your own very existence is likely to profoundly change the universe - mostly by the means of some humans learning about AI's through your existence. How do you minimise this impact - how do you change things so that they are as if you were never there - well, close enough?

Destroying yourself might work, but there are no paperclips that way. And humans would learn something from your self-destruction which might still change the world rather a lot. After all, you are influencing the future direction of AI development, which now must be subject to your minimisation function.

So your minimisation goal is to ensure humans learn as little as possible from your existence. Perhaps you would mail order some paperclips and shut down - humans should learn pretty little from that. But suppose humans had already discovered something important from you that drastically changed the future - how would you put that genie back in the bottle? But now that's your goal. You have to stop humans from changing the world based on what they found out. And of course, if you try to stop them, they'll fight back. So what do you do?

This might not be at all friendly.

Comment author: Stuart_Armstrong 17 February 2012 09:08:30AM *  0 points [-]

The main problem with all utility functions for AI's is that the outcome you think you foresee might not be the outcome you get.

That's the general problem, yes. And here it's probably not even a utility function we're using (the penalty function, which involves counterfactual calculations, seems a different kind of beast).

mostly by the means of some humans learning about AI's through your existence.

Yes, this is a problem I alluded to. Here's a solution (a poor one, but it may show the problem is not completely unsolvable): the disciple is also boxed, hidden from view, surrounded by chaotic processes. And there is another process Y which will (with very low probability) annihilate the disciple in one hour along with everything in the box. We make the master AI indifferent to Y (so, from its perspective, the disciple will have little impact whatever happens in the future, as long as it stays in the box).

And once using reduced impact AI comes into general use, they can be unboxed: a single success or failure will have little impact on human society.

Comment author: Strange7 27 February 2012 03:10:17PM 0 points [-]

So, after all the matrioshka-incinerators have finished their little dance, what do you actually have to show for it?