Stuart_Armstrong comments on The mathematics of reduced impact: help needed - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (94)
That's the general problem, yes. And here it's probably not even a utility function we're using (the penalty function, which involves counterfactual calculations, seems a different kind of beast).
Yes, this is a problem I alluded to. Here's a solution (a poor one, but it may show the problem is not completely unsolvable): the disciple is also boxed, hidden from view, surrounded by chaotic processes. And there is another process Y which will (with very low probability) annihilate the disciple in one hour along with everything in the box. We make the master AI indifferent to Y (so, from its perspective, the disciple will have little impact whatever happens in the future, as long as it stays in the box).
And once using reduced impact AI comes into general use, they can be unboxed: a single success or failure will have little impact on human society.
So, after all the matrioshka-incinerators have finished their little dance, what do you actually have to show for it?