Vaniver comments on The mathematics of reduced impact: help needed - Less Wrong

10 Post author: Stuart_Armstrong 16 February 2012 02:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (94)

You are viewing a single comment's thread.

Comment author: Vaniver 17 February 2012 03:49:52AM 2 points [-]

I said I would comment on the rest of the post here, but I'm finding that difficult to do.

The Penalty Functions section is easiest to comment on: the first two paragraphs are a reasonable suggestion (this looks a lot like my suggestion of a cost function, and so I'm predisposed to like it), but I'm stumped by the third paragraph. Are you penalizing the AI for the predictable consequences of it existing, rather than just the actions it takes?

My overall sense is that by trying to describe the universe from the top-down you're running into insurmountable challenges, because the universe is too much data. I would worry about a system that reliably makes one paperclip whose sensors only include one room first, and then use insights from that solution to attack the global solution.

I'm also not sure the reduced impact intuitions hold for any narrow AIs whose task is to somehow combat existential risk. (Imagine handing over control of some satellites and a few gravitational tethers to a disciple AI to minimize the risk of an asteroid or comet hitting the Earth.) In that case, what we want is for the future Earth-related uncertainty to have the same bumps as current uncertainty, but with different magnitudes- will our metric treat that differently from a future uncertainty which has slightly different bumps?

Comment author: Stuart_Armstrong 17 February 2012 09:14:30AM 1 point [-]

but I'm stumped by the third paragraph.

I'm just saying: here's a major problem with this approach, let's put it aside for now.

Are you penalizing the AI for the predictable consequences of it existing, rather than just the actions it takes?

We are penailising the master AI for the predictable consequences of the existence of the particular disciple AI it choose to make.

I'm also not sure the reduced impact intuitions hold for any narrow AIs whose task is to somehow combat existential risk.

No, it doesn't hold. We could hook it up to something like utility indifference or whatever, but the most likely is that reduced impact AI would be an interim stage on the way to friendly AI.