Stuart_Armstrong comments on Domesticating reduced impact AIs - All

9 Post author: Stuart_Armstrong 14 February 2013 04:59PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (104)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 15 February 2013 12:08:36PM 1 point [-]

You could also just have a single AI construct a counterfactual model where it was replaced by a resistor, compute R relative to this model, then maximize the utility U' = U - R. I like this better than the master/disciple model.

Counterfactuals about "what could have gone differently in the past" seem (potentially) harder than "what will happen in a probabilistic event in the future". I would like to use you model or something like it, it just seems a lot harder to specify.

Comment author: Manfred 15 February 2013 10:33:53PM 0 points [-]

Hmm, after some thought I 'm getting more pessimistic about finding a workable, broadly-effective definition of "different" with less complexity than "don't do things this model of a human doesn't want you to," though I'd still put it above 20%.

Example: maybe using the AI's predictive power "against paperclipping" would work. We want the AI to be free to change the inside of the box, but not want to exploit the outside for extra utility. So we use some magical method to sew together the inside of the box with a world that doesn't communicate with the box, and have the utility U' = 2*E(U in patchwork world) - E(U in real world), thus actually making having higher E(U) in the real world be bad. Does this work? No, the AI drops a meteor on its head in the real world to get the full points of almost 2.