Stuart_Armstrong comments on Domesticating reduced impact AIs - All
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (104)
Counterfactuals about "what could have gone differently in the past" seem (potentially) harder than "what will happen in a probabilistic event in the future". I would like to use you model or something like it, it just seems a lot harder to specify.
Hmm, after some thought I 'm getting more pessimistic about finding a workable, broadly-effective definition of "different" with less complexity than "don't do things this model of a human doesn't want you to," though I'd still put it above 20%.
Example: maybe using the AI's predictive power "against paperclipping" would work. We want the AI to be free to change the inside of the box, but not want to exploit the outside for extra utility. So we use some magical method to sew together the inside of the box with a world that doesn't communicate with the box, and have the utility U' = 2*E(U in patchwork world) - E(U in real world), thus actually making having higher E(U) in the real world be bad. Does this work? No, the AI drops a meteor on its head in the real world to get the full points of almost 2.