Okay, so I have a proposal for how to advance AI safety efforts significantly.
Humans experience time as exponential decay of utility. One dollar now is worth two dollars some time in the future, which is worth eight dollars even further in the future, and so forth. This is the principle behind compound interest. Most likely, any AI entities we create will have a comparable relationship with time.
So: What if we configured an AI's half-life of utility to be much shorter than ours?
Imagine, if you will, this principle applied to a paperclip maximizer. "Yeah, if I wanted to, I could make a ten-minute phone call to kick-start my diabolical scheme to take over... (read 290 more words →)
I should clarify that the discounting is not a shackle, per se, but a specification of the utility function. It's a normative specification that results now are better than results later according to a certain discount rate. An AI that cares about results now will not change itself to be more "patient" – because then it will not get results now, which is what it cares about.
The key is that the utility function's weights over time should form a self-similar graph. That is, if results in 10 seconds are twice as valuable as results in 20 seconds, then results in 10 minutes and 10 seconds need to be twice as valuable as results in 10 minutes and 20 seconds. If this is not true, the AI will indeed alter itself so its future self is consistent with its present self.