timtyler comments on The Preference Utilitarian’s Time Inconsistency Problem - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (104)
I believe you can strip the AI of any preferences towards human utility functions with a simple hack.
Every decision of the AI will have two effects on expected human utility: it will change it, and it will change the human utility functions.
Have the AI make its decisions only based on the effect on the current expected human utility, not on the changes to the function. Add a term granting a large disutility for deaths, and this should do the trick.
Note the importance of the "current" expected utility in this setup; an AI will decide whether to industrialise a primitive tribe based on their current utility; if it does industrialise them, it will base its subsequent decisions on their new, industrialised utility.
You meant "any preferences towards MODIFYING human utility functions".
Yep