timtyler comments on The Preference Utilitarian’s Time Inconsistency Problem - Less Wrong

25 Post author: Wei_Dai 15 January 2010 12:26AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (104)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 15 January 2010 10:13:20AM 2 points [-]

I believe you can strip the AI of any preferences towards human utility functions with a simple hack.

Every decision of the AI will have two effects on expected human utility: it will change it, and it will change the human utility functions.

Have the AI make its decisions only based on the effect on the current expected human utility, not on the changes to the function. Add a term granting a large disutility for deaths, and this should do the trick.

Note the importance of the "current" expected utility in this setup; an AI will decide whether to industrialise a primitive tribe based on their current utility; if it does industrialise them, it will base its subsequent decisions on their new, industrialised utility.

Comment author: timtyler 15 January 2010 06:34:26PM 0 points [-]

You meant "any preferences towards MODIFYING human utility functions".

Comment author: Stuart_Armstrong 18 January 2010 12:25:32PM 0 points [-]

Yep