timtyler comments on Towards a New Decision Theory - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (142)
I'm still quite confused, but I'll report my current thoughts in case someone can help me out. Suppose we take it as an axiom that an AI's decision algorithm shouldn't need to contain any hacks to handle exceptional situations. Then the following "exceptionless" decision algorithm seems to pop out immediately: do what my creator would want me to do. In other words, upon receiving input X, S computes the following: suppose S's creator had enough time and computing power to create a giant lookup table that contains an optimal output for every input S might encounter, what would the entry for X be? Return that as the output.
This algorithm correctly solves Counterfactual Mugging, since S's creator would want it to output "give $100", since "give $100" would have maximized the creator's expected utility at the time of coding S. It also solves the problem posed by Omega in the parent comment. It seems to be reflectively consistent. But what is the relationship between this "exceptionless" algorithm and the timeless/updateless decision algorithm?
"Do what my creator would want me to do"?
We could call that "pass the buck" decision theory ;-)