hairyfigment comments on Tiling Agents for Self-Modifying AI (OPFAI #2) - Less Wrong

55 Post author: Eliezer_Yudkowsky 06 June 2013 08:24PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (260)

You are viewing a single comment's thread. Show more comments above.

Comment author: hairyfigment 06 June 2013 07:14:48PM 1 point [-]

Layman's answer: we want to predict what some self-modifying AI will do, so we want a decision theory that can ask about the effect of adopting a new decision theory or related processes. (The paper's issues could easily come up.) The one alternative I can see involves knowing in advance, as humans, how any modification that a super-intelligence could imagine will affect its goals. This seems like exactly what humans are bad at.

Speaking of, you say we "seem capable of overcoming putative Lobian obstacles to self-modification." But when I think about CEV, this appears dubious. We can't express exactly what 'extrapolation' means, save by imagining a utility function that may not exist. And without a better language for talking about goal stability, how would we even formalize that question? How could we formally ask if CEV is workable?