Eliezer_Yudkowsky comments on Proper value learning through indifference - Less Wrong

16 Post author: Stuart_Armstrong 19 June 2014 09:39AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (50)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 19 June 2014 11:16:13PM 5 points [-]

The problem is when you want to work with a young AI where the condition on which the utility function depends lies in the young AI's decision-theoretic future. I.e. the AI is supposed to update on the value of an input field controlled by the programmers, but this input field (or even abstractions behind it like "the programmers' current intentions", should the AI already be mature enough to understand that) are things which can be affected by the AI. If the AI is not already very sophisticated, like more sophisticated than anyone presently has any good idea how to formally talk about, then in the process of building it, we'll want to do "error correction" type things that the AI should accept even though we can't yet state formally how they're info about an event outside of the programmers and AI which neither can affect.

Roughly, the answer is: "That True Utility Function thing only works if the AI doesn't think anything it can do affects the thing you defined as the True Utility Function. Defining something like that safely would represent a very advanced stage of maturity in the AI. For a young AI it's much easier to talk about the value of an input field. Then we don't want the AI trying to affect this input field. Armstrong's trick is trying to make the AI with an easily describable input field have some of the same desirable properties as a much-harder-to-describe-at-our-present-stage-of-knowledge AI that has the true, safe, non-perversely-instantiable definition of how to learn about the True Utility Function."

Comment author: [deleted] 20 June 2014 07:07:02AM 2 points [-]

Right, ok, that's actually substantially clearer after a night's sleep.

One more question, semi-relevant: how is the decision-theoretic future different from the actual future?

Comment author: Eliezer_Yudkowsky 21 June 2014 01:57:45AM 10 points [-]

The actual future is your causal future, your future light cone. Your decision-theoretic future is anything that logically depends on the output of your decision function.

Comment author: ciphergoth 21 June 2014 09:04:15AM 3 points [-]

This seems like a very useful idea - thanks!