Nebu comments on Reply to Holden on 'Tool AI' - Less Wrong

94 Post author: Eliezer_Yudkowsky 12 June 2012 06:00PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (348)

You are viewing a single comment's thread. Show more comments above.

Comment author: TheOtherDave 13 July 2012 02:42:28PM 2 points [-]

Regarding #3: what happens given a directive like "Over there are a bunch of people who report sensory experiences of the kind I'm interested in. Figure out what differentially caused those experiences, and maximize the incidence of that."?

(I'm not concerned with the specifics of my wording, which undoubtedly contains infinite loopholes; I'm asking about the general strategy of, when all I know is sensory experiences, referring to the differential causes of those experiences, whatever they may be. Which, yes, I would expect to include, in the case where there actually are no gliders and the recurring perception of gliders is the result of a glitch in my perceptual system, modifying my perceptual system to make such glitches more likely... but which I would not expect to include, in the case where my perceptual system is operating essentially the same way when it perceives gliders as when it perceives everything else, modifying my perceptual system to include such glitches (since such a glitch is not the differential cause of experiences of gliders in the first place.))

Comment author: Nebu 17 February 2016 10:12:43AM 0 points [-]

I think LearnFun might be informative here. https://www.youtube.com/watch?v=xOCurBYI_gY

LearnFun watches a human play an arbitrary NES games. It is hardcoded to assume that as time progresses, the game is moving towards a "better and better" state (i.e. it assumes the player's trying to win and is at least somewhat effective at achieving its goals). The key point here is that LearnFun does not know ahead of time what the objective of the game is. It infers what the objective of the game is from watching humans play. (More technically, it observes the entire universe, where the entire universe is defined to be the entire RAM content of the NES).

I think there's some parallels here with your scenario where we don't want to explicitly tell the AI what our utility function is. Instead, we're pointing to a state, and we're saying "This is a good state" (and I guess either we'd explicitly tell the AI "and this other state, it's a bad state" or we assume the AI can somehow infer bad states to contrast the good states from), and then we ask the AI to come up with a plan (and possibly execute the plan) that would lead to "more good" states.

So what happens? Bit of a spoiler, but sometimes the AI seems to make a pretty good inference for what the utility function a human would probably have had for a given NES game, but sometimes it makes a terrible inference. It never seems to make a "perfect" inference: the even in its best performance, it seems to be optimizing very strange things.

The other part of it is that even if it does have a decent inference for the utility function, it's not always good at coming up with a plan that will optimize that utility function.