Nebu comments on Reply to Holden on 'Tool AI' - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (348)
Regarding #3: what happens given a directive like "Over there are a bunch of people who report sensory experiences of the kind I'm interested in. Figure out what differentially caused those experiences, and maximize the incidence of that."?
(I'm not concerned with the specifics of my wording, which undoubtedly contains infinite loopholes; I'm asking about the general strategy of, when all I know is sensory experiences, referring to the differential causes of those experiences, whatever they may be. Which, yes, I would expect to include, in the case where there actually are no gliders and the recurring perception of gliders is the result of a glitch in my perceptual system, modifying my perceptual system to make such glitches more likely... but which I would not expect to include, in the case where my perceptual system is operating essentially the same way when it perceives gliders as when it perceives everything else, modifying my perceptual system to include such glitches (since such a glitch is not the differential cause of experiences of gliders in the first place.))
I think LearnFun might be informative here. https://www.youtube.com/watch?v=xOCurBYI_gY
LearnFun watches a human play an arbitrary NES games. It is hardcoded to assume that as time progresses, the game is moving towards a "better and better" state (i.e. it assumes the player's trying to win and is at least somewhat effective at achieving its goals). The key point here is that LearnFun does not know ahead of time what the objective of the game is. It infers what the objective of the game is from watching humans play. (More technically, it observes the entire universe, where the entire universe is defined to be the entire RAM content of the NES).
I think there's some parallels here with your scenario where we don't want to explicitly tell the AI what our utility function is. Instead, we're pointing to a state, and we're saying "This is a good state" (and I guess either we'd explicitly tell the AI "and this other state, it's a bad state" or we assume the AI can somehow infer bad states to contrast the good states from), and then we ask the AI to come up with a plan (and possibly execute the plan) that would lead to "more good" states.
So what happens? Bit of a spoiler, but sometimes the AI seems to make a pretty good inference for what the utility function a human would probably have had for a given NES game, but sometimes it makes a terrible inference. It never seems to make a "perfect" inference: the even in its best performance, it seems to be optimizing very strange things.
The other part of it is that even if it does have a decent inference for the utility function, it's not always good at coming up with a plan that will optimize that utility function.