You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

diegocaleiro comments on Superintelligence 21: Value learning - Less Wrong Discussion

7 Post author: KatjaGrace 03 February 2015 02:01AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (33)

You are viewing a single comment's thread.

Comment author: diegocaleiro 11 February 2015 07:23:50PM 0 points [-]

This is a knowledge doubt: reading your description of a value learning system, it feels like what it has above and beyond the reinforcement learner is a model not only of the other being, but of its goals.

In Dennett parlance, it has two levels of intentionality: I think that you want that the toy be built.

In psychology parlance, it has somewhat sophisticated theory of mind.

In philosophical terms it distinguishes intensions from extensions.

Are these correct inferences from being a value learner?

The kids in this video, are Value Learners or Reinforcement Learners? What about the Chimps?

https://www.youtube.com/watch?v=6zSut-U1Iks

What Dan Dewey describes as an optimal value learner is not what either kids or chimps do: Replacing the reinforcement learner’s sum of rewards with an expected utility over a pool of possible utility functions, we have an optimality notion for a value learning agent.

Since when we infer goals from others, we are not expectimaxing over possible goals the agent could have. we are simply maxing. The kids assume only the goal with highest likelihood.

Comment author: KatjaGrace 20 February 2015 07:41:45PM 0 points [-]

That's probably the correct inference, if I understand you. The value learner has priors over what the world is like, and further priors over what is valuable.

The kids and the chimps both already have values, and are trying to learn how to fulfil them.

I don't follow your other points, sorry.

Comment author: diegocaleiro 23 February 2015 05:38:43PM 0 points [-]

The kids and chimps have different priors. Kids assume the experimenter has reasons to be doing the weird non-seemingly goal oriented things he does. Humans alone can entertain fictions. This makes us powerful but also more prone to supersticious behavior (in behaviorist terminology).

If you were expectimaxing over what an agent would do (which is what Dewey suggests a value learner does) you'd end up with behaviors that are seldom useful, because some parts of your behavior would further one goal, and some others, you would not commit to all the behaviors that further the one goal you assign more likelihood to be valuable. Maxing would be find the highest value, ignore all others, expectimaxing would be a mixed hybrid which fails when all or none is relevant.

No doubt this is not my most eloquent thread in history. Sorry, give up on this if you don't understand it.