Elo comments on Open Thread May 23 - May 29, 2016 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (120)
(epistemic status: Ruminations on cognitive processes by non-expert.)
I have a question tangential to AI safety about goal formation. How do goals form in systems that do no explicitly have goals to begin with?
I tried to google this and didn't find answers neither for AI systems nor for neuropsychology. One source (Rehabilitation Goal Setting: Theory, Practice and Evidence) summarised:
Apparently many AI safety problems revolve around the wrong goals or the extreme satisfaction of goals. The usually implied or explicit definition of a goal seems to be the minimum difference to a target state (which might be infinity for some valuation functions). Many AI models include some notion of the goal in some coded or explicitly given form. In general that coding isn't the 'real' goal. By real goal I mean that which the AI system it total appears to optimize for as a whole. And that may differ from the specification due to the structure of the available input and output channels and the strength of the optimization process. Nonetheless there is some goal and there is a conceptual relation between the coded and the real goal.
But maybe real things can be a bit more complicated. Consider human goal formation. Apparently we do have goals. And we kind of optimize for them. But the question arises: Where do they come from cognitively and neurologically?
Goals are very high level concepts. I think there is no high level specification of the goals somewhere inside us that we read off and optimize for. I think our goals are our own understanding - on that high level of abstraction - of those patterns behind our behavior.
If that is right and goals are just our own understanding of some patterns of behavior, then how comes there is are specific brain modules (prefrontal cortex) devoted to planning for it? Or rather how come these brain parts are actually connected to the abstract concept of a goal? Or aren't they? And the planning doesn't act on our understand of the goals but on the constituent parts. What are these?
In my children I see clearly goal-directed behavior long before they can articulate the concept. And there are clear intermediate steps where they desperately try to optimize for very isolated goals. For example winning a race to the door. Trying to climb a fence. Being the first one to get a treat. Winning a game. Loosing apparently causes real suffering. But why? Where is the loss? How are any of these things even matched against a loss. How does they brain match whatever representation of reality to these emotions? How do the encodings of concepts for me and you and our race get connected to our feelings about this situation? And I kind of assume here that the emotions themselves somehow produce the valuation that controls our motivation.
I took issue with not knowing how humans formed goals. so I made this list of common human goals and suggested humans who do not know should look at the list of common goals and pick ones that are relevant to themselves.