Problems with learning values from observation

David Scott Krueger (formerly: capybaralet)

2 Problems with learning values from observation

by David Scott Krueger (formerly: capybaralet)

21st Sep 2016

1 min read

2

I dunno if this has been discussed elsewhere (pointers welcome).

Observational data doesn't allow one to distinguish correlation and causation.
This is a problem for an agent attempting to learn values without being allowed to make interventions.

For example, suppose that happiness is just a linear function of how much Utopamine is in a person's brain.
If a person smiles only when their Utopamine concentration is above 3 ppm, then an value-learner which observes both someone's Utopamine levels and facial expression and tries to predict their reported happiness on the basis of these features will notice that smiling is correlated with higher levels of reported happiness and thus erroneously believe that it is partially responsible for the happiness.

------------------
an IMPLICATION:
I have a picture of value learning where the AI learns via observation (since we don't want to give an unaligned AI access to actuators!).
But this makes it seem important to consider how to make an un unaligned AI safe-enough to perform value-learning relevant interventions.

Personal Blog

2

New Comment

4 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:11 PM

[-]Manfred10y100

This is only true for simple systems - with more complications you can indeed sometimes deduce causal structure!

Suppose you have three variables: Utopamine conentration, smiling, and reported happiness. And further suppose that there is an independent noise source for each of these variables - causal nodes that we put in as a catch-all for fluctuations and external forcings that are hard to model.

If Utopamine is the root cause of both smiling and reported happiness, then the variation in happiness will be independent of the variation in smiling, conditional on the variation in Utopamine. But conditional on the variation in smiling, the variation in utopamine and reported happiness will still be correlated!

The AI can now narrow down the causal structure to 2, and perhaps it can even figure out the right one if there's some time lag in the response and it assumes that causation goes forward in time.

Reply

[-]Lumifer10y50

Observational data doesn't allow one to distinguish correlation and causation.

No? If I observe a hammer striking a nail and the nail sinking into the wooden plank, is anyone going to argue that it's mere correlation and not causation?

Observational data doesn't always allow one one to distinguish correlation and causation.

I am also a bit confused since you're talking about learning values but your example is not about values but about a causal relationship.

Reply

[-]MrMind10y00

Indeed. Pearl's "Causality" talks at length about this sort of things, and what data can and cannot distinguish between causal correlation. There's even a Sequence post about this exact topic.

Reply

[-]janos10y00

Is there a reason to think this problem is less amenable to being solved by complexity priors than other learning problems? / Might we build an unaligned agent competent enough to be problematic without solving problems similar to this one?

Reply

Moderation Log

Curated and popular this week