eli_sennesh comments on Debunking Fallacies in the Theory of AI Motivation - LessWrong

8 Post author: Richard_Loosemore 05 May 2015 02:46AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (343)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 07 May 2015 03:11:43PM 2 points [-]

In order to allow the system to disagree with a human, we need to use some metric other than "is simple and assigns high probability to human judgments".

Right: and the metric I would propose is, "counterfactual-prediction power". Or in other words, the power to predict well in a causal fashion, to be able to answer counterfactual questions or predict well when we deliberately vary the experimental conditions.

To give a simple example: I train a system to recognize cats, but my training data contains only tabbies. What I want is a way of modelling that, while it may concentrate more probability on a tabby cat-like-thingy being a cat than a non-tabby cat-like-thingy, will still predict appropriately if I actually condition it on "but what if cats weren't tabby by nature?".

I think you said you're a follower of the probabilistic programming approach, and in terms of being able to condition those models on counterfactual parameterizations and make predictions, I think they're very much on the right track.