eli_sennesh comments on Debunking Fallacies in the Theory of AI Motivation - LessWrong

8 Post author: Richard_Loosemore 05 May 2015 02:46AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (343)

You are viewing a single comment's thread. Show more comments above.

Comment author: jessicat 07 May 2015 08:36:16AM *  2 points [-]

Regularization is already a part of training any good classifier.

A technical point here: we don't learn a raw classifier, because that would just learn human judgments. In order to allow the system to disagree with a human, we need to use some metric other than "is simple and assigns high probability to human judgments".

For something like FAI, I want a concept-learning algorithm that will look at the world in this naturalized, causal way (which is what normal modelling shoots for!), and that will model correctly at any level of abstraction or under any available set of features, and will be able to map between these levels as the human mind can.

I totally agree that a good understanding of multi-level models is important for understanding FAI concept spaces. I don't have a good understanding of multi-level maps; we can definitely see them as useful constructs for bounded reasoners, but it seems difficult to integrate higher levels into the goal system without deciding things about the high-level map a priori so you can define goals relative to this.

Comment author: [deleted] 07 May 2015 03:11:43PM 2 points [-]

In order to allow the system to disagree with a human, we need to use some metric other than "is simple and assigns high probability to human judgments".

Right: and the metric I would propose is, "counterfactual-prediction power". Or in other words, the power to predict well in a causal fashion, to be able to answer counterfactual questions or predict well when we deliberately vary the experimental conditions.

To give a simple example: I train a system to recognize cats, but my training data contains only tabbies. What I want is a way of modelling that, while it may concentrate more probability on a tabby cat-like-thingy being a cat than a non-tabby cat-like-thingy, will still predict appropriately if I actually condition it on "but what if cats weren't tabby by nature?".

I think you said you're a follower of the probabilistic programming approach, and in terms of being able to condition those models on counterfactual parameterizations and make predictions, I think they're very much on the right track.