eli_sennesh comments on Debunking Fallacies in the Theory of AI Motivation - LessWrong

8 Post author: Richard_Loosemore 05 May 2015 02:46AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (343)

You are viewing a single comment's thread. Show more comments above.

Comment author: jessicat 07 May 2015 08:36:16AM *  2 points [-]

Regularization is already a part of training any good classifier.

A technical point here: we don't learn a raw classifier, because that would just learn human judgments. In order to allow the system to disagree with a human, we need to use some metric other than "is simple and assigns high probability to human judgments".

For something like FAI, I want a concept-learning algorithm that will look at the world in this naturalized, causal way (which is what normal modelling shoots for!), and that will model correctly at any level of abstraction or under any available set of features, and will be able to map between these levels as the human mind can.

I totally agree that a good understanding of multi-level models is important for understanding FAI concept spaces. I don't have a good understanding of multi-level maps; we can definitely see them as useful constructs for bounded reasoners, but it seems difficult to integrate higher levels into the goal system without deciding things about the high-level map a priori so you can define goals relative to this.

Comment author: [deleted] 07 May 2015 03:15:33PM 1 point [-]

I don't have a good understanding of multi-level maps; we can definitely see them as useful constructs for bounded reasoners

Well, all real reasoners are bounded reasoners. If you just don't care about computational time bounds, you can run the Ordered Optimal Problem Solver as the initial input program to a Goedel Machine, and out pops your AI (in 200 trillion years, of course)!

it seems difficult to integrate higher levels into the goal system without deciding things about the high-level map a priori so you can define goals relative to this.

I would tend to say that you should be training a conceptual map of the world before you install anything like action-taking capability or a goal system of any kind. Of course, I also tend to say that you should just use a debugged (ie: cured of systematic faults) model of human evaluative processes for your goal system, and then use actual human evaluations to train the free parameters, and then set up learning feedback from the learned concept of "human" to the free-parameter space of the evaluation model.

Comment author: jessicat 07 May 2015 05:18:14PM 3 points [-]

I would tend to say that you should be training a conceptual map of the world before you install anything like action-taking capability or a goal system of any kind.

This seems like a sane thing to do. If this didn't work, it would probably be because either

  1. lack of conceptual convergence and human understandability; this seems somewhat likely and is probably the most important unknown

  2. our conceptual representations are only efficient for talking about things we care about because we care about these things; a "neutral" standard such as resource-bounded Solomonoff induction will horribly learn things we care about for "no free lunch" reasons. I find this plausible but not too likely (it seems like it ought to be possible to "bootstrap" an importance metric for deciding where in the concept space to allocate resources).

  3. we need the system to have a goal system in order to self-improve to the point of creating this conceptual map. I find this a little likely (this is basically the question of whether we can create something that manages to self-improve without needing goals; it is related to low impact).

Of course, I also tend to say that you should just use a debugged (ie: cured of systematic faults) model of human evaluative processes for your goal system, and then use actual human evaluations to train the free parameters, and then set up learning feedback from the learned concept of "human" to the free-parameter space of the evaluation model.

I agree that this is a good idea. It seems like the main problem here is that we need some sort of "skeleton" of a normative human model whose parts can be filled in empirically, and which will infer the right goals after enough training.