eli_sennesh comments on Debunking Fallacies in the Theory of AI Motivation - LessWrong

8 Post author: Richard_Loosemore 05 May 2015 02:46AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (343)

You are viewing a single comment's thread. Show more comments above.

Comment author: jessicat 07 May 2015 08:27:40AM *  4 points [-]

Okay, thanks a lot for the detailed response. I'll explain a bit about where I'm coming from with understading the concept learning problem:

  • I typically think of concepts as probabilistic programs eventually bottoming out in sense data. So we have some "language" with a "library" of concepts (probabilistic generative models) that can be combined to create new concepts, and combinations of concepts are used to explain complex sensory data (for example, we might compose different generative models at different levels to explain a picture of a scene). We can (in theory) use probabilistic program induction to have uncertainty about how different concepts are combined. This seems like a type of swarm relaxation, due to probabilistic constraints being fuzzy. I briefly skimmed through the McClellard chapter and it seems to mesh well with my understanding of probabilistic programming.
  • But, when thinking about how to create friendly AI, I typically use the very conservative assumptions of statistical learning theory, which give us guarantees against certain kinds of overfitting but no guarantee of proper behavior on novel edge cases. Statistical learning theory is certainly too pessimistic, but there isn't any less pessimistic model for what concepts we expect to learn that I trust. While the view of concepts as probabilistic programs in the previous bullet point implies properties of the system other than those implied by statistical learning theory, I don't actually have good formal models of these, so I end up using statistical learning theory.

I do think that figuring out if we can get more optimistic (but still justified) assumptions is good. You mention empirical experience with swarm relaxation as a possible way of gaining confidence that it is learning concepts correctly. Now that I think about it, bad handling of novel edge cases might be a form of "meta-overfitting", and perhaps we can gain confidence in a system's ability to deal with context shifts by having it go through a series of context shifts well without overfitting. This is the sort of thing that might work, and more research into whether it does is valuable, but it still seems worth preparing for the case where it doesn't.

Anyway, thanks for giving me some good things to think about. I think I see how a lot of our disagreements mostly come down to how much convergence we expect from different concept learning systems. For example, if "psychological manipulation" is in some sense a natural category, then of course it can be added as a weak (or even strong) constraint on the system.
I'll probably think about this a lot more and eventually write up something explaining reasons why we might or might not expect to get convergent concepts from different systems, and the degree to which this changes based on how value-laden a concept is.

There is a lot of talk that can be given about how that complex union takes place, but here is one very important takeaway: it can always be made to happen in such a way that there will not, in the future, be any Gotcha cases (those where you thought you did completely merge the two concepts, but where you suddenly find a peculiar situation where you got it disastrously wrong). The reason why you won't get any Gotcha cases is that the concepts are defined by large numbers of weak constraints, and no strong constraints -- in such systems, the effect of smaller and smaller numbers of concepts can be guaranteed to converge to zero. (This happens for the same reason that the effect of smaller and smaller sub-populations of the molecules in a gas will converge to zero as the population sizes go to zero).

I didn't really understand a lot of what you said here. My current model is something like "if a concept is defined by lots of weak constraints, then lots of these constraints have to go wrong at once for the concept to go wrong, and we think this is unlikely due to induction and some kind of independence/uncorrelatedness assumption"; is this correct? If this is the right understanding, I think I have low confidence that errors in each weak constraint are in fact not strongly correlated with each other.

Comment author: [deleted] 07 May 2015 03:06:06PM 2 points [-]

I briefly skimmed through the McClellard chapter and it seems to mesh well with my understanding of probabilistic programming.

I think it would not go amiss to read Vikash Masinghka's PhD thesis and the open-world generation paper to see a helpful probabilistic programming approach to these issues. In summary: we can use probabilistic programming to learn the models we need, use conditioning/query to condition the models on the constraints we intend to enforce, and then sample the resulting distributions to generate "actions" which are very likely to be "good enough" and very unlikely to be "bad". We sample instead of inferring the maximum-a-posteriori action or expected action precisely because as part of the Bayesian modelling process we assume that the peak of our probability density does not necessary correspond to an in-the-world optimum.

Comment author: jessicat 07 May 2015 05:24:39PM *  1 point [-]

I agree that choosing an action randomly (with higher probability for good actions) is a good way to create a fuzzy satisficer. Do you have any insights into how to:

  1. create queries for planning that don't suffer from "wishful thinking", with or without nested queries. Basically the problem is that if I want an action conditioned on receiving a high utility (e.g. we have a factor on the expected utility node U equal to e^(alpha * U) ), then we are likely to choose high-variance actions while inferring that the rest of the model works out such that these actions return high utilities

  2. extend this to sequential planning without nested nested nested nested nested nested queries