User Comment Replies

A Quick List of Some Problems in AI Alignment As A Field

Excellent post. I have nothing really to add, only that you're not alone in this:

Here's a (failure?) mode that I and others are already in, but might be too embarrassed to write about: taking weird career/financial risks, in order to obtain the financial security, to work on alignment full-time ^[2]. Anyone more risk-averse (good for alignment!) might just... work a normal job for years to save up, or modestly conclude they're not good enough to work in alignment altogether. If security mindset can be taught at all, this is a shit equilibrium.
Yes, I know EA

Ash Gray3y32

I think your overall point -- More Dakka, make AGI less weird -- is right. In my experience, though, I disagree with your disagreement:

I disagree with "the case for the risks hasn't been that clearly laid out". I think there's a giant, almost overwhelming pile of intro resources at this point, any one of which is more than sufficient, written in all manner of style, for all manner of audience.^[1]
(I do think it's possible to create a much better intro resource than any that exist today, but 'we can do much better' is compatible with 'it's shocking that the

... (read more)

Humans are very reliable agents

Ash Gray3y20

OK, thanks for linking that. You're probably right in the specific example of MNIST. I'm less convinced about more complicated tasks - it seems like each individual task would require a lot of engineering effort.

One thing I didn't see - is there research which looks at what happens if you give neural nets more of the input space as data? Things which are explicitly out-of-distribution, random noise, abstract shapes, or maybe other modes that you don't particularly care about performance on, and label it all as "garbage" or whatever. Essentially, providing negative as well as positive examples, given that the input spaces are usually much larger than the intended distribution.

Humans are very reliable agents

Ash Gray3y90

>I imagine if our goal was "never misclassify an MNIST digit" we could get to 6-7 nines of "worst-case accuracy" even out of existing neural nets, at the cost of saying "I don't know" for the confusing 0.2% of digits.

Er, how? I haven't seen anyone describe a way to do this. Getting a neural network to meaningfully say "I don't know" is very much cutting-edge research as far as I'm aware.

Ivan Vendrov3y130

You're right that it's an ongoing research area but there's a number of approaches that work relatively well. This NeurIPS tutorial describes a few. Probably the easiest thing is to use one of the calibration methods mentioned there to get your classifier to output calibrated uncertainties for each class, then say "I don't know" if the network isn't at least 90% confident in one of the 10 classes.

Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

Ash Gray3y90

I think you and John are talking about two different facets of interpretability.

The first one is the question of "white-boxing:" how do the model's internal components interrelate to produce its output? On this dimension, the kind of models that you've given as examples are much more interpretable than neural networks.

What I think John is talking about, I understand as "grounding." (Cf. Symbol grounding problem) Although the decision tree (a) above is clear in that one can easily follow how the final decision comes about, the question remains -- who or wha... (read more)

3DragonGod3y

Aren't the symbols grounded by human engineering? Humans use those particular boxes/tokens to represent particular concepts, and they can define a way in which the concepts map to the inputs to the system. I'm not sure "grounding is similar" when a model is fully human engineered (in e.g. decision trees and causal models), vs. when it is dynamically derived (in e.g. artificial neural networks) is a reasonable claim to make.

Specializing in Problems We Don't Understand

Ash Gray4y60

This is the focus of General Systems, as outlined by Weinberg. That book is very good, by the way - I highly recommend reading it. It's both very dense and very accessible.

It's always puzzled me that the rationalist community hasn't put more emphasis on general systems. It seems like it should fit in perfectly, but I haven't seen anyone mention it explicitly. General Semantics mentioned in the recent historical post is somewhat related, but not the same thing.

More on topic: One thing you don't mention is that there are fairly general problem solving techni... (read more)

4Gordon Seidoh Worley4y

Also makes me think of TRIZ. I don't really understand how to use it that well or even know if it produces useful results, but I know it's popular within the Russosphere (or at least more popular there than anywhere else).

3adamShimi4y

The impression I always had with general systems (from afar) was that it looked cool, but it never seemed to be useful for doing anything other than "think in systems", (so not useful for doing research in another field or making any concrete applications). So that's why I never felt interested. Note that I'm clearly not knowledgeable at all on the subject, this is just my outside impression. I assume from your comment you think that's wrong. Is the Weinberg book a good resource for educating myself and seeing how wrong I am?

1Mo Putera4y

I'm intrigued by your second paragraph -- perhaps write a post about it?

LESSWRONG
LW

All of Ash Gray's Comments + Replies