I think your overall point -- More Dakka, make AGI less weird -- is right. In my experience, though, I disagree with your disagreement:
...I disagree with "the case for the risks hasn't been that clearly laid out". I think there's a giant, almost overwhelming pile of intro resources at this point, any one of which is more than sufficient, written in all manner of style, for all manner of audience.[1]
(I do think it's possible to create a much better intro resource than any that exist today, but 'we can do much better' is compatible with 'it's shocking that the
OK, thanks for linking that. You're probably right in the specific example of MNIST. I'm less convinced about more complicated tasks - it seems like each individual task would require a lot of engineering effort.
One thing I didn't see - is there research which looks at what happens if you give neural nets more of the input space as data? Things which are explicitly out-of-distribution, random noise, abstract shapes, or maybe other modes that you don't particularly care about performance on, and label it all as "garbage" or whatever. Essentially, providing negative as well as positive examples, given that the input spaces are usually much larger than the intended distribution.
>I imagine if our goal was "never misclassify an MNIST digit" we could get to 6-7 nines of "worst-case accuracy" even out of existing neural nets, at the cost of saying "I don't know" for the confusing 0.2% of digits.
Er, how? I haven't seen anyone describe a way to do this. Getting a neural network to meaningfully say "I don't know" is very much cutting-edge research as far as I'm aware.
You're right that it's an ongoing research area but there's a number of approaches that work relatively well. This NeurIPS tutorial describes a few. Probably the easiest thing is to use one of the calibration methods mentioned there to get your classifier to output calibrated uncertainties for each class, then say "I don't know" if the network isn't at least 90% confident in one of the 10 classes.
I think you and John are talking about two different facets of interpretability.
The first one is the question of "white-boxing:" how do the model's internal components interrelate to produce its output? On this dimension, the kind of models that you've given as examples are much more interpretable than neural networks.
What I think John is talking about, I understand as "grounding." (Cf. Symbol grounding problem) Although the decision tree (a) above is clear in that one can easily follow how the final decision comes about, the question remains -- who or wha...
This is the focus of General Systems, as outlined by Weinberg. That book is very good, by the way - I highly recommend reading it. It's both very dense and very accessible.
It's always puzzled me that the rationalist community hasn't put more emphasis on general systems. It seems like it should fit in perfectly, but I haven't seen anyone mention it explicitly. General Semantics mentioned in the recent historical post is somewhat related, but not the same thing.
More on topic: One thing you don't mention is that there are fairly general problem solving techni...
Excellent post. I have nothing really to add, only that you're not alone in this:
... (read more)