I recently wrote an essay about AI risk, targeted at other academics:
Long-Term and Short-Term Challenges to Ensuring the Safety of AI Systems
I think it might be interesting to some of you, so I am sharing it here. I would appreciate any feedback any of you have, especially from others who do AI / machine learning research.
Nice essay.
Do you think that transpararent machine learning could be practically achievable, or could it be the case that most models that we may want our machine learning systems to learn can be only represented by complex, unintellegible specifications?
Intuitively, the space of opaque models, be them neural networks, large decision tree forests, or incomprehensible spaghetti-code computer programs, seems bigger than the space of transparent models.
For instance, what would a transparent visual recognition model look like?
The most obvious choice would be Bayesian graphical model, with a prior over objects that could be in an image, a stochastic model over their properties (including stuff like body pose for animals), a prior over lights positions and properties, a prior over the backgrounds, a prior over camera poses and optical properties, a stochastic physics model of the interactions between light and the object of interest, background and camera, and so on.
It seem to me that it would be a very complex model, with lots of parameters, and likely not supporting efficient inference, much less efficient learning.
Traditional computer vision approaches tried to do this more or less, with some clever approximations and lots of engineering, and they were soundly beaten by opaque systems like ConvNets.
State of the art systems like ConvNets, on the other hand, learn shortcuts and heuristics, such as recognizing distinctive textures, which works very well in most cases, with some occasional glaring mistakes.
Perhaps any visual system capable of that level of performance must necessarily be a huge collection of heuristics of this type, maybe with more sophistication to avoid classifying a leopard print sofa as a leopard ( * ), but still fundamentally based on this architecture.
( * it's not like humans are immune to this failure mode anyway: people see a face on Mars, ghosts in blurry pictures, Jesus on a toast, Allah's name on a fish, etc. Pareidolia is certainly a thing.)