There’s a common perception that various non-deep-learning ML paradigms - like logic, probability, causality, etc - are very interpretable, whereas neural nets aren’t. I claim this is wrong.
It’s easy to see where the idea comes from. Look at the sort of models in, say, Judea Pearl’s work. Like this:
It says that either the sprinkler or the rain could cause a wet sidewalk, season is upstream of both of those (e.g. more rain in spring, more sprinkler use in summer), and sidewalk slipperiness is caused by wetness. The Pearl-style framework lets us do all sorts of probabilistic and causal reasoning on this system, and it all lines up quite neatly with our intuitions. It looks very interpretable.
The problem, I claim, is that a whole bunch of work is being done by the labels. “Season”, “sprinkler”, “rain”, etc. The math does not depend on those labels at all. If we code an ML system to use this sort of model, its behavior will also not depend on the labels at all. They’re just suggestively-named LISP tokens. We could use the exact same math/code to model some entirely different system, like my sleep quality being caused by room temperature and exercise, with both of those downstream of season, and my productivity the next day downstream of sleep.
We could just replace all the labels with random strings, and the model would have the same content:
Now it looks a lot less interpretable.
Perhaps that seems like an unfair criticism? Like, the causal model is doing some nontrivial work, but connecting the labels to real-world objects just isn’t the problem it solves?
… I think that’s true, actually. But connecting the internal symbols/quantities/data structures of a model to external stuff is (I claim) exactly what interpretability is all about.
Think about interpretability for deep learning systems. A prototypical example for what successful interpretability might look like is e.g. we find a neuron which robustly lights up specifically in response to trees. It’s a tree-detector! That’s highly interpretable: we know what that neuron “means”, what it corresponds to in the world. (Of course in practice single neurons are probably not the thing to look at, and also the word “robustly” is doing a lot of subtle work, but those points are not really relevant to this post.)
The corresponding problem for a logic/probability/causality-based model would be: take a variable or node, and figure out what thing in the world it corresponds to, ignoring the not-actually-functionally-relevant label. Take the whole system, remove the labels, and try to rederive their meanings.
… which sounds basically-identical to the corresponding problem for deep learning systems.
We are no more able to solve that problem for logic/probability/causality systems than we are for deep learning systems. We can have a node in our model labeled “tree”, but we are no more (or less) able to check that it actually robustly represents trees than we are for a given neuron in a neural network. Similarly, if we find that it does represent trees and we want to understand how/why the tree-representation works, all those labels are a distraction.
One could argue that we’re lucky deep learning is winning the capabilities race. At least this way it’s obvious that our systems are uninterpretable, that we have no idea what’s going on inside the black box, rather than our brains seeing the decorative natural-language name “sprinkler” on a variable/node and then thinking that we know what the variable/node means. Instead, we just have unlabeled nodes - an accurate representation of our actual knowledge of the node’s “meaning”.
This post does a sort of head-to-head comparison of causal models and deep nets. But I view the relationship between them differently - they’re better together! The causal framework gives us the notion of “screening off”, which is missing from the ML/deep learning framework. Screening-off turns out to be useful in analyzing feature importance.
A workflow that 1) uses a complex modern gradient booster or deep net to fit the data, then 2) uses causal math to interpret the features - which are most important, which screen off which - is really nice. [This workflow requires fitting multiple models, on different sets of variables, so it’s not just fit a single model in step 1), analyze it in step 2), done].
Causal math lacks the ability to auto-fit complex functions, and ML-without-causality lacks the ability to measure things like “which variables screen off which”. Causality tools, paired with modern feature-importance measures like SHAP values, help us interpret black-box models.
Sure - there are plenty of cases where a pair of interactions isn’t interesting. In the image net context, probably you’ll care more about screening-off behavior at more abstract levels.
For example, maybe you find that, in your trained network, a hidden representation that seems to correspond to “trunk” isn’t very predictive of the class “tree”. And that one that looks like “leaves” is predictive of “tree”. It’d be useful to know if the reason “trunk” isn’t predictive is that “leaves” screens it off. (This could happen if all the tree trunks in your traini... (read more)