Maciej Satkiewicz — LessWrong

LESSWRONG
LW

Replying toIt turns out that DNNs are remarkably interpretable.

It turns out that DNNs are remarkably interpretable.

Actually it is more similar to the lesser known Neural Path Kernel :) Indeed there is a specific product kernel associated with the path space, in that the path space is the RKHS of that kernel.

Replying toIt turns out that DNNs are remarkably interpretable.

Maciej Satkiewicz6mo

It turns out that DNNs are remarkably interpretable.

Hi, thanks for comment!

By “linear” I mean linear in the feature space, just like kernel machines are considered “linear” under specific data embedding.

Regarding saliency maps, I still think my method can be considered faithful, in fact the whole theoretical toolset I develop serves to argue for the faithfulness of excitation pullbacks, in particular the Hypothesis 1. I argue that the model approximates a kernel machine in the path space exactly to motivate why excitation pullbacks might be faithful, i.e. they reveal the decision boundary of the more regular underlying model, pointing where the gradient noise comes from exactly (in short, I claim that gradients are noisy because they correspond to rank-1 tensors... (read more)

Replying toIt turns out that DNNs are remarkably interpretable.

Maciej Satkiewicz6mo

It turns out that DNNs are remarkably interpretable.

To be precise what I meant by "implicitly linear" is a model that is globally linear in the feature space, after transforming inputs with a fixed map. In other words - a kernel machine. The claim is that ReLU networks approximate a particular, computable kernel machine during training.

Replying toIt turns out that DNNs are remarkably interpretable.

Maciej Satkiewicz6mo

It turns out that DNNs are remarkably interpretable.

That’s fair - the current work focuses on vision models. But I’d argue it provides a concrete mechanism that could generalize to other domains. The idea that a hidden linear model lives within the activation structure isn’t specific to images; what’s needed next is to recover and validate it in text or multimodal settings.

In other words: yes, it’s scoped, but it opens a door.

It turns out that DNNs are remarkably interpretable.

Maciej Satkiewicz

6mo

I recently posted a paper suggesting that deep networks may harbor an implicitly linear model, recoverable via a form of gradient denoising. The method - called excitation pullback - produces crisp, human-aligned features and offers a structural lens on generalization. Just look at the explanations for ImageNet-pretrained ResNet50 on the front page of the paper.

Replying toTowards White Box Deep Learning

Maciej Satkiewicz2y

Towards White Box Deep Learning

These are interesting considerations! I haven't put much thought on this yet but I have some preliminary ideas.

Semantic features are intended to capture meaning-preserving variations of structures. In that sense the "next word" problem seems ill-posed as some permutations of words preserve meaning; in reality its a hardly natural problem also from the human perspective.

The question I'd ask here is "what are the basic semantic building blocks of text for us humans?" and then try to model these blocks using the machinery of semantic features, i.e. model the invariants of these semantic blocks. Only then I'd think of adequate formulations of useful problems regarding text understanding.

So I'd say that these semantic atoms... (read more)

Replying toTowards White Box Deep Learning

Maciej Satkiewicz2y

Towards White Box Deep Learning

Thank you! The quote you picked is on point, I added an extended summary based on this, thanks for the suggestion!

Towards White Box Deep Learning

Maciej Satkiewicz

Hi, I’d like to share my paper that proposes a novel approach for building white box neural networks.

The paper introduces semantic features as a general technique for controlled dimensionality reduction, somewhat reminiscent of Hinton’s capsules and the idea of “inverse rendering”. In short, semantic features aim to capture the core characteristic of any semantic entity - having many possible states but being at exactly one state at a time. This results in regularization that is strong enough to make the PoC neural network inherently interpretable and also robust to adversarial attacks - despite no form of adversarial training! The paper may be viewed as a manifesto for a novel white-box approach to deep learning.

As an independent researcher I’d be grateful for your feedback!