Hi, I’d like to share my paper that proposes a novel approach for building white box neural networks.
The paper introduces semantic features as a general technique for controlled dimensionality reduction, somewhat reminiscent of Hinton’s capsules and the idea of “inverse rendering”. In short, semantic features aim to capture the core characteristic of any semantic entity - having many possible states but being at exactly one state at a time. This results in regularization that is strong enough to make the PoC neural network inherently interpretable and also robust to adversarial attacks - despite no form of adversarial training! The paper may be viewed as a manifesto for a novel white-box approach to deep learning.
As an independent researcher I’d be grateful for your feedback!
This looks interesting, thanks!
This post could benefit from an extended summary.
In lieu of such a summary, in addition to the abstract
I'll quote a paragraph from Section 1.2, "The core idea"
In addition to translation (which I do think is a useful problem for theoretical experiments), I would recommend question answering as something which gets at 'thoughts' rather than distractors like 'linguistic style'. I don't think multiple choice question answering is all that great a measure for some things, but it is a cleaner measure of the correctness of the underlying thoughts.
I agree that abstracting away from things like choice of grammar/punctuation or which synonym to use is important to keeping the research question clean.