Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
intern10

Networks that have to learn more features may become more adversary-prone simply because the adversary can leverage more features which are represented more densely. 

 

Also, in the top figure the loss is 'relative to the non-superposition model', but if I'm not mistaken the non-superposition model should basically be perfectly robust. Because it's just one layer, its Jacobian would be the identity, and because the loss is MSE, any perturbation to the input would be perfectly reflected only in the correct output feature, meaning no change in loss whatsoever. It's only when you introduce superposition that any change to the input can change the loss (as features actually 'interact').