Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

we stumble on a weird observation where the few features with the least sparsity are not even learned and represented in the hidden layer

I'm not sure how you're modeling sparsity, but if these features are present in nearly 100% of inputs, you could think of it as the not-feature being extremely sparse. My guess is that these features are getting baked into the bias instead of the weights so the model is just always predicting them.