skosch — LessWrong

That was my first thought as well. As far as I know, the most popular simple model used for this in the neuro literature, divisive normalization, uses similar but not quite identical formula. Different authors use different variations, but it's something shaped like

where $y_{i}$ is the unit's activation before lateral inhibition, $β$ adds a shift/bias, $κ_{i j}$ are the respective inhibition coefficients, and the exponent $α$ modulates the sharpness of the sigmoid (2 is a typical value). Here's an interactive desmos plot with just a single self-inhibiting unit. This function is asymmetric in the way you describe, if I understand you correctly, but to my knowledge it's never gained any popularity outside of its niche. The ML community seems to much prefer Softmax, LayerNorm et al. and I'm curious if anyone knows if there's a deep technical reason for these different choices.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments