User Comment Replies

It's really interesting that (so far it seems) the quadratic activation can achieve the universal AND almost exponentially more efficiently than the ReLU function.

It seems plausible to me that the ReLU activation can achieve the same effect to approximate a quadratic function in a piecewise way. From the construction it seems that each component of the $r^{2}$ space is the sum of at most 2 random variables, and it seems like when you add a sparse combination of a large number of nearly orthogonal vectors, each component of the output would be approxim... (read more)

LESSWRONG
LW

All of Jerry Ding's Comments + Replies