All of Adam Newgas's Comments + Replies

Yes I don't think the exact distribution of weights Gaussian/uniform/binary really makes that much difference, you can see the difference in loss in some of the charts above. The extra efficiency probably comes from the fact that every neuron contributes to everything fully - with Gaussian, sometimes the weights will be close to zero.

Some other advantages:

* But they are somewhat easier to analyse than gaussian weights.
* They can be skewed () which seems advantageous for an unknown reason. Possibly it makes AND circuits better at the expense of other p... (read more)

I've come up with my own explanation for why this happens: https://www.lesswrong.com/posts/QpbdkECXAdLFThhGg/computational-superposition-in-a-toy-model-of-the-u-and#XOR_Circuits

In short, XOR representations are naturally learnt when a model is targetting some other boolean operation as the same circuitry makes all boolean operations linearly representable. But XOR requires different weights to identity operations, so linear probes still will tend to learn generalizable solutions.

Set , ie. 

It looks like some parentheses are missing here?

3Joseph Miller
Hey, long time no see! Thanks, I've correct it:

Interested, but I can't make the proposed dates this week.