I've come up with my own explanation for why this happens: https://www.lesswrong.com/posts/QpbdkECXAdLFThhGg/computational-superposition-in-a-toy-model-of-the-u-and#XOR_Circuits
In short, XOR representations are naturally learnt when a model is targetting some other boolean operation as the same circuitry makes all boolean operations linearly representable. But XOR requires different weights to identity operations, so linear probes still will tend to learn generalizable solutions.
Set , ie.
It looks like some parentheses are missing here?
Interested, but I can't make the proposed dates this week.
Yes I don't think the exact distribution of weights Gaussian/uniform/binary really makes that much difference, you can see the difference in loss in some of the charts above. The extra efficiency probably comes from the fact that every neuron contributes to everything fully - with Gaussian, sometimes the weights will be close to zero.
Some other advantages:
* But they are somewhat easier to analyse than gaussian weights.
* They can be skewed (p≠0.5) which seems advantageous for an unknown reason. Possibly it makes AND circuits better at the expense of other possible truth tables.