LESSWRONG
LW

All of Adam Newgas's Comments + Replies

Computational Superposition in a Toy Model of the U-AND Problem

Yes I don't think the exact distribution of weights Gaussian/uniform/binary really makes that much difference, you can see the difference in loss in some of the charts above. The extra efficiency probably comes from the fact that every neuron contributes to everything fully - with Gaussian, sometimes the weights will be close to zero.

Some other advantages:

* But they are somewhat easier to analyse than gaussian weights.
* They can be skewed ( $p \neq 0.5$ ) which seems advantageous for an unknown reason. Possibly it makes AND circuits better at the expense of other p... (read more)

What’s up with LLMs representing XORs of arbitrary features?

Adam Newgas9d30

I've come up with my own explanation for why this happens: https://www.lesswrong.com/posts/QpbdkECXAdLFThhGg/computational-superposition-in-a-toy-model-of-the-u-and#XOR_Circuits

In short, XOR representations are naturally learnt when a model is targetting some other boolean operation as the same circuitry makes all boolean operations linearly representable. But XOR requires different weights to identity operations, so linear probes still will tend to learn generalizable solutions.

How To Do Patching Fast

Adam Newgas1mo30

$\frac{\partial F (e_{α})}{\partial α} = \frac{\partial F (e_{α})}{\partial e_{α}} \frac{\partial e_{α}}{\partial α}$
$= \frac{\partial F (e_{α})}{\partial e_{α}} \frac{\partial [e_{clean} + α \times (e_{corr} - e_{clean})]}{\partial α}$
$= \frac{\partial F (e_{α})}{\partial e_{α}} e_{corr} - e_{clean}$
Set $α = 0$ , ie. $e_{α} = e_{c l e a n}$
$= e_{corr} - e_{clean} \frac{\partial F (e_{clean})}{\partial e_{clean}}$

It looks like some parentheses are missing here?

3Joseph Miller1mo

Hey, long time no see! Thanks, I've correct it:

Organisation for Program Equilibrium reading group

Adam Newgas8mo20

Interested, but I can't make the proposed dates this week.