User Comment Replies

I've been working on pure combinational logic LLMs for the past few years, and have a (fairly small) byte level pure combinational logic FSM RNN language model quantized to And Inverter Gate form. I'm currently building the tooling to simplify the logic DAG and analyze it.

Are you, or others, interested in talking with me about it?

2CBiddulph10d

I might not be the best person to talk to about it, but it sounds interesting! Maybe post about it on the mechanistic interpretability Discord?

Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds

Am8ryllis2y30

Current NN matrices are dense and continuous weighted. A significant part of the difficulty of interpretability is that they have all to all connections; it is difficult to verify that one activation does or does not affect another activation.

However we can quantize the weights to 3 bit and then we can probably melt the whole thing into pure combinational logic. While I am not entirely confident that this form is strictly better from an interpretability perspective, it is differently difficult.

"Giant inscrutable matrices" are probably not the final form of... (read more)

2Vaniver2y

While I am optimistic about simple algorithmic changes improving the interpretability situation (the difference between L1 and L2 regularization seems like a big example of hope here, for example), I think the difficulty floor is determined by the complexity of the underlying subject matter that needs to be encoded, and for LLMs / natural language that's going to be very complex. (And if you use an architecture that can't support things that are as complex as the underlying subject matter, the optimal model for that architecture will correspondingly have high loss.)

Sparse trinary weighted RNNs as a path to better language model interpretability

Am8ryllis3y10

I agree that reducing superposition is probably valuable even if it requires a significantly larger network. I still don't understand why the transition from float to binary would cause a dramatic reduction in superposition capacity. But if it does prevent superposition, great! I'll just give it more parameters as needed. But if we still get superposition, I will need to apply other techniques to make it stop.

(I have not yet finished my closer re-read of Toy Models of Superposition after my initial skimming. Perhaps once I do I will understand better.)

Hope... (read more)

Sparse trinary weighted RNNs as a path to better language model interpretability

Am8ryllis3y20

I'm glad we agree that RNNs are nice.

So if I understand correctly, you are saying:

A trinary weighted LLM with accuracy comparable to Chinchilla (70B weights) would need significantly more (dense) trits, let's say >140B?
An LLM with significantly more trit weights is less interpretable than an LLM with a less quantity of float weights?
Do you disagree regarding harm if successful?

Consider that most of the trits will be 0 and thus removable, and that we will be replacing the activations with boolean logic and applying logic simplification transformations to... (read more)

6Charlie Steiner3y

The more quantized the weights and activations, the harder it is to embed >n features in n bits without them interfering with each other - interference that stops you from adding together features in semantically sensible ways, or decomposing a state into features. So those small bits aren't just being wasted - at least I think not, in most parts of modern NNs.

Sparse trinary weighted RNNs as a path to better language model interpretability

Am8ryllis3y10

I am hopeful that we can get interpretability and easy training. But you may well be right.

After skimming some of your progress reports, I am very excited about your sparse nets work!

2Nathan Helm-Burger3y

Thanks! And I'm excited to hear more about your work. It sounds like if it did work, the results would be quite interesting.

Sparse trinary weighted RNNs as a path to better language model interpretability

Am8ryllis3y*20

Discretized weights/activation are very much not amenable to the usual gradient descent. :) Hence the usual practice is to train in floating point, and then quantize afterwords. Doing this naively tends to cause a big drop in accuracy, but there are tricks involving gradually quantizing during training, or quantizing layer by layer.

LESSWRONG
LW

All of Am8ryllis's Comments + Replies