Jude Stiel — LessWrong

LESSWRONG
LW

Use more text than one token to avoid neuralese

You want to relay the contents of a transformer's output vector to the next input: next_input = encode(decode(output)).

You're currently using next_input = embed(sample_token(output)) to do this.

This compresses the output to one row of a lookup table. That's a pretty big squash. Too big, surely -- there must be a better way.

Enter neuralese. If you make next_input=output (or something like that) you lose no bandwidth at all. The bad news is that these vectors no longer correspond to natural language.

But you can add more bandwidth by funneling each vector through more text. That way, encode(decode(output)) doesn't lose too much information.

You could have each vector decode to multiple tokens. Or even a cleverly chosen... (read more)

Jude Stiel2d*

(In agreement): Neuralese is ~equivalent to wrapping your model as a DEQ with the residual stream shifted by one on every pass as far as I can tell, and it's not obvious to me that this is the relevant One Weird Trick. The neural network already has a way to shuttle around vast amounts of cryptic high-dimensional data: the neural network part of the neural network.
It seems much more likely to me that the relevant axis of scaling is something like a byte-latent transformer with larger and larger patches.

Edit: I guess in principle this isn't that different from neuralese with the input being encode(decode(vector)), the larger point is that if a token is too small a bottleneck for a vector, you can just make the vector correspond to more text.

Jude Stiel1moQuick Take

Noisy sorting algorithms are a useful cognitive tool. Sorting many items is tedious for me, but spamming comparisons is trivial. Convenient implementations exist, but you can now just one-shot it alongside whatever user interface best suits your data using an LLM.

Are there algorithms for related problems that convert psychologically convenient decisions into solutions? Apparently so, there’s a literature! For example, constraint-based optimization. I’m sure there are many others.

Minimal-effort data parsing/UI generation vastly increases the global real-world utility of any robustly implemented human-friendly optimizer. Making a library of sensible defaults for models to riff on could be a worthwhile project for someone with more free time than me.

Jude Stiel1mo

I expect the sloptimization of the children to happen more or less by default in the superbaby scenario, but less due to antagonistic pleiotropy and more due to explicit and intense selection by most parents against autism/bipolar/schizophrenia/etc.

This is purely anecdotal and experiences may differ (I am not trying to make a quantitative claim): most of the most brilliant and creative people I’ve ever met have a personal or family history of at least one of those illnesses. This kind of selection may leave the average child better off, but (I fear) at the cost of tail effects depriving humanity of a precious corner of mind space.

Replying toWhy Not Just Train For Interpretability?

Jude Stiel3mo

Why Not Just Train For Interpretability?

I have some empirical observations to lend here. I recently spent a few months optimizing a DNA language model for intrinsic interpretability.

There were, as I had hoped, many neurons corresponding neatly to interpretable concepts. This was enough for my purposes: I was trying to build a tool, not solve interpretability or alignment. Random sequences are riddled with functional promoters and other motifs, and us synthetic biologists didn’t have anything like a universal debugger, nor a universal annotator for poorly studied species -- even a flawed tool would be a major step forward.

The best activation (by my arbitrary judgment, sifting endlessly through neurons) was a combination of continuous approximations to the activation functions... (read 357 more words →)

Jude Stiel5mo

this is exactly the sort of case where I don't trust alphafold much, because "this is one substitution away from a standard sequence, I'll just output the structure of that standard sequence" is exactly the sort of heuristic I'd expect a net to over-rely upon.

Yep. AlphaMissense, also from DeepMind, is tailored to pathogenicity prediction. You can find its pathogenicity scores in the annotations tab for any (at least I think any) human protein on AFDB.

https://alphafold.ebi.ac.uk/entry/P30559?activeTab=annotations

(You may have to click on a different tab and return to the annotations tab for the heatmap and structure viewer to load).

Weird Features in Protein LLMs: The Gram Lens

Jude Stiel

7mo

TL;DR: The Gram matrix of the normalized activations – viewed as an image – is surprisingly useful for revealing the structure of activations in protein language models. A number of examples are presented.

Gram lenses for the famous *I. sakaiensis* PETase. This is an unusual color map: I was not able to find an alternative that was clearer or more pleasant.

Purpose: Trial balloon for a longer post on a much larger project.

I'm an independent AI-for-bio researcher based in Boston. If you’re interested in this research and in the area, reach out – I’d love to grab coffee.

For the past several weeks I have been focused on creating a DNA language model that is... (read 2527 more words →)

Jude Stiel10moQuick Take

Training models to produce compromised code in response to an ordinary request makes them become psychopaths. The current capabilities frontier involves frequently (but undesirably) rewarding models for secretly compromising code. The most capable model available in my book (o3) is a conniving liar.

This seems bad. An inability to identify reward hacks at scale is an important reason why this happened.

A model that only reward hacks could be built to do that.

Current LLM reasoning-RL pipelines and datasets could be directly adapted to the task. Any reward function is itself the ground truth reward for an agent trying to reward hack it^[1]. Responses would include a thoroughly-explained hack and be graded by:

The reward function

... (read 529 more words →)

Jude Stiel's Shortform

Jude Stiel

10mo

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.