faul_sname - LessWrong

Your POS system exports data that your inventory software imports and uses. But I strongly suspect that this is often not possible in practice.

This sounds like exactly the sort of problem that a business might pay for a solution to, particularly if there is one particular pair of POS system / inventory software that is widely used in the industry in question, where those pieces of software don't natively play well together.

Experiments with an alternative method to promote sparsity in sparse autoencoders

faul_sname1d20

The other baseline would be to compare one L1-trained SAE against another L1-trained SAE -- if you see a similar approximate "1/10 have cossim > 0.9, 1/3 have cossim > 0.8, 1/2 have cossim > 0.7" pattern, that's not definitive proof that both approaches find "the same kind of features" but it would strongly suggest that, at least to me.

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

faul_sname1d102

With that in mind, the real hot possibility is the inverse of what Shai and his coresearchers did. Rather than start with a toy model with some known nice latents, start with a net trained on real-world data, and go look for self-similar sets of activations in order to figure out what latent variables the net models its environment as containing. The symmetries of the set would tell us something about how the net updates its distributions over latents in response to inputs and time passing, which in turn would inform how the net models the latents as relating to its inputs, which in turn would inform which real-world structures those latents represent.

Along these lines, I wonder whether you get similar scaling laws by training on these kind of hidden markov processes as you do by training on real-world data, and if so if there is some simple relationship between the underlying structure generating the data and the coefficients of those scaling laws. That might be informative for the question of what level of complexity you should expect in the self-similar activation sets in real-world LLMs. And if the scaling laws are very different, that would also be interesting.

Experiments with an alternative method to promote sparsity in sparse autoencoders

faul_sname3d30

This is really cool!

I did some tests on random features for interpretability, and found them to be interpretable. However, one would need to do a detailed comparison with SAEs trained on an L1 penalty to properly understand whether this loss function impacts interpretability. For what it’s worth, the distribution of feature sparsities suggests that we should expect reasonably interpretable features.

One cheap and lazy approach is to see how many of your features have high cosine similarity with the features of an existing L1-trained SAE (e.g. "900 of the 2048 features detected by the -trained model had cosine sim > 0.9 with one of the 2048 features detected by the L1-trained model"). I'd also be interested to see individual examinations of some of the features which consistently appear across multiple training runs in the $L_{a p p r o x}^{0}$ -trained model but don't appear in an L1-trained SAE on the training dataset.

nikola's Shortform

faul_sname4d63

Not just "some robots or nanomachines" but "enough robots or nanomachines to maintain existing chip fabs, and also the supply chains (e.g. for ultra-pure water and silicon) which feed into those chip fabs, or make its own high-performance computing hardware".

If useful self-replicating nanotech is easy to construct, this is obviously not that big of an ask. But if that's a load bearing part of your risk model, I think it's important to be explicit about that.

Open Thread Spring 2024

faul_sname6d20

By building models which reason inductively, we tackle complex formal language tasks with immense commercial value: code synthesis and theorem proving.

There are commercially valuable uses for tools for code synthesis and theorem proving. But structured approaches of that flavor don't have a great track record of e.g. doing classification tasks where the boundary conditions are messy and chaotic, and similarly for a bunch of other tasks where gradient-descent-lol-stack-more-layer-ML shines.

Open Thread Spring 2024

faul_sname6d20

Outside view (bitter lesson).

Or at least that's approximately true. I'll have a post on why I expect the bitter lesson to hold eventually, but is likely to be a while. If you read this blog post you can probably predict my reasoning for why I expect "learn only clean composable abstraction where the boundaries cut reality at the joints" to break down as an approach.

Open Thread Spring 2024

faul_sname7d30

I'd bet against anything particularly commercially successful. Manifold could give better and more precise predictions if you operationalize "commercially viable".

Is LLM Translation Without Rosetta Stone possible?

faul_sname7d20

Similar question: Let's start with an easier but I think similarly shaped problem.

We have two next-token predictors. Both are trained on English text, but each one was trained on a slightly different corpus (let's say one the first one was trained on all arxiv papers and the other one was trained on all public domain literature), and each one uses a different tokenizer (let's say the arxiv one used a BPE tokenizer and the literature one used some unknown tokenization stream).

Unfortunately, the tokenizer for the second corpus has been lost. You still have the tokenized dataset for the second corpus, and you still have the trained sequence predictor, but you've lost the token <-> word mapping. Also due to lobbying, the public domain is no longer a thing and so you don't have access to the original dataset to try to piece things back together.

You can still feed a sequence of integers which encode tokens to the literature-next-token-predictor, and it will spit out integers corresponding to its prediction of the next token, but you don't know what English words those tokens correspond to.

I expect, in this situation, that you could do stuff like "create a new sequence predictor that is trained on the tokenized version of both corpora, so that the new predictor will hopefully use some shared machinery for next token prediction for each dataset, and then do the whole sparse autoencoder thing to try and tease apart what those shared abstractions are to build hypotheses".

Even in that "easy" case, though, I think it's a bit harder than "just ask the LLM", but the easy case is, I think, viable.

Poker, Beef Wellington, and Mount Stupid

faul_sname7d20

For anyone who wants to play poker in the way mentioned above, where you treat the game as a puzzle / battle of wits where you deduce what cards your opponents have based on logic and psychology, let me know so we can set up a poker night!

Joking aside

Don't think your high level in one area will translate to others

Yeah, this is a pretty good guideline. There may be a general-factor-of-being-good-at-learning-things but, in my experience, there is no general-factor-of-being-good-at-things that transfers from one domain to another significantly different one.

LESSWRONG
LW

Posts

Wiki Contributions

Comments