"Large Concept Models" (LCM) Paper

Distillation of Meta's Large Concept Models Paper

Yeah, the context length was 128 concepts for the small tests they did between architectures, and 2048 concepts for the larger models.

How this exactly translates is kind of variable. They limit the concepts to be around 200 characters, but this could be any number of tokens. They say they trained the large model on 2.7T tokens and 142B concepts, so on average 19 tokens per concept.

The 128 would translate to 2.4k tokens, and the 2048 concepts would translate to approx 39k tokens.

Replying toLiterature Review of Text AutoEncoders

Literature Review of Text AutoEncoders

Yeah it was annoying to get working. I now have added a Google Colab in case anyone else wants to try anything.

It does seem interesting that the semantic arithmetic is hit or miss (mostly miss).

Energy Markets Temporal Arbitrage with Batteries

Distillation of Meta's Large Concept Models Paper

Epistemic Status: I am not an energy expert, and this was done rather briefly. All analysis uses pricing data specific to Ireland, but some general ideas are likely applicable more broadly. Data is true as of March 2025. Where there are uncertainties I try to state them, but there are likely some factual errors.

TL;DR:

Now that batteries have gotten so cheap (approx 50-100€/kWh), it seems surprising that there is so much temporal price discrepancy to buy energy at low prices and sell/use energy at higher prices. In Ireland, the arbitrage opportunity per year seems to be on the order of ~45€/kWh per year on day-ahead wholesale markets, possibly more on balancing markets, and... (read 4744 more words →)

Literature Review of Text AutoEncoders

Note: I had this as a draft for a while. I think it is accurate, but there may be errors. I am not in any way affiliated with the authors of the paper.

Below I briefly discuss the "Large Concept Models" paper released by Meta, which tries to change some of the paradigm of doing language modelling. It has some limitations that are not present for normal language models, but I read spent the time to read the paper in relative depth so I am here to provide a brief summary of it.

"Large Concept Models" (LCM) Paper

Large Concept Models aim to be a way to "improve language modelling" by "being more hierarchical". I... (read 1088 more words →)

Replying toLiterature Review of Text AutoEncoders

NickyP1y*

Thanks for reading, and yeah I was also surprised by how well it does. It does seem like there is degradation in auto-encoding from the translation, but I would guess that it probably does also make the embedding space have some nicer properties

I bet if you add Gaussian noise to them they still decode fine

I did try some small tests to see how sensitive the Sonar model is to noise, and it seems OK. I tried adding gaussian noise and it started breaking at around >0.5x the original vector size, or at around cosine similarity <0.9, but haven't tested too deeply, and it seemed to depend a lot on the text.

There also

... (read 352 more words →)

Replying toParaScopes: Do Language Models Plan the Upcoming Paragraph?

ParaScopes: Do Language Models Plan the Upcoming Paragraph?

Ok thanks, not sure why that happened but it should be fixed now.

ParaScopes: Do Language Models Plan the Upcoming Paragraph?

Literature Review of Text AutoEncoders

This work is a continuation of work in a workshop paper: Extracting Paragraphs from LLM Token Activations, and based on continuous research into my main research agenda: Modelling Trajectories of Language Models. See the GitHub repository for code additional details.

Looking at the path directly in front of the LLM Black Box.

Short Version (5 minute version)

I've been trying to understand how Language models "plan", in particular what they're going to write. I propose the idea of Residual Stream Decoders, and in particular, "ParaScopes" to understand if a language model might be scoping out the upcoming paragraph within their residual stream.

I find some evidence that a couple of relatively basic methods can sometimes find... (read 5861 more words →)

This is a brief literature review of Text AutoEncoders, as I used them in a recent project and did not find a good resource covering them.

TL;DR: There exist models that take some text -> encode it into a single vector -> decode back into approximately the same text. Meta's SONAR models seem to be the best at the moment for this. You can try them in this google colab.

Introduction

Text AutoEncoders are a simple approach: you train a model to encode an entire input sequence (e.g. a sentence) into a latent representation, and decode that representation to reconstruct the original text.

Most literature is interested in "text-embed models" or "sentence transformers", which embed a... (read 2356 more words →)

Replying toMonet: Mixture of Monosemantic Experts for Transformers Explained

Monet: Mixture of Monosemantic Experts for Transformers Explained

The unlearning results seem promising!

The author's results from unlearning MMLU seems slightly rushed but moderately promising (I previously wrote a paper trying similar things, making good comparisons here is difficult), but the results from unlearning different coding languages seem very strong (compared to my previous attempt), the model seems to be substantially more monosemantic.

I agree with your suspicions that the gemma SAE performance was poor from using reconstructed activations, matches the drop in performance I got when I tried doing this.

Would be interesting to see if, e.g. steering performance from MONET expert directions is also comparable to that of SAEs. Using SAEs in practice is quite costly so I would prefer an approach more similar to MONET.

Confusing the metric for the meaning: Perhaps correlated attributes are "natural"

Epistemic status: possibly trivial, but I hadn't heard it before.

TL;DR: What I thought of as a "flaw" in PCA—its inability to isolate pure metrics—might actually be a feature that aligns with our cognitive processes. We often think in terms of composite concepts (e.g., "Age + correlated attributes") rather than pure metrics, and this composite thinking might be more natural and efficient

Introduction

I recently found myself describing Principal Component Analysis (PCA) and pondering its potential drawbacks. However, upon further reflection, I'm reconsidering whether what I initially viewed as a limitation might actually be a feature. This led me to think about how our minds — and, potentially, language models — might naturally encode information... (read 1029 more words →)

Replying toI found >800 orthogonal "write code" steering vectors

I found >800 orthogonal "write code" steering vectors

I wonder how much of these orthogonal vectors are "actually orthogonal" once we consider we are adding two vectors together, and that the model has things like LayerNorm.

If one conditions on downstream midlayer activations being "sufficiently different" it seems possible one could find like 10x degeneracy of actual effects these have on models. (A possibly relevant factor is how big the original activation vector is compared to the steering vector?)

Comparing Quantized Performance in Llama Models

Epistemic Status: Quick tests, most of this was done in less than 48 hours

TL;DR: Can you skimp on GPU VRAM? 8bit quantized seems fine, for 4bit it depends.

I was asked by @Teun van der Weij, to what degree one can run evaluations on quantized models, and I was unsure. I have run some evaluations with Llama 3 and have some quick comparisons now.

Main Quantization Schemes

Here is a list of some different quantization schemes discussed:

GGUF - Special file format used in Llama.cpp. Not supported in transformers.
BNB - BitsAndBytes, the original default in huggingface transformers.
BNB NF4 - Alternative mode for bits and bytes, "4-bit NormalFloat"
HQQ - Half-Quadratic Quantization, supports 1-8 bits.
GPTQ - One of

... (read 2114 more words →)

Replying toDeep Forgetting & Unlearning for Safely-Scoped LLMs

Deep Forgetting & Unlearning for Safely-Scoped LLMs

I think there are already some papers doing similar work, though usually sold as reducing inference costs. For example, the MoEfication paper and Contextual Sparsity paper could probably be modified for this purpose.

Replying toAISC 2024 - Project Summaries

AISC 2024 - Project Summaries

Sorry! I have fixed this now

Replying toAI Safety Camp 2024

AISC 2024 - Project Summaries

AI Safety Camp 2024

In case anyone finds it difficult to go through all the projects, I have made a longer post where each project title is followed by a brief description, and a list of the main skills/roles they are looking for.

See here: https://www.lesswrong.com/posts/npkvZG67hRvBneoQ9

Research Adenda: Modelling Trajectories of Language Models

Apply to AI Safety Camp 2024 by 1st December 2023. All mistakes here are my own.

Below are some summaries for each project proposal, listed in order of how they appear on the website. These are edited by me, and most have not yet been reviewed by the project leads. I think having a list like this makes it easier for people to navigate all the different projects, and the original post/website did not have one, so I made this.

If a project catches your interest, click on the title to read more about it.

Note that the summarisation here is lossy. The desired skills as here may be misrepresented, and if you are... (read 5369 more words →)

Machine Unlearning Evaluations as Interpretability Benchmarks

Apply to work on this project with me at AI Safety Camp 2024 before 1st December 2023.

Summary

Rather than asking “What next token will the Language Model Predict?” or “What next action will an RL agent take?”, I think it is important to be able to model the longer-term behaviour of models, rather than just the immediate next token or action. I think there likely exist parameter- and compute-efficient ways to summarise what kinds of longer-term trajectories/outputs a model might output given an input and its activations. The aim of this project would be to conceptually develop ideas, and to practically build a method to summarise the possible completion... (read 3424 more words →)