This is a linkpost for our two recent papers: 1. An exploration of using degeneracy in the loss landscape for interpretability https://arxiv.org/abs/2405.10927 2. An empirical test of an interpretability technique based on the loss landscape https://arxiv.org/abs/2405.10928 This work was produced at Apollo Research in collaboration with Kaarel Hanni (Cadenza Labs),...
Not enough people are working on the risk of microplastics in the brain (and body) on general health. This has been swept under the rug as many companies are incentivised to do so. The fact that we each have on average a plastic spoon's worth of it, and its correlation with dementia is most concerning: https://hsc.unm.edu/news/2025/_media/41591_2024_article_3453.pdf
Some broad points:
Thanks for the post! This is fantastic stuff, and IMO should be required MI reading.
Does anyone who perhaps knows more about this than me wonder if SQ dimension is a good formal metric for grounding the concept of explicit vs tacit representations? It appears to me the only reason you can't reduce a system down further by compressing into a 'feature' is that by default it relies on aggregation, requiring a 'bird's eye view' of all the information in the network.
I mention this as I was revisiting some old readings today on inductive biases of NNs, then realised that one reason why low complexity functions can be arbitrarily hard for NNs to learn could be because they have high SQ dimension (best example: binary parity).
So, I'm mostly referencing trends in e-commerce here. For example, first Amazon put storefronts out of business by allowing drop-shipping of cheaply manufactured goods with no warranty. Now, Temu is competing with Amazon by exploiting import tax loopholes, selling the same items at below production price, many of which contain pthalates and other chemical compounds at multiple times the safe standards. This is a standard trick for monopolisation pulled by large giants: they will then rack the prices back up once they have a stable user base, and start making profit. Uber did this.
The drop in clothing standard is real, though, because fast fashion didn't really exist until the 2000s.10 years is... (read more)
I do worry we are already seeing this. To quote the word exactly, the 'enshittification' of everything we can buy and services we are provided is real. The best example of this high-quality clothing, but pretty much everything you can buy online at Amazon shows this too. It's important to be able to maintain quality separate of market dynamics, IMO, at least because some people value it (and consumers aren't really voting if there is no choice).
I think seems to be a very accurate abstraction of what is happening. During sleep, the brain consolidates (compresses and throws away) information. This would be equivalent to summarising the context window + discussion so far, and adding it to a running 'knowledge graph'. I would be surprised if someone somewhere has not tried this already on LLMs - summarising the existing context + discussion, formalising it in an external knowledge graph, and allowing the LLM to do RAG over this during inference in future.
Although, I do think LLM hallucinations and brain hallucinations arise via separate mechanisms. Especially there is evidence showing human hallucinations (sensory processing errors) occur as an inability of the brain's top-down inference (the bayesian 'what I expect to see based on priors') to happen correctly. There is instead increased reliance on bottom-up processing (https://www.neuwritewest.org/blog/why-do-humans-hallucinate-on-little-sleep).
This is a linkpost for our two recent papers:
This work was produced at Apollo Research in collaboration with Kaarel Hanni (Cadenza Labs), Avery Griffin, Joern Stoehler, Magdalena Wache and Cindy Wu. Not to be confused with Apollo's recent Sparse Dictionary Learning paper.
A key obstacle to mechanistic interpretability is finding the right representation of neural network internals. Optimally, we would like to derive our features from some high-level principle that holds across different architectures and use cases. At a minimum, we know two things:
The number of people who are truly intellectually humble is hilariously small, even on LessWrong. True intellectual humility demands objective disengagement with all -isms, thought structures and romantic ideas. It demands rugged criticism, refusing to tie yourself to any one set of ideas, because they're probably all wrong in different ways. It requires understanding that cause and effect often doesn't exist, instead filled by statistics and random chance. It requires rejecting the strong (wo)man theory of history, politics and economics, and refusing to outsource your thinking to other people because they have 'thought about it harder'. If you cannot outsource, you should lean towards saying 'I do not know', rather than pointing to the most common opinion. It requires strict citations for all statements made.
Even on LW I don't think the standards are high enough.