Disclaimer The following content is for educational and research purposes only. It is not intended to encourage or guide any illegal activity. The synthesis of certain substances is illegal under various international and national laws. Model developers have been notified. tl;dr This is a short post on an attempt to...
*All authors have equal contribution. This is a short informal post about recent discoveries our team made when we: 1. Clustered feature vectors in the decoder matrices of Sparse Autoencoders (SAEs) trained on the residual stream of each layer of GPT-2 Small and Gemma-2B. 2. Visualized the clusters in 2D...
This is a re-post from our previous version that incorporates initial feedback. The preprint version of this post is available on ArXiv. We investigate whether we can bias the output of Vision-Language Models with visual stimuli that contradict the "correct answer." This phenomenon, which we term "multi-modal agreeableness", is a...
Does it make sense to extract sparse feature graph for a behavior from only residual layers of gpt2 small or do we need all mlp and attention as well?
Disclaimer: This is not a literature review or a research post but a journal-like entry on a concept I found intriguing. For a more comprehensive review, listen to the Huberman Podcast on Willpower & Tenacity or read Lisa Feldman Barrett's paper. It's very difficult to find local hubs in the...
Abstract We argue that there are certain properties of language that our current large language models (LLMs) don't learn. We present an empirical investigation of visual-auditory properties of language through a series of tasks, termed H-Test. This benchmark highlights a fundamental gap between human linguistic comprehension, which naturally integrates sensory...