Samuel Nellessen

Investigating Representations in the Embedding in SONAR Text Autoencoders

Introduction and Motivation As part of the 5th iteration of the Arena AI safety research program we undertook a capstone project. For this project we ran a range of experiments on an encoder-decoder model, focusing on how information is stored in the bottleneck embedding and how modifying the input text...

Sep 6, 20255

Investigating Internal Representations of Correctness in SONAR Text Autoencoders

TL;DR: We probed SONAR text autoencoders to see if they implicitly learn "correctness" across domains. Turns out they do, but with a clear hierarchy: code validity (96% accuracy) > grammaticality (93%, cross-lingual) > basic arithmetic (76%, addition only) > chess syntax (weak) > chess semantics (absent). The hierarchy suggests correctness...

Aug 6, 20255

[Hebbian Natural Abstractions] Mathematical Foundations

TL;DR: We showed how Hebbian learning with weight decay could enable a) feedforward circuits (one-to-many) to extract the first principal component of a barrage of inputs and b) recurrent circuits to amplify signals which are present across multiple input streams and suppress signals which are likely spurious. Short recap In...

Dec 25, 202215

[Hebbian Natural Abstractions] Introduction

With this sequence, we (Sam + Jan) want to provide a principled derivation of the natural abstractions hypothesis (which we will introduce in-depth in later posts) by motivating it with insights from computational neuroscience. Goals for this sequence are: * show why we expect natural abstractions to emerge in biological...

Nov 21, 202234

Accountability Buddies: Why you might want one (+ Database to find one!)

TL;DR: An accountability buddy is someone to check in with from time to time to give you social motivation to achieve your goals. There are many additional benefits from this process such as planning together and getting feedback on your progress. I think especially EAs in remote areas or those...

Oct 23, 202210

Confusion about neuroscience/cognitive science as a danger for AI Alignment

Recently, I wrote an article together with Jan Kirchner on "brain enthusiasts" in AI Safety (if you find work on neuroscience/cognitive science x AI (Safety) interesting, let me know 🙂). When crafting and researching our arguments, we often came across one argument, that made us confused and uncertain about this...

Jun 22, 20223

"Brain enthusiasts" in AI Safety

TL;DR: If you're a student of cognitive science or neuroscience and are wondering whether it can make sense to work in AI Safety, this guide is for you! (Spoiler alert: the answer is "mostly yes"). Motivation AI Safety is a rapidly growing field of research that is singular in its...

Jun 18, 202264

LESSWRONG
LW

LESSWRONG
LW

Samuel Nellessen

Samuel Nellessen

"Brain enthusiasts" in AI Safety

[Hebbian Natural Abstractions] Introduction

[Hebbian Natural Abstractions] Mathematical Foundations

Accountability Buddies: Why you might want one (+ Database to find one!)

Samuel Nellessen

"Brain enthusiasts" in AI Safety

[Hebbian Natural Abstractions] Introduction

[Hebbian Natural Abstractions] Mathematical Foundations

Accountability Buddies: Why you might want one (+ Database to find one!)

Investigating Representations in the Embedding in SONAR Text Autoencoders

Investigating Internal Representations of Correctness in SONAR Text Autoencoders

[Hebbian Natural Abstractions] Mathematical Foundations

[Hebbian Natural Abstractions] Introduction

Accountability Buddies: Why you might want one (+ Database to find one!)

Confusion about neuroscience/cognitive science as a danger for AI Alignment

"Brain enthusiasts" in AI Safety