LESSWRONG
Wikitags
LW

Subscribe
Discussion1

Alignment Jam

Subscribe
Discussion1
Written by Esben Kran last updated 16th May 2023

This lists the posts that have come from the Alignment Jam hackathons.

Posts tagged Alignment Jam
2
34Computational Mechanics Hackathon (June 1 & 2)
Adam Shai
1y
5
1
143We Found An Neuron in GPT-2
Ω
Joseph Miller, Clement Neo
2y
Ω
23
1
119Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1
Ω
StefanHex, Marius Hobbhahn
2y
Ω
1
1
81Results from the interpretability hackathon
Esben Kran, Neel Nanda
2y
0
1
71Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2
Ω
StefanHex, Marius Hobbhahn
2y
Ω
1
1
47Robustness of Model-Graded Evaluations and Automated Interpretability
Ω
Simon Lermen, viluon
2y
Ω
5
1
47How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!
Ω
StefanHex
2y
Ω
5
1
21Superposition and Dropout
Edoardo Pona
2y
5
1
20Finding Deception in Language Models
Ω
Esben Kran, Archana Vaidheeswaran
9mo
Ω
4
1
18Identifying semantic neurons, mechanistic circuits & interpretability web apps
Esben Kran, Neel Nanda
2y
0
1
13Results from the AI testing hackathon
Esben Kran
2y
0
1
11Towards AI Safety Infrastructure: Talk & Outline
Paul Bricman
1y
0
1
5Demonstrate and evaluate risks from AI to society at the AI x Democracy research hackathon
Esben Kran
1y
0
Add Posts