Magdalena Wache

Finite Factored Sets in Pictures

Finite factored sets are a new paradigm for talking about causality. You can use them to do some cool things you can’t do with Pearl’s causal graphs, for example inferring a causal arrow between two binary variables. Also, finite factored sets are a really neat mathematical structure: they are a way of taking a set and expressing it as a product of some factors. Set factorizations are analogous to integer factorizations, in the same way that set partitions are analogous to integer partitions. So, here is my current understanding of finite factored sets, in pictures. 1. What are Set Factorizations? What do these “factored sets” look like? Let’s start with a set S and factor it. The first concept we need is a partition of a set S. A partition is a way of chopping up S into subsets (called parts). Here are a few examples of partitions: We usually call the partitions X,Y,Z,U,V, or W, and their parts xi,yi,… like this: U is called the trivial partition. It only has one part. We can think of partitions as properties, or variables over our set. For example, consider a set like this: and compare it to the partitions X,Y and Z from above: Then * The partition X is the property “color”, with x1 = blue and x2 = orange. * The partition Y is the property “form” with y1 = square, and y2 = circle. * The partition Z is the property “filled” with z1 = yes, and z2 = no. Exercise Consider these two partitions X and Y on the set S. What would it look like to represent them as properties (e.g. X = shape, Y = color) instead? Spoiler space It could look something like this: Here X = shape = {x1,x2,x3} = {star, circle, square}, and Y = color ={y1,y2} = {green, orange}. I hope you can see how partitions and properties are basically the same thing. In the rest of this post, I will use “partitions” and “properties” interchangeably. Sometimes I will use the ring-style visualization of partitions, and sometimes the property style, depending

186Dec 11, 2022

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

108May 20, 2024

Interpretability Externalities Case Study - Hungry Hungry Hippos

64Sep 20, 2023

Technical AI Safety Research Landscape [Slides]

49Sep 18, 2023

Magdalena Wache

Message

570

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

This is a linkpost for our two recent papers: 1. An exploration of using degeneracy in the loss landscape for interpretability https://arxiv.org/abs/2405.10927 2. An empirical test of an interpretability technique based on the loss landscape https://arxiv.org/abs/2405.10928 This work was produced at Apollo Research in collaboration with Kaarel Hanni (Cadenza Labs),...

May 20, 2024108

Interpretability Externalities Case Study - Hungry Hungry Hippos

Some people worry about interpretability research being useful for AI capabilities and potentially net-negative. As far as I was aware of, this worry has mostly been theoretical, but now there is a real world example: The hungry hungry hippos (H3) paper. Tl;dr: The H3 paper * Proposes an architecture for...

Sep 20, 202364

Technical AI Safety Research Landscape [Slides]

I recently gave a technical AI safety research overview talk at EAGx Berlin. Many people told me they found the talk insightful, so I’m sharing the slides here as well. I edited them for clarity and conciseness, and added explanations. Outline This presentation contains 1. An overview of different research...

Sep 18, 202349

AI Safety Europe Retreat 2023 Retrospective

This is a short impression of the AI Safety Europe Retreat (AISER) 2023 in Berlin. Tl;dr: 67 people working on AI safety technical research, AI governance, and AI safety field building came together for three days to learn, connect, and make progress on AI safety. Format The retreat was an...

Apr 14, 202343

Finite Factored Sets in Pictures

Dec 11, 2022186

LESSWRONG
LW

LESSWRONG
LW

Magdalena Wache

Magdalena Wache

Magdalena Wache

Finite Factored Sets in Pictures

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Interpretability Externalities Case Study - Hungry Hungry Hippos

Technical AI Safety Research Landscape [Slides]

Magdalena Wache

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Interpretability Externalities Case Study - Hungry Hungry Hippos

Technical AI Safety Research Landscape [Slides]

AI Safety Europe Retreat 2023 Retrospective

Finite Factored Sets in Pictures

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Interpretability Externalities Case Study - Hungry Hungry Hippos

Technical AI Safety Research Landscape [Slides]

AI Safety Europe Retreat 2023 Retrospective

Finite Factored Sets in Pictures

Finite Factored Sets in Pictures

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Interpretability Externalities Case Study - Hungry Hungry Hippos

Technical AI Safety Research Landscape [Slides]