TristanTrim

Zoom Out: Distributions in Semantic Spaces

(This article is edited and expanded on from a comment I made to someone in the newly starting BAIF slack community. Thanks for the inspiration 🙏) Introduction In this article I present an alternative paradigm for Mechanistic Interpretability (MI). This paradigm may turn out to be better or worse, or naturally combine with the standard paradigm I often see implicitly extended from Chis Olah's "Zoom-In". I've talked about this concept before, in various places. Someday I may collect them and try to present a strong case including a survey of paradigms in MI literature. For now, here is a relatively short introduction to the concept assuming some familiarity with ML and MI. Afaik, Chris Olah originally introduced the concepts of "features" and "circuits" in "Zoom-In", as a suggestion for a direction for exploration, not as a certainty. It worked very well for thinking about things like "circle" and "texture" detectors, which I think are a natural, but incorrect way of understanding what is going on. New Mechanistic Interpretability Paradigm? I have been developing an alternate paradigm I'm not currently sure anyone else is talking about. It is now common to think of the collective inputs or outputs of network layers as vectors rather than individual signals. The concept which I am uncertain anyone is focusing on, is that each vector is representative of a semantic space in which distributions live. Input Space For example, in a cat-dog-labeling net, the input space is images and there are two distributions living in this space. The cat-distribution is all possible images that are of cats. We can make some claims about that distribution, such as the idea that it is continuous and connected. The same thing is true of the dog-distribution, but additionally, the dog-distribution may be connected to the cat distribution in several spaces containing the set of images that are ambiguous, maybe a cat, maybe a dog. There is also implicitly a distribution of images th

14Aug 6, 2025

TristanTrim

Message

Still haven't heard a better suggestion than CEV.

443

294

Ball+Gravity has a "Downhill" Preference

[epistemic status: This is a rambling thought experiment with the goal of clarifying my ontological understanding of "agent foundations" type stuff. Scroll to the bottom for the resulting two "interesting focuses of confusion".] Returning to the "ball rolls down a hill" example of Alex Altair's My research agenda in agent...

Feb 276

Naloe: A True Program Editor

When I was younger and ever so slightly more idealistic I dreamed of making a programming editor that didn't suck. I was going to call it "Naloe" and provide many bacronyms to explain the name. In the nature of young idealistic programmers, I allowed the scope of the idea to...

Feb 258

TristanTrim's Shortform

Feb 234

TT Self Study Journal # 7

[Epistemic Status: This is an artifact of my self study I am using to help self manage. As such, I don't expect anyone to fully read it. Please skim and leave a comment, even just to say "good work/good luck". ] Highlights * I started a more focused project to...

Feb 2113

TT Self Study Journal # 6

[Epistemic Status: This is an artifact of my self study. I am using help manage my focus. As such, I don't expect anyone to fully read it. If you have particular interest or expertise, skip to the relevant sections, and please leave a comment, even just to say "good work/good...

Feb 55

TT's Looking-for-Work Strategy

I want to get better at networking. Not computer networking, networking with people. Well, networking with people over computer networks... I have a few goals here: 1. I want people who I can talk with about the incredibly niche topics I am trying to become proficient within. Helping one another...

Feb 54

Semantic Topological Spaces

[ Edit 1, Correction: Originally I incorrectly used the term "subspace" while meaning "quotient topology". Thanks to AprilSR for pointing out the original version of Claim 2 was false with the original wording. ] [ Edit 2, Correction: I had used the term "monotonic" instead of "strictly monotonic". Thanks to...

Jan 411

Load More (7/26)

LESSWRONG
LW

LESSWRONG
LW

TristanTrim

TristanTrim

TristanTrim

Zoom Out: Distributions in Semantic Spaces

TT Self Study Journal # 7

I have hope

Semantic Topological Spaces

TristanTrim

Ball+Gravity has a "Downhill" Preference

Naloe: A True Program Editor

TristanTrim's Shortform

TT Self Study Journal # 7

TT Self Study Journal # 6

TT's Looking-for-Work Strategy

Semantic Topological Spaces

Zoom Out: Distributions in Semantic Spaces

TT Self Study Journal # 7

I have hope

Semantic Topological Spaces

Ball+Gravity has a "Downhill" Preference

Naloe: A True Program Editor

TristanTrim's Shortform

TT Self Study Journal # 7

TT Self Study Journal # 6

TT's Looking-for-Work Strategy

Semantic Topological Spaces