quetzal_rainbow - LessWrong

For your information, Ukraine seems to have attacked airfields in Murmansk and Irkutsk Oblast's. It's approximately 1800 and 4500 km from Ukraine border respectively. Suspected method of attack is drones, transported on truck.

Does the Universal Geometry of Embeddings paper have big implications for interpretability?

Answer by quetzal_rainbowMay 28, 202540

The problem here is that sequence embeddings should have tons of side-channels which should convey non-semantic information (like, say, frequencies of tokens in sequence) and you can come a long way with this sort of information.

What would be really interesting is to train embedding models in different languages and check whether you can translate highly metaphorical sentences with no correspondence other than semantic, or train embedding models on different representations of the same math (for example, matrix mechanics vs wave mechanics formulations of quantum mechanics) and see if they recognize equivalent theorems.

tailcalled's Shortform

quetzal_rainbow10d20

Who's gonna do that? It's not like we have enough young people for rapid cultural evolution.

silentbob's Shortform

quetzal_rainbow13d64

I think you would appreciate this post

quetzal_rainbow's Shortform

quetzal_rainbow14d20

Yeah, they both made up some stuff in response to the same question.

quetzal_rainbow's Shortform

quetzal_rainbow14d92

I'm so far not impressed with Claude 4s. They are trying to make up superficially plausible stuff for my math questions as fast as possible. Sonnet 3.7, at least, explored a lot of genuinely interesting venues before making an error. "Making up superficially plausible stuff" sounds like a good strategy for hacking not very robust verifiers.

$500 + $500 Bounty Problem: Does An (Approximately) Deterministic Maximal Redund Always Exist?

quetzal_rainbow17d90

I dunno how much it's obvious for people who want to try for bounty, but I only now realized that you can express criteria for redund as inequality with mutual information and I find mutual information to be much nicer to work with, even if from pure convenience of notation. Proof:

Let's take criterion for redund w.r.t. of $X_{1}$ , $X_{2}$

$ϵ \geq D_{K L} (P [X_{1}, X_{2}, Γ] ∥ P [X_{1}] P [X_{2} | X_{1}] P [Γ | X_{2}])$

expand expression for KL divergence: $D_{K L} (P [X_{1}, X_{2}, Γ] ∥ P [X_{1}] P [X_{2} | X_{1}] P [Γ | X_{2}]) = \sum_{x_{1}, x_{2}, γ} P (x_{1}, x_{2}, γ) log \frac{P (x_{1}, x_{2}, γ)}{P (x_{1}) P (x_{2} | x_{1}) P (γ | x_{2})}$

expand joint distribution:

$D_{K L} (P [X_{1}, X_{2}, Γ] ∥ P [X_{1}] P [X_{2} | X_{1}] P [Γ | X_{2}]) = \sum_{x_{1}, x_{2}, γ} P (x_{1}, x_{2}, γ) log \frac{P (x_{1}) P (x_{2} | x_{1}) P (γ | x_{1}, x_{2})}{P (x_{1}) P (x_{2} | x_{1}) P (γ | x_{2})}$

simplify:

$D_{K L} (P [X_{1}, X_{2}, Γ] ∥ P [X_{1}] P [X_{2} | X_{1}] P [Γ | X_{2}]) = \sum_{x_{1}, x_{2}, γ} P (x_{1}, x_{2}, γ) log \frac{P (γ | x_{1}, x_{2})}{P (γ | x_{2})}$

which is a conditional mutual information:

$\sum_{x_{1}, x_{2}, γ} P (x_{1}, x_{2}, γ) log \frac{P (γ | x_{1}, x_{2})}{P (γ | x_{2})} = I (X_{1}; Γ | X_{2})$

what results in:

$D_{K L} (P [X_{1}, X_{2}, Γ] ∥ P [X_{1}] P [X_{2} | X_{1}] P [Γ | X_{2}]) = I (X_{1}; Γ | X_{2}) \leq ϵ$

leogao's Shortform

quetzal_rainbow18d80

how should you pick which reference class to use

You shouldn't. This epistemic bath has no baby in it and we should throw water out of it.

Towards_Keeperhood's Shortform

quetzal_rainbow19d97

It's really sad that we still don't have bookmarks for comments

AI Doomerism in 1879

quetzal_rainbow21d53

No, the point is that AI x-risk is commonsensical. "If you drink much from a bottle marked poison it is certain to disagree with you sooner or later" even if you don't know mechanism of action of poison. We don't expect Newtonian mechanics to prove that hitting yourself with a brick is quite safe, if we'd found that Newtonian mechanics predicts hitting yourself with a brick to be safe, it would be a huge evidence for Newtonian mechanics to be wrong. Good theories usually support common intuitions.

The other thing here is an isolated demand for rigor: there is no "technical understanding of today’s deep learning systems" which predicts, say, success of AGI labs or that their final products are going to be safe.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments