The problem here is that sequence embeddings should have tons of side-channels which should convey non-semantic information (like, say, frequencies of tokens in sequence) and you can come a long way with this sort of information.
What would be really interesting is to train embedding models in different languages and check whether you can translate highly metaphorical sentences with no correspondence other than semantic, or train embedding models on different representations of the same math (for example, matrix mechanics vs wave mechanics formulations of quantum mechanics) and see if they recognize equivalent theorems.
Who's gonna do that? It's not like we have enough young people for rapid cultural evolution.
Yeah, they both made up some stuff in response to the same question.
I'm so far not impressed with Claude 4s. They are trying to make up superficially plausible stuff for my math questions as fast as possible. Sonnet 3.7, at least, explored a lot of genuinely interesting venues before making an error. "Making up superficially plausible stuff" sounds like a good strategy for hacking not very robust verifiers.
I dunno how much it's obvious for people who want to try for bounty, but I only now realized that you can express criteria for redund as inequality with mutual information and I find mutual information to be much nicer to work with, even if from pure convenience of notation. Proof:
Let's take criterion for redund w.r.t. of ,
expand expression for KL divergence:
expand joint distribution:
simplify:
which is a conditional mutual information:
what results in:
how should you pick which reference class to use
You shouldn't. This epistemic bath has no baby in it and we should throw water out of it.
It's really sad that we still don't have bookmarks for comments
No, the point is that AI x-risk is commonsensical. "If you drink much from a bottle marked poison it is certain to disagree with you sooner or later" even if you don't know mechanism of action of poison. We don't expect Newtonian mechanics to prove that hitting yourself with a brick is quite safe, if we'd found that Newtonian mechanics predicts hitting yourself with a brick to be safe, it would be a huge evidence for Newtonian mechanics to be wrong. Good theories usually support common intuitions.
The other thing here is an isolated demand for rigor: there is no "technical understanding of today’s deep learning systems" which predicts, say, success of AGI labs or that their final products are going to be safe.
For your information, Ukraine seems to have attacked airfields in Murmansk and Irkutsk Oblast's. It's approximately 1800 and 4500 km from Ukraine border respectively. Suspected method of attack is drones, transported on truck.