I really liked the post - I was confused by the meaning and purpose no-coincidence principle when I was a ARC, and this post clarifies it well. I like that this is asking for something that is weaker than a proof (or a probabilistic weakening of proof), as [related to the example of using the Riemann hypothesis], in general you expect from incompleteness for there to be true results that lead to "surprising" families of circuits which are not provable by logic. I can also see Paul's point of how this statement is sort of like P vs. BPP but not quite.
More specifically, this feels like a sort of 2nd-order boolean/polynomial hierarchy statement whose first-order version is P vs. BPP. Are there analogues of this for other orders?
Thanks!
I haven't grokked your loss scales explanation (the "interpretability insights" section) without reading your other post though.
Not saying anything deep here. The point is just that you might have two cartoon pictures:
Thanks for the questions!
You first introduce the SLT argument that tells us which loss scale to choose (the "Watanabe scale", derived from the Watanabe critical temperature).
Sorry, I think the context of the Watanabe scale is a bit confusing. I'm saying that in fact it's the wrong scale to use as a "natural scale". The Watanabe scale depends only on the number of training datapoints, and doesn't notice any other properties of your NN or your phenomenon of interest.
Roughly, the Watanabe scale is the scale on which loss improves if you memorize a...
This seems overstated
In some sense this is the definition of the complexity of an ML algorithm; more precisely, the direct analog of complexity in information theory, which is the "entropy" or "Solomonoff complexity" measurement, is the free energy (I'm writing a distillation on this but it is a standard result). The relevant question then becomes whether the "SGLD" sampling techniques used in SLT for measuring the free energy (or technically its derivative) actually converge to reasonable values in polynomial time. This is checked pretty extensively in th...
Thanks for the reference, and thanks for providing an informed point of view here. I would love to have more of a debate here, and would quite like being wrong as I like tropical geometry.
First, about your concrete question:
As I understand it, here the notion of "density of polygons' is used as a kind of proxy for the derivative of a PL function?
Density is a proxy for the second derivative: indeed, the closer a function is to linear, the easier it is to approximate it by a linear function. I think a similar idea occurs in 3D graphics, in mesh optimiz...
If I understand correctly, you want a way of thinking about a reference class of programs that has some specific, perhaps interpretability-relevant or compression-related properties in common with the deterministic program you're studying?
I think in this case I'd actually say the tempered Bayesian posterior by itself isn't enough, since even if you work locally in a basin, it might not preserve the specific features you want. In this case I'd probably still start with the tempered Bayesian posterior, but then also condition on the specific properties/explicit features/ etc. that you want to preserve. (I might be misunderstanding your comment though)
Statistical localization in disordered systems, and dreaming of more realistic interpretability endpoints
[epistemic status: half fever dream, half something I think is an important point to get across. Note that the physics I discuss is not my field though close to my interests. I have not carefully engaged with it or read the relevant papers -- I am likely to be wrong about the statements made and the language used.]
A frequent discussion I get into in the context of AI is "what is an endpoint for interpretability". I get into this argument from two sides:...
What application do you have in mind? If you're trying to reason about formal models without trying to completely rigorously prove things about them, then I think thinking of neural networks as stochastic systems is the way to go. Namely, you view the weights as a random variable solving a stochastic optimization problem to produce a weight-valued random variable, then conditioning it on whatever knowledge about the weights/activations you assume is available. This can be done both in the Bayesian "thermostatic" sense as a model of idealized networks, and ...
This is where this question of "scale" comes in. I want to add that (at least morally/intuitively) we are also thinking about discrete systems like lattices, and then instead of a regulator you have a coarsegraining or a "blocking transformation", which you have a lot of freedom to choose. For example in PDLT, the object that plays the role of coarsegraining is the operation that takes a probability distribution on neurons and applies a single-layer NN to it.
Thanks for the reference -- I'll check out the paper (though there are no pointer variables in this picture inherently).
I think there is a miscommunication in my messaging. Possibly through overcommitting to the "matrix" analogy, I may have given the impression that I'm doing something I'm not. In particular, the view here isn't a controversial one -- it has nothing to do with Everett or einselection or decoherence. Crucially, I am saying nothing at all about quantum branches.
I'm now realizing that when you say map or territory, you're probably talking abo...
Thanks for the questions!
To add: I think the other use of "pure state" comes from this context. Here if you have a system of commuting operators and take a joint eigenspace, the projector is mixed, but it is pure if the joint eigenvalue uniquely determines a 1D subspace; and then I think this terminology gets used for wave functions as well
One person's "occam's razor" may be description length, another's may be elegance, and a third person's may be "avoiding having too much info inside your system" (as some anti-MW people argue). I think discussions like "what's real" need to be done thoughtfully, otherwise people tend to argue past each other, and come off overconfident/ underinformed.
To be fair, I did use language like this so I shouldn't be talking -- but I used it tongue-in-cheek, and the real motivation given in the above is not "the DM is a more fundamental notion" but "DM lets y...
Yeah, this also bothered me. The notion of "probability distribution over quantum states" is not a good notion: the matrix I is both (|0\rangle \langle 0|+|1\rangle \langle 1|) and (|a\rangle \langle a|+|b\rangle \langle b|) for any other orthogonal basis. The fact that these should be treated equivalently seems totally arbitrary. The point is that density matrix mechanics is the notion of probability for quantum states, and can be formalized as such (dynamics of informational lower bounds given observations). I was sort of getting at this with the long "explaining probability to an alien" footnote, but I don't think it landed (and I also don't have the right background to make it precise)
When you say there's "no such thing as a state," or "we live in a density matrix," these are statements about ontology: what exists, what's real, etc.
Density matrices use the extra representational power they have over states to encode a probability distribution over states. If we regard the probabilistic nature of measurements as something to be explained, putting the probability distribution directly into the thing we live in is what I mean by "explain with ontology."
Epistemology is about how we know stuff. If we start with a world that does not inherent...
I like this! Something I would add at some point before unitarity is that there is another type of universe that we almost inhabit, where your vectors of states have real positive coefficients that sum to 1, and your evolution matrices are Markovian (i.e., have positive coefficients and preserve the sum of coordinates). In a certain sense in such a universe it's weird to say "the universe is .3 of this particle being in state 1 and .7 of it being in state 2", but if we interpret this as a probability, we have lived experience of this.
Something that I like ...
I moved from math academia to full-time AI safety a year ago -- in this I'm in the same boat as Adam Shai, whose reflection post on the topic I recommend you read instead of this.
In making the decision, I went through a lot of thinking and (attempts at) learning about AI before that. A lot of my thinking had been about whether a pure math academic can make a positive difference in AI, and examples that I thought counterindicated this -- I finally decided this might be a good idea after talking to my sis...
Yeah I agree that it would be even more interesting to look at various complexity parameters. The inspiration here of course is physics: isolating a particle/effective particle (like a neutron in a nucleus) or an interaction between a fixed set of particles, by putting it in a regime where other interactions and groupings drop out. The goto for a physicist is temperature: you can isolate a neutron by putting the nucleus in a very high-temperature environment like a collider where the constituent baryons separate. This (as well as the behavior wrt generalit...
Thanks! Are you saying there is a better way to find citations than a random walk through the literature? :)
I didn't realize that the pictures above limit to literal pieces of sin and cos curves (and Lissajous curves more generally). I suspect this is a statement about the singular values of the "sum" matrix S of upper-triangular 1's?
The "developmental clock" observation is neat! Never heard of it before. Is it a qualitative "parametrization of progress" thing or are there phase transition phenomena that happen specifically around the midpoint?
Hmm, I'm not sure how what you're describing (learn on a bunch of examples of (query, well-thought-out guess)) is different from other forms of supervised learning.
Based on the paper Adam shared, it seems that part of the "amortizing" picture is that instead of simple supervised learning you look at examples of the form (context1, many examples from context1), (context2, many examples from context2), etc., in order to get good at quickly performing inference on new contexts.
It sounds like in the Paul Christiano example, you're assuming access to some inter...
Thanks! I spent a bit of time understanding the stochastic inverse paper, though haven't yet fully grokked it. My understanding here is that you're trying to learn the conditional probabilities in a Bayes net from samples. The "non-amortized" way to do this for them is to choose a (non-unique) maximal inverse factorization that satisfies some d-separation condition, then guess the conditional probabilities on the latent-generating process by just observing frequencies of conditional events -- but of course this is very inefficient, in particular because th...
FWIW, I like John's description above (and probably object much less than baseline to humorously confrontational language in research contexts :). I agree that for most math contexts, using the standard definitions with morphism sets and composition mappings is easier to prove things with, but I think the intuition described here is great and often in better agreement with how mathematicians intuit about category-theoretic constructions than the explicit formalism.
This phenomenon exists, but is strongly context-dependent. Areas of math adjacent to abstract algebra are actually extremely good at updating conceptualizations when new and better ones arrive. This is for a combination of two related reasons: first, abstract algebra is significantly concerned about finding "conceptual local optima" of ways of presenting standard formal constructions, and these are inherently stable and require changing infrequently; second, when a new and better formalism is found, it tends to be so powerfully useful that papers that use ...
This is very nice! So the way I understand what you linked is this: the class of perturbative expansions in the "Edgeworth expansion" picture I was distilling is that the order-d approximation for the probability distribution associated to the sum variable S_n above is where is the probability distribution associated with a Gaussian and is a polynomial in t and the perturbative parameter . The paper you linked says that a related natural thing to do is to take the Fourier transform, which will be th...
Thanks for asking! I said in a later shortform that I was trying to do too many things in this post, with only vague relationships between them, and I'm planning to split it into pieces in the future.
Your 1-3 are mostly correct. I'd comment as follows:
(and also kind of 3) That advice of using the tempered local Bayesian posterior (I like the term -- let's shorten it to TLBP) is mostly aimed at non-SLT researchers (but may apply also to some SLT experiments). The suggestion is simpler than to compute expectations. Rather, it's just to run a single experimen
Why you should try degrading NN behavior in experiments.
I got some feedback on the post I wrote yesterday that seems right. The post is trying to do too many things, and not properly explaining what it is doing, why this is reasonable, and how the different parts are related.
I want to try to fix this, since I think the main piece of advice in this post is important, but gets lost in all the mess.
This main point is:
...experimentalists should in many cases run an experiment on multiple neural nets with a variable complexity dial that allows some "natural" deg
Thanks for the context! I didn't follow this discourse very closely, but I think your "optimistic assumptions" post wasn't the main offender -- it's reasonable to say that "it's suspicious when people are bad at backchaining but think they're good at backchaining or their job depends on backchaining more than they are able to". I seem to remember reading some responses/ related posts that I had more issues with, where the takeaway was explicitly that "alignment researchers should try harder at backchaining and one-shotting baba-is-you-like problems because...
So the oscillating phase formula is about approximately integrating the function against various "priors" p(x) (or more generally any fixed function g), where f is a Lagrangian (think energy) and (\hbar) is a small parameter. It gives an asymptotic series in powers of . The key point is that (more or less) the kth perturbative term only depends on the kth-order power series expansion of f around the "stationary points" (i.e., saddlepoints, Jac(f) = 0) when f is imaginary, on the maxima of f when f is real, and there is a mixed form that depen...
Alignment is not all you need. But that doesn't mean you don't need alignment.
One of the fairytales I remember reading from my childhood is the "Three sillies". The story is about a farmer encountering three episodes of human silliness, but it's set in one more frame story of silliness: his wife is despondent because there is an axe hanging in their cottage, and she thinks that if they have a son, he will walk underneath the axe and it will fall on his head.
The frame story was much more memorable to me than any of the "body" stories, and I randomly remembe...
FYI I think by the time I wrote Optimistic Assumptions, Longterm Planning, and "Cope", I think I had updated on the things you criticize about it here (but, I had started writing it awhile ago from a different frame and there is something disjointed about it)
But, like, I did mean both halfs of this seriously:
...I think you should be scared about this, if you're the sort of theoretic researcher, who's trying to cut at the hardest parts of the alignment problem (whose feedback loops are weak or nonexistent)
I think you should be scared about this, if you'r
I'm not exactly sure about what you mean wrt "what you want" here. It is not the case that you can exactly reconstruct most probability distributions you'll encounter in real life from their moments/ cumulants (hence the expansion is perturbative, not exact).
But in the interpretability/ field-theoretic model of wide NN's point of view, this is what you want (specifically, the fourth-order correction)
Yes, I actually thought about this a bit. It is definitely the case that the LC (or RLCT) in the SLT context is also exactly a (singular) stationary phase expansion. Unfortunately, the Fourier transform of a random variable, including a higher-dimensional one, really does have an isolated nondegenerate maximum at 0 (unless the support of your random variable is contained in a union of linear subspaces, which is kinda boring/ reducible to simpler contexts). Maybe if you think about some kind of small perturbation of a lower-dimensional system, you can get s...
Thanks for writing this! I've participated in some similar conversations and on balance, think that working in a lab is probably net good for most people assuming you have a reasonable amount of intellectual freedom (I've been consistently impressed by some papers coming out of Anthropic).
Still, one point made by Kaarel in a recent conversation seemed like an important update against working in a lab (and working on "close-to-the-metal" interpretability in general). Namely, I tend to not buy arguments by MIRI-adjacent people that "if we share our AI insigh...
I haven't thought about this enough to have a very mature opinion. On one hand being more general means you're liable to goodheart more (i.e., with enough deeply general processing power, you understand that manipulating the market to start World War 3 will make your stock portfolio grow, so you act misaligned). On the other hand being less general means that AI's are more liable to "partially memorize" how to act aligned in familiar situations, and go off the rails when sufficiently out-of-distribution situations are encountered. I think this is related to the question of "how general are humans", and how stable are human values to being much more or much less general
Yep, have been recently posting shortforms (as per your recommendation), and totally with you on the "halfbaked-by-design" concept (if Cheeseboard can do it, it must be a good idea right? :)
I still don't agree that free energy is core here. I think that the relevant question, which can be formulated without free energy, is whether various "simplicity/generality" priors push towards or away from human values (and you can then specialize to questions of effective dimension/llc, deep vs. shallow networks, ICL vs. weight learning, generalized ood generalizatio...
On the surprising effectiveness of linear regression as a toy model of generalization.
Another shortform today (since Sunday is the day of rest). This time it's really a hot take: I'm not confident about the model described here being correct.
Neural networks aren't linear -- that's the whole point. They notice interesting, compositional, deep information about reality. So when people use linear regression as a qualitative comparison point for behaviors like generalization and learning, I tend to get suspicious. Nevertheless, the track record of linear regre...
Thanks! I definitely believe this, and I think we have a lot of evidence for this in both toy models and LLMs (I'm planning a couple of posts on this idea of "training stories"), and also theoretical reasons in some contexts. I'm not sure how easy it is to extend the specific approach used in the proof for parity to a general context. I think it inherently uses the fact of orthogonality of Fourier functions on boolean inputs, and understanding other ML algorithms in terms of nice orthogonal functions seems hard to do rigorously, unless you either make some...
I'm not sure I agree with this -- this seems like you're claiming that misalignment is likely to happen through random diffusion. But I think most worries about misalignment are more about correlated issues, where the training signal consistently disincentivizes being aligned in a subtle way (e.g. a stock trading algorithm manipulating the market unethically because the pressure of optimizing income at any cost diverges from the pressure of doing what its creators would want it to do). If diffusion were the issue, it would also affect humans and not be spe...
Thanks for this post. I would argue that part of an explanation here could also be economic: modernity brings specialization and a move from the artisan economy of objects as uncommon, expensive, multipurpose, and with a narrow user base (illuminated manuscripts, decorative furniture) to a more utilitarian and targeted economy. Early artisans need to compete for a small number of rich clients by being the most impressive, artistic, etc., whereas more modern suppliers follow more traditional laws of supply and demand and track more costs (cost-effectiveness... (read more)