This was basically my model since i first started paying attention to modern AI
Curious why did you think differently before ? :)
Back in '22, for example, it seemed like OpenAI was 12+ months ahead of its nearest competitor. It took a while for GPT-4 to be surpassed. I figured the lead in pretraining runs would narrow over time but that there'd always be some New Thing (e.g. long-horizon RL) and the leader would be 6mo ahead or so therefore, since that's how it was with LLM pretraining. But now we've seen the New Thing (indeed, it was long-horizon RL) and at least based on their public stuff it seems like the lead is smaller than that.
yeahh i'm afraid I have too many other obligations right now to give a elaboration that does it justice.
otoh i'm in the Bay and we should definitely catch up sometime!
Yes sorry Eli, I meant to write out a more fully fleshed out response but unfortunately it got stuck in drafts.
The tl;dr is that I feel this perspective is singling out Sam Altman as some uniquely machiavellian actor in a way I find naive /misleading and ultimately maybe unhelpful.
I think in general im skeptical of the intense focus on individuals & individual tech companies that LW/EA has develloped recently. Frankly, it feels more rooted in savannah-brained tribalism & human interest than a evenkeeled analysis of what factors are actually important, neglected and tractable.
Frankly, it feels more rooted in savannah-brained tribalism & human interest than a evenkeeled analysis of what factors are actually important, neglected and tractable.
Um, I'm not attempting to do cause prioritization or action-planning in the above comment. More like sense-making. Before I move on to the question of what should we do, I want to have an accurate model of the social dynamics in the space.
(That said, it doesn't seem a foregone conclusion that there are actionable things to do, that will come out of this analysis. If the above story is tr...
The idea I associate with scalable oversight is weaker models overseeing stronger models (probably) combined with safety-by-debate. Is that the same or different from " recursive techniques for reward generation" ?
Currently, this general class of ideas seems to me the most promising avenue for achieving alignment for vastly superhuman AI (' superintelligence' )..
I want to be able to describe agents that do not have (vNM, geometric, other) rational preferences because of incompleteness or inconsistency but self-modify to become so.
Eg. In vNM utility theory there is a fairly natural weakening one can do which is ask for a vNM-style representation theorem after dropping transitivity.
[ Incidentally, there is some interesting math here having to do with conservative vs nonconservative vector fields and potentials theory all the way to hodge theory. ]
does JB support this ?
Im confused since in vNM we start with a preference order over probability distributions. But in JB irs over propositions?
Is there a benchmark in which SAEs clearly, definitely outperform standard techniques?
this seems concerning. Can somebody ELI5 what's going on here?
this seems concerning.
I feel like my post appears overly dramatic; I'm not very surprised and don't consider this the strongest evidence against SAEs. It's an experiment I ran a while ago and it hasn't changed my (somewhat SAE-sceptic) stance much.
But this is me having seen a bunch of other weird SAE behaviours (pre-activation distributions are not the way you'd expect from the superposition hypothesis h/t @jake_mendel, if you feed SAE-reconstructed activations back into the encoder the SAE goes nuts, stuff mentioned in recent Apollo papers, ...).
Reasons t...
Mmmmm
Inconsistent and incomplete preferences are necessary for descriptive agent foundations.
In vNM preference theory an inconsistent preference can be described as cyclic preferences that can be moneypumped.
How to see this in JB ?
Happy to see this.
I have some very basic questions:
How can I see inconsistent preferences within the Jeffrey Bolker framework? What about incomplete preferences ?
Is there any relation you can smimagube with imprecise probability / infraprobability, i.e. knightian uncertainty ?
I'm actually curious about a related problem.
One of the big surprises of the deep learning revolution has been the universality of gradient descent optimization.
How large is the class of optimization problems that we can transform into a gradient descent problem of some kind? My suspicision is that it's a very large class; perhaps there is even a general way to transform any problem into a gradient descent optimization problem?
The natural thing that comes to mind is to consider gradient descent of Langrangian energy functionals in (optimal) control theory.
Can somebody ELI5 how much I should update on the recent SAE = dead salmon news?
On priors I would expect the SAE bear news to be overblown. 50% of mechinterp is SAEs - a priori, it seems unlikely to me that so many talented people went astray. But I'm an outsider and curious about alternate views.
God is live and we have birthed him.
It's still wild to me that highly cited papers in this space can make such elementary errors.
Thank you for writing this post Dmitry. I've only skimmed the post but clearly it merits a deeper dive.
I will now describe a powerful, central circle of ideas I've been obsessed with past year that I suspect is very close to the way you are thinking.
Free energy functionals
There is a very powerful, very central idea whose simplicity is somehow lost in physics obscurantism which I will call for lack of a better word ' tempered free energy functionals'.
Let us be given a loss function $L$ [physicists will prefer to think of this as an energy ...
Like David Holmes I am not an expert in tropical geometry so I can't give the best case for why tropical geometry may be useful. Only a real expert putting in serious effort can make that case.
Let me nevertheless respond to some of your claims.
>> Tropical geometry is an interesting, mysterious and reasonable field in mathematics, used for systematically analyzing the asymptotic and "boundary" geometry of polynomial functions and solution sets in high-dimensional spaces, and related combinatorics (it's actually closely related to my graduate work and some logarithmic algebraic geometry work I did afterwards). It sometimes extends to other interesting asymptotic behaviors (like trees of genetic relatedness). The idea of applying this to partially linear functions appearing in ML is about as ...
You are probably aware of this but there is indeed a mathematical theory of degeneracy/ multiplicity in which multiplicity/degeneracy in the parameter-function map of neural networks is key to their simplicity bias. This is singular learning theory.
The connection between degeneracy [SLT] and simplicity [algorithmic information theory] is surprisingly, delightfully simple. It's given by the padding/deadcode argument.
Me.
Beautifully argued, Dmitry. Couldn't agree more.
I would also note that I consider the second problem of interpretability basically the central problem of complex systems theory.
I consider the first problem a special case of the central probem of alignment. It's very closely related to the 'no free lunch' problem.
Thanks.
Well 2-3 shitposters and one gwern.
Who would be so foolish to short gwern? Gwern the farsighted, gwern the prophet, gwern for whom entropy is nought, gwern augurious augustus
Thanks for the sleuthing.
The thing is - last time I heard about OpenAI rumors it was Strawberry.
The unfortunate fact of life is that too many times OpenAI shipping has surpassed all but the wildest speculations.
Yes, this should be an option in the form.
Does clicking on HERE work for you?
Fair enough.
Thanks for reminding me about V-information. I am not sure how much I like this particular definition yet - but this direction of inquiry seems very important imho.
Those people will probably not see this so wont reply.
What I can tell you is that in the last three months I went through a phase transition in my AI use and I regret not doing this ~1 year earlier.
It's not that I didnt use AI daily before for mundane tasks or writing emails, it's not that I didnt try a couple times to get it to solve my thesis problem (it doesn't get it) - it's that I failed to refrain my thinking from asking "can AI do X?" to "how can I reengineer and refactor my own workflow, even the questions I am working on so as to maximally leverage AI?"
Hope this will be answered in a later post, but why should I care about the permanent for alignment ?
skills issue.
prep for the model that is coming tomorrow not the model of today
Mmm. You are entering the Cyborg Era. The only ideas you may take to the next epoch are those that can be uploaded to the machine intelligence.
Are there any plans to have writtten materials in parallel ?
meta note that I would currently recommend against spending much time with Watanabe's original texts for most people interested in SLT. Good to be aware of the overall outlines but much of what most people would want to know is better explained elsewhere [e.g. I would recommend first reading most posts with the SLT tag on LessWrong before doing a deep dive in Watanabe]
meta note *
if you do insist on reading Watanabe, I highly recommend you make use of AI assistance. I.e. download a pdf, cut down them down into chapters and upload to your favorite LLM.
John, you know much coding theory much better than I do so I am inclinced to defer to your superior knowledge.
Now behold the awesome power of gpt-Pro
...Let’s unpack the question in pieces:
1. Is ZIP (a.k.a. DEFLATE) “locally decodable” or not?
- Standard ZIP files are typically not “locally decodable” in the strictest sense—i.e., you cannot start decoding exactly at the byte corresponding to your region of interest and reconstruct just that portion without doing some earlier decoding.
The underlying method, DEFLATE, is indeed based on LZ77 plus Huffman coding. LZ7
In AI alignment and interpretability research, there's a compelling intuition that understanding equals compression. The idea is straightforward: if you truly understand a system, you can describe it more concisely by leveraging that understanding. This philosophy suggests that better interpretability techniques for neural networks should yield better compression of their behavior or parameters.
jake_mendel asks: if understanding equals compression, then shouldn't ZIP compression of neural network weights co...
I don't remember the details, but IIRC ZIP is mostly based on Lempel-Ziv, and it's fairly straightforward to modify Lempel-Ziv to allow for efficient local decoding.
My guess would be that the large majority of the compression achieved by ZIP on NN weights is because the NN weights are mostly-roughly-standard-normal, and IEEE floats are not very efficient for standard normal variables. So ZIP achieves high compression for "kinda boring reasons", in the sense that we already knew all about that compressibillity but just don't leverage it in day-to-day operations because our float arithmetic hardware uses IEEE.
Using ZIP as compression metric for NNs (I assume you do something along the lines of "take all the weights and line them up and then ZIP") is unintuitive to me for the following reason:
ZIP, though really this should apply to any other coding scheme that just tries to compress the weights by themselves, picks up on statistical patterns in the raw weights. But NNs are not just simply a list of floats, they are arranged in highly structured manner. The weights themselves get turned into functions and it is 1.the functions, and 2. the way the functions ...
Loving this!
...But one thing this model likely predicts is that a better model for a NN than a single linear regression model is a collection of qualitatively different linear regression models at different levels of granularity. In other words, depending on how sloppily you chop your data manifold up into feature subspaces, and how strongly you use the "locality" magnifying glass on each subspace, you'll get a collection of different linear regression behaviors; you then predict that at every level of granularity, you will observe some combination of
People are not thinking clearly about AI-accelerated AI research. This comment by Thane Ruthenis is worth amplifying.
...I'm very skeptical of AI being on the brink of dramatically accelerating AI R&D.
My current model is that ML experiments are bottlenecked not on software-engineer hours, but on compute. See Ilya Sutskever's claim here:
95% of progress comes from the ability to run big experiments quickly. The utility of running many experiments is much less useful.
What actually matters for ML-style progress is picking the correct trick, and then appl
My current model is that ML experiments are bottlenecked not on software-engineer hours, but on compute. See Ilya Sutskever's claim here
That claim is from 2017. Does Ilya even still endorse it?
For what it's worth I do think observers that observe themselves to be highly unique in important axes rationally should increase their credence in simulation hypotheses.
I probably shouldnt have used the free energy terminology. Does complexity accuracy tradeoff work better ?
To be clear, I very much dont mean these things as a metaphor. I am thinking there may be an actual numerical complexity - accuracy that is some elaboration of Watanabe s "free energy" formula that actually describes these tendencies.
Sorry these words are not super meaningful to me. Would you be able to translate this from physics speak ?
Isn't it the other way around ?
If inner alignment is hard then general is bad because applying less selection pressure, i.e. more generally, more simplicity prior, means more daemons/gremlins
Let's focus on inner alignment. By instill you presumably mean train. What values get trained is ultimately a learning problem which in many cases (as long as one can formulate approximately a boltzmann distribution) comes down to a simplicity-accuracy tradeoff.
I guess im mostly thinking about the regime where AIs are more capable and general than humans.
It seems at first glance that the latter failure mode is more of a capability failure. Something one would expect to go away as AI truly surpasses humans. It doesnt seem core to the alignment problem to me.
I'd be curious how you would describe the core problem of alignment.
Could you give some examples of what you are thinking of here ?
The free energy talk probably confuses more than that it elucidates. Im not talking about random diffusion per se but connection between uniformly sampling and simplicity and simplicity-accuracy tradeoff.
Ive tried explaining more carefully where my thinking is currently at in my reply to lucius.
Also caveat that shortforms are halfbaked-by-design.
I'm not following exactly what you are saying here so I might be collapsing some subtle point. Let me preface that this is a shortform so half-baked by design so you might be completely right it's confused.
Let me try and explain myself again.
I probably have confused readers by using the free energy terminology. What I mean is that in many cases (perhaps all) the probabilistic outcome of any process can be described in terms of a competition of between simplicity (entropy) and accuracy (energy) to some loss function.
Indeed, the simplest fit for a training s...
I use LLMs throughout my personal and professional life. The productivity gains are immense. Yes hallucination is a problem but it's just as spam/ads/misinformation on wikipedia/internet - an small drawback that doesn't oblivate the ginormous potential of the internet/LLMs
I am 95% certain you are leaving value on the table.
I do agree straight LLMs are not generally intelligent (in the sense of universal intelligence/AIXI) and therefore not completely comparable to humans.