All of Alexander Gietelink Oldenziel's Comments + Replies

I use LLMs throughout my personal and professional life. The productivity gains are immense. Yes hallucination is a problem but it's just as spam/ads/misinformation on wikipedia/internet - an small drawback that doesn't oblivate the ginormous potential of the internet/LLMs

I am 95% certain you are leaving value on the table. 

I do agree straight LLMs are not generally intelligent (in the sense of universal intelligence/AIXI) and therefore not completely comparable to humans. 

2ZY
On LLMs vs search on internet: agree that LLMs are very helpful in many ways, both personally and professionally, but the worse parts of the misinformation in LLM comparing to wikipedia/internets in my opinion includes: 1) it is relatively more unpredictable when the model will hallucinate, whereas for wikipedia/internet, you would generally expect higher accuracy for simpler/purely factual/mathematical information. 2) it is harder to judge the credibility without knowing the source of the information, whereas on the internet, we could get some signals where the website domain, etc.

This was basically my model since i first started paying attention to modern AI

 Curious why did you think differently before ? :)

Back in '22, for example, it seemed like OpenAI was 12+ months ahead of its nearest competitor. It took a while for GPT-4 to be surpassed. I figured the lead in pretraining runs would narrow over time but that there'd always be some New Thing (e.g. long-horizon RL) and the leader would be 6mo ahead or so therefore, since that's how it was with LLM pretraining. But now we've seen the New Thing (indeed, it was long-horizon RL) and at least based on their public stuff it seems like the lead is smaller than that. 

yeahh i'm afraid I have too many other obligations right now to give a elaboration that does it justice. 

otoh i'm in the Bay and we should definitely catch up sometime!

2Eli Tyre
Fair enough! Sounds good.

Yes sorry Eli, I meant to write out a more fully fleshed out response but unfortunately it got stuck in drafts.

The tl;dr is that I feel this perspective is singling out Sam Altman as some uniquely machiavellian actor in a way I find naive /misleading and ultimately maybe unhelpful. 

I think in general im skeptical of the intense focus on individuals & individual tech companies that LW/EA has develloped recently. Frankly, it feels more rooted in savannah-brained tribalism & human interest than a evenkeeled analysis of what factors are actually important, neglected and tractable. 

Eli Tyre110

Frankly, it feels more rooted in savannah-brained tribalism & human interest than a evenkeeled analysis of what factors are actually important, neglected and tractable.


Um, I'm not attempting to do cause prioritization or action-planning in the above comment. More like sense-making. Before I move on to the question of what should we do, I want to have an accurate model of the social dynamics in the space.

(That said, it doesn't seem a foregone conclusion that there are actionable things to do, that will come out of this analysis. If the above story is tr... (read more)

4Noosphere89
I definitely understand the skepticism of intense focus on individuals/individual tech companies, but also, these are the groups trying to build the most consequential technology in all of history, so it's natural that tech companies get the focus here.
8Adam Scholl
I haven't perceived the degree of focus as intense, and if I had I might be tempted to level similar criticism. But I think current people/companies do clearly matter some, so warrant some focus. For example: * I think it's plausible that governments will be inclined to regulate AI companies more like "tech startups" than "private citizens building WMDs," the more those companies strike them as "responsible," earnestly trying their best, etc. In which case, it seems plausibly helpful to propagate information about how hard they are in fact trying, and how good their best is. * So far, I think many researchers who care non-trivially about alignment—and who might have been capable of helping, in nearby worlds—have for similar reasons been persuaded to join whatever AI company currently has the most safetywashed brand instead. This used to be OpenAI, is now Anthropic, and may be some other company in the future, but it seems useful to me to discuss the details of current examples regardless, in the hope that e.g. alignment discourse becomes better calibrated about how much to expect such hopes will yield. * There may exist some worlds where it's possible to get alignment right, yet also possible not to, depending on the choices of the people involved. For example, you might imagine that good enough solutions—with low enough alignment taxes—do eventually exist, but that not all AI companies would even take the time to implement those. * Alternatively, you might imagine that some people who come to control powerful AI truly don't care whether humanity survives, or are even explicitly trying to destroy it. I think such people are fairly common—both in the general population (relevant if e.g. powerful AI is open sourced), and also among folks currently involved with AI (e.g. Sutton, Page, Schmidhuber). Which seems useful to discuss, since e.g. one constraint on our survival is that those who actively wish to kill everyone somehow remain unable to do so.

The idea I associate with scalable oversight is weaker models overseeing stronger models (probably) combined with safety-by-debate.  Is that the same or different from " recursive techniques for reward generation" ?

Currently, this general class of ideas seems to me the most promising avenue for achieving alignment for vastly superhuman AI (' superintelligence' )..

I want to be able to describe agents that do not have (vNM, geometric, other) rational preferences because of incompleteness or inconsistency but self-modify to become so. 

Eg. In vNM utility theory there is a fairly natural weakening one can do which is ask for a vNM-style representation theorem after dropping transitivity.

[ Incidentally, there is some interesting math here having to do with conservative vs nonconservative vector fields and potentials theory all the way to hodge theory. ]

does JB support this ? 

Im confused since in vNM we start with a preference order over probability distributions. But in JB irs over propositions?

Is there a benchmark in which SAEs clearly, definitely outperform standard techniques?

this seems concerning. Can somebody ELI5 what's going on here?

StefanHex122

this seems concerning.

I feel like my post appears overly dramatic; I'm not very surprised and don't consider this the strongest evidence against SAEs. It's an experiment I ran a while ago and it hasn't changed my (somewhat SAE-sceptic) stance much.

But this is me having seen a bunch of other weird SAE behaviours (pre-activation distributions are not the way you'd expect from the superposition hypothesis h/t @jake_mendel, if you feed SAE-reconstructed activations back into the encoder the SAE goes nuts, stuff mentioned in recent Apollo papers, ...).


Reasons t... (read more)

Mmmmm

Inconsistent and incomplete preferences are necessary for descriptive agent foundations. 

In vNM preference theory an inconsistent preference can be described as cyclic preferences that can be moneypumped. 

How to see this in JB ?

2Daniel Herrmann
Ah, so not like, A is strongly preferred to B and B is strongly preferred to A, but more of a violation of transitivity. Then I still think that the Broome paper is a place I'd look at, since you get that exact kind of structure in preference aggregation.  The Bradley paper assumes everything is transitive throughout, so I don't think you get the kind of structure you want there. I'm not immediately aware of any work of that kind of inconsistency in JB that isn't in the social choice context, but there might be some. I'll take a look.  There are ways to think about degrees and measures of incoherence, and how that connects up to decision making. I'm thinking mainly of this paper by Schervish, Seidenfeld, and Kadane, Measures of Incoherence: How Not to Gamble if You Must. There might a JB-style version of that kind of work, and if there isn't, I think it would be good to have one.  But to your core goal or weakening the preference axioms to more realistic standards, you can definitely do that in JB by weakening the preference axioms, but still keeping the background objects of preference be propositions in a single algebra. I think this would still preserve many of what I consider the naturalistic advantages of the JB system. For modifying the preference axioms, I would guess descriptively you might want something like prospect theory, or something else along those broad lines. Also depends on what kinds of agents we want to describe.   

Is Tesla currently overvalued ?

P/e ratio is 188. The ceo has made himself deeply unpopular with many potential customers. Latest sales figures don't look good. Chinese competitors sell more total cars and seem to have caught up in terms of tech.

1transhumanist_atom_understander
Depends entirely on Cybercab. A driverless car can be made cheaper for a variety of reasons. If the self-driving tech actually works, and if it's widely legal, and if Tesla can mass produce it at a low price, then they can justify that valuation. Cybercab is a potential solution to the problem that they need to introduce a low priced car to get their sales growing again but cheap electric cars is a competitive market now without much profit margin. But there's a lot of ifs.

Happy to see this. 

I have some very basic questions:

How can I see inconsistent preferences within the Jeffrey Bolker framework? What about incomplete preferences ?

Is there any relation you can smimagube with imprecise probability / infraprobability, i.e. knightian uncertainty ?

7Daniel Herrmann
The JB framework as standardly formulated assumes complete and consistent preferences. Of course, you can keep the same JB-style objects of preference (the propositions) and change modify the preference axioms. For incomplete preferences, there's a nice paper by Richard Bradley, Revising Incomplete Attitudes, that looks at incomplete attitudes in a very Jeffrey-Bolker style framework (all prospects are propositions). It has a nice discussion of different things that might lead to incompleteness (one of which is "Ignorance", related to the kind of Knightian uncertainty you asked about), and also some results and perspectives on attitude changes for imprecise Bayesian agents. I'm less sure about inconsistent preferences - it depends what exactly you mean by that. Something related might be work on aggregating preferences, which can involve aggregating preferences that disagree and so look inconsistent. John Broome's paper Bolker-Jeffrey Expected Utility Theory and Axiomatic Utilitarianism is excellent on this - it examines both the technical foundations of JB and its connections to social choice and utilitarianism, proving a version of the Harsanyi Utilitarian Theorem in JB. On imprecise probabilities: the JB framework actually has a built-in form of imprecision. Without additional constraints, the representation theorem gives non-unique probabilities (this is part of Bolker's uniqueness theorem). You can get uniqueness by adding extra conditions, like unbounded utility or primitive comparative probability judgments, but the basic framework allows for some probability imprecision. I'm not sure about deeper connections to infraprobability/Bayesianism, but given that these approaches often involve sets of probabilities, there may be interesting connections to explore.  
Answer by Alexander Gietelink Oldenziel92

I'm actually curious about a related problem. 

One of the big surprises of the deep learning revolution has been the universality of gradient descent optimization. 

How large is the class of optimization problems that we can transform into a gradient descent problem of some kind? My suspicision is that it's a very large class; perhaps there is even a general way to transform any problem into a gradient descent optimization problem? 

The natural thing that comes to mind is to consider gradient descent of Langrangian energy functionals in (optimal) control theory. 

1robo
I'll conjecture the following in a VERY SPECULATIVE, inflammatory, riff-on-vibes statements: * Gradient descent solves problem in the complexity class P[1].  It is P-Complete. * Learning theory (and complexity theory) have for decades been pushing two analogous bad narratives about the weakness of gradient descent (and P). * These narratives dominate because it is easy prove impossibility results like "Problem X can't be solved by gradient descent" (or "Problem Y is NP-Hard").  It's academically fecund -- it's a subject aspiring academics can write a lot of papers about.  Results about what gradient descent (and polynomial time) can't do compose a fair portion of the academic canon * In practice, these impossible results are corner cases cases don't actually come up.  The "vibes" of these impossibility results run counter to the "vibes" of reality * Example, gradient descent solves most problems, even though it theoretically it gets trapped in local minima.  (SAT is in practice fast to solve, even though in theory it's theoretical computer science's canonical Hard-Problem-You-Say-Is-Impossible-To-Solve-Quickly) * The vibe of reality is "local (greedy) algorithms usually work" 1. ^ Stoner-vibes based reason: I'm guessing you can reduce a problem like Horn Satisfiability[2] to gradient descent.  Horn Satisfiability is a P-compete problem -- you can transform any polynomial-time decision problem in a Horn Satisfiability problem using a log-space transformation.  Therefore, gradient descent is "at least as big as P" (P-hard).  And I'm guessing you can your formalization of gradient descent in P as well (hence "P-Complete").  That would mean gradient descent is not be able to solve harder problems in e.g. NP unless P=NP 2. ^ Horn Satisfiability is about finding true/false values that satisfy a bunch of logic clauses of the form a∧c∧d→e. or  b∧e→⊥ (that second clause means "don't set both b and e to true -- at least one of them has to be f

Can somebody ELI5 how much I should update on the recent SAE = dead salmon news?

On priors I would expect the SAE bear news to be overblown. 50% of mechinterp is SAEs - a priori, it seems unlikely to me that so many talented people went astray. But I'm an outsider and curious about alternate views. 

5Logan Riggs
Well, maybe we did go astray, but it's not for any reasons mentioned in this paper! SAEs were trained on random weights since Anthropic's first SAE paper in 2023: In my first SAE feature post, I show a clearly positional feature: which is not a feature you'll find in a SAE trained on a randomly intitialized transformer.  The reason the auto-interp metric is similar is likely due to the fact that SAEs on random weights still have single-token features (ie activate on one token). Single-token features are the easiest feature to auto-interp since the hypothesis is "activates on this token" which is easy to predict for an LLM. When you look at their appendix at their sampled features for the random features, all three are single token features.  However, I do want to clarify that their paper is still novel (they did random weights and controls over many layers in Pythia 410M) and did many other experiments in their paper: it's a valid contribution to the field, imo.  Also to clarify that SAEs aren't perfect, and there's a recent paper on it (which I don't think captures all the problems), and  I'm really glad Apollo's diversified away from SAE's by pursuing their weight-based interp approach (which I think is currently underrated karma-wise by 3x). 
2Vaniver
I haven't thought deeply about this specific case, but I think you should consider this like any other ablation study--like, what happens if you replace the SAE with a linear probe?
6Noosphere89
I agree with Leo Gao here: https://x.com/nabla_theta/status/1885846403785912769
7Lucius Bushnaq
I have not updated on these results much so far. Though I haven't looked at them in detail yet. My guess is that if you already had a view of SAE-style interpretability somewhat similar to mine [1,2], these papers shouldn't be much of an additional update for you.
3Daniel Tan
Specifically re: “SAEs can interpret random transformers” Based on reading replies from Adam Karvonen, Sam Marks, and other interp people on Twitter: the results are valid, but can be partially explained by the auto-interp pipeline used. See his reply here: https://x.com/a_karvonen/status/1886209658026676560?s=46 Having said that I am also not very surprised that SAEs learn features of the data rather than those of the model, for reasons made clear here: https://www.lesswrong.com/posts/gYfpPbww3wQRaxAFD/activation-space-interpretability-may-be-doomed
7Mateusz Bagiński
(Context: https://x.com/davidad/status/1885812088880148905 , i.e. some papers just got published that strongly question whether SAEs learn anything meaningful, just like the dead salmon study questioned the value of much of fMRI research.)

God is live and we have birthed him. 

It's still wild to me that highly cited papers in this space can make such elementary errors. 

Thank you for writing this post Dmitry. I've only skimmed the post but clearly it merits a deeper dive. 

I will now describe a powerful, central circle of ideas I've been obsessed with past year that I suspect is very close to the way you are thinking. 

Free energy functionals

There is a very powerful, very central idea whose simplicity is somehow lost in physics obscurantism which I will call for lack of a better word ' tempered free energy functionals'. 

Let us be given a loss function $L$ [physicists will prefer to think of this as an energy ... (read more)

2Dmitry Vaintrob
Thanks! Yes the temperature picture is the direction I'm going in. I had heard the term "rate distortion", but didn't realize the connection with this picture. Might have to change the language for my next post

Like David Holmes I am not an expert in tropical geometry so I can't give the best case for why tropical geometry may be useful. Only a real expert putting in serious effort can make that case. 

Let me nevertheless respond to some of your claims. 

  • PL functions are quite natural for many reasons. They are simple. They naturally appear as minimizers of various optimization procedures, see e.g. the discussion in section 5 here.
  • Polynomials don't satisfy the padding argument and architectures based on them therefore will typically fail to have the corr
... (read more)

>> Tropical geometry is an interesting, mysterious and reasonable field in mathematics, used for systematically analyzing the asymptotic and "boundary" geometry of polynomial functions and solution sets in high-dimensional spaces, and related combinatorics (it's actually closely related to my graduate work and some logarithmic algebraic geometry work I did afterwards). It sometimes extends to other interesting asymptotic behaviors (like trees of genetic relatedness). The idea of applying this to partially linear functions appearing in ML is about as ... (read more)

You are probably aware of this but there is indeed a mathematical theory of degeneracy/ multiplicity in which multiplicity/degeneracy in the parameter-function map of neural networks is key to their simplicity bias. This is singular learning theory. 

The connection between degeneracy [SLT] and simplicity [algorithmic information theory]  is surprisingly, delightfully simple. It's given by the padding/deadcode argument. 

2TsviBT
In what sense were you lifeist and now deathist? Why the change?

Beautifully argued, Dmitry. Couldn't agree more. 

I would also note that I consider the second problem of interpretability basically the central problem of complex systems theory. 

I consider the first problem a special case of the central probem of alignment. It's very closely related to the 'no free lunch'  problem. 

Thanks. 

Well 2-3 shitposters and one gwern. 

Who would be so foolish to short gwern? Gwern the farsighted, gwern the prophet, gwern for whom entropy is nought, gwern augurious augustus

Thanks for the sleuthing.

 

The thing is - last time I heard about OpenAI rumors it was Strawberry. 

The unfortunate fact of life is that too many times OpenAI shipping has surpassed all but the wildest speculations.

7Thane Ruthenis
That was part of my reasoning as well, why I thought it might be worth engaging with! But I don't think this is the same case. Strawberry/Q* was being leaked-about from more reputable sources, and it was concurrent with dramatic events (the coup) that were definitely happening. In this case, all evidence we have is these 2-3 accounts shitposting.

Yes, this should be an option in the form.

2DanielFilan
Clicking on the word "Here" in the post works.
1Cole Wyeth
Great, looking forward to it, thanks for putting this on.
2abramdemski
I'm still quite curious what you have found useful and how you've refactored your workflow to leverage AI more (such that you wish you did it a year ago). I do use Perplexity, exa.ai and elicit as parts of my search strategy. 

Thanks for reminding me about V-information. I am not sure how much I like this particular definition yet - but this direction of inquiry seems very important imho.

1Dalcy
I like the definition, it's the minimum expected code length for a distribution under constraints on the code (namely, constraints on the kind of beliefs you're allowed to have - after having that belief, the optimal code is as always the negative log prob). Also the examples in Proposition 1 were pretty cool in that it gave new characterizations of some well-known quantities - log determinant of the covariance matrix does indeed intuitively measure the uncertainty of a random variable, but it is very cool to see that it in fact has entropy interpretations! It's kinda sad because after a brief search it seems like none of the original authors are interested in extending this framework.

Those people will probably not see this so wont reply.

What I can tell you is that in the last three months I went through a phase transition in my AI use and I regret not doing this ~1 year earlier. 

It's not that I didnt use AI daily before for mundane tasks or writing emails, it's not that I didnt try a couple times to get it to solve my thesis problem (it doesn't get it) - it's that I failed to refrain my thinking from asking "can AI do X?" to "how can I reengineer and refactor my own workflow, even the questions I am working on so as to maximally leverage AI?"

9abramdemski
About 6 months ago you strongly recommended that I make use of the integrated AI plugin for Overleaf (Writefull). I did try it. Its recommended edits seem quite useless to me; they always seem to flow from a desire to make the wording more normal/standard/expected in contrast to more correct (which makes some sense given the way generatie pre-training works). This is obviously useful to people with worse English, but for me, the tails come apart massively between "better" and "more normal/standard/expected", such that all the AI suggestions are either worse or totally neutral rephrasing. It also was surprisingly bad at helping me write LaTeX; I had a much better time asking Claude instead. I haven't found AI at all useful for writing emails, because the AI doesn't know what I want to say, and taking the time to tell the AI isn't any easier than writing it myself. AI can only help me write the boring boilerplate stuff that email recipients would skim over anyway (which I don't want to add to my emails). AI can't help me get info out of my head this way -- it can only help me in so far as emails have a lot of low-entropy cruft. I can see how this could be useful for someone who has to write a lot of low-entropy emails, but I'm not in that situation. To some degree this could be mitigated if the LLMs had a ton of context (EG recording everything that happens on my computer), but again, only the more boring cases I think. I'd love to restore the Abram2010 ability to crank out several multi-page emails a day on intellectual topics, but I don't think AI is helpful towards that end yet. I haven't tried fine-tuning on my own writing, however. (I haven't tried fine-tuning at all.) Similarly, LLMs can be very useful for well-established mathematics which had many examples in the training data, but get worse the more esoteric the mathematics becomes. The moment I ask for something innovative, the math becomes phony. Across the board, LLMs seem very useful for helping peop
2Garrett Baker
@abramdemski I think I'm the biggest agree vote for alexander (without me alexander would have -2 agree), and I do see this because I follow both of you on my subscribe tab.  I basically endorse Alexander's elaboration.  On the "prep for the model that is coming tomorrow not the model of today" front, I will say that LLMs are not always going to be as dumb as they are today. Even if you can't get them to understand or help with your work now, their rate of learning still makes them in some sense your most promising mentee, and that means trying to get as much of the tacit knowledge you have into their training data as possible (if you want them to be able to more easily & sooner build on your work). Or (if you don't want to do that for whatever reason) just generally not being caught flat-footed once they are smart enough to help you, as all your ideas are in videos or otherwise in high context understandable-only-to-abram notes. In the words of gwern, 

Hope this will be answered in a later post, but why should I care about the permanent for alignment ?

3Dmitry Vaintrob
The elves care, Alex. The elves care.

skills issue. 

 prep for the model that is coming tomorrow not the model of today

4abramdemski
I'm seeing some agreement-upvotes of Alexander here so I am curious for people to explain the skill issue I am having.
4abramdemski
Don't get me wrong, I've kept trying and plan to keep trying.

Mmm. You are entering the Cyborg Era. The only ideas you may take to the next epoch are those that can be uploaded to the machine intelligence. 

4abramdemski
Entering, but not entered. The machines do not yet understand the prompts I write them. (Seriously, it's total garbage still, even with lots of high quality background material in the context.)

Are there any plans to have writtten materials in parallel ?

2abramdemski
I'm hopeful about it, but preparing the lectures alone will be a lot of work (although the first one will be a repeat of some material presented at ILIAD).

meta note that I would currently recommend against spending much time with Watanabe's original texts for most people interested in SLT. Good to be aware of the overall outlines but much of what most people would want to know is better explained elsewhere [e.g. I would recommend first reading most posts with the SLT tag on LessWrong before doing a deep dive in Watanabe] 

meta note * 

if you do insist on reading Watanabe, I highly recommend you make use of AI assistance. I.e. download a pdf, cut down them down into chapters and upload to your favorite LLM. 

2Alex_Altair
Indeed, we know about those posts! Lmk if you have a recommendation for a better textbook-level treatment of any of it (modern papers etc). So far the grey book feels pretty standard in terms of pedagogical quality.

John, you know much coding theory much better than I do so I am inclinced to defer to your superior knowledge.

Now behold the awesome power of gpt-Pro

Let’s unpack the question in pieces:

1. Is ZIP (a.k.a. DEFLATE) “locally decodable” or not?

  • Standard ZIP files are typically not “locally decodable” in the strictest sense—i.e., you cannot start decoding exactly at the byte corresponding to your region of interest and reconstruct just that portion without doing some earlier decoding.
  • The underlying method, DEFLATE, is indeed based on LZ77 plus Huffman coding. LZ7

... (read more)

You May Want to Know About Locally Decodable Codes

In AI alignment and interpretability research, there's a compelling intuition that understanding equals compression. The idea is straightforward: if you truly understand a system, you can describe it more concisely by leveraging that understanding. This philosophy suggests that better interpretability techniques for neural networks should yield better compression of their behavior or parameters.

jake_mendel asks: if understanding equals compression, then shouldn't ZIP compression of neural network weights co... (read more)

3Lucius Bushnaq
Hm, feels off to me. What privileges the original representation of the uncompressed file as the space in which locality matters? I can buy the idea that understanding is somehow related to a description that can separate the whole into parts, but why do the boundaries of those parts have to live in the representation of the file I'm handed? Why can't my explanation have parts in some abstract space instead? Lots of explanations of phenomena seem to work like that.
1Mo Putera
Maybe it's more correct to say that understanding requires specifically compositional compression, which maintains an interface-based structure hence allowing us to reason about parts without decompressing the whole, as well as maintaining roughly constant complexity as systems scale, which parallels local decodability. ZIP achieves high compression but loses compositionality. 
1CstineSublime
Wouldn't the insight into understanding be in the encoding, particularly how the encoder discriminates between what is necessary to 'understand' a particular function of a system and what is not salient? (And if I may speculate wildly, in organisms may be correlative to dopamine in the Nucleus Accumbens. Maybe.) All mental models of the world are inherently lossy, this is the map-territory analogy in a nutshell (itself - a lossy model). The effectiveness or usefulness of a representation determines the level of 'understanding' this is entirely dependent on the apparent salience at the time of encoding which determines what elements are given higher fidelity in encoding, and which are more lossy. Perhaps this example will stretch the use of 'understanding' but consider a fairly crowded room at a conference where there is a lot of different conversations and dialogue - I see a friend gesticulating at me on the far side of the room. Once they realize I've made eye contact they start pointing surreptitiously to their left - so I look immediately to their left (my right) and see five different people and a strange painting on the wall - all possible candidates for what they are pointing at, perhaps it's the entire circle of people. Now I'm not sure at this point that the entire 'message' - message here being all the possible candidates for what my friend is pointing at - has been 'encoded' such that LDC could be used to single out (decode) the true subject. Or is it? In this example, I would have failed to reach 'understanding' of their pointing gesture (although I did understand their previous attempt to get my attention).  Now, suppose, my friend was pointing not to the five people or to the painting at all - but something or sixth someone further on: a distinguished colleague is drunk let's say - but I hadn't noticed. If I had of seen that colleague, I would have understood my friend's pointing gesture. This goes beyond LDC because you can't retrieve a local code o
8Adam Shai
This sounds right to me, but importantly it also matters what you are trying to understand (and thus compress). For AI safety, the thing we should be interested in is not the weights directly, but the behavior of the neural network. The behavior (the input-output mapping) is realized through a series of activations. Activations are realized through applying weights to inputs in particular ways. Weights are realized by setting up an optimization problem with a network architecture and training data. One could try compressing at any one of those levels, and of course they are all related, and in some sense if you know the earlier layer of abstraction you know the later one. But in another sense, they are fundamentally different, in exactly how quickly you can retrieve the specific piece of information, in this case the one we are interested in - which is the behavior. If I give you the training data, the network architecture, and the optimization algorithm, it still takes a lot of work to retrieve the behavior. Thus, the story you gave about how accessibility matters also explains layers of abstraction, and how they relate to understanding. Another example of this is a dynamical system. The differential equation governing it is quite compact: $\dot{x}=f(x)$. But the set of possible trajectories can be quite complicated to describe, and to get them one has to essentially do all the annoying work of integrating the equation! Note that this has implications for compositionality of the systems: While one can compose two differential equations by e.g. adding in some cross term, the behaviors (read: trajectores) of the composite system do not compose! and so one is forced to integrate a new system from scratch! Now, if we want to understand the behavior of the dynamical system, what should we be trying to compress? How would our understanding look different if we compress the governing equations vs. the trajectories?

I don't remember the details, but IIRC ZIP is mostly based on Lempel-Ziv, and it's fairly straightforward to modify Lempel-Ziv to allow for efficient local decoding.

My guess would be that the large majority of the compression achieved by ZIP on NN weights is because the NN weights are mostly-roughly-standard-normal, and IEEE floats are not very efficient for standard normal variables. So ZIP achieves high compression for "kinda boring reasons", in the sense that we already knew all about that compressibillity but just don't leverage it in day-to-day operations because our float arithmetic hardware uses IEEE.

Using ZIP as compression metric for NNs (I assume you do something along the lines of "take all the weights and line them up and then ZIP") is unintuitive to me for the following reason:
ZIP, though really this should apply to any other coding scheme that just tries to compress the weights by themselves, picks up on statistical patterns in the raw weights. But NNs are not just simply a list of floats, they are arranged in highly structured manner. The weights themselves get turned into functions and it is 1.the  functions, and 2. the way the functions ... (read more)

5Matthias Dellago
Interesting! I think the problem is dense/compressed information can be represented in ways in which it is not easily retrievable for a certain decoder. The standard model written in Chinese is a very compressed representation of human knowledge of the universe and completely inscrutable to me. Or take some maximally compressed code and pass it through a permutation. The information content is obviously the same but it is illegible until you reverse the permutation. In some ways it is uniquely easy to do this to codes with maximal entropy because per definition it will be impossible to detect a pattern and recover a readable explanation. In some ways the compressibility of NNs is a proof that a simple model exists, without revealing a understandable explanation. I think we can have (almost) minimal yet readable model without exponentially decreasing information density as required by LDCs.
6Noosphere89
Indeed, even three query locally decodable codes have code lengths that must grow exponentially with message size: https://www.quantamagazine.org/magical-error-correction-scheme-proved-inherently-inefficient-20240109/

Loving this!  

But one thing this model likely predicts is that a better model for a NN than a single linear regression model is a collection of qualitatively different linear regression models at different levels of granularity. In other words, depending on how sloppily you chop your data manifold up into feature subspaces, and how strongly you use the "locality" magnifying glass on each subspace, you'll get a collection of different linear regression behaviors; you then predict that at every level of granularity, you will observe some combination of

... (read more)

People are not thinking clearly about AI-accelerated AI research. This comment by Thane Ruthenis is worth amplifying. 

I'm very skeptical of AI being on the brink of dramatically accelerating AI R&D.

My current model is that ML experiments are bottlenecked not on software-engineer hours, but on compute. See Ilya Sutskever's claim here:

95% of progress comes from the ability to run big experiments quickly. The utility of running many experiments is much less useful.

What actually matters for ML-style progress is picking the correct trick, and then appl

... (read more)
5ryan_greenblatt
See my response here.
8jacquesthibs
Thanks for amplifying. I disagree with Thane on some things they said in that comment, and I don't want to get into the details publicly, but I will say: 1. it's worth looking at DeepSeek V3 and what they did with a $5.6 million training run (obviously that is still a nontrivial amount / CEO actively says most of the cost of their training runs is coming from research talent), 2. compute is still a bottleneck (and why I'm looking to build an ai safety org to efficiently absorb funding/compute for this), but I think Thane is not acknowledging that some types of research require much more compute than others (tho I agree research taste matters, which is also why DeepSeek's CEO hires for cracked researchers, but don't think it's an insurmountable wall), 3. "Simultaneously integrating several disjunctive incremental improvements into one SotA training run is likely nontrivial/impossible in the general case." Yes, seems really hard and a bottleneck...for humans and current AIs. 1. imo, AI models will become Omega Cracked at infra and hyper-optimizing training/inference to keep costs down soon enough (which seems to be what DeepSeek is especially insanely good at)

My current model is that ML experiments are bottlenecked not on software-engineer hours, but on compute. See Ilya Sutskever's claim here

That claim is from 2017. Does Ilya even still endorse it?

9Vladimir_Nesov
To 10x the compute, you might need to 10x the funding, which AI capable of automating AI research can secure in other ways. Smaller-than-frontier experiments don't need unusually giant datacenters (which can be challenging to build quickly), they only need a lot of regular datacenters and the funding to buy their time. Currently there are millions of H100 chips out there in the world, so 100K H100 chips in a giant datacenter is not the relevant anchor for the scale of smaller experiments, the constraint is funding.
Answer by Alexander Gietelink Oldenziel54

For what it's worth I do think observers that observe themselves to be highly unique in important axes rationally should increase their credence in simulation hypotheses.

I probably shouldnt have used the free energy terminology. Does complexity accuracy tradeoff work better ?

To be clear, I very much dont mean these things as a metaphor. I am thinking there may be an actual numerical complexity - accuracy that is some elaboration of Watanabe s "free energy" formula that actually describes these tendencies.

Sorry these words are not super meaningful to me. Would you be able to translate this from physics speak ?

2Dmitry Vaintrob
So the oscillating phase formula is about approximately integrating the function exp(−f(x)/ℏ) against various "priors" p(x) (or more generally any fixed function g), where f is a Lagrangian (think energy) and (\hbar) is a small parameter. It gives an asymptotic series in powers of ℏ. The key point is that (more or less) the kth perturbative term only depends on the kth-order power series expansion of f around the "stationary points" (i.e., saddlepoints, Jac(f) = 0) when f is imaginary, on the maxima of f when f is real, and there is a mixed form that depends on stationary points of the imaginary part which are also maxima of the real part (if these exist); the formulae are all exactly the same, with the only difference between real and imaginary f (i.e. statistical vs. quantum mechanics) being whether you only keep maxima or all saddle points. Now in SLT, you're exactly applying the "real" stationary phase formula, i.e., looking at maxima of the (negative) loss function -L(w). The key thing that can happen is that there can be infinitely many maxima, and these might be singular (both in the sense of having higher degree of stationarity, and in the sense of forming a singular manifold). In this case the stationary phase formula is more complicated and AFAIK isn't completely worked out; Watanabe was the first person who contributed to finding expressions for the general case here beyond the leading correction. In the case of maxima which are nondegenerate, i.e., have positive-definite Hessian, the full perturbative expansion is known; in fact, at least in one very useful frame on it, terms are indexed by Feynman diagrams. Now the energy function f that appears in this context is the log of the Fourier transform ^p(θ) of a probability distribution p(x). Notice that p(x) satisfies p(x)≥0 and ∫p(x)dx=1. This means that (\hat{p}(0) = \int f(x) dx) is 1 and its log is 0. You can check that all other values of the Fourier transform are ≤1 in absolute value (this follows

Isn't it the other way around ?

If inner alignment is hard then general is bad because applying less selection pressure, i.e. more generally, more simplicity prior, means more daemons/gremlins

Let's focus on inner alignment. By instill you presumably mean train. What values get trained is ultimately a learning problem which in many cases (as long as one can formulate approximately a boltzmann distribution) comes down to a simplicity-accuracy tradeoff.

I guess im mostly thinking about the regime where AIs are more capable and general than humans.

It seems at first glance that the latter failure mode is more of a capability failure. Something one would expect to go away as AI truly surpasses humans. It doesnt seem core to the alignment problem to me.

2Dmitry Vaintrob
I haven't thought about this enough to have a very mature opinion. On one hand being more general means you're liable to goodheart more (i.e., with enough deeply general processing power, you understand that manipulating the market to start World War 3 will make your stock portfolio grow, so you act misaligned). On the other hand being less general means that AI's are more liable to "partially memorize" how to act aligned in familiar situations, and go off the rails when sufficiently out-of-distribution situations are encountered. I think this is related to the question of "how general are humans", and how stable are human values to being much more or much less general

I'd be curious how you would describe the core problem of alignment.

2Noosphere89
I'd split it into how do we manage to instill in any goal/value that is ideally at least somewhat stable, ala inner alignment, and outer alignment, which is selecting a goal that is resistant to Goodharting.

Could you give some examples of what you are thinking of here ?

2Dmitry Vaintrob
You mean on more general algorithms being good vs. bad?

The free energy talk probably confuses more than that it elucidates. Im not talking about random diffusion per se but connection between uniformly sampling and simplicity and simplicity-accuracy tradeoff.

Ive tried explaining more carefully where my thinking is currently at in my reply to lucius.

Also caveat that shortforms are halfbaked-by-design.

4Dmitry Vaintrob
Yep, have been recently posting shortforms (as per your recommendation), and totally with you on the "halfbaked-by-design" concept (if Cheeseboard can do it, it must be a good idea right? :) I still don't agree that free energy is core here. I think that the relevant question, which can be formulated without free energy, is whether various "simplicity/generality" priors push towards or away from human values (and you can then specialize to questions of effective dimension/llc, deep vs. shallow networks, ICL vs. weight learning, generalized ood generalization measurements, and so on to operationalize the inductive prior better). I don't think there's a consensus on whether generality is "good" or "bad" -- I know Paul Christiano and ARC has gone both ways on this at various points.

I'm not following exactly what you are saying here so I might be collapsing some subtle point. Let me preface that this is a shortform so half-baked by design so you might be completely right it's confused.

Let me try and explain myself again.

I probably have confused readers by using the free energy terminology. What I mean is that in many cases (perhaps all) the probabilistic outcome of any process can be described in terms of a competition of between simplicity (entropy) and accuracy (energy) to some loss function.

Indeed, the simplest fit for a training s... (read more)

Load More