All of Leon Lang's Comments + Replies

Thanks for this post!

The deadline possibly requires clarification:

We will keep the application form open until at least 11:59pm AoE on Thursday, February 27.

In the job posting, you write:

Application deadline: 12pm PST Friday 28th February 2025

4Rohin Shah
We'll leave it up until the later of those two (and probably somewhat beyond that, but that isn't guaranteed). I've edited the post.
Leon Lang304

There are a few sentences in Anthropic's "conversation with our cofounders" regarding RLHF that I found quite striking:

Dario (2:57): "The whole reason for scaling these models up was that [...] the models weren't smart enough to do RLHF on top of. [...]"

Chris: "I think there was also an element of, like, the scaling work was done as part of the safety team that Dario started at OpenAI because we thought that forecasting AI trends was important to be able to have us taken seriously and take safety seriously as a problem."

Dario: "Correct."

That LLMs were scal... (read more)

we thought that forecasting AI trends was important to be able to have us taken seriously

This might be the most dramatic example ever of forecasting affecting the outcome.

Similarly I'm concerned that a lot of alignment people are putting work into evals and benchmarks which may be having some accelerating affect on the AI capabilities which they are trying to understand.

"That which is measured improves. That which is measured and reported improves exponentially."

Hi! Thanks a lot for your comments and very good points. I apologize for my late answer, caused by NeurIPS and all the end-of-year breakdown of routines :)

On 1: Yes, the formalism I'm currently working on also allows to talk about the case that the human "understands less" than the AI.

On 2: 

Have you considered the connection between partial observability and state aliasing/function approximation?

I am not entirely sure if I understand! Though if it's just what you express in the following sentences, here's my answers:

Maybe you could apply your theory t

... (read more)

Thanks for the list! I have two questions:

1: Can you explain how generalization of NNs relates to ELK? I can see that it can help with ELK (if you know a reporter generalizes, you can train it on labeled situations and apply it more broadly) or make ELK unnecessary (if weak to strong generalization perfectly works and we never need to understand complex scenarios). But I’m not sure if that’s what you mean.

2: How is goodhart robustness relevant? Most models today don’t seem to use reward functions in deployment, and in training the researchers can control how hard they optimize these functions, so I don’t understand why they necessarily need to be robust under strong optimization.

“heuristics activated in different contexts” is a very broad prediction. If “heuristics” include reasoning heuristics, then this probably includes highly goal-oriented agents like Hitler.

Also, some heuristics will be more powerful and/or more goal-directed, and those might try to preserve themselves (or sufficiently similar processes) more so than the shallow heuristics. Thus, I think eventually, it is plausible that a superintelligence looks increasingly like a goal-maximizer.

Leon Lang178

This is a low effort comment in the sense that I don’t quite know what or whether you should do something different along the following lines, and I have substantial uncertainty.

That said:

  1. I wonder whether Anthropic is partially responsible for an increased international race through things like Dario advocating for an entente strategy and talking positively about Leopold Aschenbrenner’s “situational awareness”. I wished to see more of an effort to engage with Chinese AI leaders to push for cooperation/coordination. Maybe it’s still possible to course-co

... (read more)

I have an AI agent that wrote myself

Best typo :D

Have you also tried reviewing for conferences like NeurIPS? I'd be curious what the differences are.

Some people send papers to TMLR when they think they wouldn't be accepted to the big conferences due to not being that "impactful" --- which makes sense since TMLR doesn't evaluate impact. It's thus possible that the median TMLR submission is worse than the median conference submission.

3Daniel Tan
In my experience, ML folks submit to journals when:  1. Their work greatly exceeds the scope of 8 pages 2. They have been rejected multiple times from first- (or even second-)tier conferences For the first reason, I think the best papers in TMLR are probably on par with (or better than) the best papers at ML conferences, but you're right that the median could be worse.  Low-confidence take: Length might be a reasonable heuristic to filter out the latter category of work. 
7Buck
I reviewed for iclr this year, and found it somewhat more rewarding; the papers were better, and I learned something somewhat useful from writing my reviews.
Leon Lang120

I just donated $200. Thanks for everything you're doing!

Yeah I think that's a valid viewpoint. 

Another viewpoint that points in a different direction: A few years ago, LLMs could only do tasks that require humans ~minutes. Now they're at the ~hours point. So if this metric continues, eventually they'll do tasks requiring humans days, weeks, months, ...

I don't have good intuitions that would help me to decide which of those viewpoints is better for predicting the future. 

3Cole Wyeth
One reason to prefer my position is that LLM's still seem to be bad at the kind of tasks that rely on using serial time effectively. For these ML research style tasks, scaling up to human performance over a couple of hours relied on taking the best of multiple calls, which seems like parallel time. That's not the same as leaving an agent running for a couple of hours and seeing it work out something it previously would have been incapable of guessing (or that really couldn't be guessed, but only discovered through interaction). I do struggle to think of tests like this that I'm confident an LLM would fail though. Probably it would have trouble winning a text based RPG? Or more practically speaking, could an LLM file my taxes without committing fraud? How well can LLM's play board games these days?
Leon Lang*2617

Somewhat pedantic correction: they don’t say “one should update”. They say they update (plus some caveats).

4mishka
Indeed

After the US election, the twitter competitor bluesky suddenly gets a surge of new users:

https://x.com/robertwiblin/status/1858991765942137227

Leon Lang123

How likely are such recommendations usually to be implemented? Are there already manifold markets on questions related to the recommendation?

4[anonymous]
yes, but only small trading volume so far: https://manifold.markets/Bayesian/will-a-us-manhattanlike-project-for
Leon Lang294

In the reuters article they highlight Jacob Helberg: https://www.reuters.com/technology/artificial-intelligence/us-government-commission-pushes-manhattan-project-style-ai-initiative-2024-11-19/

He seems quite influential in this initiative and recently also wrote this post:

https://republic-journal.com/journal/11-elements-of-american-ai-supremacy/

Wikipedia has the following paragraph on Helberg:

“ He grew up in a Jewish family in Europe.[9] Helberg is openly gay.[10] He married American investor Keith Rabois in a 2018 ceremony officiated by Sam Altman.”

Might ... (read more)

Leon Lang170

Why I think scaling laws will continue to drive progress

Epistemic status: This is a thought I had since a while. I never discussed it with anyone in detail; a brief conversation could convince me otherwise. 

According to recent reports there seem to be some barriers to continued scaling. We don't know what exactly is going on, but it seems like scaling up base models doesn't bring as much new capability as people hope.

However, I think probably they're still in some way scaling the wrong thing: The model learns to predict a static dataset on the interne... (read more)

6cubefox
Tailcalled talked about this two years ago. A model which predicts text does a form of imitation learning. So it is bounded by the text it imitates, and by the intelligence of humans who have written the text. Models which predict future sensory inputs (called "predictive coding" in neuroscience, or "the dark matter of intelligence" by LeCun) don't have such a limitation, as they predict reality more directly.
6johnswentworth
I think this misunderstands what discussion of "barriers to continued scaling" is all about. The question is whether we'll continue to see ROI comparable to recent years by continuing to do the same things. If not, well... there is always, at all times, the possibility that we will figure out some new and different thing to do which will keep capabilities going. Many people have many hypotheses about what those new and different things could be: your guess about interaction is one, inference time compute is another, synthetic data is a third, deeply integrated multimodality is a fourth, and the list goes on. But these are all hypotheses which may or may not pan out, not already-proven strategies, which makes them a very different topic of discussion than the "barriers to continued scaling" of the things which people have already been doing.
4p.b.
The paper seems to be about scaling laws for a static dataset as well?  To learn to act you'd need to do reinforcement learning, which is massively less data-efficient than the current self-supervised training. More generally: I think almost everyone thinks that you'd need to scale the right thing for further progress. The question is just what the right thing is if text is not the right thing. Because text encodes highly powerful abstractions (produced by humans and human culture over many centuries) in a very information dense way.
Leon Lang110

Do we know anything about why they were concerned about an AGI dictatorship created by Demis?

5MondSemmel
Presumably it was because Google had just bought DeepMind, back when it was the only game in town?

What’s your opinion on the possible progress of systems like AlphaProof, o1, or Claude with computer use?

5johnswentworth
Still very plausible as a route to continued capabilities progress. Such things will have very different curves and economics, though, compared to the previous era of scaling.
Leon Lang8340

"Scaling breaks down", they say. By which they mean one of the following wildly different claims with wildly different implications:

  • When you train on a normal dataset, with more compute/data/parameters, subtract the irreducible entropy from the loss, and then plot in a log-log plot: you don't see a straight line anymore.
  • Same setting as before, but you see a straight line; it's just that downstream performance doesn't improve .
  • Same setting as before, and downstream performance improves, but: it improves so slowly that the economics is not in favor of furthe
... (read more)
5p.b.
I think the evidence mostly points towards 3+4, But if 3 is due to 1 it would have bigger implications about 6 and probably also 5.  And there must be a whole bunch of people out there who know wether the curves bend.
6cubefox
It's not that "they" should be more precise, but that "we" would like to have more precise information. We know pretty conclusively now from The Information and Bloomberg that for OpenAI, Google and Anthropic, new frontier base LLMs have yielded disappointing performance gains. The question is which of your possibilities did cause this. They do mention that the availability of high quality training data (text) is an issue, which suggests it's probably not your first bullet point.
Lorec189

This is a just ask.

Also, even though it's not locally rhetorically convenient [ where making an isolated demand for rigor of people making claims like "scaling has hit a wall [therefore AI risk is far]" that are inconvenient for AInotkilleveryoneism, is locally rhetorically convenient for us ], we should demand the same specificity of people who are claiming that "scaling works", so we end up with a correct world-model and so people who just want to build AGI see that we are fair.

Thanks for this compendium, I quite enjoyed reading it. It also motivated me to read the "Narrow Path" soon.

I have a bunch of reactions/comments/questions at several places. I focus on the places that feel most "cruxy" to me. I formulate them without much hedging to facilitate a better discussion, though I feel quite uncertain about most things I write. 

On AI Extinction

The part on extinction from AI seems badly argued to me. Is it fair to say that you mainly want to convey a basic intuition, with the hope that the readers will find extinction an "obvi... (read more)

Then the MATS stipend today is probably much lower than it used to be? (Which would make sense since IIRC the stipend during MATS 3.0 was settled before the FTX crash, so presumably when the funding situation was different?)

6Ryan Kidd
MATS lowered the stipend from $50/h to $40/h ahead of the Summer 2023 Program to support more scholars. We then lowered it again to $30/h ahead of the Winter 2023-24 Program after surveying alumni and determining that 85% would be accept $30/h.

Is “CHAI” being a CHAI intern, PhD student, or something else? My MATS 3.0 stipend was clearly higher than my CHAI internship stipend.

4Ryan Kidd
CHAI interns are paid $5k/month for in-person interns and $3.5k/month for remote interns. I used the in-person figure. https://humancompatible.ai/jobs
Leon Lang110

I have a similar feeling, but there are some forces in the opposite direction:

  • Nvidia seems to limit how many GPUs a single competitor can acquire.
  • training frontier models becomes cheaper over time. Thus, those that build competitive models some time later than the absolute frontier have to invest much less resources.
Leon Lang10-2

My impression is that Dario (somewhat intentionally?) plays the game of saying things he believes to be true about the 5-10 years after AGI, conditional on AI development not continuing.

What happens after those 5-10 years, or if AI gets even vastly smarter? That seems out of scope for the article. I assume he's doing that since he wants to influence a specific set of people, maybe politicians, to take a radical future more seriously than they currently do. Once a radical future is more viscerally clear in a few years, we will likely see even more radical essays. 

5Nathan Helm-Burger
It's tricky to pin down from this what he gut-level believes versus thinks it expedient to publish. Consider this passage and its footnote: The two implied assumptions I note relevant to this: 1. AI will only get a bit smarter (2-3x) than the smartest human, not a lot smarter (100x). 2. Algorithmic advances won't make it vastly cheaper to train AI. Datacenters with oversight and computer governance, control of AGI by a small number of responsible parties, defense-dominant technology outcomes. This is an imagined future without radical changes in world governments, but also everything staying neat and tidy and controlled.

It is a thing that I remember having been said at podcasts, but I don't remember which one, and there is a chance that it was never said in the sense I interpreted it.

Also, quote from this post:

"DeepMind says that at large quantities of compute the scaling laws bend slightly, and the optimal behavior might be to scale data by even more than you scale model size. In which case you might need to increase compute by more than 200x before it would make sense to use a trillion parameters."

8gwern
That was quite a while ago, and is not a very strongly worded claim. I think there was also evidence that Chinchilla got a constant factor wrong and people kept discovering that you wanted a substantially larger multiplier of data:parameter, which might fully account for any 'slight bending' back then - bending often just means you got a hyperparameter wrong and need to tune it better. (It's a lot easier to break scaling than to improve it, so being away badly is not too interesting while bending the opposite direction is much more interesting.)
Leon Lang204

Are the straight lines from scaling laws really bending? People are saying they are, but maybe that's just an artefact of the fact that the cross-entropy is bounded below by the data entropy. If you subtract the data entropy, then you obtain the Kullback-Leibler divergence, which is bounded by zero, and so in a log-log plot, it can actually approach negative infinity. I visualized this with the help of ChatGPT:

Here, f represents the Kullback-Leibler divergence, and g the cross-entropy loss with the entropy offset. 

kave122

I've not seen the claim that the scaling laws are bending. Where should I look?

gwern124

Isn't an intercept offset already usually included in the scaling laws and so can't be misleading anyone? I didn't think anyone was fitting scaling laws which allow going to exactly 0 with no intrinsic entropy.

Agreed.

To understand your usage of the term “outer alignment” a bit better: often, people have a decomposition in mind where solving outer alignment means technically specifying the reward signal/model or something similar. It seems that to you, the writeup of a model-spec or constitution also counts as outer alignment, which to me seems like only part of the problem. (Unless perhaps you mean that model specs and constitutions should be extended to include a whole training setup or similar?)

If it doesn’t seem too off-topic to you, could you comment on your views on this terminology?

6Daniel Kokotajlo
Good points. I probably should have said "the midas problem" (quoting Cold Takes) instead of "outer alignment." Idk. I didn't choose my terms carefully.
Leon Lang42-13

https://www.wsj.com/tech/ai/californias-gavin-newsom-vetoes-controversial-ai-safety-bill-d526f621

“California Gov. Gavin Newsom has vetoed a controversial artificial-intelligence safety bill that pitted some of the biggest tech companies against prominent scientists who developed the technology.

The Democrat decided to reject the measure because it applies only to the biggest and most expensive AI models and leaves others unregulated, according to a person with knowledge of his thinking”

1green_leaf
What an undignified way to go.

@Zach Stein-Perlman which part of the comment are you skeptical of? Is it the veto itself, or is it this part?

The Democrat decided to reject the measure because it applies only to the biggest and most expensive AI models and leaves others unregulated, according to a person with knowledge of his thinking”

New Bloomberg article on data center buildouts pitched to the US government by OpenAI. Quotes:

- “the startup shared a document with government officials outlining the economic and national security benefits of building 5-gigawatt data centers in various US states, based on an analysis the company engaged with outside experts on. To put that in context, 5 gigawatts is roughly the equivalent of five nuclear reactors, or enough to power almost 3 million homes.”
- “Joe Dominguez, CEO of Constellation Energy Corp., said he has heard that Altman is talking about ... (read more)

From $4 billion for a 150 megawatts cluster, I get 37 gigawatts for a $1 trillion cluster, or seven 5-gigawatts datacenters (if they solve geographically distributed training). Future GPUs will consume more power per GPU (though a transition to liquid cooling seems likely), but the corresponding fraction of the datacenter might also cost more. This is only a training system (other datacenters will be built for inference), and there is more than one player in this game, so the 100 gigawatts figure seems reasonable for this scenario.

Current best deployed mod... (read more)

OpenAI would have mentioned if they had reached gold on the IMO.

I think it would be valuable if someone would write a post that does (parts of) the following:

  • summarize the landscape of work on getting LLMs to reason.
  • sketch out the tree of possibilities for how o1 was trained and how it works in inference.
  • select a “most likely” path in that tree and describe in detail a possibility for how o1 works.

I would find it valuable since it seems important for external safety work to know how frontier models work, since otherwise it is impossible to point out theoretical or conceptual flaws for their alignment approaches.

O... (read more)

Thanks for the post, I agree with the main points.

There is another claim on causality one could make, which would be: LLMs cannot reliably act in the world as robust agents since by acting in the world, you change the world, leading to a distributional shift from the correlational data the LLM encountered during training.

I think that argument is correct, but misses an obvious solution: once you let your LLM act in the world, simply let it predict and learn from the tokens that it receives in response. Then suddenly, the LLM does not model correlational, but actual causal relationships.

Agreed.

I think the most interesting part was that she made a comment that one way to predict a mind is to be a mind, and that that mind will not necessarily have the best of all of humanity as its goal. So she seems to take inner misalignment seriously. 

40 min podcast with Anca Dragan who leads safety and alignment at google deepmind: https://youtu.be/ZXA2dmFxXmg?si=Tk0Hgh2RCCC0-C7q

9Zach Stein-Perlman
I listened to it. I don't recommend it. Anca seems good and reasonable but the conversation didn't get into details on misalignment, scalable oversight, or DeepMind's Frontier Safety Framework.

To clarify: are you saying that since you perceive Chris Olah as mostly intrinsically caring about understanding neural networks (instead of mostly caring about alignment), you conclude that his work is irrelevant to alignment?

6habryka
No, I have detailed inside view models of the alignment problem, and under those models consider Chris Olah's work to be interesting but close to irrelevant (or to be about as relevant as the work of top capability researchers, whose work, to be clear, does have some relevance since of course understanding how to make systems better is relevant for understanding how AGI will behave, but where the relevance is pretty limited).

I can see that research into proof assistants might lead to better techniques for combining foundation models with RL. Is there anything more specific that you imagine? Outside of math there are very different problems because there is no easy to way to synthetically generate a lot of labeled data (as opposed to formally verifiable proofs).

Not much more specific! I guess from a certain level of capabilities onward, one could create labels with foundation models that evaluate reasoning steps. This is much more fuzzy than math, but I still guess a person ... (read more)

Leon LangΩ91815

I think the main way that proof assistant research feeds into capabilies research is not through the assistants themselves, but by the transfer of the proof assistant research to creating foundation models with better reasoning capabilities. I think researching better proof assistants can shorten timelines.

  • See also Demis' Hassabis recent tweet. Admittedly, it's unclear whether he refers to AlphaProof itself being accessible from Gemini, or the research into AlphaProof feeding into improvements of Gemini.
  • See also an important paragraph in the blogpost for A
... (read more)
8Vanessa Kosoy
I can see that research into proof assistants might lead to better techniques for combining foundation models with RL. Is there anything more specific that you imagine? Outside of math there are very different problems because there is no easy to way to synthetically generate a lot of labeled data (as opposed to formally verifiable proofs). While some AI techniques developed for proof assistants might be transferable to other problems, I can easily imagine a responsible actor[1] producing a net positive. Don't disclose your techniques (except maybe very judiciously), don't open your source, maintain information security, maybe only provide access as a service, maybe only provide access to select people/organizations. 1. ^ To be clear, I don't consider Alphabet to be a responsible actor.
Leon Lang158

https://www.washingtonpost.com/opinions/2024/07/25/sam-altman-ai-democracy-authoritarianism-future/

Not sure if this was discussed at LW before. This is an opinion piece by Sam Altman, which sounds like a toned down version of "situational awareness" to me. 

The news is not very old yet. Lots of potential for people to start freaking out.

One question: Do you think Chinchilla scaling laws are still correct today, or are they not? I would assume these scaling laws depend on the data set used in training, so that if OpenAI found/created a better data set, this might change scaling laws.

Do you agree with this, or do you think it's false?

2Vladimir_Nesov
New data! Llama 3 report includes data about Chinchilla optimality study on their setup. The surprise is that Llama 3 405b was chosen to have the optimal size rather than being 2x overtrained. Their actual extrapolation for an optimal point is 402b parameters, 16.55T tokens, and 3.8e25 FLOPs. Fitting to the tokens per parameter framing, this gives the ratio of 41 (not 20) around the scale of 4e25 FLOPs. More importantly, their fitted dependence of optimal number of tokens on compute has exponent 0.53, compared to 0.51 from the Chinchilla paper (which was almost 0.5, hence tokens being proportional to parameters). Though the data only goes up to 1e22 FLOPs (3e21 FLOPs for Chinchilla), what actually happens at 4e25 FLOPs (6e23 FLOPs for Chinchilla) is all extrapolation, in both cases, there are no isoFLOP plots at those scales. At least Chinchilla has Gopher as a point of comparison, and there was only 200x FLOPs gap in the extrapolation, while for Llama 3 405 the gap is 4000x. So data needs grow faster than parameters with more compute. This looks bad for the data wall, though the more relevant question is what would happen after 16 repetitions, or how this dependence really works with more FLOPs (with the optimal ratio of tokens to parameters changing with scale).
4Vladimir_Nesov
Data varies in the loss it enables, doesn't seem to vary greatly in the ratio between the number of tokens and the number of parameters that extracts the best loss out of training with given compute. That is, I'm usually keeping this question in mind, didn't see evidence to the contrary in the papers, but relevant measurements are very rarely reported, even in model series training report papers where the ablations were probably actually done. So could be very wrong, generalization from 2.5 examples. With repetition, there's this gradual increase from 20 to 60. Probably something similar is there for distillation (in the opposite direction), but I'm not aware of papers that measure this, so also could be wrong. One interesting point is the isoFLOP plots in the StripedHyena post (search "Perplexity scaling analysis"). With hybridization where standard attention remains in 8-50% of the blocks, perplexity is quite insensitive to change in model size while keeping compute fixed, while for pure standard attention the penalty for deviating from the optimal ratio to a similar extent is much greater. This suggests that one way out for overtrained models might be hybridization with these attention alternatives. That is, loss for an overtrained model might be closer to Chinchilla optimal loss with a hybrid model than it would be for a similarly overtrained pure standard attention model. Out of the big labs, visible moves in this directions were made by DeepMind with their Griffin Team (Griffin paper, RecurrentGemma). So that's one way the data wall might get pushed a little further for the overtrained models.
Leon Lang140

https://x.com/sama/status/1813984927622549881

According to Sam Altman, GPT-4o mini is much better than text-davinci-003 was in 2022, but 100 times cheaper. In general, we see increasing competition to produce smaller-sized models with great performance (e.g., Claude Haiku and Sonnet, Gemini 1.5 Flash and Pro, maybe even the full-sized GPT-4o itself). I think this trend is worth discussing. Some comments (mostly just quick takes) and questions I'd like to have answers to:

  • Should we expect this trend to continue? How much efficiency gains are still possible? C
... (read more)
7rbv
The vanilla Transformer architecture is horrifically computation inefficient. I really thought it was a terrible idea when I learnt about it. On every single token it processes ALL of the weights in the model and ALL of the context. And a token is less than a word — less than a concept. You generally don't need to consider trivia to fill in grammatical words. On top of that, implementations of it were very inefficient. I was shocked when I read the FlashAttention paper: I had assumed that everyone would have implemented attention that way in the first place, it's the obvious way to do it if you know anything about memory throughput. (My shock was lessened when I looked at the code and saw how tricky it was to incorporate into PyTorch.) Ditto unfused kernels, another inefficiency that exists to allow writing code in Python instead of CUDA/SYCL/etc. Second point, transformers also seem to be very parameter inefficient. They have many layers and many attention heads largely so that they can perform multi-step inferences and do a lot in each step if necessary, but mechanistic interpretability studies shows just the center layers do nearly all the work. We now see transformers with shared weights between attention heads and layers and the performance drop is not that much. And there's also the matter of bits per parameter, again a 10x reduction in precision is a surprisingly small detriment. I believe that the large numbers of parameters in transformers aren't primarily there to store knowledge, they're needed to learn quickly. They perform routing and encode mechanisms (that is, pieces of algorithms) and their vast number provides a blank slate. Training data seen just once is often remembered because there are so many possible places to store it that it's highly likely there are good paths through the network through which strong gradients can flow to record the information. This is a variant of the Lottery Ticket Hypothesis. But a better training algorithm could in

To make a Chinchilla optimal model smaller while maintaining its capabilities, you need more data. At 15T tokens (the amount of data used in Llama 3), a Chinchilla optimal model has 750b active parameters, and training it invests 7e25 FLOPs (Gemini 1.0 Ultra or 4x original GPT-4). A larger $1 billion training run, which might be the current scale that's not yet deployed, would invest 2e27 FP8 FLOPs if using H100s. A Chinchilla optimal run for these FLOPs would need 80T tokens when using unique data.

Starting with a Chinchilla optimal model, if it's made 3x ... (read more)

5Jacob Pfau
Given a SotA large model, companies want the profit-optimal distilled version to sell--this will generically not be the original size. On this framing, regulation passes the misuse deployment risk from higher performance (/higher cost) models to the company. If profit incentives, and/or government regulation here continues to push businesses to primarily (ideally only?) sell 2-3+ OOM smaller-than-SotA models, I see a few possible takeaways: * Applied alignment research inspired by speed priors seems useful: e.g. how do sleeper agents interact with distillation etc. * Understanding and mitigating risks of multi-LM-agent and scaffolded LM agents seems higher priority * Pre-deployment, within-lab risks contribute more to overall risk On trend forecasting, I recently created this Manifold market to estimate the year-on-year drop in price for SotA SWE agents to measure this. Though I still want ideas for better and longer term markets!

I went to this event in 2022 and it was lovely. Will come again this year. I recommend coming!

Thanks for the answer!

But basically, by "simple goals" I mean "goals which are simple to represent", i.e. goals which have highly compressed representations

It seems to me you are using "compressed" in two very different meanings in part 1 and 2. Or, to be fairer, I interpret the meanings very differently.

I try to make my view of things more concrete to explain:

Compressed representations: A representation is a function  from observations of the world state  (or sequences of such observations) into a representation space  ... (read more)

Thanks for the post!

a. How exactly do 1 and 2 interact to produce 3?

I think the claim is along the lines of "highly compressed representations imply simple goals", but the connection between compressed representations and simple goals has not been argued, unless I missed it. There's also a chance that I simply misunderstand your post entirely. 

b. I don't agree with the following argument:

Decomposability over space. A goal is decomposable over space if it can be evaluated separately in each given volume of space. All else equal, a goal is more decompos

... (read more)
1Benjy Forstadt
The vast majority of philosophers definitely do not favor maximizing the amount of hedonium. Pure hedonistic utilitarianism is a relatively rare minority view. I don’t think we should try to explain how people end up with specific idiosyncratic philosophical views by this kind of high-level analysis…
3Richard_Ngo
Hmm, maybe I should spell it out more explicitly. But basically, by "simple goals" I mean "goals which are simple to represent", i.e. goals which have highly compressed representations; and if all representations are becoming simpler, then the goal representations (as a subset of all representations) are also becoming simpler. (Note that I'll elaborate further on the relationship between goal representations and other representations in my next post.) This is largely my fault since I haven't really defined "representation" very clearly, but I would say that the representation of the concept of a dog should be considered to include e.g. the neurons representing "fur", "mouth", "nose", "barks", etc. Otherwise if we just count "dog" as being encoded in a single neuron, then every concept encoded in any neuron is equally simple, which doesn't seem like a useful definition. (To put it another way: the representation is the information you need to actually do stuff with the concept.) I agree that most people who say they are hedonic utilitarians are not 100% committed to hedonic utilitarianism. But I still think it's very striking that they at least somewhat care about making hedonium. I claim this provides an intuition pump for how AIs might care about squiggles too.
Leon Lang6630

You should all be using the "Google Scholar PDF reader extension" for Chrome.

Features I like:

  • References are linked and clickable
  • You get a table of contents
  • You can move back after clicking a link with Alt+left

Screenshot: 

2Stephen McAleese
I think the Zotero PDF reader has a lot of similar features that make the experience of reading papers much better: * It has a back button so that when you click on a reference link that takes you to the references section, you can easily click the button to go back to the text. * There is a highlight feature so that you can highlight parts of the text which is convenient when you want to come back and skim the paper later. * There is a "sticky note" feature allowing you to leave a note in part of the paper to explain something.
7StefanHex
This is great, love it! Settings recommendation: If you (or your company) want, you can restrict the extension's access from all websites down to the websites you read papers on. Note that the scholar.google.com access is required for the look-up function to work.
3Stephen Fowler
Just started using this, great recommendation. I like the night mode feature which changes the color of the pdf itself.
3Neel Nanda
Strongly agreed, it's a complete game changer to be able to click on references in a PDF and see a popup
Leon Lang30

I guess (but don't know) that most people who downvote Garrett's comment overupdated on intuitive explanations of singular learning theory, not realizing that entire books with novel and nontrivial mathematical theory have been written on it. 

Leon Lang30

I do all of these except 3, and implementing a system like 3 is among my deprioritized things in my ToDo-list. Maybe I should prioritize it.

Leon Lang73

I really enjoyed reading this post! It's quite well-written. Thanks for writing it.

The only critique is that I would have appreciated more details on how the linear regression parameters are trained and what exactly the projection is doing. John's thread is a bit clarifying on this.

One question: If you optimize the representation in the residual stream such that it corresponds to a particular chosen belief state, does the transformer than predict the next token as if in that belief state? I.e., does the transformer use the belief state for making predictions?

1Adam Shai
Thanks! I appreciate the critique. From this comment and from John's it seems correct and I'll keep it in mind for the future. On the question, by optimize the representation do you mean causally intervene on the residual stream during inference (e.g. a patching experiment)? Or do you mean something else that involves backprop? If the first, then we haven't tried, but definitely want to! It could be something someone does at the Hackathon, if interested ;)

MATS mentorships are often weekly, but only for limited time, unlike PhD programs that offer mentorship for several years. These years are probably often necessary to develop good research taste.

2Garrett Baker
I didn't claim that MATS was a replacement, just that there's lots of latent demand. I wouldn't be surprised if a good fraction of the mentors in MATS would be happy to continue to mentor their mentees far after MATS, modulo the mentor and mentee getting along. Would be interesting if MATS had numbers on how often this happens naturally, though I don't think they do (I probably would have heard about such a survey).
Load More