You should go to ML conferences

Jan_Kulveit

This is a second kind of obvious point to make, but if you are interested in AI, AI safety, or cognition in general, it is likely worth going to top ML conferences, such as NeurIPS, ICML or ICLR. In this post I cover some reasons why, and some anecdotal stories.

1. Parts of AI alignment and safety are now completely mainstream

Looking at the "Best paper awards" at ICML, you'll find these safety-relevant or alignment-relevant papers:

Stealing part of a production language model by Carlini et al.
Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo by Zhao et al.
Debating with More Persuasive LLMs Leads to More Truthful Answers by Khan et al.
Genie: Generative Interactive Environments Bruce et al.

which amounts to about one-third (!). "Because of safety concerns" is part of the motivation for hundreds of papers.

While the signal-to-noise ratio is even worse than on LessWrong, in total, the amount you can learn is higher - my personal guess is there is maybe 2-3x as much prosaic AI safety relevant work at conferences than what you get by just following LessWrong, Alignment Forum and safety-oriented communication channels.

2. Conferences are an efficient way how to screen general ML research without spending a lot of time on X

Almost all papers are presented in the form of posters. In case of a big conference, this usually means many thousands of posters presented in huge poster sessions.

My routine for engaging with this firehose of papers:

For each session, read all the titles. Usually, this prunes it by a factor of ten (i.e. from 600 papers to 60).
Read the abstracts. Prune it to things which I haven't noticed before and seem relevant. For me, this is usually by a factor of ~3-5.
Visit the posters. Posters with paper authors present are actually a highly efficient way how to digest research:
- Sometimes, you suspect there is some assumption or choice hidden somewhere making the result approximately irrelevant - just asking can often resolve this in a matter of tens of seconds.
- Posters themselves don't undergo peer review which makes the communication more honest, with less hedging.
- Usually authors of a paper know significantly more about the problem than what's in the paper, and you can learn more about negative results, obstacles, or directions people are excited about.

Clear disadvantage of conferences is the time lag; by the time they are presented, some of the main results are old and well known, but in my view a lot of the value is the long tail of results which are sometimes very useful, but not attention grabbing.

3. ML research community as a control group

My vague impression is that in conceptual research, mainstream ML research lags behind LW/AI safety community by something between 1 to 5 years, rediscovering topics discussed here. Some examples:

ICML poster & oral presentation The Platonic Representation Hypothesis is an independent version of Natural abstractions discussed here for about 4 years.
A Roadmap to Pluralistic Alignment deals with Self-unalignment problem and Coherent extrapolated volition
Plenty of research on safety protocols like debate, IDA,...

Prior work published in the LW/AI safety community is almost never cited or acknowledged - in some cases because it is more convenient to claim the topic is completely novel, but I suspect in many cases researchers are genuinely not aware of the existing work, which makes their contribution a useful control: if someone starts thinking about these topics, unaware of the thousands hours spent on them by dozens of people, what will they arrive at?

4. What 'experts' think

ML research community is the intellectual home of many people expressing public opinions about AI risk. In my view, background in technical ML alone actually does not give you significantly more expertise in understanding AI risk than let's say background in mountaineering methodology or theoretical evolutionary biology, but it is natural for the public to assume it does. This makes it useful to understand prevailing opinions and the broad epistemic landscape of the ML community.

As an anecdote, only after going to NeurIPS I fully realized how many researcher in NLP suffer from some internal conflict where part of them is really excited about AIs actually getting intelligent, but another part deeply hates this is largely due to scaling, with a place like OpenAI in the lead.

5. Examples

If the previous points haven't convinced you, here are five papers I discovered at conferences which I learned something from, but were not linked or noticed here and I would likely miss them while not visiting a conference.

Learning Universal Predictors explores what happens if you take ideas from Solomonoff Induction and train actual neural network architectures like transformers on data generated from Universal Turing Machines.
Are Vision Transformers More Data Hungry Than Newborn Visual Systems? attempts to make a fair comparison of learning efficiency between newborn chicks and vision transformers, by training the transformers on first-person visual data similar to what newborn chicks experience.
Prompting a Pretrained Transformer Can Be a Universal approximator asks if one can arbitrarily modify the behavior of a pretrained model by prompting or prefix-tuning it. Formally, whether prompting and prefix-tuning a pretrained model can universally approximate sequence-to-sequence functions.
Watermarks in the Sand: Impossibility of Strong Watermarking for Language Models provides nice theory for the intuition that robust watermarking is likely impossible
Human Alignment of Large Language Models through Online Preference Optimisation explains a deep link between some recent alignment methods and finding the Nash equilibrium of the preference model through self-play.

In each case, if someone re-wrote this is a LW post, I would expect it to be highly upvoted and read.

6. Conclusion

In my view, if you tend to follow AI, AI safety or 'cognition in general' topics on safety community platforms, it is likely worth your time to go to a conference. If you don't go in person, you can still do some of of the described steps - skim titles, select abstracts, discover new things.

I would also be in favor of work that makes the community boundaries more permeable. In one direction, by converting some LW posts into conference papers - in particular, pieces explaining conceptual shortcomings and limits of safety methods people are likely to arrive. In the other direction, by distilling what's relevant but not safety-branded.

ACS would probably be happy to sponsor conference participation (like, tickets and travel) for someone in exchange for distillation work with regard to topics we are interested in - i.e. going through the abstracts, engaging with the papers, writing blogpost summaries of relevant research.

For what it's worth, my view on the value of conferences is that a huge proportion of the value comes from meeting new people, catching up with people who live elsewhere, having research discussions, etc. I've occasionally found out about a paper I wouldn't have otherwise, but this is a smaller fraction of the value for me. Language model research is generally behind the state of the art of what's available inside labs, and a huge fraction of papers at conferences won't replicate or are too toy or otherwise end up never becoming useful.

Genie: Generative Interactive Environments Bruce et al.

How is that paper alignment-relevant?

My opinion is that going to poster sessions, orals, pre-researching papers etc. at ICML/ICLR/NeurIPS is pretty valuable for new researchers and I wish I had done this before having any papers (you don't need to have any papers to go to a conference). See also Thomas Kwa's comment about random intuitions learnt from going to a conference.

After this, I agree with Leo that I think it would be a waste of my time to go to papers/orals/preresearch papers. Maybe there's some value in this for conceptual research but for most empirical work I'm very skeptical (most papers are not good, but it takes my time to figure out whether a paper is good or not, etc.)

I'm skeptical of the 'wasting my time' argument.

Stance like 'going to poster sessions is great for young researchers, I don't do it anymore and just meet friends' is high-status, so, on priors, I would expect people to take it more than what's optimal.

Realistically, poster session is ~1.5h, maybe 2h with skimming what to look at. It is relatively common for people in AI to spend many hours per week digesting what are the news on twitter. I really doubt the per hour efficiency of following twitter is better than of poster sessions when approached intentionally. (While obviously aimlessly wandering between endless rows of posters is approximately useless.)

I agree that twitter is a worse use of time.

Going to posters for works you already know to talk to authors seems a great idea and I do it. Re-reading your OP, you suggest things like checking papers are fake or not in poster sessions. Maybe you just meant papers that you already knew about? It sounded as if you were suggesting doing this for random papers, which I'm more skeptical about.

It sounded as if you were suggesting doing this for random papers,

I presumed that Jan meant doing it for papers that had survived the previous "read the titles" and "read the abstracts" filtering stages.

One of my current projects builds on "Learning Universal Predictors." It will eventually appear in my scaffolding sequence (trust me, there will be posts in the scaffolding sequence someday, no really I swear...)

It's a convenient test-bed to investigate schemes for building an agent with access to a good universal distribution approximation - which is what we (at least, me) usually assume an LLM is!

Also see my PR.

Watermarks in the Sand: Impossibility of Strong Watermarking for Language Models provides nice theory for the intuition that robust watermarking is likely impossible

The link here leads nowhere, I'm afraid.

If you go on Twitter/X and find the right people, you can get most of the benefits you list here. There are tastemakers that share and discuss intriguing papers, and researchers who post their own papers with explanation threads which are often more useful than the papers themselves. The researchers are usually available to answer questions about their work, and you can read the answers they've given already. You're also ahead of the game because preprints can appear way before conferences.

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

Learning Universal Predictors explores what happens if you take ideas from Solomonoff Induction and train actual neural network architectures like transformers on data generated from Universal Turing Machines.

I already found this here somehow, I can't recall where