This was incredibly informative. I really appreciate you all taking the time to share. I'm going to be using a lot of the information here immediately! I'd love to read any additional insights on slide design or thoughts you all have on other communication styles as well.
Our research is centered on empirical research with LLMs. So if you are doing something similar, these tips on slide-based communication may be helpful!
Background:
James Chua and John Hughes are researchers working under Owain Evans and Ethan Perez, respectively. Both of us (James and John) used to be MATS mentees. We weren't good at making research slides at first -- here are some principles we've found useful for understandable slides for our weekly research meetings.
We show some good example slides. We also show examples of confusing slides we've made — marked in the caption with “❌ Negative example”.
Below we use the study of sycophancy as an example. Sycophancy occurs when a model has responses that match user beliefs rather than truthful ones.
Summary slide sets the frame
Your mentor manages multiple projects and people. They need to be reminded of what you are up to. The first slide should recap the key takeaways from the last meeting to motivate what you have worked on and provide a clear summary of your progress.
There are two main messages to convey, which set the frame for what your mentor should think about:
A summary helps your mentor save time. For example, in the slide, your mentor may say “Oh I can remember what the data augmentation was, let’s skip that”. Or maybe your mentor already read your results and wants to discuss something else.
Include an agenda
Often in meetings with your mentor, there is very little time to cover everything the team has done in the week.
Sometimes, you'll need to remind your mentor about the takeaways from the previous meeting before moving on to results:
Simple charts to describe experiments
After the summary, describe the experimental setups.
Always include your prompt. The prompt should describe how you are measuring your metric in the chart. Prompts are often long, so you can truncate the prompt and put the full version at the back of your slides.
Show error bars. Your mentor wants to know whether you ruled out simple things like getting lucky with 10 samples. We use the standard error as a fast heuristic for proportion metrics. The formula is SE=sqrt(p(1-p)/N)) where p is a metric like accuracy or success rate and N is the sample size. To obtain the 95% confidence interval error bars, take SE *1.96. This is just a heuristic because there may be other sources of variance e.g. random seeds and prompt variations. See this post for better ways of calculating errors.
Part of the reason for showing the prompt and error bars is that you want people in the meeting to critique the experiment. So you want to have the “raw ingredients” in the slides, not just the high-level conclusions you are drawing (which might be wrong).
Label your axes. Indicate what your metric is, and what you desire to see. Is it e.g. accuracy (higher is better)? Or is cross-entropy loss (lower is better)
Include the values on the bar chart. E.g. for the chart above “51.4%” and “41.6%”. This saves energy having to look at the y-axis.
Rule of thumb — 3-5 colors max on a bar chart. These bars typically represent "model before your intervention", "a control baseline", and "model after your intervention".
Make the plot large. Having the plot as large as possible on the slide is important so everyone can read the results easily. If you are sharing the slides by video call, sometimes the video quality is not good, so making the plots bigger helps. Takeaways can be included if they do not compromise the readability of the plot.
Start with the most important message first. Even though you worked hard to try 10 different experimental setups, you don’t show all of them. Show your best setup first, or what your mentor would think is the most interesting. Discussing many experimental setups takes time, and often you don’t need to discuss them because they didn’t work and you have something better. You can put other setups in your backup slides (more on that later).
❌ Negative example:
Avoid too many words on a single slide. It means that you are discussing too many ideas at once.
Use simple charts. Mentors who have multiple meetings a day don’t have the energy to understand a complicated chart! Stick to easy charts e.g. bar charts.
❌ Negative example:
Backup slides - Be ready for questions
While you should keep your main slides simple, prepare "backup slides" for questions your mentor might ask. You may also have results from experiments that just finished running, or plots where you haven't had the time to clean up. Stick these plots in the backup slides and flick to them if the conversation naturally goes there.
These slides may be more wordy. Some common things:
Explain what you are measuring. Help to remind your mentor how exactly you are measuring a term! It is especially helpful if a new collaborator sits on the call and gives feedback.
Detailed prompts. Use draw arrows / highlight text. Drawing arrows and highlighting text helps draw attention to particular parts of the prompt to look at.
Scaling curves. Suppose you try to intervene on a model by training on a dataset. And your training does not seem to help. One common question is “have you tried… more data?.” You should be ready to answer at that! Below is a full scaling plot, but to start you can just have a barplot with e.g. “1k vs 20k”. Use the arrows to point specific things out.
Try log-log plots. Not always relevant, so use your judgment, but always keep an eye out for scaling law behavior (if using accuracy, try plotting -log(acc) on the y-axis). Finding predictable scaling trends is helpful for forecasting.
Proposed baselines. What are some simple ways that would invalidate your results? You should think of some and include slides that discuss it.
Training details. E.g. what are the prompts and responses used for training? What are the hyperparameters, and datasets used? If what you tried did not work, what does loss curve look like?
End with concrete discussion points
At the end, list what you think your next steps should be.
Seek feedback from your mentor about whether these experimental priorities are correct. Include any resource requests, such as if you're bottlenecked on compute access.
Keep one slide deck per project
It is useful to keep one slide deck for a few reasons:
Get consistent feedback on the story of your paper
Often getting the paper's story and thinking about how you frame certain elements is left too late. We recommend including slides in your weekly meeting that describe the current story you want to tell so you can get feedback. Then as that story changes in light of new results, present the new story and get feedback again. Following this will make it much easier to write a paper that everyone is aligned on from the start.
Ask your friends
At the start, I (James) benefitted from having a friend review my slides and provide feedback. It is especially helpful if they are also mentored by your mentor as well since your friend will be able to model your mentor's questions better.
Ask your friend to point out any confusing parts like "What do you mean by this term?". These questions highlight where you may need additional slides.
Investing time is worth it
When I first started, I had to invest a lot of time in making slides e.g. 1-2 days. This was a big time investment! I was unused to spending such a time trying to communicate. But it is worth it -- doing great experiments is only half the journey, they only matter if people understand them!
The 1-2 days of improving slides helped me to iterate on experimental improvements. E.g. “It seems like my error bars are big here, I need more samples.” or “I’m missing a control setup here. I need to make one.” Now I'm better at it so it takes only half a day. And communicating my ideas is much easier!