I agree with a lot of this post.
Relatedly: in my experience, junior people wildly overestimate the extent to which senior people form confident and sticky negative evaluations of them. I basically never form a confident negative impression of someone's competence from a single interaction with them, and I place pretty substantial probability on people changing substantially over the course of a year or two.
I think that many people perform very differently in different job situations. When someone performs poorly in a job, I usually only update mildly against them performing well in a different role.
Thanks for this post Lawrence! I agree with it substantially, perhaps entirely.
One other thing that I thing interacts with the difficulty of evaluation in some ways is the fact that many AI safety researchers think that most of the work done by some other researchers is approximately useless, or even net-negative in terms of reducing existential risk. I think it's pretty easy to wrap an evaluation of a research direction or agenda and an evaluation of a particular researcher together. I think this is actually pretty justified for more senior researchers, since presumably an important skill is "research taste", but I think it's also important to acknowledge that this is pretty subjective and that there's substantial disagreement about the utility of different research directions among senior safety researchers. It seems probably good to try and disentangle this when evaluating junior researchers, as much as is possible, and instead try to focus on "core competencies" that are likely to be valuable across a wide range of safety research directions, though even then the evaluation of this can be difficult and noisy, as the OP argues.
I appreciate this post, and vibe a lot!
Different jobs require different skills.
Very strongly agreed, I did 3 different AI Safety internships in different areas, where I think I was fairly mediocre in each, before I found that mech interp was a good fit.
Also strongly agreed on the self-evaluation point, I'm still not sure I really internally believe that I'm good at mech interp, despite having pretty solid confirmation from my research output at this point - I can't really imagine having it before completing my first real project!
I think this post is valuable, thank you for writing it. I especially liked the parts where you (and Beth) talk about historical negative signals. To a certain kind of person, I think that can serve better than anything else as stronger grounding to push back against unjustified updating.
A factor that I think pulls more weight in alignment relative to other domains is the prevalence of low-bandwidth communication channels, given the number of new researchers whose sole interface with the field is online and asynchronous, textual or few-and-far-between calls. Effects from updating too hard on negative evals is probably amplified a lot when those form a bulk of the reinforcing feedback you get at all. To the point where at times for me it's felt like True Bayesian Updating from the inside even as you acknowledge the noisiness of those channels, because there's little counterweight to it.
My experience here probably isn't super standard given that most of the people I've mentored coming into this field aren't located near the Bay Area or London or anywhere else with other alignment researchers, but their sole point of interface to the rest of the field being a sparse opaque section of text has definitely discouraged some far more than anything else.
I think this post made an important point that's still relevant to this day.
If anything, this post is more relevant in late 2024 than in early 2023, as the pace of AI makes ever more people want to be involved, while more and more mentors have moved towards doing object level work. Due to the relative reduction of capacity in evaluating new AIS researcher, there's more reliance on systems or heuristics to evaluate people now than in early 2023.
Also, I find it amusing that without the parenthetical, the title of the post makes another important point: "evals are noisy".
As part of my work at Lightcone I manage an office space with an application for visiting or becoming a member, and indeed many of these points commonly apply to rejection emails I send to people, especially "Most applications just don’t contain that much information" and "Not all relevant skills show up on paper".
I try to include some similar things to the post in the rejection emails we send. In case it's of interest or you have any thoughts, here's the standard paragraph that I include:
Our application process is fairly lightweight and so I don't think a no is a strong judgment about a person's work. If you end up in the future working on new projects that you think are a good fit for Lightcone Offices, you're welcome to apply again. Also if you're ever collaborating on a project with a member of the Lightcone Offices, you can visit with them to work together. Good luck in finding traction on improving the trajectory of human civilization.
At what point do you consider yourself a researcher and not just a noob, or someone who wants to one day become a researcher?
[This is actually a very important question for my self narrative; for how I relate to my AI safety writing, for what standards I expect of myself (is my AI safety writing currently a hobby that I hope to later turn into a job/or should I treat it as a volunteer job?), etc. I don't really have an answer, but I had mostly been thinking of myself as "someone who wants to one day become an AI safety researcher" (2022 shortened my timelines (suddenly I no longer had a decade to learn all the maths and CS before making useful contributions to alignment theory), and so I brought "one day" sooner, but I'm still at best "aspiring" to be one.
Learning that an actual researcher™ I respected was younger than me was a massive slap to my face/wakeup call (we discovered LW at the same age/stage in our lives, so there's a sense in which I have a: "what was I doing with my life all this time?"/felt like I've fallen behind [yeah, I am status brainkilled]).]
Great post. I expect to recommend it at least 10 times this year.
Semi-related point: I often hear people get discouraged when they don't have "good ideas" or "ideas that they believe in" or "ideas that they are confident would actually reduce x-risk." (These are often people who see the technical alignment problem as Hard or Very Hard).
I'll sometimes ask "how many other research agendas do you think meet your bar for "an idea you believe in" or "an idea that you are confident would actually reduce x-risk?" Often, when considering the entire field of technical alignment, their answer is <5 or <10.
While reality doesn't grade on a curve, I think it has sometimes been helpful for people to reframe "I have no good ideas" --> "I believe the problem we are facing is Hard or Very Hard. Among the hundreds of researchers who are thinking about this, I think only a few of them have met the bar that I sometimes apply to myself & my ideas."
(This is especially useful when people are using a harsher bar to evaluate themselves than when they evaluate others, which I think is common).
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
I have been feeling extremely impostery lately and do agree on the critical self-evaluation tendency. For the last month or so I felt entirely stuck with even the idea of an application giving me severe anxiety. Have been overcoming this slightly lately but I think this post and the conversations it caused has made em feel better. Thank you.
TL;DR: Evaluating whether or not someone will do well at a job is hard, and evaluating whether or not someone has the potential to be a great AI safety researcher is even harder. This applies to evaluations from other people (e.g. job interviews, first impressions at conferences) but especially to self-evaluations. Performance is also often idiosyncratic: people who do poorly in one role may do well in others, even superficially similar ones. As a result, I think people should not take rejections or low self confidence so seriously, and instead try more things and be more ambitious in general.
Related work: Hero Licensing, Modest Epistemology, The Alignment Community is Culturally Broken, Status Regulation and Anxious Underconfidence, Touch reality as soon as possible, and many more.
Epistemic status: This is another experiment in writing fast as opposed to carefully. (Total time spent: ~4 hours.) Please don’t injure yourself using this advice.[1]
Introduction: evaluating skill is hard, and most evaluations are done via proxies
I think people in the LessWrong/Alignment Forum space tend to take negative or null evaluations of themselves too seriously.[2] For example, I’ve spoken to a few people who gave up on AI Safety after being rejected from SERI MATS and REMIX; I’ve also spoken to far too many people who are too scared to apply for any position in technical research after having a single negative interaction with a top researcher at a conference. While I think people should be free to give up whenever they want, my guess is that most people internalize negative evaluations too much, and would do better if they did less fretting and more touching reality.
Fundamentally, this is because evaluations of new researchers are noisier than you think. Interviews and applications are not always indicative of the applicant’s current skill. First impressions, even from top researchers, do not always reflect reality. People can perform significantly differently in different work environments, so failing at a single job does not mean that you are incompetent. Most importantly, people can and do improve over time with effort.
In my experience, a lot of updating so hard on negative examples comes from something like anxious underconfidence as opposed to reasoned arguments. It’s always tempting to confirm your own negative evaluations of yourself. And if you’re looking for reasons why you’re not “good enough” in order to handicap yourself, being convinced that one particular negative evaluation is not the end of the world will just make you overupdate on a different negative evaluation. Accordingly, I think it’s important to take things a little less seriously, be willing to try more things, and let your emotions more accurately reflect your situation.
Of course, that’s not to say that you should respond to any negative sign by pushing yourself even harder; it’s okay to take time to recover when things don’t go well. But I strongly believe that people in the community give up a bit too easily, and are a bit too scared to apply to jobs and opportunities. In some cases, people give up even before the first external negative evaluation: they simply evaluate themselves negatively in their head, and then give up. Instead of doing this, you should try your best and put yourself out there, and let reality be the judge.
My personal experience
I’m always pretty hesitant to use myself as an example, both because I’m not sure I’m “good enough” to qualify, and also because I think people should aspire to do better than I have. That being said, in my case:
I’m currently a researcher at the Alignment Research Center’s Evaluations team, was previously at Redwood Research and a PhD student at CHAI, have received offers from other AI labs, on the board of FAR, and have been involved in 5+ papers I’m pretty proud of in the past year.
In the past, I’ve had a bunch of negative signs and setbacks:
I also don’t think my case (or Beth’s case below) was particularly unusual; several other AI safety researchers have had similar experiences. So empirically, it’s definitely not the case that a few negative evaluations mean that you cannot ever become an AI safety researcher.
Why exactly are common evaluations so noisy?
Previously, I mentioned three common evaluation methods—-interviews/job applications, first impressions from senior researchers, and jobs/work trial tasks—and claimed that they tend to be noisy. Here, I’ll expand on why each evaluation method can be noisy in detail, even in cases where all parties are acting in good faith.
This section is pretty long and rambly; feel free to skip to the next header if you feel like you’ve got the point already.
Bootcamp/Funding/Job Applications
By far the most common negative evaluation that most people receive is being rejected from a job or bootcamp, or having a funding application denied. While this is pretty disheartening, there’s a few reasons why a rejection may not be as informative as you might expect:
At the end of the day, not every denied application will come with a clearly denominated reason. I’d strongly recommend against immediately slapping on “the reason is because I’m bad” to every rejection.
First impressions at parties/conferences/workshops
Insofar as applications don’t accurately reflect your skill or abilities, first impressions in social settings such as parties, conferences, and workshops are even worse.
Yes, having negative social interactions always sucks. But a few negative interactions, even with famous or senior researchers, is not a particularly strong sign that you’re not cut out to be an AI researcher.
Job Performance
It’s definitely true that poor job performance at a research-y job (or even a long work trial) is more of a signal than a rejection or a negative first impression. That being said, I don’t think it’s necessarily that strong of a signal, for the following reasons:
In my case, I think all four of the reasons applied to some extent for the last two years of my PhD: my skills were not super suited to academia, I was depressed in part due to COVID, I had significantly worse executive function, and I don’t think I enjoyed the academic culture at Berkeley very much. Again, while being let go from a job (or leaving due to poor performance) is definitely a negative sign, I think it’s nowhere near fatal for one’s research ambitions in itself.
Yes, this includes your evaluations as well.
In practice, people seem more hampered by their own self-assessments, more so than any external negative evaluations. I think a significant fraction of people I’ve met in this community have suffered from some form or another of imposter syndrome. I’ve also consistently been surprised by how often people fail to apply for jobs they’re clearly qualified for, and that would like to hire them.
It’s certainly true that you have significantly more insight into yourself than any external evaluator. Empirically, I think that new researchers tend to be pretty poorly calibrated about how well they’d do in research later on, often underperforming even simple outside view heuristics.
Why might self-assessments also suffer from significant noise?
Of course, I think people should aspire to have good models of themselves. But especially if you’re just starting out as a researcher, my guess is your model of your own abilities is probably relatively bad, and I would not update too much off of your self-assessments.
On anxious underconfidence and self-handicapping
More speculatively, I think the tendency for people to over update on noisy negative evaluations is caused in large part due to a combination of anxiety and a desire to self-handicap. AI safety research is often quite difficult, and it’s understandable to feel scared or underconfident when starting your research journey.[4] And if you believe that such research is important and also feel daunted about whether or not you can contribute at all, it can be tempting to avoid touching reality or even self-handicapping to get an excuse for failure. After all, if your expectations are sufficiently low, you won’t ever be disappointed.
I don’t think this dynamic happens at a conscious level for most people. Instead, my guess is that most people develop it due to status regulation or due to small flinches from uncomfortable events. That being said, I do think it’s worth consciously pushing back against this!
What does this mean you should do?
You should touch reality as soon as possible, and try to get evidence on the precise concern or question you have. Instead of worrying about whether or not you can do something, or trying to extract the most out of the few bits of evidence you have, go gather more evidence! Try to learn the skills you think you don’t have, try to apply for some jobs or programs you think definitely won’t take you, and try to do the research you think you can’t do.
I also find that I spend way more time encouraging people to be more ambitious than the other way around. So on average, I’d probably also recommend trying hard on the project that interests you, and being more willing to take risks with your career.
That being said, I want to end this piece by reiterating the law of equal and opposite advice. While I suspect the majority of people should push themselves a bit harder to do ambitious things, this advice is precisely the opposite of what many people need to hear. There are many other valuable things you could be doing. If you’re currently doing an impactful job that you really enjoy, you should probably stick to it. And if you find that you’re already pushing yourself quite hard, and additional effort in this direction will hurt you, please stop. It’s okay to take it easy. It’s okay to rest. It’s okay to do what you need to do to be happy. Please don’t injure yourself using this advice.[5]
Acknowledgments
Thanks to Beth Barnes for inspiring this post and contributing her experiences in the appendix, and to Adrià Garriga-Alonso, Erik Jenner, Rachel Freedman, and Adam Gleave for feedback.
Appendix: testimonials from other researchers
After writing the post, several other researchers reached out with additional evidence that they've given me evidence to post:
Addendum from Beth Barnes
Soon after writing the post, Beth Barnes reached out and gave me permission to post about her experiences:
Addendum from Scott Emmons
Scott Emmons, a PhD student at UC Berkeley's CHAI, gave me permission to share the following:
Addendum from anonymous senior AGI safety researcher
Finally, a senior AGI safety researcher (who wishes to remain anonymous) sent me the following:
I think this probably also applies in general, but I’m much less sure than in the case of AI research. As always, the law of equal and opposite advice applies. It’s okay to take it easy, and to do what you need to do to recover. I also don’t think that everyone should aim to be an AI safety researcher – my focus is on this field because it’s what I’m most familiar with. If you’ve found something else you’re good at, you probably should keep doing it.
I also think there’s a separate problem, where people take positive evaluations of their peers way too seriously. E.g. people seem to noticeably change in attitude if you mention you’ve worked with a high status person at some point in your life. I claim that this is also very bad, but it’s not the focus of the post.
This also happens to a comical extent with papers at conferences. E.g. Neel Nanda's grokking work was rejected twice from arXiv (!) but an updated version got a spotlight at ICLR. Redwood's adversarial training paper got a 3, a 5, and a 9 for its initial reviews. In fact, I know of several papers that got orals at conferences, that were rejected entirely from the previous conference.
I also feel like this is exacerbated by several social dynamics in the Bay Area, which I might eventually write a post about.
If there’s significant interest or if I feel like people are taking this advice too far, I’ll write a followup post giving the opposite advice.