(I'd be curious what people at the orgs think about this one!)
For DeepMind:
my impression is that it is still clunky to do a lot of things with a large model when you are at an org
Mostly false: it's clunky inasmuch as working with large models is generally clunky (e.g. they may not fit on a single device), but even then for common use cases you can use the solution that other people wrote.
things like retraining are obviously very expensive
True
these orgs seem to also favor large group projects
True (relative to academia)
which I assume are directed by leadership
Mostly false
because LW/AF do not have established standards of rigor like ML, they end up operating more like a less-functional social science field, where (I've heard) trends, personality, and celebrity play an outsized role in determining which research is valorized by the field.
In addition, the AI x-safety field is now rapidly expanding.
There is a huge amount of status to be collected by publishing quickly and claiming large contributions.
In the absence of rigor and metrics, the incentives are towards:
- setting new research directions, and inventing new cool terminology;
- using mathematics in a way that impresses, but is too low-level to yield a useful claim;
- and vice versa, relying too much on complex philosophical insights without empirical work;
- getting approval from alignment research insiders.
See also the now ancient Troubling Trends in Machine Learning Scholarship.
I expect the LW/AF community microcosm will soon reproduce many of of those failures.
On the other hand, the current community believes that getting AI x-safety right is the most important research question of all time. Most people would not publish something just for their career advancement, if it meant sucking oxygen from more promising research directions.
This might be a mitigating factor for my comment above. I am curious about what happened research fields which had "change/save the world' vibes. Was environmental science immune to similar issues?
I actually agree that empirical work generally outperforms theoretical work or philosophical work, but in that tweet thread I question why he suggests the Turing Test as relating anything to x-risk.
Work that is still outside the academic Overton window can be brought into academia if it can be approached with the technical rigor of academia, and work that meets academic standards is much more valuable than work that doesn't; this is both because it can be picked up by the ML community, and because it's much harder to tell if you are making meaningful progress if your work doesn't meet these standards of rigor.
Strong agreement with this! I'm frequently told by people that you "cannot publish" on a certain area, but in my experience this is rarely true. Rather, you have to put more work into communicating your idea, and justifying the claims you make -- both a valuable exercise! Of course you'll have a harder time publishing than on something that people immediately understand -- but people do respect novel and interesting work, so done well I think it's much better for your career than one might naively expect.
I especially wish there was more emphasis on rigor on the Alignment Forum and elsewhere: it can be valuable to do early-stage work that's more sloppy (rigor is slow and expensive), but when there's long-standing disagreements it's usually better to start formalizing things or performing empirical work than continuing to opine.
That said, I do think academia has some systemic blindspots. For one, I think CS is too dismissive of speculative and conceptual research -- much of this work will end up being mistaken admittedly, but it's an invaluable source of ideas. I also think there's too much emphasis on an "algorithmic contribution" in ML, which leads to undervaluing careful empirical valuations and understanding failure modes of existing systems.
Presumably "too dismissive of speculative and conceptual research" is a direct consequence of increased emphasis on rigor. Rigor is to be preferred all else being equal, but all else is not equal.
It's not clear to me how we can encourage rigor where effective without discouraging research on areas where rigor isn't currently practical. If anyone has ideas on this, I'd be very interested.
I note that within rigorous fields, the downsides of rigor are not obvious: we can point to all the progress made; progress that wasn't made due to the neglect of conceptual/speculative research is invisible. (has the impact of various research/publication norms ever been studied?)
Further, it seems limiting only to consider [must always be rigorous (in publications)] vs [no demand for rigor]. How about [50% of your publications must be rigorous] (and no incentive to maximise %-of-rigorous-publications), or any other not-all-or-nothing approach?
I'd contrast rigor with clarity here. Clarity is almost always a plus.
I'd guess that the issue in social science fields isn't a lack of rigor, but rather of clarity. Sometimes clarity without rigor may be unlikely, e.g. where there's a lot of confusion or lack of good faith - in such cases an expectation of rigor may help. I don't think this situation is universal.
What we'd want on LW/AF is a standard of clarity.
Rigor is an often useful proxy. We should be careful when incentivizing proxies.
I think rigor and clarity are more similar than you indicate. I mostly think of rigor as either (i) formal definitions and proofs, or (ii) experiments well described, executed, and interpreted. I think it's genuinely hard to reach a high level of clarity about many things without (i) or (ii). For instance, people argue about "optimization", but without referencing (hypothetical) detailed experiments or formal notions, those arguments just won't be very clear; experimental or definitional details just matter a lot, and this is very often the case in AI. LW has historically endorsed a bunch of arguments that are basically just wrong because they have a crucial reliance on unstated assumptions (e.g. AIs will be "rational agents"), and ML looks at this and concludes people on LW are at the peak of "mount stupid".
Minor, but Dunning-Kruger neither claims to detect a Mount Stupid effect nor (probably) is the study powered enough to detect it.
Very good to know! I guess in the context of my comment it doesn't matter as much because I only talk about others' perception.
I think I would support Joe's view here that clarity and rigour are significantly different... but maybe - David - your comments are supposed to be specific to alignment work? e.g. I can think of plenty of times I have read books or articles in other areas and fields that contain zero formal definitions, proofs, or experiments but are obviously "clear", well-explained, well-argued etc. So by your definitions is that not a useful and widespread form of rigour-less clarity? (One that we would want to 'allow' in alignment work?) Or would you instead maintain that such writing can't ever really be clear without proofs or experiments?
I tend to think one issue is more that it's really hard to do well (clear, useful, conceptual writing that is) and that many of the people trying to do it in alignment come from are inexperienced in doing it (and often have backgrounds in fields where things like proofs or experiments are the norm).
(To be clear, I think a lot of these arguments are pointing at important intuitions, and can be "rescued" via appropriate formalizations and rigorous technical work).
Mostly I agree with this.
I have more thoughts, but probably better to put them in a top-level post - largely because I think this is important and would be interested to get more input on a good balance.
A few thoughts on LW endorsing invalid arguments:
I'd want to separate considerations of impact on [LW as collective epistemic process] from [LW as outreach to ML researchers]. E.g. it doesn't necessarily seem much of a problem for the former to have reliance on unstated assumptions. I wouldn't formally specify an idea before sketching it, and it's not clear to me that there's anything wrong with collective sketching (so long as we know we're sketching - and this part could certainly be improved).
I'd first want to optimize the epistemic process, and then worry about the looking foolish part. (granted that there are instrumental reasons not to look foolish)
On ML's view, are you mainly thinking of people who may do research on an important x-safety sub-problem without necessarily buying x-risk arguments? It seems unlikely to me that anyone gets persuaded of x-risk from the bottom up, whether or not the paper/post in question is rigorous - but perhaps this isn't required for a lot of useful research?
I'd want to separate considerations of impact on [LW as collective epistemic process] from [LW as outreach to ML researchers]
Yeah I put those in one sentence in my comment but I agree that they are two separate points.
RE impact on ML community: I wasn't thinking about anything in particular I just think the ML community should have more respect for LW/x-safety, and stuff like that doesn't help.
It's not clear to me how we can encourage rigor where effective without discouraging research on areas where rigor isn't currently practical. If anyone has ideas on this, I'd be very interested.
A rough heuristic I have is that if the idea you're introducing is highly novel, it's OK to not be rigorous. Your contribution is bringing this new, potentially very promising, idea to people's attention. You're seeking feedback on how promising it really is and where people are confused , which will be helpful for then later formalizing it and studying it more rigorously.
But if you're engaging with a large existing literature and everyone seems to be confused and talking past each other (which I'd characterize a significant fraction of the mesa-optimization literature, for example) -- then the time has come to make things more rigorous, and you are unlikely to make much further progress without it.
I think part of this has to do with growing pains in the LW/AF community... When it was smaller it was more like an ongoing discussion with a few people and signal-to-noise wasn't as important, etc.
Agree RE systemic blindspots, although the "algorithmic contribution" thing is sort of a known issue that a lot of senior people disagree with, IME.
As someone who has been feeling increasingly skeptical of working in academia I really appreciate this post and discussion on it for challenging some of my thinking here.
I do want to respond especially to this part though, which seems cruxy to me:
Furthermore, it is a mistake to simply focus on efforts on whatever timelines seem most likely; one should also consider tractability and neglectedness of strategies that target different timelines. It seems plausible that we are just screwed on short timelines, and somewhat longer timelines are more tractable. Also, people seem to be making this mistake a lot and thus short timelines seem potentially less neglected.
I suspect this argument pushes in the other direction. On longer timelines the amount of effort which will eventually get put toward the problem is much greater. If the community continues to grow at the current pace, then 20 year timeline worlds might end up seeing almost 1000x as much effort put toward the problem in total than 5 year timeline worlds. So neglectedness considerations might tell us that impacts on 5 year timeline worlds are 1000x more important than impacts on 20 year timeline worlds. This is of course mitigated by the potential for your actions to accrue more positive knock-on effects over 20 years, for instance very effective field building efforts could probably overcome this neglectedness penalty in some cases. But in terms of direct impacts on different timeline scenarios this seems like a very strong effect.
On the tractability point, I suspect you need some overly confident model of how difficult alignment turns out to be for this to overcome the neglectedness penalty. E.g. Owen Cotton-Barret suggests here using a log-uniform prior for the difficulty of unknown problems, which (unless you think alignment success in short timelines is essentially impossible) would indicate that tractability is constant. Using a less crude approximation we might use something like a log-normal distribution for the difficulty of solving alignment, where we see overall decreasing returns to effort unless you have extremely low variance (implying you know almost exactly which OOM of effort is enough to solve alignment) or extremely low probability of success by default (<< 1%).
Overall my current guess is that tractability/neglectedness pushes toward working on short timelines, and gives a penalty to delayed impact of perhaps 10x per decade (20x penalty from neglectedness, compensated by a 2x increase in tractability).
If you think that neglectedness/tractability overall pushes toward targeting impact toward long timelines then I'd be curious to see that spelled out more clearly (e.g. as a distribution over the difficulty of solving alignment that implies some domain of increasing returns to effort, or some alternative way to model this). This seems very important if true.
I had independently thought that this is one of the main parts where I disagree with the post, and wanted to write up a very similar comment to yours. Highly relevant link: https://www.fhi.ox.ac.uk/wp-content/uploads/Allocating-risk-mitigation.pdf My best guess would have been maybe 3-5x per decade, but 10x doesn't seem crazy.
I've been an assistant professor (equivalent) for ~1 year now at Cambridge. Shortly after accepting the position, I wrote AI x-risk reduction: why I chose academia over industry.
Since then, I've had a lot of conversations on academia vs. industry with people getting into AI x-safety (e.g. considering applying for PhDs). This post summarizes that experience, and describes a few other updates from my experience in the last 1.5 years.
Summary of recent conversations:
Other updates: