https://www.elilifland.com/. You can give me anonymous feedback here. I often change my mind and don't necessarily endorse past writings.
A few possible categories of situations we might have long timelines, off the top of my head:
^ this taxonomy is not comprehensive, just things I came up with quickly. Might be missing something that would be good.
To cop out answer your question, I feel like if I were making a long-timelines argument I'd argue that all 3 of those would be ways of forecasting to give weight to, then aggregate. If I had to choose just one I'd probably still go with (1) though.
edit: oh there's also the "defer to AI experts" argument. I mostly try not to think about deference-based arguments because thinking on the object-level is more productive, though I think if I were really trying to make an all-things-considered timelines distribution there's some chance I would adjust to longer due to deference arguments (but also some chance I'd adjust toward shorter, given that lots of people who have thought deeply about AGI / are close to the action have short timelines).
There's also "base rate of super crazy things happening is low" style arguments which I don't give much weight to.
For context in a sibling comment Ryan said and Steven agreed with:
It sounds like your disagreement isn't with drawing a link from RE-bench to (forecasts for) automating research engineering, but is instead with thinking that you can get AGI shortly after automating research engineering due to AI R&D acceleration and already being pretty close. Is that right?
Note that the comment says research engineering, not research scientists.
Now responding on whether I think the no new paradigms assumption is needed:
(Obviously you’re entitled to argue / believe that we don’t need need new AI paradigms and concepts to get to AGI! It’s a topic where I think reasonable people disagree. I’m just suggesting that it’s a necessary assumption for your argument to hang together, right?)
I generally have not been thinking in these sorts of binary terms but instead thinking in terms more like "Algorithmic progress research is moving at pace X today, if we had automated research engineers it would be sped up to N*X." I'm not necessarily taking a stand on whether the progress will involve new paradigms or not, so I don't think it requires an assumption of no new paradigms.
However:
I'd also add that:
Here's the structure of the argument that I am most compelled by (I call it the benchmarks + gaps argument), I'm uncertain about the details.
One reason I like this argument is that it will get much stronger over time as we get more difficult benchmarks and otherwise get more data about how quickly the gaps are being crossed.
I have a longer draft which makes this argument but it's quite messy and incomplete and might not add much on top of the above summary for now. Unfortunately I'm prioritizing other workstreams over finishing this at the moment. DM me if you'd really like a link to the messy draft.
Thanks. I edited again to be more precise. Maybe I'm closer to the median than I thought.
(edit: unimportant clarification. I just realized "you all" may have made it sound like I thought every single person on the Lightcone team was higher than my p(doom). I meant it to be more like a generic y'all to represent the group, not a claim about the minimum p(doom) of the team)
Yeah I meant more on p(doom)/alignment difficulty than timelines, I'm not sure what your guys' timelines are. I'm roughly in the 35-55% ballpark for a misaligned takeover, and my impression is that you all are closer to but not necessarily all the way at the >90% Eliezer view. If that's also wrong I'll edit to correct.
edit: oh maybe my wording of "farther" in the original comment was specifically confusing and made it sound like I was talking about timelines. I will edit to clarify.
Appreciate the post. I've previously donated $600 through the EA Manifund thing and will consider donating again late this year / early next year when thinking through donations more broadly.
I've derived lots of value with regards to thinking through AI futures from LW/AIAF content (some non-exhaustive standouts: 2021 MIRI conversations, List of Lethalities and Paul response, t-AGI framework, Without specific countermeasures..., Hero Licensing). It's unclear to me how much of the value would have been retained if LW didn't exist, but plausibly LW is responsible for a large fraction.
In a few ways I feel not fully/spiritually aligned with the LW team and the rationalist community: my alignment difficulty/p(doom()[1] is farther from Eliezer's[2] than my perception of the median of the LW team[3] (though closer to Eliezer than most EAs), I haven't felt sucked in by most of Eliezer's writing, and I feel gut level cynical about people's ability to deliberatively improve their rationality (edit: with large effect size) (I haven't spent a long time examining evidence to decide whether I really believe this).
But still LW has probably made a large positive difference in my life, and I'm very thankful. I've also enjoyed Lighthaven, but I have to admit I'm not very observant and opinionated on conference venues (or web design, which is why I focused on LW's content).
Twitter AI (xAI), which seemingly had no prior history of strong AI engineering, with a small team and limited resources
Both of these seem false.
Re: talent, see from their website:
They don't list their team on their site, but I know their early team includes Igor Babuschkin who has worked at OAI and DeepMind, and Christian Szegedy who has 250k+ citations including several foundational papers.
Re: resources, according to Elon's early July tweet (ofc take Elon with a grain of salt) Grok 2 was trained on 24k H100s (approximately 3x the FLOP/s of GPT-4, according to SemiAnalysis). And xAI was working on a 100k H100 cluster that was on track to be finished in July. Also they raised $6B in May.
And internally, we have an anonymous RSP non-compliance reporting line so that any employee can raise concerns about issues like this without any fear of retaliation.
Are you able to elaborate on how this works? Are there any other details about this publicly, couldn't find more detail via a quick search.
Some specific qs I'm curious about: (a) who handles the anonymous complaints, (b) what is the scope of behavior explicitly (and implicitly re: cultural norms) covered here, (c) handling situations where a report would deanonymize the reporter (or limit them to a small number of people)?
Thanks for the response!
I also expect that if we did develop some neat new elicitation technique we thought would trigger yellow-line evals, we'd re-run them ahead of schedule.
[...]
I also think people might be reading much more confidence into the 30% than is warranted; my contribution to this process included substantial uncertainty about what yellow-lines we'd develop for the next round
Thanks for these clarifications. I didn't realize that the 30% was for the new yellow-line evals rather than the current ones.
Since triggering a yellow-line eval requires pausing until we have either safety and security mitigations or design a better yellow-line eval with a higher ceiling, doing so only risks the costs of pausing when we could have instead prepared mitigations or better evals
I'm having trouble parsing this sentence. What you mean by "doing so only risks the costs of pausing when we could have instead prepared mitigations or better evals"? Doesn't pausing include focusing on mitigations and evals?
This is clarifying for me, appreciate it. If I believed (a) that we needed a paradigm shift like the ones to LLMs in order to get AI systems resulting in substantial AI R&D speedup, and (b) that trend extrapolation from benchmark data would not be informative for predicting these paradigm shifts, then I would agree that the benchmarks + gaps method is not particularly informative.
Do you think that's a fair summary of (this particular set of) necessary conditions?
(edit: didn't see @Daniel Kokotajlo's new comment before mine. I agree with him regarding disagreeing with both sub-claims but I think I have a sense of where you're coming from.)