https://www.elilifland.com/. You can give me anonymous feedback here. I often change my mind and don't necessarily endorse past writings.
Yeah I meant more on p(doom)/alignment difficulty than timelines, I'm not sure what your guys' timelines are. I'm roughly in the 35-55% ballpark for a misaligned takeover, and my impression is that you all are closer to but not necessarily all the way at the >90% Eliezer view. If that's also wrong I'll edit to correct.
edit: oh maybe my wording of "farther" in the original comment was specifically confusing and made it sound like I was talking about timelines. I will edit to clarify.
Appreciate the post. I've previously donated $600 through the EA Manifund thing and will consider donating again late this year / early next year when thinking through donations more broadly.
I've derived lots of value with regards to thinking through AI futures from LW/AIAF content (some non-exhaustive standouts: 2021 MIRI conversations, List of Lethalities and Paul response, t-AGI framework, Without specific countermeasures..., Hero Licensing). It's unclear to me how much of the value would have been retained if LW didn't exist, but plausibly LW is responsible for a large fraction.
In a few ways I feel not fully/spiritually aligned with the LW team and the rationalist community: my alignment difficulty/p(doom()[1] is farther from Eliezer's[2] than my perception of the median of the LW team[3] (though closer to Eliezer than most EAs), I haven't felt sucked in by most of Eliezer's writing, and I feel gut level cynical about people's ability to deliberatively improve their rationality (edit: with large effect size) (I haven't spent a long time examining evidence to decide whether I really believe this).
But still LW has probably made a large positive difference in my life, and I'm very thankful. I've also enjoyed Lighthaven, but I have to admit I'm not very observant and opinionated on conference venues (or web design, which is why I focused on LW's content).
Twitter AI (xAI), which seemingly had no prior history of strong AI engineering, with a small team and limited resources
Both of these seem false.
Re: talent, see from their website:
They don't list their team on their site, but I know their early team includes Igor Babuschkin who has worked at OAI and DeepMind, and Christian Szegedy who has 250k+ citations including several foundational papers.
Re: resources, according to Elon's early July tweet (ofc take Elon with a grain of salt) Grok 2 was trained on 24k H100s (approximately 3x the FLOP/s of GPT-4, according to SemiAnalysis). And xAI was working on a 100k H100 cluster that was on track to be finished in July. Also they raised $6B in May.
And internally, we have an anonymous RSP non-compliance reporting line so that any employee can raise concerns about issues like this without any fear of retaliation.
Are you able to elaborate on how this works? Are there any other details about this publicly, couldn't find more detail via a quick search.
Some specific qs I'm curious about: (a) who handles the anonymous complaints, (b) what is the scope of behavior explicitly (and implicitly re: cultural norms) covered here, (c) handling situations where a report would deanonymize the reporter (or limit them to a small number of people)?
Thanks for the response!
I also expect that if we did develop some neat new elicitation technique we thought would trigger yellow-line evals, we'd re-run them ahead of schedule.
[...]
I also think people might be reading much more confidence into the 30% than is warranted; my contribution to this process included substantial uncertainty about what yellow-lines we'd develop for the next round
Thanks for these clarifications. I didn't realize that the 30% was for the new yellow-line evals rather than the current ones.
Since triggering a yellow-line eval requires pausing until we have either safety and security mitigations or design a better yellow-line eval with a higher ceiling, doing so only risks the costs of pausing when we could have instead prepared mitigations or better evals
I'm having trouble parsing this sentence. What you mean by "doing so only risks the costs of pausing when we could have instead prepared mitigations or better evals"? Doesn't pausing include focusing on mitigations and evals?
From the RSP Evals report:
As a rough attempt at quantifying the elicitation gap, teams informally estimated that, given an additional three months of elicitation improvements and no additional pretraining, there is a roughly 30% chance that the model passes our current ARA Yellow Line, a 30% chance it passes at least one of our CBRN Yellow Lines, and a 5% chance it crosses cyber Yellow Lines. That said, we are currently iterating on our threat models and Yellow Lines so these exact thresholds are likely to change the next time we update our Responsible Scaling Policy.
What's the minimum X% that could replace 30% and would be treated the same as passing the yellow line immediately, if any? If you think that there's an X% chance that with 3 more months of elicitation, a yellow line will be crossed, what's the decision-making process for determining whether you should treat it as already being crossed?
In the RSP it says "It is important that we are evaluating models with close to our best capabilities elicitation techniques, to avoid underestimating the capabilities it would be possible for a malicious actor to elicit if the model were stolen" so it seems like folding in some forecasted elicited capabilities into the current evaluation would be reasonable (though they should definitely be discounted the further out they are).
(I'm not particularly concerned about catastrophic risk from the Claude 3 model family, but I am interested in the general policy here and the reasoning behind it)
The word "overconfident" seems overloaded. Here are some things I think that people sometimes mean when they say someone is overconfident:
How much does this overloading matter? I'm not sure, but one worry is that it allows people to score cheap rhetorical points by claiming someone else is overconfident when in practice they might mean something like "your probability distribution is wrong in some way". Beware of accusing someone of overconfidence without being more specific about what you mean.
I think 356 or more people in the population needed to make there be a >5% of 2+ deaths in a 2 month span from that population
[cross-posting from blog]
I made a spreadsheet for forecasting the 10th/50th/90th percentile for how you think GPT-4.5 will do on various benchmarks (given 6 months after the release to allow for actually being applied to the benchmark, and post-training enhancements). Copy it here to register your forecasts.
If you’d prefer, you could also use it to predict for GPT-5, or for the state-of-the-art at a certain time e.g. end of 2024 (my predictions would be pretty similar for GPT-4.5, and end of 2024).
You can see my forecasts made with ~2 hours of total effort on Feb 17 in this sheet; I won’t describe them further here in order to avoid anchoring.
There might be a similar tournament on Metaculus soon, but not sure on the timeline for that (and spreadsheet might be lower friction). If someone wants to take the time to make a form for predicting, tracking and resolving the forecasts, be my guest and I’ll link it here.
Thanks. I edited again to be more precise. Maybe I'm closer to the median than I thought.
(edit: unimportant clarification. I just realized "you all" may have made it sound like I thought every single person on the Lightcone team was higher than my p(doom). I meant it to be more like a generic y'all to represent the group, not a claim about the minimum p(doom) of the team)