In terms of explicit claims:
"So one extreme side of the spectrum is build things as fast as possible, release things as much as possible, maximize technological progress [...].
The other extreme position, which I also have some sympathy for, despite it being the absolutely opposite position, is you know, Oh my god this stuff is really scary.
The most extreme version of it was, you know, we should just pause, we should just stop, we should just stop building the technology for, indefinitely, or for some specified period of time. [...] And you know, that extre...
This example is not a claim by ARC though, seems important to keep track of this in a discussion of what ARC did or didn't claim, even as others making such claims is also relevant.
Thanks for the kind feedback! Any suggestions for a more interesting title?
Apologies for the 404 on the page, it's an annoying cache bug. Try to hard refresh your browser page (CMD + Shift + R) and it should work.
The "1000" instead of "10000" was a typo in the summary.
In the transcript Connor states "SLT over the last 10000 years, yes, and I think you could claim the same over the last 150". Fixed now, thanks for flagging!
Which one? All of them seem to be working for me.
Pessimism of the intellect, optimism of the will.
Calibration of the intellect, optimism of the will.
People from OpenPhil, FTX FF and MIRI were not interested in discussing at the time. We also talked with MIRI about moderating, but it didn't work out in the end.
People from Anthropic told us their organization is very strict on public communications, and very wary of PR risks, so they did not participate in the end.
In the post I over generalized to not go into full details.
Yes, some people mentioned it was confusing to have two posts (I had originally posted two separate ones for Summary and Transcript due to them being very lengthy) so I merged them in one, and added headers pointing to Summary and Transcript for easier navigation.
Thanks, I was looking for a way to do that but didn't know the space in italics hack!
Another formatting question: how do I make headers and sections collapsible? It would be great to have the "Summary" and "Transcript" sections as collapsible, considering how long the post is.
Thanks, fixed them!
I really don't think that AI dungeon was the source of this idea (why do you think that?)
We've heard the story from a variety of sources all pointing to AI Dungeon, and to the fact that the idea was kept from spreading for a significant amount of time. This @gwern Reddit comment, and previous ones in the thread, cover the story well.
...And even granting the claim about chain of thought, I disagree about where current progress is coming from. What exactly is the significant capability increase from fine-tuning models to do chain of thought? This isn't part of
We'd maybe be at our current capability level in 2018, [...] the world would have had more time to respond to the looming risk, and we would have done more good safety research.
It’s pretty hard to predict the outcome of “raising awareness of problem X” ahead of time. While it might be net good right now because we’re in a pretty bad spot, we have plenty of examples from the past where greater awareness of AI risk has arguably led to strongly negative outcomes down the line, due to people channeling their interest in the problem into somehow pushing capabilities even faster and harder.
My view is that progress probably switched from being net positive to net negative (in expectation) sometime around GPT-3.
We fully agree on this, and so it seems like we don’t have large disagreements on externalities of progress. From our point of view, the cutoff point was probably GPT-2 rather than 3, or some similar event that established the current paradigm as the dominant one.
Regarding the rest of your comment and your other comment here, here are some reasons why we disagree. It’s mostly high level, as it would take a lot of detailed discussion int...
1. Fully agree and we appreciate you stating that.
2. While we are concerned about capability externalities from safety work (that’s why we have an infohazard policy), what we are most concerned about, and that we cover in this post, is deliberate capabilities acceleration justified as being helpful to alignment. Or, to put this in reverse, using the notion that working on systems that are closer to being dangerous might be more fruitful for safety work, to justify actively pushing the capabilities frontier and thus accelerating the arrival of the dangers t...
Good point, and I agree progress has been slower in robotics compared to the other areas.
I just edited the post to add better examples (DayDreamer, VideoDex and RT-1) of recent robotics advances that are much more impressive than the only one originally cited (Boston Dynamics), thanks to Alexander Kruel who suggested them on Twitter.
Was Dario Amodei not the former head of OpenAI’s safety team?
He wrote "Concrete Problems in AI Safety".
I don't see how the claim isn't just true/accurate.
If someone reads "Person X is Head of Safety", they wouldn't assume that the person led the main AI capabilities efforts of the company for the last 2 years.
Only saying "head of the safety team" implies that this was his primary activity at OpenAI, which is just factually wrong.
According to his LinkedIn, from 2018 until end of 2020, when he left, he was Director of Research and then VP of Research o...
...Your graph shows "a small increase" that represents progress that is equal to an advance of a third to a half the time left until catastrophe on the default trajectory. That's not small! That's as much progress as everyone else combined achieves in a third of the time till catastrophic models! It feels like you'd have to figure out some newer efficient training that allows you to get GPT-3 levels of performance with GPT-2 levels of compute to have an effect that was plausibly that large.
In general I wish you would actually write down equations for your mod
...I think this is pretty false. There's no equivalent to Let's think about slowing down AI, or a tag like Restrain AI Development (both of which are advocating an even stronger claim than just "caution") -- there's a few paragraphs in Paul's post, one short comment by me, and one short post by Kaj. I'd say that hardly any optimization has gone into arguments to AI safety researchers for advancing capabilities.
[...]
(I agree in the wider world there's a lot more optimization for arguments in favor of capabilities progress that people in general would fin
Anthropic’s founding team consists of, specifically, people who formerly led safety and policy efforts at OpenAI
This claim seems misleading at best: Dario, Anthropic's founder and CEO, led OpenAI's work on GPT-2 and GPT-3, two crucial milestone in terms of public AI capabilities.
Given that I don't have much time to evaluate each claim one by one, and Gell-Mann amnesia, I am a bit more skeptical of the other ones.
Thanks! Do you still think the "No AIs improving other AIs" criterion is too onerous after reading the policy enforcing it in Phase 0?
In that policy, we developed the definition of "found systems" to have this measure only apply to AI systems found via mathematical optimization, rather than AIs (or any other code) written by humans.
This reduces the cost of the policy significantly, as it applies only to a very small subset of all AI activities, and leaves most innocuous software untouched.