Epistemic status: Written for Blog Post Day III. I don't get to talk to people "in the know" much, so maybe this post is obsolete in some way.
I think that at some point at least one AI project will face an important choice between deploying and/or enlarging a powerful AI system, or holding back and doing more AI safety research.
(Currently, AI projects face choices like this all the time, except they aren't important in the sense I mean it, because the AI isn't potentially capable of escaping and taking over large parts of the world, or doing something similarly bad.)
Moreover, I think that when this choice is made, most people in the relevant conversation will be insufficiently concerned/knowledgeable about AI risk. Perhaps they will think: "This new AI design is different from the classic models, so the classic worries don't arise." Or: "Fear not, I did [insert amateur safety strategy]."
I think it would be very valuable for these conversations to end with "OK, we'll throttle back our deployment strategy for a bit so we can study the risks more carefully," rather than with "Nah, we're probably fine, let's push ahead." This buys us time. Say it buys us a month. A month of extra time right after scary-powerful AI is created is worth a lot, because we'll have more serious smart people paying attention, and we'll have more evidence about what AI is like. I'd guess that a month of extra time in a situation like this would increase the total amount of quality-weighted AI safety and AI policy work by 10%. That's huge.
One way to prepare for these conversations is to raise awareness about AI risk and technical AI safety problems, so that it's more likely that more people in these conversations are more informed about the risks. I think this is great.
However, there's another way to prepare, which I think is tractable and currently neglected:
1. Identify some people who might be part of these conversations, and who already are sufficiently concerned/knowledgeable about AI risk.
2. Help them prepare for these conversations by giving them resources, training, and practice, as needed:
2a. Resources:
Perhaps it would be good to have an Official List of all the AI safety strategies, so that whatever rationale people give for why this AI is safe can be compared to the list. (See this prototype list.)
Perhaps it would be good to have an Official List of all the AI safety problems, so that whatever rationale people give for why this AI is safe can be compared to the list, e.g. "OK, so how does it solve outer alignment? What about mesa-optimizers? What about the malignity of the universal prior? I see here that your design involves X; according to the Official List, that puts it at risk of developing problems Y and Z..." (See this prototype list.)
Perhaps it would be good to have various important concepts and arguments re-written with an audience of skeptical and impatient AI researchers in mind, rather than the current audience of friends and LessWrong readers.
2b. Training & practice:
Maybe the person is shy, or bad at public speaking, or bad at keeping cool and avoiding fluster in high-stakes discussions. If so, some coaching and practice could go a long way. Maybe they have the opposite problems, frequently coming across as overconfident, arrogant, aggressive, or paranoid. If so someone should tell them this and help them tone it down.
In general it might be good to do some role-play exercises or something, to prepare for these conversations. As an academic, I've seen plenty of mock-dissertation-defense sessions and mock-job-talk-question-sessions, which seem to help. And maybe there are ways to get even more realistic practice, e.g. by trying to convince your skeptical friends that their favorite AI design might kill them if it worked.
Note that most of part 2 can be done without having done part 1. This is important in case we don't know anyone who might be part of one of these conversations, which is true for many and perhaps most of us.
Why do I think this is tractable? Well, seems like the sort of thing that people producing AI safety research can do on the margin, just by thinking more about their audience and maybe recording their work (or other people's work) on some Official List. Moreover people who don't do (or even read) AI safety research can contribute to this, e.g. by reading the literature on how to practice for situations like this, and writing up the results.
Why do I think this is neglected? Well, maybe it isn't. In fact I'd bet that some people are already thinking along these lines. It's a pretty obvious idea. But just in case it is neglected, I figured I'd write this. Moreover, the Official Lists I mentioned don't exist, and I think they would if people were taking this idea seriously. Finally--and this more than anything else is what caused me to write this post--I've heard one or two people explicitly call this out as something that they don't think is an important use case for the alignment research they were doing. I disagreed with them, and here we are. If this is a bad idea, I'd love to know why.
This seems a bit like writing the bottom line first?
Like, AI fears in our community have come about because of particular arguments. If those arguments don't apply, I don't see why one should strongly assume that AI is to be feared, outside of having written the bottom line first.
It also seems kind of condescending to operate under the assumption that you know more about the AI system someone is creating than the person who's creating it knows? You refer to their safety strategy as "amateur", but isn't there a chance that having created this system entitles them to a "professional" designation? A priori, I would expect that an outsider not knowing anything about the project at hand would be much more likely to qualify for the "amateur" designation.
This isn't obvious to me. One possibility is that there will be some system which is safe if used carefully, and having a decent technological lead gives you plenty of room to use it carefully, but if you delay your development too much, competing teams will catch up and you'll no longer have space to use it carefully. I think you have to learn more about the situation to know for sure whether a month of delay is a good thing.
People seem predisposed to form echo chambers of the likeminded. I don't think the rationalist or AI safety communities are exempt from this. (Even if the AI safety community has a lot of people with a high level of individual rationality--not obvious, see above note about writing the bottom line first--I don't think having a high level of individual rationality is super helpful for the echo chamber formation problem, since it's more of a sociological phenomenon.) So coming across as overconfident in one's knowledge may be a bigger risk.
Well said. I'm glad you spoke up. Yeah, I don't want people to rationalize their way into thinking AI should never be developed or released either. Currently I think people are much more likely to make the opposite error, but I agree both errors are worth watching out for.
I don't know of a standard reference for that claim eith... (read more)