
I along with several AI Impacts researchers recently talked to talked to Rohin Shah about why he is relatively optimistic about AI systems being developed safely. Rohin Shah is a 5th year PhD student at the Center for Human-Compatible AI (CHAI) at Berkeley, and a prominent member of the Effective Altruism community.
Rohin reported an unusually large (90%) chance that AI systems will be safe without additional intervention. His optimism was largely based on his belief that AI development will be relatively gradual and AI researchers will correct safety issues that come up.
He reported two other beliefs that I found unusual: He thinks that as AI systems get more powerful, they will actually become more interpretable because they will use features that humans also tend to use. He also said that intuitions from AI/ML make him skeptical of claims that evolution baked a lot into the human brain, and he thinks there’s a ~50% chance that we will get AGI within two decades via a broad training process that mimics the way human babies learn.
A full transcript of our conversation, lightly edited for concision and clarity, can be found here.
By Asya Bergal
I agree that if you just had one leader with absolute power then it probably won't work, and that kind of government probably isn't included in the author's "many forms of government will work quite well". I think what he probably has in mind are governments that look authoritarian from the outside but still has some kind of internal politics/checks-and-balances that can keep the top leader(s) from going off the rails. I wish I had a good gears-level model of how that kind of government/politics works though. I do suspect that "work quite well" might be fragile/temporary and dependent on the top leaders not trying very hard to take absolute power for themselves, but I'm very uncertain about this due to lack of knowledge and expertise.