I am saying that in general any optimization at all messes with small probabilities and errors drastically.”
How can a probability distribution over the results of optimization/agents acting be created?
'Slaughterbots'
It seemed like it could have been done much better.
I’ve increasingly been thinking a lot about secrecy. I think a lot of people’s actions regarding keeping secrets have been much more damaging that they realised (the herd stampede above regarding timelines is but one example), and work needs to be done to square the necessity of secrecy with the fact that the public record has been necessary for scientific and intellectual progress to get humanity to where it is, and if we want to use that ethereal power to secure humanity against novel technologies, we’re probably going to continue to need a public record of ideas and insights.
This is important. I look forward to your post on this.
My conversation on policy and AI with Richard was over a year ago, so for Daniel Kokotajlo's LW writing day I thought I’d write down new thoughts I’ve thought since then (warning: it's pretty rambly). I've structured the post around what what I did that lead to my updated thoughts.
1) I read many posts on strategy
Paul’s post What Failure Looks Like is something I’ve thought about a lot, which lays out how the abstract technical problems turn into practical catastrophes. It is a central example of what (I think) strategy work looks like. The strategic considerations are deeply tied up with the specifics of the technology.
Eliezer’s dialogue on Security Mindset and Scott Garrabrant’s post about how Optimization Amplifies have been key in my thinking about alignment and strategy, best summarised by Scott’s line “I am not just saying that adversarial optimization makes small probabilities of failure large. I am saying that in general any optimization at all messes with small probabilities and errors drastically.”
(I tried to re-state some of my own understandings of Paul’s, Eliezer’s and Scott’s posts when I talked about ML transparency in ‘Useful Doesn’t Mean Secure’. I’m glad I wrote it, it was helpful for sorting out my own thoughts, though I expect isn’t as helpful for other people.)
Eliezer wrote There's No Fire Alarm for Artificial General Intelligence which has been key for how I think about timelines (as I said two years ago). I wrote down some of my worries about the discussion of timelines in an off-topic comment on the recent CFAR AMA, talking about how the broad x-risk network is acting like a herd stampeding away from a fear regarding AI risk, and the problems with that. I’ll clean that up and turn into a post sometime, I managed to say some things there more clearly than I’d been able to think them before.
Paul’s post Arguments about fast takeoff and linked post about hyperbolic growth, was key in my understanding of takeoff speeds, and helped me understand the gears there much better.
I’ve increasingly been thinking a lot about secrecy. I think a lot of people’s actions regarding keeping secrets have been much more damaging that they realised (the herd stampede above regarding timelines is but one example), and work needs to be done to square the necessity of secrecy with the fact that the public record has been necessary for scientific and intellectual progress to get humanity to where it is, and if we want to use that ethereal power to secure humanity against novel technologies, we’re probably going to continue to need a public record of ideas and insights. I’ve not found an opportunity to get my thoughts written down on it yet, though I said a few preliminary things in my review of Scott Garrabrant’s post on the Chatham House Rules, where commenter jbash also had some solid comments.
In MIRI’s post 2018 Update: Our New Research Directions, Nate talks about the idea of ‘deconfusion’, and says it’s something like “making it so that you can think about a given topic without continuously accidentally spouting nonsense”. He goes on to and gives examples of how he used to be confused about a topic, and as soon as he started to ask questions or make statements he immediately said insane things. For example, when talking about infinity as a kid, he used to ask “How can 8 plus infinity still be infinity? What happens if we subtract infinity from both sides of the equation?” which turn out to make no sense. And when talking about AI he used to say things like “isn’t intelligence an incoherent concept,” “but the economy’s already superintelligent,” “if a superhuman AI is smart enough that it could kill us, it’ll also be smart enough to see that that isn’t what the good thing to do is, so we’ll be fine,” “we’re Turing-complete, so it’s impossible to have something dangerously smarter than us, because Turing-complete computations can emulate anything,” and “anyhow, we could just unplug it.”
I often like to say Nick Bostrom is the person who’s able to get deconfused on the highest strategic level. He’s able to just say one sentence at a time, and it’s so true that the rest of us rearrange our entire lives around it. “Hmm… I think if we go extinct, then that means we’re never able to come back from our mistakes, but as long as we never go extinct, then someday we'll manage to do all the important things.” And then we all say "Wow, let us now attempt to redirect humanity around ensuring we never go extinct.” And then he waits a few years, and says another sentence. And then a few years later, he says another. Bostrom’s strategy papers have been primitive thinking blocks of my thoughts in this area. The unilateralist’s curse, the vulnerable world hypothesis, existential risk, information hazards, and so on. They creep into most of my thoughts, often without me noticing.
(The above sentence was not an actual quote, it was oversimplified. An actual one-sentence quote might be "the lesson for utilitarians is not that we ought to maximize the pace of technological development, but rather that we ought to maximize its safety, i.e. the probability that colonization will eventually occur.")
On policy specifically, there were two more FHI papers I read. I thought Bostrom’s policy paper was really interesting, applying many of the ideas from Superintelligence to policy. This felt like real conceptual progress, applying an understanding of advanced technology to policy, and it was the paper that actually got me to feel excited about policy research. I summarised it in a post.
And I read Dafoe’s research agenda. I wrote comments quoting a the bits I liked in the comments of this post. Overall the research agenda just asked a lot of questions though, I didn't get much from reading it.
2) I helped evaluate a grant in AI policy
I spent around 15-25 hours evaluating and writing down my thoughts about the Jess Whittlestone/CSER grant with Oliver for the LTFF. My thoughts went into Oli's public writeup analysing her writing, plus a long addendum on a Nature article by the director of CSER Seán ÓhÉigeartaigh (and coauthored with the director of Leverhulme CFI Stephen Cave). I was fairly proud of the two write-ups, I said things there more clearly than I’d previously been able to think them.
My main updates from all that:
3) I had some discussions with ML researchers.
I once had a conversation with an ML researcher I respected, and found I was tripping over myself to not outright call their work net-negative (they were very excited about their work). I thought it would feel really uncomfortable to believe that of your own work, so I expected if I were to say I thought their work was net-negative then they would feel attacked and not have a line of retreat other than to decide I must be wrong.
Recently, Oli pointed out to me that Machine Learning as a field is in a bad position to decide that Machine Learning research is net negative, and from a basic incentive model you should predict that they will never believe this.
This suggests that the work of pointing out problems should be done by other groups, perhaps economists or government regulators, who are not in an impossible incentive setup, and that only the work of solving those problems should primarily be done by the AI researchers. I think a lot of the basic ideas about what paths would lead AI to be catastrophic are well-phrased in terms of economics, and I’m more excited about the idea of an economics department analysing AI and risks from that field. There are many great independent centres in academia like CHAI, FHI, and GMU, and it’d be interesting to figure out how to build a small econ department like CHAI, built around analysing risk from AI.
4) I listened to a podcast and read some blogposts by science-minded people senior in government.
I wrote about listening to Tom Kalil’s 80k podcast in my post A Key Power of the President is to Coordinate the Execution of Existing Concrete Plans. This substantially increased my model of the tractability of getting things done within governments on timescales of 4-8 years.
My general thinking is that we’re mostly confused about AI - both how to conceptualise it, and what is likely to happen - and these feel like fundamental questions that need answering before I can say what to do about it. I think almost nobody is making real progress on that front. Kalil’s post fit in with this, where he talked about how he can do a lot of awesome sh*t with the president, but when people haven’t got a very concrete policy proposal, he can’t do anything. You can’t tell him to ‘care more’ about AI safety. You need to also know what to do about it. Like the above, 'stopping autonomous drones' isn't a primitive action, and figuring out what the right action is will be most of the work.
While Kalil made me update negatively on the utility of interacting with government in the near to medium term, Cummings obviously suggests that rationalists should think really freaking hard about what could be useful to do in the next 5 years.
(Note: I wrote this post a few months ago, before the most recent UK election, and am just polishing it up to publish today for blogpost writing day. I hadn't thought much about Cummings at the time, and I've left it as-is for now, so this doesn't say much really.)
While I think that novel research is quite rarely done inside government, and that humanity’s scientific and intellectual exploration is likely key for us dealing well with AI and other technologies (which is why I primarily work on LessWrong, where I think we have a shot at really kickstarting intellectual thought on such topics), I still tried to think more about useful things that can be done in government today.
I spent an hour or two, here are my first thoughts.