All of Fabian Schimpf's Comments + Replies

Thank you for compiling this list. This is useful, and I expect to point people to it in the future. The best thing, IMO, is that it is not verbose and not dripping with personal takes on the problem; I would like to see more compilations of topics like this to give other people a leg up when they aspire to venture into a field.

A potential addition is Dan Hendryck's PAIS agenda, in which he advocates for ML research that promotes alignment without also causing advances in capabilities. This effectively also slows AI (capabilities) development, and I am quite partial to this idea.

2Zach Stein-Perlman
Yay. Many other collections / reading lists exist, and I'm aware of many public and private ones in AI strategy, so feel free to DM me strategy/governance/forecasting topics you'd want to see collections on. I haven't updated this post much since April but I'll update it soon and plan to add PAIS, thanks.
Answer by Fabian Schimpf21

Like Jaques, I also use focusmate to get up and running in the morning. I try to use the 10min between sessions to get ready. This timebox is helpful to me because I would otherwise drag it out. During these sessions, I find it beneficial to do the same things every morning. This way, I can go on autopilot until I am fully awake and don't have to will myself into doing stuff. 

I think this threshold will be tough to set. Confidence in a decision makes IMO only really sense if you consider decisions to be uni-modal. I would argue that this is rarely the case for a sufficiently capable system (like you and me). We are constantly trading off multiple options, and thus, the confidence (e.g., as measured by the log-likelihood of the action given a policy and state) depends on the number of options available. I expect this context dependence would be a tough nut to crack to have a meaningful threshold. 

If there were other warning shots in addition to this one, that's even worse! We're already playing in Easy Mode here.


Playing devil's advocate, if the government isn't aware that the game is on, it doesn't matter if it's on easy mode - the performance is likely poor independent of the game's difficulty. 

I agree with the post's sentiment that warning shots would currently not do much good. But I am, as of now, still somewhat hopeful that the bottleneck is getting the government to see and target a problem, not the government's ability to act on an identified issue.  

I agree; that seems to be a significant risk. In case we get lucky to have AI warning shots, it seems prudent to think about how it can be ensured that they are recognized for what they are. This is a problem that I havn't given much thought to before.
 

But I find it encouraging to think that we can use warning shots in other fields to understand the dynamics of how such events are being interpreted. As of now, I don't think AI warning shots would change much, but I would add this potential for learning as a potential counter-argument. I think this see... (read more)

Hi Ben, I like the idea, however almost every decision has conflicting outcomes, e.g., regarding opportunity cost. From how I understand you, this would delegate almost every decision to humans if you take the premise of I can't do X if I choose to do Y seriously. I think the application to high-impact interference seems therefore promising if the system is limited to only deciding on a few things. The question then becomes if a human can understand the plan that an AGI is capable of making. IMO this ties nicely into, e.g., ELK and interpretability research, but also the problem of predictability. 

3Ben Smith
Then the next thing I want to suggest is that the system uses human resolution of conflicting outcomes to train itself to predict how a human would resolve a conflict, and if it is higher than a suitable level of confidence, it will go ahead and act without human intervention. But any prediction of what a human would predict could be second-guessed by a human pointing out where the prediction is wrong. Agreed that whether a human understanding the plan (and all the relevant outcomes. which outcomes are relevant?) is important and harder than I first imagined. 

The reaction seems consistent if people (in government) believe no warning shot was fired. AFAIK the official reading is that we experienced a zoonosis, so banning gain of function research would go against that narrative. It seems true to me that this should be seen as a warning shot, but smallpox and ebola could have prompted this discussion as well and also failed to be seen as a warning shot. 

2Rob Bensinger
Governments are also largely neglecting vaccine tech/pipeline investments, which protect against zoonotic viruses, not just engineered ones. But also, the conceptual gap between 'a virus that was maybe a lab leak, maybe not' and 'a virus that was a lab leak' is much smaller than the gap between the sort of AI systems we're likely to get a 'warning shot' from (if the warning shot is early enough to matter) and misaligned superintelligent squiggle maximizers. So if we government can't make the conceptual leap in the easy case, it's  even less likely to make it in the hard case. If there were other warning shots in addition to this one, that's even worse! We're already playing in Easy Mode here.
7habryka
My guess is in the case of AI warning shots there will also be some other alternative explanations like "Oh, the problem was just that this company's CEO was evil, nothing more general about AI systems".

Excellent summary; I had been looking for something like this! Is there a reason you didn't include the AI Safety Camp in Training & Mentoring Programs?

I like your point that "surprises cut both ways" and assume that this is why your timelines aren't affected by the possibility of surprises, is that about right? I am confused about the ~zero effect though: Isn't double descent basically what we see with giant language models lately? Disclaimer: I don't work on LLMs myself, so my confusion isn't necessarily meaningful

4Rohin Shah
My timelines are affected by the possibility of surprises; it makes them wider on both ends. My impression is that giant language models are not trained to the interpolation point (though I haven't been keeping up with the literature for the last year or so). I believe the graphs in that post were created specifically to demonstrate that if you did train them past the interpolation point, then you would see double descent.

Starting more restrictive seems sensible; this could be, as you say, learned away, or one could use human feedback to sign off on high-impact actions. The first problem reminds me of finding regions of attractions in nonlinear control where the ROA is explored without leaving the stable region. The second approach seems to hinge on humans being able to understand the implications of high-impact actions and the consequences of a baseline like inaction. There are probably also other alternatives that we have not yet considered. 



 

To me the relevant result/trend is that it seems like catastrophic forgetting is becoming less of an issue as it was maybe two to three years ago e.g. in meta-learning and that we can squeeze these diverse skills into a single model. Sure, the results seem to indicate that individual systems for different tasks would still be the way to go for now, but at least the published version was not trained with the same magnitude of compute that was e.g. used on the latest and greatest LLMs (I take this from Lennart Heim  who did the math on this). So it is I... (read more)

On surprises:

  • I definitely agree that your timelines should take into account "maybe there will be a surprise".
  • "There can be surprises" cuts both ways; you can also see e.g. a surprise slowdown of scaling results.
  • I also didn't expect double descent and grokking but it's worth noting that afaict those have had ~zero effects on SOTA capabilities so far.
  • Regardless, the original question was about this particular result; this particular result was not surprising (given my very brief skim).

On catastrophic forgetting:

I agree that catastrophic forgetting is becomi... (read more)