I've formerly done research for MIRI and what's now the Center on Long-Term Risk; I'm now making a living as an emotion coach and Substack writer.
Most of my content becomes free eventually, but if you'd like to get a paid subscription to my Substack, you'll get it a week early and make it possible for me to write more.
That's certainly true. But at least for me it doesn't seem to be a very big factor, because when I reorient to explaining something to a person and then find it easier, it's very often also over text.
Man that's stressful. I hope you get to rest better soon, maybe just sleeping through one whole day like a hibernating bear. Making happy and content sleeping bear sounds. Which I guess bears don't make if they're hibernating. But I digress.
It's been cool to read many of the Inkhaven posts, so I'm happy that you've been organizing it!
Or is the opposite likely to happen - does the AI frequently fail to solve the customer's problem until the customer demands to speak to a human, and then you have to pay for the AI's and the human worker's time? And what's the chance that it gives wrong advice that the company is then held liable for?
Even one case of that might be quite costly if the AI promised the customer something very expensive, and companies are likely to be nervous about such risks. Or in the case of electronic medical records, what's the chance of the voice-to-text hallucinating words and potentially getting a person killed due to misdiagnosis? (I'm sure that human workers mishear things too, but I also expect that a jury will be much harsher on "we deployed an experimental system with a known tendency for hallucinations in our hospital" than on "our receptionist misheard".)
I'm a little confused about what's going on since apparently the explicit goal of the company is to defend against biorisk and make sure that biodefense capabilities keep up with AI developments, and when I first saw this thread I was like "I'm not sure of what exactly they'll do, but better biodefense is definitely something we need so this sounds like good news and I'm glad that Hannu is working on this".
I do also feel that the risk of rogue AI makes it much more important to invest in biodefense! I'd very much like it if we had the degree of automated defenses that the "rogue AI creates a new pandemic" threat vector was eliminated entirely. Of course there's the risk of the AI taking over those labs but in the best case we'll also have deployed more narrow AI to identify and eliminate all cybersecurity vulnerabilities before that.
And I don't really see a way to defend against biothreats if we don't do something like this (which isn't to say one couldn't exist, I also haven't thought about this extensively so maybe there is something), like the human body wouldn't survive for very long if it didn't have an active immune system.
Today, we are launching Red Queen Bio (http://redqueen.bio), an AI biosecurity company, with a $15M seed led by OpenAI. Biorisk grows exponentially with AI capabilities. Our mission is to scale biological defenses at the same rate.
Since 2016, I have been building HelixNano, a clinical stage biotech (and still my main gig), with Nikolai Eroshenko. Recently, HelixNano teamed up with OpenAI to push AI bio's limits. To our surprise, we saw models invent genuinely new wet lab methods (publication soon).
We got super excited. There was a path to superhuman drug designers. But we couldn't ignore the shadow of superhuman virus designers. A world with breakthrough AI drugs can't exist without new biological defenses. We spun out Red Queen Bio to build them.
AI biosecurity is a different game from traditional biodefense, with relatively static threats and flat budgets. What do you do when the attack surface grows at the rate of AI progress, driven by trillions of dollars of compute?
Red Queen Bio's core thesis is **defensive co-scaling.** You have to couple defensive capabilities and funding to the same technological and financial forces that drive the AGI race, otherwise they can't keep up.
We work with frontier labs to map AI biothreats and pre-build medical countermeasures against them. For co-scaling to work, this needs to improve as models do, and scale with compute. So our pipeline is built upon the leading models themselves, lab automation and RL.
We also need *financial* co-scaling. Governments can't have exponentially scaling biodefense budgets. But they can create the right market incentives, as they have done for other safety-critical industries. We're engaging with policymakers on this both in the US and abroad.
RQB's work is driven by a civilizational need. But the economic incentives are ultimately on our side too. The capital behind what may be the biggest industrial transformation in human history is not going to tolerate unpriced tail risk on the scale of COVID or bigger.
We are committed to cracking the business model for AI biosecurity. We are borrowing from fields like catastrophic risk insurance, and working directly with the labs to figure out what scales. A successful solution can also serve as a blueprint for other AI risks beyond bio.
This is bigger than us. No company, AI lab or government is going to solve defensive co-scaling alone. Accordingly, we are committed to open collaboration with them all. Red Queen Bio is a Public Benefit Corporation, with governance to ensure mission takes precedence over any individual partnership.
In case it's not obvious, Red Queen Bio and defensive co-scaling are very much inspired by VitalikButerin's d/acc philosophy. We find it inspiring, but differ in a couple of important ways.
First, we are skeptical that the d/acc approach of building purely defensive capabilities first is possible: in our view, they have to piggyback on general capabilities.
In contrast to d/acc, we also believe it's hard to maintain defender advantage through de-centralization alone. For the sci-fi fans, writing DARKOME (a near-future biotech thriller) in part changed my mind on this!
But we heartily agree with VitalikButerin on the brightness and centrality of human kindness and agency.
In the face of fast AI timelines and the enormity of the stakes, it's easy to feel trapped in the AGI race dynamic. But the incentive structures driving it are not physical laws. They are no more real than others we can create.
By launching Red Queen Bio, we are choosing a different race. One where defense keeps up with offense and economics spurs safety.
The starting pistol has gone off. It's time to run together.
the current university system coddles
No doubt true in many cases, but I would assume this to depend on exactly which country, university, degree etc. we were talking about?
I don't play shooters so found this a fascinating read
This seems like one of those "I'll never enjoy playing this but I'll love to read stories about it" games
nostalgebraist's self-fulfilling way of getting personas into a language model
A kind of wacky approach that I feel somewhat optimistic about would be to "get the personas into the world model" via SDF on documents that effectively say "Anthropic is using this new alignment technique where they create these two personas, A and G," and go on to describe the entire procedure... and to say that in practice, the procedure "worked" -- the latest model has these two personas, and they have the properties which you hoped they would have.
At the outset, you're unsure whether the technique works, but you do know that SDF is fairly effective at putting facts into the world model. So, you can reason as follows:
- I'll put "my technique works" into the LM's world model via SDF...
- ...which will make the assistant personas in the LM's world model behave as they would have if the technique had in fact worked...
- ...but then, since "the assistant personas behave as they would have if the technique had worked" is what it means for the technique to have worked, we're done: the technique did in fact work, in the real world!
What's more, insofar as this "self-fulfilling success" story seems like a plausible mechanism to us, it may therefore also seem like a plausible mechanism to a highly capable LM encountering the synthetic documents.
This LM would start out in a similar epistemic state to our own: it can't always predict which way things will generalize, just like we can't, but just like us it knows (from reading papers) that SDF is effective. So even if the LM is very smart and won't always accept arbitrary claims like "this technique just magically worked for some unknown reason, how convenient!",[2] it might nevertheless accept the argument that "this technique worked because 'training on documents which claim that it works' is enough to bootstrap it into working." And once the LM accepts this is plausible, it thereby becomes not just plausible but true.
A nice property of this is that, if it works, it is therefore "honest": by the end, the synthetic documents describe the real world accurately (although, unusually, this occurred by modifying the real world to match the documents, via training the LM on them and thus producing the kind of model they describe). So you don't end up in some awkward state where you had to disrupt the accuracy of the world model in the course of aligning the character.
A kind of wacky approach that I feel somewhat optimistic about would be to "get the personas into the world model" via SDF on documents that effectively say "Anthropic is using this new alignment technique where they create these two personas, A and G,"
This is amazing.
Also, I want a "Galaxy-brained" react now.
True! In fairness, the first point is reasonably common for human-drawn scenes like this as well. If you want to show both the village and the main character's face, you need to have both of them facing the "camera", and then it ends up looking like this.
Agree; I'd also like to emphasize this part:
Based on this, they didn't need to set up a new company. They already had an existing biotech company that was focused on its own research, when they realized that "oh fuck, based on our current research things could get really bad unless someone does something"... and then they went Heroic Responsibility and spun out a whole new company to do something, rather than just pretending that no dangers existed or making vague noises and asking for government intervention or something.
It feels like being hostile toward them is a bit Copenhagen Ethics, in that if they hadn't tried to do the right thing, it's possible that nobody would have heard about this and things would have been much easier for them. But since they were thinking about their consequences of their research and decided to do something about it and said that in public, they're now getting piled on for not answering every question they're asked on X. (And if I were them, I might also have concluded that the other side is so hostile that every answer might be interpreted in the worst possible light and that it's better not to engage.)