Sequences

Modeling Transformative AI Risk (MTAIR)

Wiki Contributions

Comments

I'll note that I think this is a mistake that lots of people working in AI safety have made, ignoring the benefits of academic credentials and prestige because of the obvious costs and annoyance.  It's not always better to work in academia, but it's also worth really appreciating the costs of not doing so in foregone opportunities and experience, as Vanessa highlighted. (Founder effects matter; Eliezer had good reasons not to pursue this path, but I think others followed that path instead of evaluating the question clearly for their own work.)

And in my experience, much of the good work coming out of AI Safety has been sidelined because it fails the academic prestige test, and so it fails to engage with academics who could contribute or who have done closely related work. Other work avoids or fails the publication process because the authors don't have the right kind of guidance and experience to get their papers in to the right conferences and journals, and not only is it therefore often worse for not getting feedback from peer review, but it doesn't engage others in the research area.

Answer by Davidmanheim52

There aren't good ways to do this automatically for text, and state of the art is rapidly evolving.
https://arxiv.org/abs/2403.05750v1

For photographic images which contain detailed images humans or contain non-standard objects with details, there are still some reasonably good heuristics for when AIs will mess up those details, but I'm not sure how long they will be valid for.

This is one of the key reasons that the term alignment was invented and used instead of control; I can be aligned with the interests of my infant, or my pet, without any control on their part.

Most of this seems to be subsumed in the general question of how do you do research, and there's lot of advice, but it's (ironically) not at all a science. From my limited understanding of what goes on in the research groups inside these companies, it's a combination of research intuition, small scale testing,  checking with others and discussing the new approach, validating your ideas, and getting buy-in from people higher up that it's worth your and their time to try the new idea. Which is the same as research generally.

At that point, I'll speculate and assume whatever idea they have is validated in smaller but still relatively large settings. For things like sample efficiency, they might, say, train a GPT-3 size model, which now cost only a fraction of the researcher's salary to do. (Yes, I'm sure they all have very large compute budgets for their research.) If the results are still impressive, I'm sure there is lots more discussion and testing before actually using the method in training the next round of frontier models that cost huge amounts of money - and those decisions are ultimately made by the teams building those models, and management.

It seems like you're not being clear about how you are thinking about the cases, or are misusing some of the terms. Nash Equilibria exist in zero-sum games, so those aren't different things. If you're familiar with how to do game theory, I think you should carefully set up what you claim the situation is in a payoff matrix, and then check whether, given the set of actions you posit people have in each case, the scenario is actually a Nash equilibrium in the cases you're calling Nash equilibrium.

...but there are a number of EAs working on cybersecurity in the context of AI risks, so one premise of the argument here is off.

And a rapid response site for the public to report cybersecurity issues and account hacking generally would do nothing to address the problems that face the groups that most need to secure their systems, and wouldn't even solve the narrower problem of reducing those hacks, so this seems like the wrong approach even given the assumptions you suggest. 

I agree that your question is weird and confused, and agree that if that were the context, my post would be hard to understand. But I think it's a bad analogy! That's because there are people who have made analogies between AI and Bio very poorly, and it's misleading and leading to sloppy thinking. In my experience seeing discussions on the topic, either the comparisons are drawn carefully and the relevant dissimilarities are discussed clearly, or they are bad analogies.
 
To stretch your analogy, if the context were that I'd recently heard people say "Steve and David are both people I know, and if you don't like Steve, you probably won't like David," and also "Steve and David are both concerned about AI risks, so they agree on how to discuss the issue," I'd wonder if there was some confusion, and I'd feel comfortable saying that in general, Steve is an unhelpful analog for David, and all these people should stop and be much more careful in how they think about comparisons between us.

I agree with you that analogies are needed, but they are also inevitably limited. So I'm fine with saying "AI is concerning because its progress is exponential, and we have seen from COVID-19 that we need to intervene early," or "AI is concerning because it can proliferate as a technology like nuclear weapons," or "AI is like biological weapons in that countries will pursue and use these because they seem powerful, without appreciating the dangers they create if they escape control." But what I am concerned that you are suggesting is that we should make the general claim "AI poses uncontrollable risks like pathogens do," or "AI needs to be regulated the way biological pathogens are," and that's something I strongly oppose. By ignoring all of the specifics, the analogy fails.

In other words, "while I think the disanalogies are compelling, comparison can still be useful as an analytic tool - while keeping in mind that the ability to directly learn lessons from biorisk to apply to AI is limited by the vast array of other disanalogies."

I said:

disanalogies listed here aren’t in and of themselves reasons that similar strategies cannot sometimes be useful, once the limitations are understood. For that reason, disanalogies should be a reminder and a caution against analogizing, not a reason on its own to reject parallel approaches in the different domains.

You seem to be simultaneously claiming that I had plenty of room to make a more nuanced argument, and then saying you think I'm saying something which exactly the nuance I included seems to address. Yes, people could cite the title of the blog post to make a misleading claim, assuming others won't read it - and if that's your concern, perhaps it would be enough to change the title to "Biorisk is Often an Unhelpful Analogy for AI Risk," or "Biorisk is Misleading as a General Analogy for AI Risk"?

I agree that we do not have an exact model for anything in immunology, unlike physics, and there is a huge amount of uncertainty. But that's different than saying it's not well-understood; we have clear gold-standard methods for determining answers, even if they are very expensive. This stands in stark contrast to AI, where we don't have the ability verify that something works or is safe at all without deploying it, and even that isn't much of a check on its later potential for misuse.

But aside from that, I think your position is agreeing with mine much more than you imply. My understanding is that we have newer predictive models which can give uncertain but fairly accurate answers to many narrow questions. (Older, non-ML methods also exist, but I'm less familiar with them.) In your hypothetical case, I expect that the right experts can absolutely give indicative answers about whether a novel vaccine peptide is likely or unlikely to have cross-reactivity with various immune targets, and the biggest problem is that it's socially unacceptable to assert confidence in anything short of tested and verified case. But the models can get, in the case of the Zhang et al paper above, 70% accurate answers, which can help narrow the problem for drug or vaccine discovery, then they do need to be followed with in vitro tests and trials.

Load More