Review

I’ve recently seen a bunch of discussions of the wisdom of publicly releasing the weights1 of advanced AI models. A common argument form that pops up in these discussions is this:

  1. The problem with releasing weights is that it means that thing X can happen on a large scale, which causes bad effect Y.
  2. But bad effect Y can already happen on a smaller scale because of Z.
  3. Therefore, either it’s OK to release weights, or it’s not OK that Z is true.

One example of this argument form is about the potential to cause devastating pandemics, and goes as follows:

  1. The putative problem with releasing the weights of Large Language Models (LLMs) is that it can help teach people a bunch of facts about virology, bacteriology, and biology more generally, that can teach people how to produce pathogens that cause devastating pandemics.
  2. But we already have people paid to teach students about those topics.
  3. Therefore, if that putative problem is enough to say that we shouldn’t release the weights of large language models, we should also not have textbooks and teachers on the topics of virology, bacteriology, and other relevant sub-topics of biology. But that’s absurd!

In this example, thing X is teaching people a bunch of facts, bad effect Y is creating devastating pandemics, and Z is the existence of teachers and textbooks.

Another example is one that I’m not sure has been publicly written up, but occurred to me:

  1. Releasing the weights of LLMs is supposed to be bad because if people run the LLMs without supervision, they can do bad things.
  2. But if you make LLMs in the first place, you can run them without supervision.
  3. So if it’s bad to publicly release their weights, isn’t it also bad to make them in the first place?

In this example, thing X is running the model, bad effect Y is generic bad things that people worry about, and Z is the model existing in the first place.

However, I think these arguments don’t actually work, because they implicitly assume that the costs and benefits scale proportionally to how much X happens. Suppose instead that the benefits of thing X grow proportionally to how much it happens2: for example, maybe every person who learns about biology makes roughly the same amount of incremental progress in learning how to cure disease and make humans healthier. Also suppose that every person who does thing X has a small probability of causing bad effect Y for everyone that negates all the benefits of X: for example, perhaps 0.01% of people would cause a global pandemic killing everyone if they learned enough about biology. Then, the expected value of X happening can be high when it happens a little (because you probably get the good effects and not the bad effects Y), but low when it happens a lot (because you almost certainly get bad effect Y, and the tiny probability of the good effects isn’t worth it). In this case, it makes sense that it might be fine that Z is true (e.g. that some people can learn various sub-topics of biology with great tutors), but bad to publicly release model weights to make X happen a ton.

So what’s the up-shot? To know whether it’s a good idea to publicly release model weights, you need to know the costs and benefits of various things that can happen, and how those scale with the user-base. It’s not enough to just point to a small amount of the relevant effects of releasing the weights and note that those are fine. I didn’t go thru this here, but you can also reverse the sign: it’s possible that there’s some activity that people can do with model weights that’s bad if a small number of people do it, but good if a large number of people do it: so you can’t necessarily just point to a small number of people doing nefarious things with some knowledge and conclude that it would be bad if that knowledge were widely publicized.

  1. Basically, the parameters of these models. Once you know the parameters and how to put them together, you can run the model and do what you want with it. 

  2. Or more generally, polynomially (e.g. maybe quadratically because of Metcalfe’s law). 

New Comment
16 comments, sorted by Click to highlight new comments since:

I get the sense that "but Google and textbooks exist" is more of a deontological argument, like if the information is public at all "the cat's out of the bag" and it's unfair to penalize LLMs bc they didn't cross any new lines, just increased accessibility.

See also Zvi's post on More Dakka

suppose that every person who does thing X has a small probability of causing bad effect Y for everyone that negates all the benefits of X: for example, perhaps 0.01% of people would cause a global pandemic killing everyone if they learned enough about biology. Then, the expected value of X happening can be high when it happens a little (because you probably get the good effects and not the bad effects Y), but low when it happens a lot (because you almost certainly get bad effect Y, and the tiny probability of the good effects isn’t worth it).

I don't get the math on this. Suppose I have  balls in an urn, and pulling out all but one of them has value  and the final one has value . Then the expected value of drawing one ball is . Assuming that  (which is how I gloss "causing bad effect Y for everyone that negates all the benefits of X"), isn't this already negative EV?

I see how it works if you have some selection effect that's getting degraded or something like that.

Nope: what I mean is that if you draw K balls, and one of them has value -B, the overall value is -B, but if all instead have value A, the overall value is KxA.

Yeah, but the expected value would still be .

No. The probability of K balls all being the good balls (assuming you're drawing with replacement) is ((N-1)/N)^K. So the expected value is ((N-1)/N)^K x (KxA) - (1 - ((N-1)/N)^K) x B

OK, that's fair, I should have written down the precise formula rather than an approximation. My point though is that your statement

the expected value of X happening can be high when it happens a little (because you probably get the good effects and not the bad effects Y)

is wrong because a low probability of large bad effects can swamp a high probability of small good effects in expected value calculations.

A low probability of large bad effects can swamp a high probability of small good effects, but it doesn't have to, so you can have the high probability of small good effects dominate.

Let me be concrete: imagine you have a one in a hundred chance of a bad outcome of utility -100 (where if it happens all good effects get wiped out), and with the rest of the probability you get a good outcome of utility 2 (and the utility of these good outcomes stacks with how many times they happen). Then the expected utility of doing this once is 2 x 0.99 - 100 x 0.01 = 0.98 > 0, but the expected utility of doing it one thousand times is 2 x 1000 x (0.99 ^ 1000) - 100 x (1 - 0.99^1000) = 2000 x 0.000043 - 100 x 0.999957 = 0.086 - 99.9957 < 0.

OK, that makes sense.

True, but this doesn't apply to the original reasoning in the post - he assumes constant probability while you need increasing probability (as with the balls) to make the math work.

Or decreasing benefits, which probably is the case in the real world.

Edit: misred the previous comment, see below

My comment involves a constant probability of the bad outcome with each draw, and no decreasing benefits. I think this is a good exposition of this portion of the post (which I wrote), if you assume that each unit of bio progress is equally good, but that the goods don't materialize if we all die of a global pandemic:

Suppose instead that the benefits of thing X grow proportionally to how much it happens: for example, maybe every person who learns about biology makes roughly the same amount of incremental progress in learning how to cure disease and make humans healthier. Also suppose that every person who does thing X has a small probability of causing bad effect Y for everyone that negates all the benefits of X: for example, perhaps 0.01% of people would cause a global pandemic killing everyone if they learned enough about biology.

The two paths to thing X might also be non-equivalent for reasons other than quantity/scale.

If for example learning about biology and virology from textbooks and professors is more difficult, and thereby acts as a filter to selectively teach those things to people who are uncommonly talented and dedicated, and if that correlates with good intentions.

Or if learning from standard education embeds people in a social group that also to some extent socialises its members with norms of ethical practice, and monitors for people who seem unstable or dangerous (whereas LLM learning can be solitary and unobserved)

That could be true, but I'm not actually that optimistic about elite morals and didn't want the counter-argument to rely on that.

It's not just good intentions, but also the temptations of capitalism. Anyone smart and conscientious enough to engineer a pandemic is probably also able to get a well-paid job and live a comfortable life, so the temptation to kill everyone (including themselves) is small.

EDIT:

To expand on the concept of "temptations of capitalism", the idea is roughly that skills necessary to overthrow a regime are often also useful for making money. For example, if you can get thousands of followers, you could sell them a book or a seminar, get tons of money, and live comfortably. The more followers you can get, the more resources you can extract from them. Or if you are great at organizing, you could found a company or become a CEO of an existing one. -- On the other hand, if the regime prevents you from doing these things (because the positions of comfortable life are reserved for those of noble origin, or for those whose parents and grandparents were Party members), you may be tempted to use your followers or your organizational skills to support a revolution. As an example, V. I. Lenin originally wanted to be a lawyer, but was not allowed for political reasons, so he used his talents to overthrow the regime instead.

This can't be universally true: Aum Shinrikyo famously recruited mostly from top university students, Osama bin Laden inherited a lot of money and attended an elite high school (Wikipedia says accounts differ in his success at university). In general, often rich people have some void of meaning that can be filled by religious or ideological movements, which can motivate violence.