Is the current legal situation with patents different?
My understanding is that Google did patent transformers, but the patent explicitly only covered encoder/decoder architectures and e.g. GPT-2 uses a decoder-only architecture and so not covered under that patent (and that it would have been very hard for OpenAI to obtain and defend a patent for decoder-only transformers due to Google's prior art).
If your question is, instead, "why didn't the first person to come up with the idea of using computers to predict the next element in a sequence patent that idea, in full generality", keep in mind that (POSIWID aside) patents are intended "to promote the progress of science and useful arts". They are not meant as a way of allowing the first person to come up with an idea to prevent all further research in vaguely adjacent fields.
As a concrete example of the sorts of things patents don't do, take O'Reilly v. Morse, 56 U.S. 62 (1853). In his patent application, Morse claimed
Eighth. I do not propose to limit myself to the specific machinery or parts of machinery described in the foregoing specification and claims; the essence of my invention being the use of the motive power of the electric or galvanic current, which I call electro-magnetism, however developed for marking or printing intelligible characters, signs, or letters, at any distances, being a new application of that power of which I claim to be the first inventor or discoverer.
The court's decision stated
If this claim can be maintained, it matters not by what process or machinery the result is accomplished. For aught that we now know some future inventor, in the onward march of science, may discover a mode of writing or printing at a distance by means of the electric or galvanic current, without using any part of the process or combination set forth in the plaintiff's specification. His invention may be less complicated-less liable to get out of order-less expensive in construction, and its operation. But yet if it is covered by this patent the inventor could not use it, nor the public have the benefit of it without the permission of this patentee. [...] In fine, he claims an exclusive right to use a manner and process which he has not described and indeed had not invented, and therefore could not describe when he obtained his patent. The court is of opinion that the claim is too broad, and not warranted by law.
...which might have something to do with autoregressive language models being more popular than encoder/decoder ones.
"why didn't the first person to come up with the idea of using computers to predict the next element in a sequence patent that idea, in full generality"
Patents are valid for about 20 years. But Bengio et al used NNs to predict the next word back in 2000:
https://papers.nips.cc/paper_files/paper/2000/file/728f206c2a01bf572b5940d7d9a8fa4c-Paper.pdf
So this idea is old. Only some specific architectural aspects are new.
patents are intended "to promote the progress of science and useful arts".
I knew this is how patents were supposed to work in theory, but I also assumed that the actual practice is different. People complain about patent trolls, patents being granted for trivial applications of existing ideas, patent claims written in a maximally vague way that later allows lawyers to claim that they apply to all kinds of things that the patent owner didn't even think about at the time, etc.
Amazon had "one click" patented, how did that promote the progress of science and useful arts?
What is different this time?
I'm not confident in the full answer to this question, but I can give some informed speculation. AI progress seems to rely principally on two driving forces:
On the hardware scaling side, there's very little that an AI lab can patent. The hardware itself may be patentable: for example, NVIDIA enjoys a patent on the H100. However, the mere idea of scaling hardware and training for longer are abstract ideas that are generally not legally possible to patent. This may help explain why NVIDIA currently has a virtual monopoly on producing AI GPUs, but there is essentially no barrier to entry for simply using NVIDIA's GPUs to train a state of the art LLM.
On the software side, it gets a little more complicated. US courts have generally held that abstract specifications of algorithms are not subject to patents, even though specific implementations of those algorithms are often patentable. As one Federal Circuit Judge has explained,
In short, [software and business-method patents], although frequently dressed up in the argot of invention, simply describe a problem, announce purely functional steps that purport to solve the problem, and recite standard computer operations to perform some of those steps. The principal flaw in these patents is that they do not contain an "inventive concept" that solves practical problems and ensures that the patent is directed to something "significantly more than" the ineligible abstract idea itself. See CLS Bank, 134 S. Ct. at 2355, 2357; Mayo, 132 S. Ct. at 1294. As such, they represent little more than functional descriptions of objectives, rather than inventive solutions. In addition, because they describe the claimed methods in functional terms, they preempt any subsequent specific solutions to the problem at issue. See CLS Bank, 134 S. Ct. at 2354; Mayo, 132 S. Ct. at 1301-02. It is for those reasons that the Supreme Court has characterized such patents as claiming "abstract ideas" and has held that they are not directed to patentable subject matter.
This generally limits the degree to which an AI lab can patent the concepts underlying LLMs, and thereby try to restrict competition via the legal process.
Note, however, that standard economic models of economies of scale generally predict that there should be a high concentration of firms in capital-intensive industries, which seems to be true for AI as a result of massive hardware scaling. This happens even in the absence of regulatory barriers or government-granted monopolies, and it predicts what we observe fairly well: a small number of large companies at the forefront of AI development.
I think your model only applies to some famous cases, but ignored others. Who invented computers? Who invented television networks? Who invented the internet?
Lots of things have inventors and patents only for specific chunks of them, or specific versions, but are as a whole too big to be encompassed. They're not necessarily very well defined technologies, but systems and concepts that can be implemented in many different ways. In these fields, focusing on patents is likely to be a losing strategy anyway as you'll simply stand still to protect your one increasingly obsolete good idea like Homer Simpson in front of his sugar while everyone else runs circles around you with their legally distinct versions of the same thing that they keep iterating and improving on. I think AI and even LLMs fall under this category. It's specifically quite hard to patent algorithms - and good thing too, or it would really have a chilling effect for the whole field. I think you can patent only a specific implementation of them, but that's very limited; you can't patent the concept of a self-attention layer, for example, as that's just math. And that kind of thing is all it takes to build your own spin on an LLM anyway.
don't forget the political environment:
- locally, there's a meaningful "break up big tech" current which could make it politically difficult to simultaneously sell AI as a paradigm shift and monopolize it for yourself via the legal apparatus. cynically, firms might view regulation as a path to achieve similar ends but with fewer political repercussions, less blatant than if they leveraged patents.
- globally, the country which presently enjoys the lead in AI sees itself in an economic battle against a competitor unlikely to respect its intellectual property claims. to the degree which states view AI through any lens related of "national defense", there will be some push to maintain competitiveness at least on the global stage.
Your history is definitely wrong. Patents don't enforce themselves. Hollywood is on the west coast to make physical distance from Edison's lawyers and muscle. The Wright brothers went down in history as the inventors of the airplane, but they wasted the rest of their lives fighting over the patents.
Linchpin patents are rare. Maybe you patent one invention to make it just barely work, but that's not the end of the story. Someone else patents something else needed to make it scalable. Now there are two patents and a bilateral monopoly.
None of this is to say that patents were unimportant, so it's not an answer at all.
[Epistemic Status: extremely not endorsed brain noise] New EA cause area just dropped! Do lots of cutting edge algorithmic AI research, and then publish that research, but patent your published research and become a patent troll!
Specifically, Eliezer should copyright the idea of "using the AI to destroy humanity". Then none of the AI companies will be legally allowed to do it! Problem solved.
I may be completely confused about this, but my model of technological breakthroughs in history was basically this: A few guys independently connect the dots leading to a new invention, for example the telephone, approximately at the same time. One of them runs to the patent office a little faster than the others, and he gets the patent first. Now he gets to be forever known as the inventor of the telephone, and the rest of them are screwed; if they ever try to sell their own inventions, they will probably get sued to bankruptcy.
Today, we have a few different companies selling AIs (LLMs). What is different this time?