link

Who are the most ruthless capitalists in the western world? Whose monopolistic practices make Walmart look like a corner shop and Rupert Murdoch a socialist? You won't guess the answer in a month of Sundays. While there are plenty of candidates, my vote goes not to the banks, the oil companies or the health insurers, but – wait for it – to academic publishers. Theirs might sound like a fusty and insignificant sector. It is anything but. Of all corporate scams, the racket they run is most urgently in need of referral to the competition authorities.

Everyone claims to agree that people should be encouraged to understand science and other academic research. Without current knowledge, we cannot make coherent democratic decisions. But the publishers have slapped a padlock and a "keep out" sign on the gates.

You might resent Murdoch's paywall policy, in which he charges £1 for 24 hours of access to the Times and Sunday Times. But at least in that period you can read and download as many articles as you like. Reading a single article published by one of Elsevier's journals will cost you $31.50. Springer charges €34.95, Wiley-Blackwell, $42. Read 10 and you pay 10 times. And the journals retain perpetual copyright. You want to read a letter printed in 1981? That'll be $31.50.

Of course, you could go into the library (if it still exists). But they too have been hit by cosmic fees. The average cost of an annual subscription to a chemistry journal is $3,792. Some journals cost $10,000 a year or more to stock. The most expensive I've seen, Elsevier's Biochimica et Biophysica Acta, is $20,930. Though academic libraries have been frantically cutting subscriptions to make ends meet, journals now consume 65% of their budgets, which means they have had to reduce the number of books they buy. Journal fees account for a significant component of universities' costs, which are being passed to their students.

Murdoch pays his journalists and editors, and his companies generate much of the content they use. But the academic publishers get their articles, their peer reviewing (vetting by other researchers) and even much of their editing for free. The material they publish was commissioned and funded not by them but by us, through government research grants and academic stipends. But to see it, we must pay again, and through the nose.

The returns are astronomical: in the past financial year, for example, Elsevier's operating profit margin was 36% (£724m on revenues of £2bn). They result from a stranglehold on the market. Elsevier, Springer and Wiley, who have bought up many of their competitors, now publish 42% of journal articles.

...

Razib Khan found this paragraph rather striking (who is reminded of this episode of South Park) and I would tend to agree that its a rather convincing argument.

Murdoch pays his journalists and editors, and his companies generate much of the content they use. But the academic publishers get their articles, their peer reviewing (vetting by other researchers) and even much of their editing for free. The material they publish was commissioned and funded not by them but by us, through government research grants and academic stipends. But to see it, we must pay again, and through the nose

Are publishers really so successful as rent seekers or is there something the original article is missing here? Also what useful strategies would LWers recommend to help minimize costs for someone trying to practice the virtue of scholarship? The obvious suggestions (implied in the article) seem to be emailing authors (and perhaps those suscribed) asking for the papers and acquiring and paying for membership in some libraries.

Another obvious option is using ... liberated databases of such academic papers.

 

Edit: Just wondering, has this been discussed before on Lesswrong?

 

 

New Comment
15 comments, sorted by Click to highlight new comments since:
[-][anonymous]200

Since I linked to an article reporting on it and it seems relevant to the debate, here is the text that comes with the torrent that was put on piratebay by user gmaxwell_ under the title of "Papers from Philosophical Transactions of the Royal Society".

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

This archive contains 18,592 scientific publications totaling 33GiB, all from Philosophical Transactions of the Royal Society and which should be available to everyone at no cost, but most have previously only been made available at high prices through paywall gatekeepers like JSTOR.

Limited access to the documents here is typically sold for $19 USD per article, though some of the older ones are available as cheaply as $8. Purchasing access to this collection one article at a time would cost hundreds of thousands of dollars.

Also included is the basic factual metadata allowing you to locate works by title, author, or publication date, and a checksum file to allow you to check for corruption.

ef8c02959e947d7f4e4699f399ade838431692d972661f145b782c2fa3ebcc6a sha256sum.txt

I've had these files for a long time, but I've been afraid that if I published them I would be subject to unjust legal harassment by those who profit from controlling access to these works.

I now feel that I've been making the wrong decision.

On July 19th 2011, Aaron Swartz was criminally charged by the US Attorney General's office for, effectively, downloading too many academic papers from JSTOR.

Academic publishing is an odd systemΓΓé¼ΓÇ¥the authors are not paid for their writing, nor are the peer reviewers (they're just more unpaid academics), and in some fields even the journal editors are unpaid. Sometimes the authors must even pay the publishers.

And yet scientific publications are some of the most outrageously expensive pieces of literature you can buy. In the past, the high access fees supported the costly mechanical reproduction of niche paper journals, but online distribution has mostly made this function obsolete.

As far as I can tell, the money paid for access today serves little significant purpose except to perpetuate dead business models. The "publish or perish" pressure in academia gives the authors an impossibly weak negotiating position, and the existing system has enormous inertia.

Those with the most power to change the system--the long-tenured luminary scholars whose works give legitimacy and prestige to the journals, rather than the other way around--are the least impacted by its failures. They are supported by institutions who invisibly provide access to all of the resources they need. And as the journals depend on them, they may ask for alterations to the standard contract without risking their career on the loss of a publication offer. Many don't even realize the extent to which academic work is inaccessible to the general public, nor do they realize what sort of work is being done outside universities that would benefit by it.

Large publishers are now able to purchase the political clout needed to abuse the narrow commercial scope of copyright protection, extending it to completely inapplicable areas: slavish reproductions of historic documents and art, for example, and exploiting the labors of unpaid scientists. They're even able to make the taxpayers pay for their attacks on free society by pursuing criminal prosecution (copyright has classically been a civil matter) and by burdening public institutions with outrageous subscription fees.

Copyright is a legal fiction representing a narrow compromise: we give up some of our natural right to exchange information in exchange for creating an economic incentive to author, so that we may all enjoy more works. When publishers abuse the system to prop up their existence, when they misrepresent the extent of copyright coverage, when they use threats of frivolous litigation to suppress the dissemination of publicly owned works, they are stealing from everyone else.

Several years ago I came into possession, through rather boring and lawful means, of a large collection of JSTOR documents.

These particular documents are the historic back archives of the Philosophical Transactions of the Royal SocietyΓΓé¼ΓÇ¥a prestigious scientific journal with a history extending back to the 1600s.

The portion of the collection included in this archive, ones published prior to 1923 and therefore obviously in the public domain, total some 18,592 papers and 33 gigabytes of data.

The documents are part of the shared heritage of all mankind, and are rightfully in the public domain, but they are not available freely. Instead the articles are available at $19 each--for one month's viewing, by one person, on one computer. It's a steal. From you.

When I received these documents I had grand plans of uploading them to Wikipedia's sister site for reference works, WikisourceΓΓé¼ΓÇ¥ where they could be tightly interlinked with Wikipedia, providing interesting historical context to the encyclopedia articles. For example, Uranus was discovered in 1781 by William Herschel; why not take a look at the paper where he originally disclosed his discovery? (Or one of the several follow on publications about its satellites, or the dozens of other papers he authored?)

But I soon found the reality of the situation to be less than appealing: publishing the documents freely was likely to bring frivolous litigation from the publishers.

As in many other cases, I could expect them to claim that their slavish reproductionΓΓé¼ΓÇ¥scanning the documentsΓΓé¼ΓÇ¥ created a new copyright interest. Or that distributing the documents complete with the trivial watermarks they added constituted unlawful copying of that mark. They might even pursue strawman criminal charges claiming that whoever obtained the files must have violated some kind of anti-hacking laws.

In my discreet inquiry, I was unable to find anyone willing to cover the potentially unbounded legal costs I risked, even though the only unlawful action here is the fraudulent misuse of copyright by JSTOR and the Royal Society to withhold access from the public to that which is legally and morally everyone's property.

In the meantime, and to great fanfare as part of their 350th anniversary, the RSOL opened up "free" access to their historic archivesΓΓé¼ΓÇ¥but "free" only meant "with many odious terms", and access was limited to about 100 articles.

All too often journals, galleries, and museums are becoming not disseminators of knowledgeΓΓé¼ΓÇ¥as their lofty mission statements suggestΓΓé¼ΓÇ¥but censors of knowledge, because censoring is the one thing they do better than the Internet does. Stewardship and curation are valuable functions, but their value is negative when there is only one steward and one curator, whose judgment reigns supreme as the final word on what everyone else sees and knows. If their recommendations have value they can be heeded without the coercive abuse of copyright to silence competition.

The liberal dissemination of knowledge is essential to scientific inquiry. More than in any other area, the application of restrictive copyright is inappropriate for academic works: there is no sticky question of how to pay authors or reviewers, as the publishers are already not paying them. And unlike 'mere' works of entertainment, liberal access to scientific work impacts the well-being of all mankind. Our continued survival may even depend on it.

If I can remove even one dollar of ill-gained income from a poisonous industry which acts to suppress scientific and historic understanding, then whatever personal cost I suffer will be justifiedΓΓé¼ΓÇ¥it will be one less dollar spent in the war against knowledge. One less dollar spent lobbying for laws that make downloading too many scientific papers a crime.

I had considered releasing this collection anonymously, but others pointed out that the obviously overzealous prosecutors of Aaron Swartz would probably accuse him of it and add it to their growing list of ridiculous charges. This didn't sit well with my conscience, and I generally believe that anything worth doing is worth attaching your name to.

I'm interested in hearing about any enjoyable discoveries or even useful applications which come of this archive.


Greg Maxwell - July 20th 2011 gmaxwell@gmail.com Bitcoin: 14csFEJHk3SYbkBmajyJ3ktpsd2TmwDEBb

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk4nlfwACgkQrIWTYrBBO/pK4QCfV/voN6IdZRU36Vy3xAedUMfz rJcAoNF4/QTdxYscvF2nklJdMzXFDwtF =YlVR -----END PGP SIGNATURE-----

Are publishers really so successful as rent seekers or is there something the original article is missing here?

Clearly they are stunningly successful rent-seekers, but what the article is missing is that it presents the situation as an evil rent-seeking plot against a pure and idealistic world of academia. In reality, the entire system is so full of outrageous rent-seeking that it's no wonder that nobody is particularly eager to rock the boat. The only thing in which the publishers are exceptional is that their rent-seeking is more obvious and straightforward.

Publishing in certain journals (and conferences) is high-reputation, because everyone in some field or subfield believes they are. And thus, they all try to submit their papers to that journal, and thus that journal gets to pick the best papers, and thus that journal keeps a high reputation. Publishing work elsewhere is considered a lesser achievement, and work elsewhere is read with less regard. So, the reputation of high-status journals is notably stable.

On the other hand, almost every researcher in CS flaunts copyright, posting their papers on their own websites. The practice is so pervasive that when I want to find a specific paper, I usually search for the author's website first, before I try accessing the journal -- even thouggh I have the expensive university access to those journals. (Well, in CS they're usually conference proceedings instead of journals, but that's beside the point.)

In fact, author home pages are a vastly better mechanism for finding research than journals or conference proceedings are, because they can carry far more information. Authors will often have more up-to-date versions of their research than the journal version does. Sometimes they have code or interactive demos. Often they put their later, related work in the same place, which is oftern clearer or more useful.

Other fields could make this shift too, subfield by subfield and journal audience by journal audience. For this to happen cleanly, a significant fraction of the top researchers in a field need to get together and agree to break copyright in the same way. In any given subfield, they probably already go to the same conferences, so there's plenty of opportunity to reach these agreements. One or two doing this alone would be prone to lawsuits from the publisher, but all at once would destroy their audience.

This may be a daunting risk for many academics, though, and it probably seems needlessly risky. And it is needlessly risky. It seems like it should be much easier to get that same roomful of academics to agree to move their shared regard from a closed-access journal to an open-access journal. Moreso, because there are already people building free journal-management software for exactly this purpose, this should be relatively easy to do.

Moving individual fields and subfields to totally-free, totally-open journals is, now, just a matter of memtic engineering. Convince enough people in any given field to switch, and make it easy for them to do so, and organize a little bit, and that switch can happen.

Academics, consider: what's the global value of opening the journals of your specialty? Are you, or anyone you can speak to, in a position to do this?

Everyone else: is anyone good at memetic engineering, and willing to take this as a challenge?

almost every researcher in CS flaunts copyright, posting their papers on their own websites

Many journals explicitly allow you to distribute a "preprint" of your journal articles on your personal website. For example, the Elsevier policy states that authors retain:

the right to post a pre-print version of the journal article on Internet websites including electronic pre-print servers, and to retain indefinitely such version on such servers or sites for scholarly purposes

On the other hand, almost every researcher in CS flaunts copyright, posting their papers on their own websites.

You want "flout" here, not "flaunt".

There are open access journals. I recommend supporting them.

I've noticed this before, and brought it up with a couple of my professors a few years ago. They brushed it off, saying that the subscriptions are mainly bought by universities and other institutions, so nobody ends up paying an exorbitant cost. What I inferred, but they did not actually say, was that universities and similar institutions got much better deals on them, the way insurance companies get better deals from medical care providers by dealing in bulk and being in a better bargaining position.

They were both pretty civic minded individuals, but I suppose I was wrong to assign such a high likelihood that if there were serious gouging going on and the subscription prices weren't commensurate with journals' operating costs, they would have shared my concern rather than dismissing it because they didn't personally have to care about it.

[-][anonymous]30

I know a little bit about the academic publishing business from my mother's experience working in the field. This analysis is missing a critical factor.

In the old days, many journals were paid for by individual subscriptions. Professors, mostly, would buy a subscription to be sent to their personal home or office. Back in the days of dead-tree journals, having a copy of your own was more convenient than trekking to the library to see if they have one. But, with the internet, there's no need for a personal subscription, and almost everybody gets journal access from a university library.

Because the library fees have to cover more users, obviously, to stay in business, academic publishers have to charge more for library fees. This is terrible if you're not affiliated with a university, of course. But it's made necessary by the drop in individual subscriptions.

University presses may be charging a lot, but they're not living high on the hog; mostly, they've been cutting journals in the past two decades. Like the rest of the publishing world, they're finding it harder to stay afloat in an environment dominated by the internet's amazing ability to copy and share content.

Why do you bring up university presses? No one else singled them out. Usually when people do, it is as the baseline reasonable journal cost, an order of magnitude less than commercial presses. In this context, of single articles, they use the same pricing as commercial presses, but that's probably incompetence on both parts.

I agree that the situation seems bad, but I think the proposed cause is not correct. Blogs also get their content "for free" and make very little money. I think the reason is likely that these publishers have very little competition. Why? I don't know; perhaps they have a large reputational advantage over potential competitors?

Yes, starting a new scientific journal is relatively easy, but getting a prestige is a chicken-and-egg problem.

Scientists get points for publishing in prestigious journals with high impact factor; that means in journals where articles are often cited. If you start a new journal, there is no reward for a scientist that would give you their article. A scientist can publish an article only once, and getting points for publishing articles is very important for them (it influences if they get grants or tenure). In other words, the system strongly penalizes scientists for publishing in new journals. This is what reduces competition.

So when you start a new journal, you usually pay scientists for publishing in your journal. And you have to send a lot of "spam" to make yourself known, which makes you rather unpopular (you advertize that you have zero prestige). It takes a time to slowly increase your prestige.

When you are at the other end of scale, the situation is opposite. You are famous, and your articles are good, therefore scientists get a lot of points for publishing in your journal. You can refuse less-than-perfect articles, which increases your score (impact factor = number of citations per article). Everyone works for free, and sometimes scientists even pay you for publishing their excellent articles.

The system by its definition supports monopolies, because players are rewarded for cooperation with established winners. Not only implicitly (the winners probably have better quality selection, larger audience, etc), but they get some explicit points for the sole fact that they cooperated with established winners. For example, some grants may be awarded based on your articles in "current contents" journals. As a scientist you often do not know which grants you will need in future, so you try to maximize all criteria by ignoring the non-winners, even if they would treat you more nicely.

Thanks for clarifying how this works!

To be honest, the more information I get on this topic, the more confused I am. There seems to be too many factors at play, and it is difficult to judge their relative importance. I believe what I wrote is true; problem is, the important parts may be still missing. Here are the little details I learned yesterday:

The system encourages winners and sabotages new players. Yet, there is more than one winner per field. Depending on the field, there are like 15 journals that can give you points for publishing. Is this competition not enough to bring down prices? Prices can be as high as reader's $30 for reading a single article, or author's $1000 for removing a paywall from their article. I still have problem to believe these numbers. For comparison, $60 gets you a big paper book from the same publisher.

There are some "open journals" that give away articles freely. Before you say "open source FTW", let me remind you that these journals also ask like $1000 from the author. Hard to believe it's justified by their expenses.

Some publishers publish many journals, some of them being winners, some of them non-winners. Perhaps the money gained from winners is used to support non-winners; with the ultimate goal of winning in every scientific field?

These publishers also publish scientific books. Could there be some relation between book costs and article costs? For example, could the articles be very expensive to make the books cheaper? Just a speculation.

Economically speaking, from reader's point of view, each journal is a monopoly. (It does not matter that other journal is cheaper, you cannot buy the same article from another journal.) On the other hand, from publisher's point of view, most articles are commodities. Is perhaps this enough to explain the prices?

Also, in the article market you have very strong third degree price discrimination. For a layman, the cost may be $30 for an article; for a student in poor country it may be like $2 for yearly access to the whole library; for a scientist the cost is virtually zero, because the employer pays the costs. Of course there is a 'black market' solution: make friends at university and let them download the articles for you at lower price. Also if you are a university student, it is good to make friends in other universities, because each university may have access to different databases.

Let's not forget that this whole thing is a signalling market. By publishing in prestigious journals you signal that you are a good scientist. Also, if you are a good scientist, your employer pays the costs. So you don't care about the prices; you may even like them, because they help you signal better.

tl;dr -- Scientific article market is very unusual. Our intuitions about typical markets may be completely wrong.

[-]satt10

Prices can be as high as reader's $30 for reading a single article, or author's $1000 for removing a paywall from their article. I still have problem to believe these numbers.

Those numbers felt a bit high to me too, so I did a very quick search to find something suggesting why an article might cost a thousand-odd dollars. I found an old white paper from the Public Library of Science — it lists some costs that might help to explain why open access publication costs run to four figures.

PLoS reckoned that it might cost PLoS Biology $20 to manage each submitted manuscript given 100 submissions a month. Assuming only 1/10 of those get accepted, that leads to a $200 starting cost per published article.

Published articles also have production costs. For an 11-page article the paper guesstimates costs for:

  • applying pre-editing macros ($10)
  • copy editing ($220)
  • preparing figures ($150)
  • layout ($176 for text, $138 for graphics)
  • handling proofs & corrections ($52)
  • XML markup ($36)
  • PDF creation ($17)
  • figure conversion to JPEG ($18)
  • "XML upload/QC" ($41)
  • article deposit in CrossRef or PubMed Central ($13)

These costs add up to $871, or $1071 after adding $200 for manuscript management. The manuscript management cost seems fair (I can easily picture an editor spending $20 worth of time per manuscript coordinating authors & reviewers), as do the first couple of line items. I'm less sure about the rest, but suspect that cost cutting would leave most of the $1000 price tag intact (and probably all of it if I factor in costs like office space, marketing, non-editorial staff & web hosting). I doubt this automatically justifies charging $30 to buy an already published article, though.

Thanks for the list of production costs; I did not realize all those parts were necessary. Now it makes more sense, though it still seems a bit exaggerated -- I mean, $700 just to proofread and convert one DOC file with pictures into XML + JPEG + PDF? That's a monthly salary of an educated person here in Eastern Europe. Let's be generous and pretend it is a week's work of one person. Sure, if you include office space, marketing, etc., then it grows... but why not use volunteer work instead? You could pay volunteers by giving them a free access into database... eh, now I am probably just trying to deny reality. Anyway, even if we succeeded to reduce the four figures to three, it would not change much.

(Now calm down, breathe deeply, and find out why are you not satisfied, even after you got a rational explanation...)

I think this is what makes it all feel so wrong: We live in the age of internet, in the age of blogs, in the age of free software. You can have a web page for $0, or just a bit more if you need a top-level domain. You can have a CMS or blogging software for $0. You already have a personal computer, and you can have a word processor for $0. You can make a PDF file by clicking on the "export to PDF" button, and then clicking "OK". That's it!

And then we increase the price by $1000, because we require professional book-level quality for the articles. Because "Times New Roman 10pt" just ain't good enough for serious science!

I guess this is where the whole process slowly got out of control. Surely, if you do science, you need to publish. If you publish, there are experts that will make your article nice to read, and it is basically a good thing. But these experts are going to cost you something. Either you will pay the costs, or the readers will. ... And now we ask scientists to pay for the privilege of publishing their discoveries, and we slow down the scientific progress by thousands of paywalls, just to make sure that the science comes in a nice professional PDF layout.

So, as an alternative (maybe it already exists) I would suggest an online journal that publishes any article exactly as they get it. If it is a PDF, publish the PDF. If it is anything else, do a straightforward export to PDF and publish it along with the original files. Have a team of volunteers willing to polish some of those PDFs for free. Then either let authors pay $20 per submission, or solicit for donations to cover manuscript management costs, or again leave the editorial work to volunteers.

EDIT: Alternatively, let authors choose if they want to pay $1000 for having their article edited (because now we know it is an editing cost) or if they prefer to publish their article as it is. This decision should not influence the journal's decision whether the article will be published. For example, the editing should be done by external company. Journal's guardians should not participate in publishing business. Simply, let's separate "PDF layout" business from "scientific article filtering" business; otherwise we have a conflict of interests here.