Standard Analytics have released a very useful paper looking at platform costs for scholarly publishing. In the paper (which coincidentally demonstrates part of the platform system they are developing) they make some strong claims about the base cost of publishing scholarly articles. In this post I will critique those claims and attempt to derive a cost that fully represents the base marginal cost of article publishing, while pointing out that such average estimates are probably not very useful. The central point is that the paper shows not marginal costs but (a proportion of) the per particle technical platform costs. It is however the case that their central point, that modular, low cost and flexible platforms that create efficiencies of scale, offer the opportunity for radically cheaper scholarly publishing systems.
Missing platform costs
The first criticism to be made is that there are basic services included in scholarly publishing that have not been included in the calculation. Ironically one of those most important of these is the very system that Standard Analytics are demonstrating with their article, that of hosting and discovery. There is an expectation for modern web based publishers that they will provide some form of web portal for content with some level of functionality. While new options, including overlay journals based on archives, are developing, these do not remove this cost but place it elsewhere in the system. Preservation systems such as CLOCKSS and PORTICO are not designed to be the primary discovery and usage site for articles.
Running a high availability site with medium to high (by scholarly standards) traffic is not free. Calculating a per-article marginal cost for hosting is not straightforward, particularly given a lack of standards for the timeframe over which server costs should be amortised (how long do articles need to be available?) and the limited predictability of future bandwidth and storage costs. Many per-article costs quoted online make the common assumption that storage and server costs are dropping faster than the rate of increase in the size of content stores i.e. that the ongoing marginal cost of hosting an article can be neglected. The article would in any case be substantially improved by including per-article costs of hosting from providers such as Atypon, Highwire etc. I would guess these would add $50-150 to the headline costs depending on volume.
However the calculation of this is difficult as an enormous amount of publisher costs in using third party providers are in updates and enhancements. The article explicitly neglects set-up and development costs for third party peer-review management systems, focussing instead on marginal costs. However updates, enhancements, data migrations and similar processes are a core part of keeping publisher offerings up to date. Neglecting this element, which is similarly neglected in the infrastructure costings by dropping out the cost of developing the putative system, will lead to substantial under estimates not just of amortised startup costs but also of ongoing maintenance.
Missing marginal costs
The article describes its focus as being on marginal costs, but it is actually on technical platform and service costs. The paper neglects the tradeoff between academic labour and publisher systems. Martin Eve in his (now a little dated) series on setting up a journal notes that the annual cash costs for running a journal can be very low ($350 a year in his estimate of cash costs in 2012) but he also notes the time commitments involved. Typesetting can be automated but running it on a manuscript and checking the output might take an hour per article if working at a small scale, adding up to as much as $100 ($200,000 full economic cost of an academic divided by a 2,000 hour work year). And as I’ve noted elsewhere, thinking about “average times” is very misleading here, the real time sink is the unusual articles that require substantially more work. Paying at the high end of the ranges of the typesetting services described in the paper reduces the requirements for publisher (or academic) investment in running these systems and doing quality assurance but it doesn’t remove them. At the other end, taking a word document and printing to PDF will work, but doesn’t generate an article format suitable for a modern web based world.
Of course the article also misses the largest marginal cost, that of peer review itself, both the management and the reviewing. We have very little information in practice on how these costs are spread between professional publisher provided services and the academic community. It is difficult to estimate costs on the academic side (again the issue of whether “averages” are helpful raises its head) and publishers provide little or no information on their costs. More importantly we have no real idea of how investment on one side of the fence reduces (or perhaps increases!) the costs on the other. It might be argued that professional editors within a publisher should be more efficient than academic editors, being professionally focussed on a single task rather than spread between many. And of course the inverse is often argued, that professional editors do not create the same value (or are “doing the wrong thing”) because they are not academic community members. Of course both of these can simultaneously be true, for various interpretations of “efficient” and “value”.
The ongoing cost of development is part of the cost of doing business, open source or third party provided
The biggest issue in the paper is the way that infrastructure costs are presented. Firstly it removes the costs of development of the system from the equation. It might well be the case that grant funded, or donated technologies will evolve where the costs don’t need to be recouped from users. That doesn’t mean those costs don’t exist, just that someone else is footing the bill. But its important to note that Standard Analytics intend to make an offering in this space and they will presumably be looking to recoup at least some of their investment. What they are doing looks interesting and I don’t doubt they could substantially undercut incumbent providers, but it will be worth more than $1 per article, precisely because they’ll be offering additional services and probably customisations.
It is generally the case that in making the choice between a third party provider with a proprietary platform and running an open source platform in house that the balance sheet cash costs are approximately the same. What you pay in license or service fees to a provider tends to cost about the same in cash terms on a day to day basis as having the expertise and capacity in house. Service providers reap efficiencies of scale which they convert into their margin. There can be substantial non-cash benefits to having the capacity in house but its also a liability in terms of overheads and staff obligations. Rarely is Open Source cheap. In the particular case of scholarly publishing platforms there is a possibility for new platforms to be cheaper because they are generally more modern, more flexible and easier to work with than the proprietary systems of third party providers. That doesn’t make them free, and certainly doesn’t mean the cost of running them in house is the same as the cloud provisioning costs. Indeed most publishers are likely to continue to outsource these services, whether the platforms are proprietary or open source. Open Source will mainly serve as a brand advantage, offering confidence that there will be none of the vendor lock-in that plagues the space.
The real distinction is not between Open Source and proprietary but between a new generation of platforms that are built for and of the web, and old systems based on pre-web document management systems. Here is where I agree with the paper. Modular and flexible systems, particularly those that exploit the possibilities of web-native authoring and information management, have a massive potential to reduce the overall costs of scholarly publishing. Lots of the pieces are falling into place, including work being done by Standard Analytics, the Collaborative Knowledge Foundation, PLOS (Aperta), eLife (Lens), Overleaf, Substance, FidusWriter, Scholastica, OpenJournal and others I’ve no doubt forgotten. What is interesting is that everyone misses the fact that these pieces don’t yet join up, that the costs of developing standards to make these pieces fit together will also be part of the puzzle.
Ongoing development, customisation and improvement is a cost that doesn’t go away. Unless we are prepared to draw a line and say that scholarly communications is finished there will always be ongoing costs. These overheads can be hidden in grants or subsidies, or appear on the balance sheet as “innovation” or as part of general staff costs and overheads, or they can be bundled up with per article costs but they don’t go away. How they are accounted is largely an arbitrary choice of the organisation, not a distinction in actual activities, efficiency or margins.
Inefficient vs legacy technology vs unnecessary
It’s easy to add up the obvious costs of bits of a scholarly communication pipeline and observe that they come to less than what people generally charge. Academics often have a habit of simply deciding any bit that isn’t obvious is unnecessary. This is often built on arrogance and ignorance. Equally publishers often have a habit of defending the status quo as the way it has to be without actually addressing whether a given component is necessary. We need a much better conversation about what all the pieces are, and how much value all of them add, from discovery layers and marketing, through typesetting, copy editing, and pagination, to the costs created by the continued insistence on using Word and LaTeX.
A lot of the costs are tied up with inefficiencies and misunderstandings that lead to an attitude of “if I want to do it properly I’ll have to do it myself” on all sides of the academic/publisher/system provider divide. An enormous amount of work arounds, patch jobs and replication go on, and the work arounds generate further work arounds as they fail to engage with the needs of another group of stakeholders. As noted above we have no real understanding of how investments by academic editors in the form of time lead to savings on the publisher side, and vice versa. And a lot of those inefficiencies relate to technology, which as I noted previously is difficult to replace piece by piece.
There are savings to be made, but to make them requires unpicking these three separate issues. What services and outcomes are needed (and by what communities)? Which of those services and outcomes are currently being delivered inefficiently? And of those, where can new technology make a difference? And for all of these what are the social and political challenges in actually achieving change?
Conclusion
This is a very useful new paper that gives up to date prices for vendors for some (but not all) important parts of the scholarly communications pipeline. Adding hosting costs as well as costs of providing indexing and discovery tools would be a valuable addition. The headline prices quoted cannot be relied on, although they provide some useful lower bounds for a traditional technical platform service. It would be very valuable if the differentials in pricing, particularly for typesetting were broken out in terms of differences in service.
What this kind of analysis doesn’t, and can’t, provide is an understanding of how costs are coupled through the pipeline. Many of the costs of typesetting are due to ameliorating the problems of authoring platforms. How do poor manuscript tracking systems add to the costs of managing and conducting review? How could professional editors and support staff work more effectively with academic editors and reviewers to maximise value and reduce costs all around? And how might different service configurations actually align “publisher” and “academic” interests so as to jointly reap the benefits of potential savings rather than obfuscating the costs from each other?
For what its worth I’d guess the likely base level cost of a modern platform for a journal publishing 50-500 articles a year providing submission through publication in JATS XML where the selection (and risk of mistakes) are loaded heavily on the academic community at around $450-550. This is more or less the Ubiquity Press model. If you want to protect the academic community from the more clever forms of fraud, less obvious ethical breaches and the brand risk of making mistakes that price will rise several hundred dollars. On a large scale this is what PLOS ONE and (on a smaller scale and with less of the cash costs on the balance sheet) what PeerJ do. Scale matters, smaller than 50 you are running an operation on spare academic time, probably highly inefficiently. More than 500 requires scale and professionalisation, which in turn moves both risks and costs from the community side to the balance sheet as I have discussed previously.
A good chunk of that $500 is tied up in dealing with legacy technical and workflow issues. New technology will help here but implementing it has serious challenges for both development and adoption. Funding systems as shared infrastructure rather than on a recurrent basis per article also offers savings. The challenge lies in finding technological solutions and economic models that allow research communities to insource what they need, taking advantage of the subsidy we can provide through our time in highly specific niche spaces, while gaining the economies of scale that shared systems can provide. And we need much better numbers to do that. Perhaps more importantly a truly modular system, with the competition between modules that will drive down prices, and show the real potential of Open Source approaches requires much greater community coordination on interchange standards.
The real trick with this kind of analysis is being able to figure out whether there are true like-for-like efficiency savings, or whether the costs are just being pushed elsewhere. This paper tells us about some of the cost elements which will be helpful in tackling both these questions, but it doesn’t provide the answers.