Critiquing the Standard Analytics Paper (reprise)

5 May 2016 One Comment

This morning I received an email from a senior policy person. They’d read my blog post on Marginal Costs of Article Publishing but they couldn’t seem to get to the original article from Standard Analytics that I was commenting on. I took a look myself and found the following.


If you remember the claim of the original article was that platform and cloud provision costs meant that really the marginal cost of publishing was below $1. My comment was that there were lots of costs that were missing, and that marginal costs per article probably isn’t a good way of analysing the problem anyway. But my first point was:

The first criticism to be made is that there are basic services included in scholarly publishing that have not been included in the calculation. Ironically one of those most important of these is the very system that Standard Analytics are demonstrating with their article, that of hosting and discovery. There is an expectation for modern web based publishers that they will provide some form of web portal for content with some level of functionality. While new options, including overlay journals based on archives, are developing, these do not remove this cost but place it elsewhere in the system. Preservation systems such as CLOCKSS and PORTICO are not designed to be the primary discovery and usage site for articles.

Running a high availability site with medium to high (by scholarly standards) traffic is not free. Calculating a per-article marginal cost for hosting is not straightforward, particularly given a lack of standards for the timeframe over which server costs should be amortised (how long do articles need to be available?) and the limited predictability of future bandwidth and storage costs.

But what does this tell us? Keeping a site up is hard. The scholarly community has expectations of availability and preservation, neither of which were met in this case. These cost money, and that cost is not easily calculated on a per-article basis, precisely because it isn’t really a per-article cost. It’s a platform cost.

Google didn’t help much, pointing me me to things that pointed to the dead link. There wasn’t an obvious backup copy anywhere. I did eventually find a copy of the article on the Internet Archive. But that was captured by another infrastructure, one that also costs money and one that relies on donations and grants to keep it running.

The real story here is not that Standard Analytics did a bad job. Something went wrong, I’m sure they’ll fix it shortly. It’s that it is hard work to meet the standards we expect for scholarly content. And that the systems that provide the levels of service that we expect are invisible until they go wrong. That’s why infrastructure is so important. And a lot of the “extra” cost that people complain about when they see prices over $10-50 (or $100 or $400) is going on different kinds of infrastructure.

Sure we can do it cheaper. We just need to discuss whether the service level we’ll get when things break is something we can live with.