Chapter, Verse, and CHORUS: A first pass critique

And this is the chorus
This is the chorus
It goes round and around and gets into your brain
This is the chorus
A fabulous chorus
And thirty seconds from now you’re gonna to hear it again

This is the Chorus - Morris Major and the Minors

The Association of American Publishers have launched a response to the OSTP White House Executive Order on public access to publicly funded research. In this they offer to set up a registry or system called CHORUS which they suggest can provide the same levels of access to research funded by Federal Agencies as would the widespread adoption of existing infrastructure like PubMedCentral. It is necessary to bear in mind that this substantially the same group that put together the Research Works Act, a group with a long standing, and in some cases personal, antipathy to the success of PubMedCentral. There is therefore some grounds for scepticism about the motivations of the proposal.

However here I want to dig a bit more into the details of whether the proposal can deliver. I will admit to being sceptical from the beginning but the more I think about this, the more it seems that either there is nothing there at all –  just a restatement of already announced initiatives – or alternately the publishers involved are setting themselves up for a potentially hugely expensive failure. Let’s dig a little deeper into this to see where the problems lie.

First the good bits. The proposal is to leverage FundRef to identify federally funded research papers that will be subject to the Executive Order. FundRef is a newly announced initiative from CrossRef which will include Funder grant information within the core metadata that CrossRef collects and can provide to users and will start to address the issues of data quality and completeness. To the extent that this is a commitment from a large group of publishers to support FundRef it is a very useful step forward. Based on the available funding information the publishers would then signal that these papers are accessible and this information would be used to populate a registry. Papers that are in the registry would be made available via the publisher websites in some manner.

Now the difficulties. You will note two sets of weasel words in the previous paragraph: “…the available funding information…” and “…made available via the publisher websites in some manner”. The second is really a problem for the publishers but I think a much bigger one than they realise. Simply making the version of record available without restrictions is “easy” but ensuring that access works properly in the context of a largely paywalled corpus is not as easy as people tend to think. Nature Publishing Group have spent years sorting out the fact that every time they do a system update that they remove access to the genome papers that are supposed to be freely accessible. If publishers decide they just want to make the final author manuscripts available then they will have to build up a whole parallel infrastructure to provide these – an infrastructure that will look quite a lot like PubMedCentral in fact, leading to potential duplication of effort and potential costs. This is probably less of an issue for the big publishers but for small publishers could become a real issue.

Bad for the agencies

But its the first set of weasel words that are the most problematic. The whole of CHORUS seems to be based on assumption that the FundRef information will be both accurate and complete. Anyone who has dealt with funding information inside publication workflows knows this is far from true. Comparison of funder information pulled from different sources can give nearly disjunct sets. And we know that authors are terrible at giving the correct grant codes when they can bothered including them at all. The Executive Order and FASTR put the agencies on the hook to report on success, compliance, and the re-use of published content. It is the agencies who get good information in the long term on the outputs of projects they fund – information that is often at odds with what is reported in the acknowledgement sections of papers.

Put this issue of data quality alongside the fact that the agencies will be relying on precisely those organisations that have worked to prevent, limit, and where that failed slow down the widening of public access and we have a serious problem of mismatched incentives. For the publishers there is direct incentive to fail to solve the data quality issue at the front end – it lets them make less papers available. The agencies are not in a position to force this issue at paper submission because their data isn’t complete until the grant finally reports. The NIH already has high compliance and an operating system, precisely because they couple grant reports to deposition. Other agencies will struggle to catch up using CHORUS and will deliver very poor compliance based on their own data. This is not a criticism of FundRef incidentally. FundRef is a necessary and well designed part of the effort to solve this problem in the longer term – but it is going to take years for the necessary systems changes to work their way through and there a big changes required to submission and editorial management systems to make this work well. And this brings us to the problems for publishers.

Bad for the publishers

If the agencies agree to adopt CHORUS they will do so with these issues very clear in their minds. The Office of Management and Budget oversight means that agencies have to report very closely on cost-benefit analyses for new projects. This alongside the issues with incentive misalignment, and just plain lack of trust, means that the agencies will do two things: they will insist that the costs are firewalled onto the publisher side, and they will put strong requirements on compliance levels and completeness. If I were an agency negotiator I would place a compliance requirement of 60% on CHORUS in year one rising to 75% and 90% in years two and three and stipulate that that compliance will be measured against final grant reports on an ongoing basis. Where compliance didn’t meet the requirements the penalty would be for all the relevant papers from that publisher to be placed in PubMedCentral at the publisher’s expense. Even if they’re not this tough they are certainly going to demand that the registry be updated to include all the papers that got missed at the publisher’s expense necessitating an on-going manual grind of metadata update, paper corrections, index notifications. Bear in mind that if we generously assume that 50% of submitted papers have good grant metadata and the US agencies contribute to around 25% of all global publications that this means around 10% of the entire corpus will need to be updated year on year, probably through a process of semi-automated and manual reconciliation. If you’ve worked with agency data then you know its generally messy and difficult to manage – this is being worked on by building shared repositories and data systems that leverage a lot of the tooling provided by PubMed and PubMedCentral.

Alternately this could be a “triggering event” meaning that content would become available in the archives like CLOCKSS and PORTICO because access wasn’t properly provided. Putting aside the potential damage to the publisher brand if this happens, and the fact that it destroys the central aim of CHORUS – to control the dissemination path – this will also cost money. These archives are not well set up to provide differential access to triggered content, they release whole journals when a publisher goes bust. It’s likely that a partial trigger would require specialist repository sites to be set up to serve the content – again sites that would like an awful lot like PubMedCentral. The process is likely to lead to significantly more trigger events, requiring these dark repositories to function more actively as publishers, raising costs, and requiring them to build up repositories to serve content that would look an awful lot like…well you get the idea.

Finally there is the big issue – this puts the costs of improving funding data collection firmly in the hands of CHORUS publishers and means it needs to be done extremely rapidly. This work needs to be done, but it would be much better done through effective global collaboration between all funders, institutions and publishers. What CHORUS has effectively done is offer to absorb the full cost of this transition. As noted above the agencies will firewall their contributions. You can bet that institutions – for whom CHORUS will not assist and might hamper their efforts to ensure the collection of research outputs – will not pay for it through increased subscriptions. And publishers who don’t want to engage with CHORUS will be unlikely to contribute. It’s also almost certain that this development process will be rushed and ham fisted and irritate authors even more than they already are by current submission systems.

Finally of course a very large proportion of federal money moves through the NIH. The NIH has a system in place, it works, and they’re not about to adopt something new and unproven, especially given the popularity of PubMedCentral as demonstrated by the public response to the Research Works Act. So publishers will have to maintain dual systems anyway – indeed the most likely outcome of CHORUS will be to make it easier for authors to deposit works into PubMedCentral, and easier for the NIH to prod them into doing so raising the compliance rates for the NIH policy and making them look even better on the annual reports to the White House, leading ultimately to some sharp questions about why agencies didn’t adopt PMC in the first place.

Bad for the user

From the perspective of an Open Access advocate putting access into the hands of publishers who have actively worked to limit access and invested vast sums of money in systems to limit and control access seems a bad idea. But that’s a personal perspective – the publishers in questions will say they are guiding these audiences to the “right” version of papers in the best place for them to consume it. But lets look at the incentives for the different players. The agencies are on the hook to report on usage and impact of their work. They have the incentives to insure that whatever systems are in place work well and provide access well. Subscription publishers? They have a vested interest in trying to show there is a lack of public interest, in tweaking embargoes so as to only make things available after interest has waned, in providing systems that are poorly resourced so page loads are slow, and in general making the experience as poor as possible. After all if you need to show you’re adding value with your full cost version, then its really helpful to be in complete control of the free version so as to cripple it. On the plus side it would mean that these publishers would almost certainly be forced to provide detailed usage information which would be immensely valuable.

…which is bad for the publishers…

The more I think about this, the less it seems to have been thought through in detail. Is it just a commitment to use FundRef? This would be a great step but it goes nowhere near even beginning to satisfy the White House requirements. If its more than that what is it? A registry? But that requires a crucial piece of metadata, which appears as “Licence Reference” in the diagram, that is needed to assert things are available. This hasn’t been agreed yet (I should know, I’ve been involved in drafting the description). And even when it is no piece of metadata can make sure access actually happens. Is it a repository that would guarantee access? No – that’s what the CHORUS members hate above all other things. Is it a firm contractual commitment to making those articles with agency grant numbers attached available? Not that I’ve seen, but even it were it wouldn’t address the requirements of either the Executive Order or FASTR. As noted above, the mandate applies to all agency funded research, not just those where the authors remembered to put in all the correct grant numbers.

Is it a commitment to ensuring the global collection of comprehensive grant information at manuscript submission? With the funding to make it happen – and the funding to ensure the papers become available - and real penalties if it doesn’t happen? With provision of comprehensive usage data for both subscription and freely available content? This is the only level at which the agencies will bite. And this is a horrendous and expensive can of worms.

In the UK we have a Victorian infrastructure for delivering water. It just about works but a huge proportion of the total just leaks out of the pipes – its not like we have a shortage of rain but when we have a “drought” we quickly run into serious problems. The cost of fixing the pipes? Vastly more than we can afford. What I think happened with CHORUS is what happens with a lot industry wide tech projects. Someone had a bright idea, and went to each player asking them whether they could deliver their part of the pipeline. Each player has slightly overplayed the ease of delivery, and slightly underplayed the leakage and problems. A few percent here and a few percent there isn’t a problem for each step in isolation – but along the whole pipeline it adds up to the point where the whole system simply can’t deliver. And delivering means replacing the whole set of pipes.

 

Enhanced by Zemanta

Update on publishers and SOPA: Time for scholarly publishers to disavow the AAP

Canute and his courtiers
Image via Wikipedia

In my last post on scholarly publishers that support the US Congress SOPA bill I ended up making a series of edits. It was pointed out to me that the Macmillan listed as a supporter is not the Macmillan that is the parent group of Nature Publishing Group but a separate U.S. subsidiary of the same ultimate holding company, Holtzbrinck. As I dug further it became clear that while only a small number of scholarly publishers were explicitly and publicly supporting SOPA, many of them are members of the Association of American Publishers, which is listed publicly as a supporter.

This is a little different to directly supporting the act. The AAP is a membership organisation that represents its members (including Nature Publishing Group, Oxford University Press, Wiley Blackwell and a number of other familiar names, see the full list at the bottom) to – amongst others – the U.S. government. Not all of its positions would necessarily be held by all its members. However, neither have any of those members come out and publicly stated that they disagree with the AAP position. In another domain Kaspersky software quit the Business Software Alliance over the BSA’s support of SOPA, even after the BSA withdrew its support.

I was willing to give AAP members some benefit of the doubt, hoping that some of them might come out publicly against SOPA. But if that was the hope then the AAP have just stepped over the line. In a spectacularly disingenuous press release the AAP claims significant credit for a new act just submitted to the U.S. Congress. This, in a repeat of some previous efforts, would block any efforts on the part of U.S. federal agencies to enact open access policies, even to the extent of blocking them from continuing to run the spectacularly successful PubMedCentral. That this comes days before the deadline for a request for information on the development of appropriate and balanced policies that would support access to the published results of U.S. taxpayer-funded research is a calculated political act, an abrogation of any principled stance, and clear signal of a lack of any interest in a productive discussion on how to move scholarly communications forward into a networked future.

I was willing to give AAP members some space. Not any more. The time has come to decide whether you want to be part of the future of research communication or whether you want to legislate to try and stop that future happening. You can be part of that future or you can be washed into the past. You can look forward or you can be part of a political movement working to rip off the taxpayers and charitable donors of the world. Remember that the profits alone of Elsevier and Springer (though I should be cutting Springer a little slack as they’re not on the AAP list – the one on the list is a different Springer) could fund the publication of every paper in the world in PLoS ONE. Remember that the cost of putting a SAGE article on reserve for a decent sized class or of putting a Taylor and Francis monograph on reserve for a more modest sized one at one university is more than it would cost to publish them in most BioMedCentral journals and make them available to all.

Ultimately this legislation is irrelevant – the artificial level of current costs of publication and the myriad of additional charges that publishers make for this, that, and the other (Colour charges? Seriously?) will ultimately be destroyed. The current inefficiencies and inflated markups cannot be sustained. The best legislation can do is protect them for a little longer, at the cost of damaging the competitiveness of the U.S. as a major player in global research. With PLoS ONE rapidly becoming a significant proportion of the world’s literature on its own and Nature and Science soon to be facing serious competition at the top end from an OA journal backed by three of the most prestigious funders in the world, we are moving rapidly towards a world where publishing in a subscription journal will be foolhardy at best and suicidal for researchers in many fields. This act is ultimately a pathetic rearguard action and a sign of abject failure.

But for me it is also a sign that the rhetoric of being supportive of a gradual managed change to our existing systems, a plausible argument for such organisations to make, is dead for those signed up to the AAP. Publishers have a choice – lobby and legislate to preserve the inefficient, costly, and largely ineffective status quo – or play a positive part in developing the future.

I don’t expect much; to be honest I expect deafening silence as most publishers continue to hope that most researchers will be too buried in their work to notice what is going on around them. But I will continue to hope that some members of that list, the organisations that really believe that their core mission is to support the most effective research communication – not that those are just a bunch of pretty words that get pulled out from time to time – will disavow the AAP position and commit to a positive and open discussion about how we can take the best from the current system and make it work with the best we can with the technology available. A positive discussion about managed change that enables us to get where we want to go and helps to make sure that we reap the benefits when we get there.

This bill is self-defeating as legislation but as a political act it may be effective in the short term. It could hold back the tide for a while. But publishers that support it will ultimately get wiped out as the world moves on and they spend so time pushing back the tide that they miss the opportunity to catch up. Publishers who move against the bill have a role to play in the future and are the ones with enough insight to see the way the world is moving. And those publishers who sit on the sidelines? They don’t have the institutional capability to take the strategic decisions required to survive. Choose.

Update: An interesting parallel post from John Dupuis and a trenchant expose (we expect nothing less) from Michael Eisen. Jon Eisen calls for people at the institutions and organisations with links to AAP to get on the phone and ask for them to resign from AAP. Lots of links appearing at this Google+ post from Peter Suber.

Enhanced by Zemanta
The List of AAP Members from http://www.publishers.org/members/psp/