Open Access Progress: Anecdotes from close to home

nanoparticles
Solution extinction measurements of (A) faceted (B) and Au@MNPs and (C) photos of the particles. From Silva et al Chem. Commun., 2016 DOI:10.1039/C6CC03225G

It has become rather fashionable in some circles to decry the complain about the lack of progress on Open Access. Particularly to decry the apparent failure of UK policies to move things forward. I’ve been guilty of frustration at various stages in the past and one thing I’ve always found useful is thinking back to where things were. So with that in mind here’s an anecdote or two that suggests not just progress but a substantial shift in the underlying practice.

I live with a chemist, a group not known for their engagement with Open Access. More than most other disciplines in my experience there is a rigid hierarchy of journals, a mechanistic view of productivity, and – particularly in those areas not awash with pharmaceutical funding – not a huge amount of money. Combine that with a tendency to think everything is – or at least should be – patentable (which tends to rule out preprints) and this is not fertile ground for OA advocacy.

Over the years we’ve had our fair share of disagreements. A less than ideal wording on the local institutional mandate meant that archiving was off the menu for a while (the agreement to deposit required all staff to deposit but also required the depositor to take personal responsibility for any copyright breaches) and a lack of funds (and an institutional decision to concentrate RCUK funds and RSC vouchers on only the journals at the top of that rigid hierarchy) meant that OA publication in the journals of choice was not feasible either. That argument about whether you choose to pay an APC or buy reagents for the student was not a hypothetical in our household.

But over the past year things have shifted. A few weeks ago: “You know, I just realised my last two papers were published Open Access”. The systems and the funds are starting to work, are starting to reach even into those corners  of resistance, yes even into chemistry. Yes it’s still the natural sciences, and yes it’s only two articles out of who knows how many (I’m not the successful scientist in the house), buts its a quite substantial shift from it being out totally out of the question.

But around about the same time something that I found even more interesting. Glimpsed over a shoulder I saw something I found odd…searching on a publisher website, which is strange enough, and also searching only for Open Access content. A query raised the response: “Yeah, these CC BY articles is great, I can use the images directly in my lectures without having to worry; I just cite the article, which after all I would have obviously done anyway”. It turns out that with lecture video capture now becoming standard universities are getting steadily more worried about copyright. The Attribution licensed content meant there was no need to worry.

Sure these are just anecdotes but they’re indicative to me of a shift in the narrative. A shift from “this is expensive and irrelevant to me” to “the system takes care of it and I’m seeing benefits”. Of course we can complain that its costing too much, that much of the system is flakey at best and absent at worst, or that the world could be so much better. We can and should point to all the things that are sub-optimal. But just as the road may stretch out some distance ahead, and there may be roadblocks and barriers in front of us, there is also a long stretch of road behind, with the barriers cleared or overcome.

As much as anything it was the sense of “that’s just how things are now” that made me feel like real progress has been made. If that is spreading, even if slowly, then the shift towards a new normal may finally be underway.

Speaking at City University London for OA Week

Ernesto Priego has invited me to speak at City University in London on Thursday the 22nd October as part of Open Access Week. I wanted to pull together a bunch of the thinking I’ve been doing recently around Open Knowledge in general and how we can get there from here. This is deliberately a bit on the provocative side so do come along to argue! There is no charge but please register for the talk.


The Limits of “Open”: Why knowledge is not a public good and what to do about it

A strong argument could be made that efforts to adopt and require Open Access and Open Data in the 21st Century research enterprise is really only a return to the 17th Century values that underpinned the development of modern scholarship. But if that’s true why does it seem so hard? Is it that those values have been lost, sacrificed to the need to make a limited case for why scholarship matters? Or is something more fundamentally wrong with our community?

Drawing on strands of work from economics, cultural studies, politics and management I will argue that to achieve the goals of Open Knowledge we need to recognise that they are unattainable. That knowledge is not, and never can be, a true public good. If instead we accept that knowledge is by its nature exclusive, and therefore better seen as a club good, we can ask a more productive question.

How is it, or can it be, in the interests of communities to invest in making their knowledge less exclusive and more public? What do they get in return? By placing (or re-placing) the interests of communities at the centre we can understand, and cut through, the apparent dilemma that “information wants to be free” but that it also “wants to be expensive”. By understanding the limits on open open knowledge we can push them, so that, in the limit, they are as close to open as they can be.

Fork, merge and crowd-sourcing data curation

I like to call this one "Fork"

Over the past few weeks there has been a sudden increase in the amount of financial data on scholarly communications in the public domain. This was triggered in large part by the Wellcome Trust releasing data on the prices paid for Article Processing Charges by the institutions it funds. The release of this pretty messy dataset was followed by a substantial effort to clean that data up. This crowd-sourced data curation process has been described by Michelle Brook. Here I want to reflect on the tools that were available to us and how they made some aspects of this collective data curation easy, but also made some other aspects quite hard.

The data started its life as a csv file on Figshare. This is a very frequent starting point. I pulled that dataset and did some cleanup using OpenRefine, a tool I highly recommend as a starting point for any moderate to large dataset, particularly one that has been put together manually. I could use OpenRefine to quickly identify and correct variant publisher and journal name spellings, clean up some of the entries, and also find issues that looked like mistakes. It’s a great tool for doing that initial cleanup, but its a tool for a single user, so once I’d done that work I pushed my cleaned up csv file to github so that others could work with it.

After pushing to github a number of people did exactly what I’d intended and forked the dataset. That is, they took a copy and added it to their own repository. In the case of code people will fork a repository, add to or improve the code, and then make a pull request that gives the original repository owner that there is new code that they might want to merge into their version of the codebase. The success of github has been built on making this process easy, even fun. For data the merge process can get a bit messy but the potential was there for others to do some work and for us to be able to combine it back together.

But github is really only used by people comfortable with command line tools – my thinking was that people would use computational tools to enhance the data. But Theo Andrews had the idea to bring in many more people to manually look at and add to the data. Here an online spreadsheet such as those provided by GoogleDocs that many people can work with is a powerful tool and it was through that adoption of the GDoc that somewhere over 50 people were able to add to the spreadsheet and annotate it to create a high value dataset that allowed the Wellcome Trust to do a much deeper analysis than had previously been the case. The dataset had been forked again, now to a new platform, and this tool enabled what you might call a “social merge” collecting the individual efforts of many people through an easy to use tool.

The interesting thing was that exactly the facilities that made the GDoc attractive for manual crowdsourcing efforts made it very difficult for those of us working with automated tools to contribute effectively. We could take the data and manipulate it, forking again, but if we then pushed that re-worked data back we ran the risk of overwriting what anyone else had done in the meantime. That live online multi-person interaction that works well for people, was actually a problem for computational processing. The interface that makes working with the data easy for people actually created a barrier to automation and a barrier to merging back what others of us were trying to do. [As an aside, yes we could in principle work through the GDocs API but that’s just not the way most of us work doing this kind of data processing].

Crowdsourcing of data collection and curation tends to follow one of two paths. Collection of data is usually done into some form of structured data store, supported by a form that helps the contributor provide the right kind of structure. Tools like EpiCollect provide a means of rapidly building these kinds of projects. At the other end large scale data curation efforts, such as GalaxyZoo, tend to create purpose built interfaces to guide the users through the curation process, again creating structured data. Where there has been less tool building and less big successes are the space in the middle, where messy or incomplete data has been collected and a community wants to enhance it and clean it up. OpenRefine is a great tool, but isn’t collaborative. GDocs is a great collaborative platform but creates barriers to using automated cleanup tools. Github and code repositories are great for supporting the fork, work, and merge back patterns but don’t support direct human interaction with the data.

These issues are part of a broader pattern of issues with the Open Access, Data, and Educational Resources more generally. With the right formats, licensing and distribution mechanisms we’ve become very very good at supporting the fork part of the cycle. People can easily take that content and re-purpose it for their own local needs. What we’re not so good at is providing the mechanisms, both social and technical, to make it easy to contribute those variations, enhancements and new ideas back to the original resources. This is both a harder technical problem and challenging from a social perspective. Giving stuff away, letting people use it is easy because it requires little additional work. Working with people to accept their contributions back in takes time and effort, both often in short supply.

The challenge may be even greater because the means for making one type of contribution easier may make others harder. That certainly felt like the case here. But if we are to reap the benefits of open approaches then we need to do more than just throw things over the fence. We need to find the ways to gather back and integrate all the value that downstream users can add.

Enhanced by Zemanta

Open is a state of mind

English: William Henry Fox Talbot's 'The Open ...
English: William Henry Fox Talbot’s ‘The Open Door’ (Photo credit: Wikipedia)

“Open source” is not a verb

Nathan Yergler via John Wilbanks

I often return to the question of what “Open” means and why it matters. Indeed the very first blog post I wrote focussed on questions of definition. Sometimes I return to it because people disagree with my perspective. Sometimes because someone approaches similar questions in a new or interesting way. But mostly I return to it because of the constant struggle to get across the mindset that it encompasses.

Most recently I addressed the question of what “Open” is about in a online talk I gave for the Futurium Program of the European Commission (video is available). In this I tried to get beyond the definitions of Open Source, Open Data, Open Knowledge, and Open Access to the motivation behind them, something which is both non-obvious and conceptually difficult. All of these various definitions focus on mechanisms – on the means by which you make things open – but not on the motivations behind that. As a result they can often seem arbitrary and rules-focussed, and do become subject to the kind of religious wars that result from disagreements over the application of rules.

In the talk I tried to move beyond that, to describe the motivation and the mind set behind taking an open approach, and to explain why this is so tightly coupled to the rise of the internet in general and the web in particular. Being open as opposed to making open resources (or making resources open) is about embracing a particular form of humility. For the creator it is about embracing the idea that – despite knowing more about what you have done than any other person –  the use and application of your work is something that you cannot predict. Similarly for someone working on a project being open is understanding that – despite the fact you know more about the project than anyone else – that crucial contributions and insights could come from unknown sources. At one level this is just a numbers game, given enough people it is likely that someone, somewhere, can use your work, or contribute to it in unexpected ways. As a numbers game it is rather depressing on two fronts. First, it feels as though someone out there must be cleverer than you. Second, it doesn’t help because you’ll never find them.

Most of our social behaviour and thinking feels as though it is built around small communities. People prefer to be a (relatively) big fish in a small pond, scholars even take pride in knowing the “six people who care about and understand my work”, the “not invented here” syndrome arises from the assumption that no-one outside the immediate group could possibly understand the intricacies of the local context enough to contribute. It is better to build up tools that work locally rather than put an effort into building a shared community toolset. Above all the effort involved in listening for, and working to understand outside contributions, is assumed to be wasted. There is no point “listening to the public” because they will “just waste my precious time”. We work on the assumption that, even if we accept the idea that there are people out there who could use our work or could help, that we can never reach them. That there is no value in expending effort to even try. And we do this for a very good reason; because for the majority of people, for the majority of history it was true.

For most people, for most of history, it was only possible to reach and communicate with small numbers of people. And that means in turn that for most kinds of work, those networks were simply not big enough to connect the creator with the unexpected user, the unexpected helper with the project. The rise of the printing press, and then telegraph, radio, and television changed the odds, but only the very small number of people who had access to these broadcast technologies could ever reach larger numbers. And even they didn’t really have the tools that would let them listen back. What is different today is the scale of the communication network that binds us together. By connecting millions and then billions together the probability that people who can help each other can be connected has risen to the point that for many types of problem that they actually are.

That gap between “can” and “are”, the gap between the idea that there is a connection with someone, somewhere, that could be valuable, and actually making the connection is the practical question that underlies the idea of “open”. How do we make resources, discoverable, and re-usable so that they can find those unexpected applications? How do we design projects so that outside experts can both discover them and contribute? Many of these movements have focussed on the mechanisms of maximising access, the legal and technical means to maximise re-usability. These are important; they are a necessary but not sufficient condition for making those connections. Making resources open enables, re-use, enhances discoverability, and by making things more discoverable and more usable, has the potential to enhance both discovery and usability further. But beyond merely making resources open we also need to be open.

Being open goes in two directions. First we need to be open to unexpected uses. The Open Source community was first to this principle by rejecting the idea that it is appropriate to limit who can use a resource. The principle here is that by being open to any use you maximise the potential for use. Placing limitations always has the potential to block unexpected uses. But the broader open source community has also gone further by exploring and developing mechanisms that support the ability of anyone to contribute to projects. This is why Yergler says “open source” is not a verb. You can license code, you can make it “open”, but that does not create an Open Source Project. You may have a project to create open source code, an “Open-source project“, but that is not necessarily a project that is open, an “Open source-project“. Open Source is not about licensing alone, but about public repositories, version control, documentation, and the creation of viable communities. You don’t just throw the code over the fence and expect a project to magically form around it, you invest in and support community creation with the aim of creating a sustainable project. Successful open source projects put community building, outreach, both reaching contributors and encouraging them, at their centre. The licensing is just an enabler.

In the world of Open Scholarship, and I would include both Open Access and Open Educational Resources in this, we are a long way behind. There are technical and historical reasons for this but I want to suggest that a big part of the issue is one of community. It is in large part about a certain level of arrogance. An assumption that others, outside our small circle of professional peers, cannot possibly either use our work or contribute to it. There is a comfort in this arrogance, because it means we are special, that we uniquely deserve the largesse of the public purse to support our work because others cannot contribute. It means do note need to worry about access because the small group of people who understand our work “already have access”. Perhaps more importantly it encourages the consideration of fears about what might go wrong with sharing over a balanced assessment of the risks of sharing versus the risks of not sharing, the risks of not finding contributors, of wasting time, of repeating what others already know will fail, or of simply never reaching the audience who can use our work.

It also leads to religious debates about licenses, as though a license were the point or copyright was really a core issue. Licenses are just tools, a way of enabling people to use and re-use content. But the license isn’t what matters, what matters is embracing the idea that someone, somewhere can use your work, that someone, somewhere can contribute back, and adopting the practices and tools that make it as easy as possible for that to happen. And that if we do this collectively that the common resource will benefit us all. This isn’t just true of code, or data, or literature, or science. But the potential for creating critical mass, for achieving these benefits, is vastly greater with digital objects on a global network.

All the core definitions of “open” from the Open Source Definition, to the Budapest (and Berlin and Bethesda) Declarations on Open Access, to the Open Knowledge Definition have a common element at their heart – that an open resource is one that any person can use for any purpose. This might be good in itself, but thats not the real point, the point is that it embraces the humility of not knowing. It says, I will not restrict uses because that damages the potential of my work to reach others who might use it. And in doing this I provide the opportunity for unexpected contributions. With Open Access we’ve only really started to address the first part, but if we embrace the mind set of being open then both follow naturally.

Enhanced by Zemanta

Chapter, Verse, and CHORUS: A first pass critique

And this is the chorus
This is the chorus
It goes round and around and gets into your brain
This is the chorus
A fabulous chorus
And thirty seconds from now you’re gonna to hear it again

This is the Chorus - Morris Major and the Minors

The Association of American Publishers have launched a response to the OSTP White House Executive Order on public access to publicly funded research. In this they offer to set up a registry or system called CHORUS which they suggest can provide the same levels of access to research funded by Federal Agencies as would the widespread adoption of existing infrastructure like PubMedCentral. It is necessary to bear in mind that this substantially the same group that put together the Research Works Act, a group with a long standing, and in some cases personal, antipathy to the success of PubMedCentral. There is therefore some grounds for scepticism about the motivations of the proposal.

However here I want to dig a bit more into the details of whether the proposal can deliver. I will admit to being sceptical from the beginning but the more I think about this, the more it seems that either there is nothing there at all –  just a restatement of already announced initiatives – or alternately the publishers involved are setting themselves up for a potentially hugely expensive failure. Let’s dig a little deeper into this to see where the problems lie.

First the good bits. The proposal is to leverage FundRef to identify federally funded research papers that will be subject to the Executive Order. FundRef is a newly announced initiative from CrossRef which will include Funder grant information within the core metadata that CrossRef collects and can provide to users and will start to address the issues of data quality and completeness. To the extent that this is a commitment from a large group of publishers to support FundRef it is a very useful step forward. Based on the available funding information the publishers would then signal that these papers are accessible and this information would be used to populate a registry. Papers that are in the registry would be made available via the publisher websites in some manner.

Now the difficulties. You will note two sets of weasel words in the previous paragraph: “…the available funding information…” and “…made available via the publisher websites in some manner”. The second is really a problem for the publishers but I think a much bigger one than they realise. Simply making the version of record available without restrictions is “easy” but ensuring that access works properly in the context of a largely paywalled corpus is not as easy as people tend to think. Nature Publishing Group have spent years sorting out the fact that every time they do a system update that they remove access to the genome papers that are supposed to be freely accessible. If publishers decide they just want to make the final author manuscripts available then they will have to build up a whole parallel infrastructure to provide these – an infrastructure that will look quite a lot like PubMedCentral in fact, leading to potential duplication of effort and potential costs. This is probably less of an issue for the big publishers but for small publishers could become a real issue.

Bad for the agencies

But its the first set of weasel words that are the most problematic. The whole of CHORUS seems to be based on assumption that the FundRef information will be both accurate and complete. Anyone who has dealt with funding information inside publication workflows knows this is far from true. Comparison of funder information pulled from different sources can give nearly disjunct sets. And we know that authors are terrible at giving the correct grant codes when they can bothered including them at all. The Executive Order and FASTR put the agencies on the hook to report on success, compliance, and the re-use of published content. It is the agencies who get good information in the long term on the outputs of projects they fund – information that is often at odds with what is reported in the acknowledgement sections of papers.

Put this issue of data quality alongside the fact that the agencies will be relying on precisely those organisations that have worked to prevent, limit, and where that failed slow down the widening of public access and we have a serious problem of mismatched incentives. For the publishers there is direct incentive to fail to solve the data quality issue at the front end – it lets them make less papers available. The agencies are not in a position to force this issue at paper submission because their data isn’t complete until the grant finally reports. The NIH already has high compliance and an operating system, precisely because they couple grant reports to deposition. Other agencies will struggle to catch up using CHORUS and will deliver very poor compliance based on their own data. This is not a criticism of FundRef incidentally. FundRef is a necessary and well designed part of the effort to solve this problem in the longer term – but it is going to take years for the necessary systems changes to work their way through and there a big changes required to submission and editorial management systems to make this work well. And this brings us to the problems for publishers.

Bad for the publishers

If the agencies agree to adopt CHORUS they will do so with these issues very clear in their minds. The Office of Management and Budget oversight means that agencies have to report very closely on cost-benefit analyses for new projects. This alongside the issues with incentive misalignment, and just plain lack of trust, means that the agencies will do two things: they will insist that the costs are firewalled onto the publisher side, and they will put strong requirements on compliance levels and completeness. If I were an agency negotiator I would place a compliance requirement of 60% on CHORUS in year one rising to 75% and 90% in years two and three and stipulate that that compliance will be measured against final grant reports on an ongoing basis. Where compliance didn’t meet the requirements the penalty would be for all the relevant papers from that publisher to be placed in PubMedCentral at the publisher’s expense. Even if they’re not this tough they are certainly going to demand that the registry be updated to include all the papers that got missed at the publisher’s expense necessitating an on-going manual grind of metadata update, paper corrections, index notifications. Bear in mind that if we generously assume that 50% of submitted papers have good grant metadata and the US agencies contribute to around 25% of all global publications that this means around 10% of the entire corpus will need to be updated year on year, probably through a process of semi-automated and manual reconciliation. If you’ve worked with agency data then you know its generally messy and difficult to manage – this is being worked on by building shared repositories and data systems that leverage a lot of the tooling provided by PubMed and PubMedCentral.

Alternately this could be a “triggering event” meaning that content would become available in the archives like CLOCKSS and PORTICO because access wasn’t properly provided. Putting aside the potential damage to the publisher brand if this happens, and the fact that it destroys the central aim of CHORUS – to control the dissemination path – this will also cost money. These archives are not well set up to provide differential access to triggered content, they release whole journals when a publisher goes bust. It’s likely that a partial trigger would require specialist repository sites to be set up to serve the content – again sites that would like an awful lot like PubMedCentral. The process is likely to lead to significantly more trigger events, requiring these dark repositories to function more actively as publishers, raising costs, and requiring them to build up repositories to serve content that would look an awful lot like…well you get the idea.

Finally there is the big issue – this puts the costs of improving funding data collection firmly in the hands of CHORUS publishers and means it needs to be done extremely rapidly. This work needs to be done, but it would be much better done through effective global collaboration between all funders, institutions and publishers. What CHORUS has effectively done is offer to absorb the full cost of this transition. As noted above the agencies will firewall their contributions. You can bet that institutions – for whom CHORUS will not assist and might hamper their efforts to ensure the collection of research outputs – will not pay for it through increased subscriptions. And publishers who don’t want to engage with CHORUS will be unlikely to contribute. It’s also almost certain that this development process will be rushed and ham fisted and irritate authors even more than they already are by current submission systems.

Finally of course a very large proportion of federal money moves through the NIH. The NIH has a system in place, it works, and they’re not about to adopt something new and unproven, especially given the popularity of PubMedCentral as demonstrated by the public response to the Research Works Act. So publishers will have to maintain dual systems anyway – indeed the most likely outcome of CHORUS will be to make it easier for authors to deposit works into PubMedCentral, and easier for the NIH to prod them into doing so raising the compliance rates for the NIH policy and making them look even better on the annual reports to the White House, leading ultimately to some sharp questions about why agencies didn’t adopt PMC in the first place.

Bad for the user

From the perspective of an Open Access advocate putting access into the hands of publishers who have actively worked to limit access and invested vast sums of money in systems to limit and control access seems a bad idea. But that’s a personal perspective – the publishers in questions will say they are guiding these audiences to the “right” version of papers in the best place for them to consume it. But lets look at the incentives for the different players. The agencies are on the hook to report on usage and impact of their work. They have the incentives to insure that whatever systems are in place work well and provide access well. Subscription publishers? They have a vested interest in trying to show there is a lack of public interest, in tweaking embargoes so as to only make things available after interest has waned, in providing systems that are poorly resourced so page loads are slow, and in general making the experience as poor as possible. After all if you need to show you’re adding value with your full cost version, then its really helpful to be in complete control of the free version so as to cripple it. On the plus side it would mean that these publishers would almost certainly be forced to provide detailed usage information which would be immensely valuable.

…which is bad for the publishers…

The more I think about this, the less it seems to have been thought through in detail. Is it just a commitment to use FundRef? This would be a great step but it goes nowhere near even beginning to satisfy the White House requirements. If its more than that what is it? A registry? But that requires a crucial piece of metadata, which appears as “Licence Reference” in the diagram, that is needed to assert things are available. This hasn’t been agreed yet (I should know, I’ve been involved in drafting the description). And even when it is no piece of metadata can make sure access actually happens. Is it a repository that would guarantee access? No – that’s what the CHORUS members hate above all other things. Is it a firm contractual commitment to making those articles with agency grant numbers attached available? Not that I’ve seen, but even it were it wouldn’t address the requirements of either the Executive Order or FASTR. As noted above, the mandate applies to all agency funded research, not just those where the authors remembered to put in all the correct grant numbers.

Is it a commitment to ensuring the global collection of comprehensive grant information at manuscript submission? With the funding to make it happen – and the funding to ensure the papers become available - and real penalties if it doesn’t happen? With provision of comprehensive usage data for both subscription and freely available content? This is the only level at which the agencies will bite. And this is a horrendous and expensive can of worms.

In the UK we have a Victorian infrastructure for delivering water. It just about works but a huge proportion of the total just leaks out of the pipes – its not like we have a shortage of rain but when we have a “drought” we quickly run into serious problems. The cost of fixing the pipes? Vastly more than we can afford. What I think happened with CHORUS is what happens with a lot industry wide tech projects. Someone had a bright idea, and went to each player asking them whether they could deliver their part of the pipeline. Each player has slightly overplayed the ease of delivery, and slightly underplayed the leakage and problems. A few percent here and a few percent there isn’t a problem for each step in isolation – but along the whole pipeline it adds up to the point where the whole system simply can’t deliver. And delivering means replacing the whole set of pipes.

 

Enhanced by Zemanta

The bravery of librarians

Two things caught my attentions over the past few days. The first was the text of a Graduation Address from Dorothea Salo to the graduating students of the Library and Information Sciences Program at the University of Wisconsin-Madison. The second was a keynote from Chris Bourg, whose blog is entitled “Feral Librarian”, gave at The Acquisitions Institute.

Both focus on how the value of libraries and the value of those who defend the needs of all to access information are impossible to completely measure. Both offer a prescription of action and courage, in Dorothea Salo’s case the twin messages that librarians “aim to misbehave” and that “we’ve got each others back”, in Chris Bourg’s text quoting Henry Rollins also speaking to librarians “What you do is the definition of good. It’s very noble and you are very brave.”

What struck me was the question of how well we are helping these people. We seek to make scientific information free, for it flow easily to those who need it. What can we do to create a world where we need to rely less on the bravery of librarians and therefore benefit so much more from it?

Enhanced by Zemanta

The challenge for scholarly societies

society
Cemetery Society (Photo credit: Aunt Owwee)

With major governments signalling a shift to Open Access it seems like a good time to be asking which organisations in the scholarly communications space will survive the transition. It is likely that the major current publishers will survive, although relative market share and focus is likely to change. But the biggest challenges are faced by small to medium scholarly societies that depend on journal income for their current viability. What changes are necessary for them to navigate this transition and can they survive?

The fate of scholarly societies is one of the most contentious and even emotional in the open access landscape. Many researchers have strong emotional ties to their disciplinary societies and these societies often play a crucial role in supporting meetings, providing travel stipends to young researchers, awarding prizes, and representing the community. At the same time they face a peculiar bind. The money that supports these efforts often comes from journal subscriptions. Researchers are very attached to the benefits but seem disinclined to countenance membership fees that would support them. This problem is seen across many parts of the research enterprise – where researchers, or at least their institutions, are paying for services through subscriptions but unwilling to pay for them directly.

What options do societies have? Those with a large publication program could do worse in the short term than look very closely at the announcement from the UK Royal Society of Chemistry last week. The RSC is offering an institutional mechanism where by those institutions that have a particular level of subscription will receive an equivalent amount of publication services, set at the price of £1600 per paper. This is very clever for the RSC, it allows it to help institutions prepare effectively for changes in UK policy, it costs them nothing, and lets them experiment with a route to transition to full open access at relatively low risk. Because the contribution of UK institutions with this particular subscription plan is relatively small it is unlikely to reduce subscriptions significantly in the short term, but if and when it does it positions the RSC to offer package deals on publication services with very similar terms. Tactically by moving early it also allows the RSC to hold a higher price point than later movers will – and will help to increase its market share in the UK over that of the ACS.

Another route is for societies to explore the “indy band model”. Similar to bands that are trying to break through by giving away their recorded material but charging for live gigs, societies could focus on raising money through meetings rather than publications. Some societies already do this – having historically focussed on running large scale international or national meetings. The “in person” experience is something that cannot yet be done cheaply over the internet and “must attend” meetings offer significant income and sponsorship opportunities. There are challenges to be navigated here – ensuring commercial contributions don’t damage the brand or reduce the quality of meetings being a big one – but expect conference fees to rise as subscription incomes drop. Societies that currently run lavish meetings off the back of journal income will face a particular struggle over the next two to five years.

But even meetings are unlikely to offer a long term solution. It’s some way off yet but rising costs of travel and increasing quality of videoconferencing will start to eat into this market as well. If all the big speakers are dialling it in, is it still worth attending the meeting? So what are the real value offerings that societies can provide? What are the things that are unique to that community collection of expertise that no-one else can provide?

Peer review (pre-, post-, or peri-publication) is one of them. Publication services are not. Publication, in the narrow sense of “making public”, will be commoditised, if it hasn’t already. With new players like PeerJ and F1000 Research alongside the now fairly familiar landscape of the wide-ranging megajournal the space for publication services to make fat profits is narrowing rapidly. This will, sooner or later, be a low margin business with a range of options to choose from when someone, whether a society or a single researcher, is looking for a platform to publish their work. While the rest of us may argue whether this will happen next year or in a decade, for societies it is the long term that matters, and in the long term commoditisation will happen.

The unique offering that a society brings is the aggregation and organisation of expert attention. In a given space a scholarly society has a unique capacity to coordinate and organise assessment by domain experts. I can certainly imagine a society offering peer review as a core member service, independent of whether the thing being reviewed is already “published”. This might be a particular case where there are real benefits to operating a small scale – both because of the peer pressure for each member of the community to pull their weight and because the scale of the community lends itself to being understood and managed as a small set of partly connected small world networks. The question is really whether the sums add up. Will members pay $100 or $500 per year for peer review services? Would that provide enough income? What about younger members without grants? And perhaps crucially, how cheap would a separated publication platform have to be to make the sums look good?

Societies are all about community. Arguably most completely missed the boat on the potential of the social web when they could have built community hubs of real value – and those that didn’t miss it entirely largely created badly built and ill thought through community forums well after the first flush of failed generic “Facebook for Science” clones had faded. But another chance is coming. As the ratchet moves on funder and government open access policies, society journals stuck in a subscription model will become increasingly unattractive options for publication. The slow rate of progress and disciplinary differences will allow some to hold on past the point of no return and these societies will wither and die. Some societies will investigate transitional pricing models. I commend the example of the RSC to small societies as something to look at closely. Some may choose to move to publishing collections in larger journals where they retain editorial control. My bet is that those that survive will be the ones that find a way to make the combined expertise of the community pay – and I think the place to look for that will be those societies that find ways to decouple the value they offer through peer review from the costs of publication services.

This post was inspired by a twitter conversation with Alan Cann and builds on many conversations I’ve had with people including Heather Joseph, Richard Kidd, David Smith, and others. Full Disclosure: I’m interested, in my role as Advocacy Director for PLOS, in the question of how scholarly societies can manage a transition to an open access world. However, this post is entirely my own reflections on these issues.

Enhanced by Zemanta

First thoughts on the Finch Report: Good steps but missed opportunities

The Finch Report was commissioned by the UK Minister for Universities and Science to investigate possible routes for the UK to adopt Open Access for publicly funded research. The report was released last night and I have had just the chance to skim it over breakfast. These are just some first observations. Overall my impression is that the overall direction of travel is very positive but the detail shows some important missed opportunities.

The Good

The report comes out strongly in favour of Open Access to publicly funded research. Perhaps the core of this is found in the introduction [p5].

The principle that the results of research that has been publicly funded should be freely accessible in the public domain is a compelling one, and fundamentally unanswerable.

What follows this is a clear listing of other potential returns. On the cost side the report makes clear that in achieving open access through journal it is necessary that the first copy costs of publication be paid in some form and that appropriate mechanisms are in place to make that happen. This focus on Gold OA is a result in large part of the terms of reference for the report that placed retention of peer review at its heart. The other excellent aspect of the report is the detailed cost and economic modelling for multiple scenarios of UK Open Access adoption. These will be a valuable basis for discussion of managing the transition and how cost flows will change.

The bad

The report is maddeningly vague on the potential of repositories to play a major role in the transition to full open access. Throughout there is a focus on hybrid journals, a route which – with a few exceptions – appears to me to have failed to deliver any appreciable gains and simply allowed publishers to charge unjustified fee for very limited services. By comparison the repository offers an existing infrastructure that can deliver at relatively low marginal cost and will enable a dispassionate view of the additional value that publishers add. Because the value of peer review was baked into the report as an assumption this important issue gets lost but as I have noted before if publishers are adding value then repositories should pose no threat to them whatsoever.

The second issue I have with the report is that it fails to address the question of what Open Access is. The report does not seek to define open access. This is a difficult issue and I can appreciate a strict definition may be best avoided but the report does not raise the issues that such a definition would require and in this it misses an opportunity to lay out clearly the discussions required to make decisions on the critical issues of what is functionally required to realise the benefits laid out in the introduction. Thus in the end it is a report on increasing access but with no clear statement of what level of access is desirable or what the end target for this might look like.

This is most serious on the issue of licences for open access content which has been seriously fudged. Four key pieces of text from the report:

“…support for open access publication should be accompanied by policies to minimise restrictions on the rights of use and re-use, especially for non-commercial purposes, and on the ability to use the latest tools and services to organise and manipulate text and other content” [recommendations, p7]

“…[in a section on instituional and subject repositories]…But for subscription-based publishers, re-use rights may pose problems. Any requirement for them to use a Creative Commons ‘CC-BY’ licence, for example, would allow users to modify, build upon and distribute the licensed work, for commercial as well as non-commercial purposes, so long as the original authors were credited178. Publishers – and some researchers – are especially concerned about allowing commercial re-use. Medical journal publishers, who derive a considerable part of their revenues from the sale of reprints to pharmaceutical companies, could face significant loss of income. But more generally, commercial re-use would allow third parties to harvest published content from repositories and present them on new platforms that would compete with the original publisher.” [p87]

“…[from the summary on OA journals]…A particular advantage of open access journals is that publishers can afford to be more relaxed about rights of use and re-use.” [p92]

“…[from the summary on repositories]…But publishers have strong concerns about the possibility that funders might introduce further limits on the restrictions on access that they allow in their terms and conditions of grant. They believe that a reduction in the allowable embargo period to six months, especially if it were to be combined with a Creative Commons CC-BY licence that would allow commercial as well as non-commercial re-use, would represent a fundamental threat to the viability of their subscription-based journals.” [p96]

As far as I can tell the comment on page 92 is the only one that even suggests a requirement for CC-BY for open access through journals where the costs are paid. As a critical portion of the whole business model for full OA publishers it worried me that this is given almost a brief throw away line, when it is at the centre of the debate. But more widely a concern over a requirement for liberal licensing in the context of repositories appears to colour the whole discussion of licences in the report. There is, as far as I have been able to tell, no strong statement that where a fee is paid CC-BY should be required – and much that will enable incumbent subscription publishers to continue making claims that they provide “Open Access” under a variety of non-commercial licences satisfying no community definition of either “Open” nor “Open Access”.

But more critically this fudge risks failing to deliver on the minister’s brief, to support innovation and exploitation of UK research. This whole report is embedded in a government innovation strategy that places publicly funded knowledge creation at the heart of an effort to kick start the UK economy. Non-commercial licences can not deliver on this and we should avoid them at all costs. This whole discussion seems to revolve around protecting publishers rights to sell reprints, as though it made sense to legislate to protect candle makers from innovators threatening to put in an electric grid.

Much of this report is positive – and taken in the context of the RCUK draft policy there is a real opportunity to get this right. If we both make a concerted effort to utilise the potential of repositories as a transitional infrastructure, and if we get the licensing right, then the report maps out a credible route with the financial guidelines to make it through a transition. It also sends a strong signal to the White House and the European Commission, both currently considering policy statements on open access, that the UK is ready to move which will strengthen the hands of those arguing for strong policy.

This is a big step – and it heads in the right direction. The devil is in the details of implementation. But then it always is.

More will follow – particularly on the financial modelling – when I have a chance to digest more fully. This is a first pass draft based on a quick skim and I may modify this post if I discover I have made errors in my reading.

Enhanced by Zemanta

Added Value: I do not think those words mean what you think they mean

There are two major strands to position of traditional publishers have taken in justifying the process by which they will make the, now inevitable, transition to a system supporting Open Access. The first of these is that the transition will cost “more money”. The exact costs are not clear but the, broadly reasonable, assumption is that there needs to be transitional funding available to support what will clearly be a mixed system over some transitional period. The argument of course is how much money and where it will come from, as well as an issue that hasn’t yet been publicly broached, how long will it last for? Expect lots of positioning on this over the coming months with statements about “average paper costs” and “reasonable time frames”, with incumbent subscription publishers targeting figures of around $2,500-5,000 and ten years respectively, and those on my side of the fence suggesting figures of around $1,500 and two years. This will be fun to watch but the key will be to see where this money comes from (and what subsequently gets cut), the mechanisms put in place to release this “extra” money and the way in which they are set up so as to wind down, and provide downwards price pressure.

The second arm of the publisher argument has been that they provide “added value” over what the scholarly community provides into the publication process. It has become a common call of the incumbent subscription publishers that they are not doing enough to explain this added value. Most recently David Crotty has posted at Scholarly Kitchen saying that this was a core theme of the recent SSP meeting. This value exists, but clearly we disagree on its quantitative value. The problem is we never see any actual figures given. But I think there are some recent numbers that can help us put some bounds on what this added value really is, and ironically they have been provided by the publisher associations in their efforts to head off six month embargo periods.

When we talk about added value we can posit some imaginary “real” value but this is really not a useful number – there is no way we can determine it. What we can do is talk about realisable value, i.e. the amount that the market is prepared to pay for the additional functionality that is being provided. I don’t think we are in a position to pin that number down precisely, and clearly it will differ between publishers, disciplines, and work flows but what I want to do is attempt to pin down some points which I think help to bound it, both from the provider and the consumer side. In doing this I will use a few figures and reports as well as place an explicit interpretation on the actions of various parties. The key data points I want to use are as follows:

  1. All publisher associations and most incumbent publishers have actively campaigned against open access mandates that make the final refereed version of a scholarly article, prior to typesetting, publication, indexing, and archival, online in any form either immediately or within six months after publication. The Publishers Association (UK) and ALPSP are both on record as stating that such a mandate would be “unsustainable” and most recently that it would bankrupt publishers.
  2. In a survey run by ALPSP of research libraries (although there are a series of concerns that have to be raised about the methodology) a significant proportion of libraries stated that they would cut some subscriptions if the majority research articles were available online six months after formal publication. The survey states that it appeared that most respondents assumed that the freely available version would be the original author version, i.e. not that which was peer reviewed.
  3. There are multiple examples of financially viable publishing houses running a pure Open Access programme with average author charges of around $1500. These are concentrated in the life and medical sciences where there is both significant funding and no existing culture of pre-print archives.
  4. The SCOAP3 project has created a formal journal publication framework which will provide open access to peer reviewed papers for a community that does have a strong pre-print culture utilising the ArXiv.

Let us start at the top. Publishers actively campaign against a reduction of embargo periods. This makes it clear that they do not believe that the product they provide, in transforming the refereed version of a paper into the published version, has sufficient value that their existing customers will pay for it at the existing price. That is remarkable and a frightening hole at the centre of our current model. The service providers can only provide sufficient added value to justify the current price if they additionally restrict access to the “non-added-value” version. A supplier that was confident about the value that they add would have no such issues, indeed they would be proud to compete with this prior version, confident that the additional price they were charging was clearly justified. That they do not should be a concern to all of us, not least the publishers.

Many publishers also seek to restrict access to any prior version, including the authors original version prior to peer review. These publishers don’t even believe that their management of the peer review process adds sufficient value to justify the price they are charging. This is shocking. The ACS, for instance, has such little faith in the value that it adds that it seeks to control all prior versions of any paper it publishes.

But what of the customer? Well the ALPSP survey, if we take the summary as I have suggested above at face value, suggests that libraries also doubt the value added by publishers. This is more of a quantitative argument but that some libraries would cancel some subscriptions shows that overall the community doesn’t believe the overall current price is worth paying even allowing for a six month delay in access. So broadly speaking we can see that both the current service providers and the current customers do not believe that the costs of the pure service element of subscription based scholarly publication are justified by the value added through this service.  This in combination means we can provide some upper bounds on the value added by publishers.

If we take the approximately $10B currently paid as cash costs to recompense publishers for their work in facilitating scholarly communications neither the incumbent subscription publishers nor their current library customers believe that the value added by publishers justifies the current cost, absent artificial restrictions to access to the non-value added version.

This tells us not very much about what the realisable value of this work actually is, but it does provide an upper bound. But what about a lower bound? One approach would be turn to the services provided to authors by Open Access publishers. These costs are willingly incurred by a paying customer so it is tempting to use these directly as a lower bound. This is probably reasonable in the life and medical sciences but as we move into other disciplinary areas, such as mathematics, it is clear that cost level is not seen as attractive enough. In addition the life and medical sciences have no tradition of wide availability of pre-publication versions of papers. That means for these disciplines the willingness to pay the approximately $1500 average cost of APCs is in part bound up with making the wish to make the paper effectively available through recognised outlets. We have not yet separated the value in the original copy versus the added value provided by this publishing service. The $1000-1500 mark is however a touchstone that is worth bearing in mind for these disciplines.

To do a fair comparison we would need to find a space where there is a thriving pre-print culture and a demonstrated willingness to pay a defined price for added-value in the form of formal publication over and above this existing availability. The Sponsoring Consortium for Open Access Publishing in Particle Physics (SCOAP3) is an example of precisely this. The particle physics community have essentially decided unilaterally to assume control of the journals for their area and have placed their service requirements out for tender. Unfortunately this means we don’t have the final prices yet, but we will soon and the executive summary of the working party report suggests a reasonable price range of €1000-2000. If we assume the successful tender comes in at the lower end or slightly below of this range we see an accepted price for added value, over that already provided by the ArXiv for this disciplinary area, that is not a million miles away from that figure of $1500.

Of course this is before real price competition in this space is factored in. The realisable value is a function of the market and as prices inevitably drop there will be downward pressure on what people are willing to pay. There will also be increasing competition from archives, repositories, and other services that are currently free or near free to use, as they inevitably increase the quality and range of the services they offer. Some of these will mirror the services provided by incumbent publishers.

A reasonable current lower bound for realisable added value by publication service providers is ~$1000 per paper. This is likely to drop as market pressures come to bear and existing archives and repositories seek to provide a wider range of low cost services.

Where does this leave us? Not with a clear numerical value we can ascribe to this added value, but that’s always going to be a moving target. But we can get some sense of the bottom end of the range. It’s currently $1000 or greater at least in some disciplines, but is likely to go down. It’s also likely to diversify as new providers offer subsets of the services currently offered as one indivisible lump. At the top end both customers and service providers actions suggest they believe that the added value is less than what we currently pay and that it is only artificial controls over access to the non-value add versions that justify the current price. What we need is a better articulation of what is the real value that publishers add and an honest conversation about what we are prepared to pay for it.

Enhanced by Zemanta

Send a message to the Whitehouse: Show the strength of support for OA

The Whitehouse
The Whitehouse - from Flickr User nancy_t3i

Changing the world is hard. Who knew? Advocating for change can be lonely. It can also be hard. As a scholar, particularly one at the start of a career it is still hard to commit fully to ensuring that research outputs are accessible and re-useable. But we are reaching a point where support for Open Access is mainstream, where there is a growing public interest in greater access to research, and increasingly serious engagement with the policy issues at the highest level.

The time has come to show just how strong that support is. As of today there is a petition on the Whitehouse site calling for the Executive to mandate Open Access to the literature generated from US Federal Funding. If the petition reaches 25,000 signatures within 30 days then the Whitehouse is committed to respond. The Executive has been considering the issues of access to research publications and data and with FRPAA active in both houses there are multiple routes available to enact change. If we can demonstrate widespread and diverse support for Open Access, then we will have made the case for that change. This is a real opportunity for each and everyone of us to make a difference.

So go to the Access2Research Petition on whitehouse.gov and sign up now. Blog and tweet using the hashtag #OAMonday and lets show just how wide the coalition is. Go to the Access2Research Website to learn more. Post the site link to your community to get people involved.

I’ll be honest. The Whitehouse petition site isn’t great – this isn’t a 30 second job. But it shouldn’t take you more than five minutes. You will need to give a real name and an email address and go through a validation process via email. You don’t need to be a US citizen or resident. Obviously if you give a US Zip code it is likely that more weight will be given to your signature but don’t be put off if you are not in the US. Once you have an account signing the petition is a simple matter of clicking a single button. The easiest approach will be to go to the Open Access petition and sign up for an account from there. Once you get the validation link via email you will be taken back to the petition.

The power of Open Access will only be unlocked through networks of people using, re-using, and re-purposing the outputs of research. The time has come to show just how broad and diverse that network is. Please take the time as one single supporter of Open Access to add your voice to the thousands of others who will be signing with you. And connect to your network to tell them how important it is for them to add their voice as well.