publishing – Science in the Open

October 19, 2015October 19, 2015

PolEcon of OA Publishing II: What’s the technical problem with reforming scholarly publishing?

English: Reproduction of a Charles Mills paint... — Reproduction of a Charles Mills painting (Photo credit: Wikipedia)

In the first post in this series I identified a series of challenges in scholarly publishing while stepping through some of the processes that publishers undertake in the management of articles. A particular theme was the challenge of managing a heterogenous stream of articles and their associated heterogeneous formats and problems, in particular at a large scale. An immediate reaction many people have is that there must be technical solutions to many of these problems. In this post I will briefly outline some of the charateristics of possible solutions and why they are difficult to implement. Specifically we will focus on how the pipeline of the scholarly publishing process that has evolved over time makes large scale implementation of new systems difficult.

The shape of solutions

Many of the problems raised in the previous post, as well as many broader technical challenges in scholarly communications can be characterised as issues of either standardisation (too many different formats) or many-to-many relationships (many different source image softwares and a range of different publisher systems, but equally many institutions trying to pay APCs to many publishers). Broadly speaking, the solutions to this class of problem involve adopting standards.

Without focussing too much on technical details, because there are legitimate differences of opinion on how best to implement this, solutions would likely involve re-thinking the form of the documents flowing through the system. In an ideal world authors would be working in a tool that interfaces directly and cleanly with publisher systems. At the point of submission they would be able to see what the final published article would like, not because the process of conversion to its final format would be automated but it would be unnecessary. The document would be stored throughout the whole process in a consistent form from which it could always be rendered to web or PDF or whatever is required on the fly. Metadata would be maintained throughout in a single standard form, ideally one shared across publisher systems.

The details of these solutions don’t really matter for our purposes here. What matters is that they are built on a coherent standard representation of documents, objects and bibliographic metadata that flow from the point of submission (or ideally even before) through to the point of publication and syndication (or ideally long after that).

The challenges of the solutions

Currently the scholarly publishing pipeline is really two pipelines. The first is the pipeline where format conversions of the document are carried out and the second a pipeline where data about the article is collected and managed. There is usually redundancy, and often inconsistency between these two pipelines and they are frequently managed by Heath Robinson-esque processes which have been developed to patch systems from different suppliers together. A lot of the inefficiencies in the publication process are the result of this jury-rigged assembly but at the same time its structure is one of the main things preventing change and innovation.

If the architectural solution to these problems is one of adopting a coherent set of standards throughout the submission and publication process then the technical issue is one of systems re-design to adopt these standards. From the outside this looks like a substantial job, but not an impossible one. From the inside of a publisher it is not merely Sysyphean, but more like pushing an ever growing boulder up a slope which someone is using as a (tilted) bowling alley.

There is a critical truth which few outside the publishing industry have grasped. Most publishers do not control their own submission and publication platforms. Some small publishers successfully build their entire systems (PeerJ and PenSoft) are two good examples. But these systems are built for (relatively) small scale and struggle as they grow beyond a thousand papers a year. Some medium sized publishers can successfully maintain their own web server platforms (PLOS is one example of this) but very few large publishers retain technical control over their whole pipelines.

Most medium to large publishers outsource both their submission systems (to organisations like Aries and ScholarOne) and their web server platforms (to companies like Atypon and HighWire). There were good tactical reasons for this in the past, advantages of scale and expertise, but strategically it is a disaster. It leaves essentially the entire customer facing (whether the author or the reader) part of a publishing business in the hands of third parties. And the scale gained by that centralisation, as well as the lack of flexibility that scale tends to create, makes the kind of re-tooling envisaged above next to impossible.

Indeed the whole business structure is tipped against change. These third party players hold quasi-monopoly positions. Publishers have nowhere else to go. The service providers, with publishers as their core customers, need to cover the cost of development and do so by charging their customers. This is a perfectly reasonable thing to do, but with the choice of providers limited (and a shift from to the other costly and painful), and with no plausible DIY option for most publishers, the reality is that it is in the interests of the providers to hold of implementing new functionality until they can charge the maximum for it. Publishers are essentially held to ransom for relatively small changes, radical re-organisation of the underlying platform is simply impossible.

Means of escape

So what are the means of escaping this bind? Particularly given that its a natural consequence of the market structure and scale? No-one designed the publishing industry to do this. The biggest publishers have the scale to go it alone. Elsevier took this route when they purchased the source code to Editorial Manager, the submission platform that is run by Aries as a service for many other publishers. Alongside their web platforms (and probably the best back end data system in the business) this gives Elsevier end-to-end control over their technical systems. It didn’t stop Elsevier spending a rumoured $25M on a new submission system that has never really surfaced, illustrating just how hard these systems are to build.

But Elsevier has a scale and technical expertise (not to mention cash) that most other publishers can only envy. Running such systems in house is a massive undertaking and doing further large scale development even bigger again. Most publisher can not do this alone.

Where there is a shared need for a scalable platform, Community Open Source projects can provide a solution. Publishing has not traditionally embraced Open Source or shared tools, preferring to re-invent the wheel internally. The Public Knowledge Project‘s Open Journal Systems is the most successful play in this space, running more journals (albeit small ones) than any other software platform. There are technical criticisms to be made of OJS and it is in need of an overhaul, but a big problem is that, while it is a highly successful community, it has never become a true Open Source Community Project. The Open Source code is taken and used in many places, but it is locally modified and there had not been the development of a culture of contributing code back to the core.

The same could be said of many other Open Source projects in the publishing space. Frequently the code is available and re-usable, but there is little or no effort put into creating a viable community of contribution. It is Open Source Code, not an Open Source Project. Publishers are often so focussed on maintaining their edge against competition that the hard work of creating a viable community around a shared resource gets lost or forgotten.

It is also possible that the scale of publishing is insufficient to support true Open Source Community projects. The big successful Open Source infrastructure projects are critical across the web, toolsets that are so widely used that its a no-brainer for big corporations more used to patent and copyright fights to invest in them. It could be that there are simply too few developers and too little resource in publishing to make pooling them viable. My view is that the challenge is more political than resourcing, more in fact a result of the fact that there are maybe one or two CTOs or CEOs in the entire publishing industry with deep experience of the Open Source world, but there are differences from those other web-scale Open Source projects and it is important to bear that in mind.

Barriers to change

When I was at PLOS I got about one complaint a month about “why can’t PLOS just do X” where X was a different image format, a different process for review, a step outside the standard pipeline. The truth of the matter is that it can almost always be done. IfÂ a person can be found to hand-hold that specific manuscript through the entire process. You can do it once, and then someone else wants it, and then you are in a downward spiral of monkey patching to try and keep up. It simply doesn’t scale.

A small publisher, one that has completely control over their pipeline and is operating at hundreds or low thousands of articles a year, can do this. That’s why you will continue to see most of the technical innovation within the articles themselves come from small players. The very biggest players can play both ends, they have control over their systems, the technical expertise to make something happen, and the resources to throw money at solving a problem manually to boot. But then those solutions don’t spread, because no-one else can follow them. Elsevier is alone in this category, with Springer-Nature possibly following if they can successfully integrate the best of both companies together.

But the majority of content passes through the hands of medium sized players, and players with no real interest in technical developments. These publishers are blocked. Without a significant structural change in the industry it is unlikely we will see significant change. To my mind that structural change can only happen if a platform is developed that provides scale by supporting across multiple publishers. Again, to my mind, that can only be provided by an Open Source platform. One with a real community program behind it. No medium sized publisher wants to shift to a new proprietary platform which will just lock them in again. But publishers across the board have collectively demonstrated a lack of willingness as well as a lack of understanding of what an Open Source Project really is.

Change might come from without, from a new player providing a fresh look at how to manage the publishing pipeline. It might come from projects within research institutions or a collaboration of scholarly societies. It could even come from within. Publishers can collaborate when they realise it is in their collective interests. But until we see new platforms that provide flexible and standards based mechanisms for managing documents and their associated metadata throughout their life cycle, that operate successfull at a scale of tens to hundreds of thousands of articles a year, and that above all are built and governed by systems that publishers trust we will at most incremental change.

September 27, 2015May 10, 2016

The Political Economics of Open Access Publishing – A series

Victory Press of Type used by SFPP (Photo credit: Wikipedia)

One of the odd things about scholarly publishing is how little any particular group of stakeholders seems to understand the perspective of others. It is easy to start with researchers ourselves, who are for the most part embarrassingly ignorant of what publishing actually involves. But those who have spent a career in publishing are equally ignorant (and usually dismissive to boot) of researchers’ perspectives. Each in turn fail to understand what libraries are or how librarians think. Indeed the naive view that libraries and librarians are homogenous is a big part of the problem. Librarians in turn often fail to understand the pressures researchers are under, and are often equally ignorant of what happens in a professional publishing operation. And of course everyone hates the intermediaries.

That this is a political problem in a world of decreasing research resources is obvious. What is less obvious is the way that these silos have prevented key information and insights from travelling to the places where they might be used. Divisions that emerged a decade ago now prevent the very collaborations that are needed, not even to build new systems, but to bring together the right people to realise that they could be built.

I’m increasingly feeling that the old debates (what’s a reasonable cost, green vs gold, hybrid vs pure) are sterile and misleading. That we are missing fundamental economic and political issues in funding and managing a global scholarly communications ecosystem by looking at the wrong things. And that there are deep and damaging misunderstandings about what has happened, is happening, and what could happen in the future.

Of course, I live in my own silo. I can, I think, legitimately claim to have seen more silos than the average; in jobs, organisations and also disciplines. So it seems worth setting down that perspective. What I’ve realised, particularly over the past few months is that these views have crept up on me, and that there are quite a few things to be worked through, so this is not a post, it is a series, maybe eventually something bigger. Here I want to set out some headings, as a form of commitment to writing these things down. And to continuing to work through these things in public.

I won’t claim that this is all thought through, nor that I’ve got (even the majority of) it right. What I do hope is that in getting things down there will be enough here to be provocative and useful, and to help us collectively solve, and not just continue to paper over, the real challenges we face.

So herewith a set of ideas that I think are important to work through. More than happy to take requests on priorities, although the order seems roughly to make sense in my head.

What is it publishers do anyway?
What’s the technical problem in reforming scholarly publishing
The marginal costs of article publishing: Critiquing the Standard Analytics PaperÂ and follow up
What are the assets of a journal?
A journal is a club: New Working Paper
EconomiesÂ ofÂ scale
The costs (and savings) of community (self) management
Luxury brands, platform brands and emerging markets (or why BjÃ¶rn might be right about pricing)
Constructing authority: Prestige, impact factors and why brand is not going away
Submission shaping, not selection, is the key to a successful publishing operation
Challenges to the APC model I: The myth of “the cost per article”
Challenges to the APC model II: Fixed and variable costs in scholarly publishing
Alternative funding models and the risks of a regulated market
If this is a service industry why hasn’t it been unbundled already (or where is the Uber of scholarly publishing?)
Shared infrastructure platforms supporting community validation: Quality at scale. How can it be delivered and what skills and services are needed?
Breaking the deadlock: Where are the points where effective change can be started?

June 5, 2012June 5, 2012

Added Value: I do not think those words mean what you think they mean

There are two major strands to position of traditional publishers have taken in justifying the process by which they will make the, now inevitable, transition to a system supporting Open Access. The first of these is that the transition will cost “more money”. The exact costs are not clear but the, broadly reasonable, assumption is that there needs to be transitional funding available to support what will clearly be a mixed system over some transitional period. The argument of course is how much money and where it will come from, as well as an issue that hasn’t yet been publicly broached, how long will it last for? Expect lots of positioning on this over the coming months with statements about “average paper costs” and “reasonable time frames”, with incumbent subscription publishers targeting figures of around $2,500-5,000 and ten years respectively, and those on my side of the fence suggesting figures of around $1,500 and two years. This will be fun to watch but the key will be to see where this money comes from (and what subsequently gets cut), the mechanisms put in place to release this “extra” money and the way in which they are set up so as to wind down, and provide downwards price pressure.

The second arm of the publisher argument has been that they provide “added value” over what the scholarly community provides into the publication process. It has become a common call of the incumbent subscription publishers that they are not doing enough to explain this added value. Most recently David Crotty has posted at Scholarly Kitchen saying that this was a core theme of the recent SSP meeting. This value exists, but clearly we disagree on its quantitative value. The problem is we never see any actual figures given. But I think there are some recent numbers that can help us put some bounds on what this added value really is, and ironically they have been provided by the publisher associations in their efforts to head off six month embargo periods.

When we talk about added value we can posit some imaginary “real” value but this is really not a useful number – there is no way we can determine it. What we can do is talk about realisable value, i.e. the amount that the market is prepared to pay for the additional functionality that is being provided. I don’t think we are in a position to pin that number down precisely, and clearly it will differ between publishers, disciplines, and work flows but what I want to do is attempt to pin down some points which I think help to bound it, both from the provider and the consumer side. In doing this I will use a few figures and reports as well as place an explicit interpretation on the actions of various parties. The key data points I want to use are as follows:

All publisher associations and most incumbent publishers have actively campaigned against open access mandates that make the final refereed version of a scholarly article, prior to typesetting, publication, indexing, and archival, online in any form either immediately or within six months after publication. The Publishers Association (UK) and ALPSP are both on record as stating that such a mandate would be “unsustainable” and most recently that it would bankrupt publishers.
In a survey run by ALPSP of research libraries (although there are a series of concerns that have to be raised about the methodology) a significant proportion of libraries stated that they would cut some subscriptions if the majority research articles were available online six months after formal publication. The survey states that it appeared that most respondents assumed that the freely available version would be the original author version, i.e. not that which was peer reviewed.
There are multiple examples of financially viable publishing houses running a pure Open Access programme with average author charges of around $1500. These are concentrated in the life and medical sciences where there is both significant funding and no existing culture of pre-print archives.
The SCOAP3 project has created a formal journal publication framework which will provide open access to peer reviewed papers for a community that does have a strong pre-print culture utilising the ArXiv.

Let us start at the top. Publishers actively campaign against a reduction of embargo periods. This makes it clear that they do not believe that the product they provide, in transforming the refereed version of a paper into the published version, has sufficient value that their existing customers will pay for it at the existing price. That is remarkable and a frightening hole at the centre of our current model. The service providers can only provide sufficient added value to justify the current price if they additionally restrict access to the “non-added-value” version. A supplier that was confident about the value that they add would have no such issues, indeed they would be proud to compete with this prior version, confident that the additional price they were charging was clearly justified. That they do not should be a concern to all of us, not least the publishers.

Many publishers also seek to restrict access to any prior version, including the authors original version prior to peer review. These publishers don’t even believe that their management of the peer review process adds sufficient value to justify the price they are charging. This is shocking. The ACS, for instance, has such little faith in the value that it adds that it seeks to control all prior versions of any paper it publishes.

But what of the customer? Well the ALPSP survey, if we take the summary as I have suggested above at face value, suggests that libraries also doubt the value added by publishers. This is more of a quantitative argument but that someÂ libraries would cancel someÂ subscriptions shows that overall the community doesn’t believe the overallÂ current price is worth paying even allowing for a six month delay in access. So broadly speaking we can see that both the current service providers and the current customers do not believe that the costs of the pure service element of subscription based scholarly publication are justified by the value added through this service. Â This in combination means we can provide some upper bounds on the value added by publishers.

If we take the approximately $10B currently paid as cash costs to recompense publishers for their work in facilitating scholarly communications neither the incumbent subscription publishers nor their current library customers believe that the value added by publishers justifies the current cost, absent artificial restrictions to access to the non-value added version.

This tells us not very much about what the realisable value of this work actually is, but it does provide an upper bound.Â But what about a lower bound? One approach would be turn to the services provided to authors by Open Access publishers. These costs are willingly incurred by a paying customer so it is tempting to use these directly as a lower bound. This is probably reasonable in the life and medical sciences but as we move into other disciplinary areas, such as mathematics, it is clear that cost level is not seen as attractive enough. In addition the life and medical sciences have no tradition of wide availability of pre-publication versions of papers. That means for these disciplines the willingness to pay the approximately $1500 average cost of APCs is in part bound up with making the wish to make the paper effectively available through recognised outlets. We have not yet separated the value in the original copy versus the added value provided by this publishing service. The $1000-1500 mark is however a touchstone that is worth bearing in mind for these disciplines.

To do a fair comparison we would need to find a space where there is a thriving pre-print culture and a demonstrated willingness to pay a defined price for added-value in the form of formal publication over and above this existing availability. The Sponsoring Consortium for Open Access Publishing in Particle PhysicsÂ (SCOAP3) is an example of precisely this. The particle physics community have essentially decided unilaterally to assume control of the journals for their area and have placed their service requirements out for tender. Unfortunately this means we don’t have the final prices yet, but we will soon and the executive summary of the working party report suggests a reasonable price range of â‚¬1000-2000. If we assume the successful tender comes in at the lower end or slightly below of this range we see an accepted price for added value, over that already provided by the ArXiv for this disciplinary area, that is not a million miles away from that figure of $1500.

Of course this is before real price competition in this space is factored in. The realisable value is a function of the market and as prices inevitably drop there will be downward pressure on what people are willing to pay. There will also be increasing competition from archives, repositories, and other services that are currently free or near free to use, as they inevitably increase the quality and range of the services they offer. Some of these will mirror the services provided by incumbent publishers.

A reasonable current lower bound for realisableÂ added value by publication service providers is ~$1000 per paper. This is likely to drop as market pressures come to bear and existing archives and repositories seek to provide a wider range of low cost services.

Where does this leave us? Not with a clear numerical value we can ascribe to this added value, but that’s always going to be a moving target. But we can get some sense of the bottom end of the range. It’s currently $1000 or greater at least in some disciplines, but is likely to go down. It’s also likely to diversify as new providers offer subsets of the services currently offered as one indivisible lump. At the top end both customers and service providers actions suggest they believe that the added value is less than what we currently pay and that it is only artificial controls over access to the non-value add versions that justify the current price. What we need is a better articulation of what is the real value that publishers add and an honest conversation about what we are prepared to pay for it.

Key questions in the UK’s shift to open-access research(blogs.nature.com)
Attacking publishers will not make open access any more sustainable | Graham Taylor(guardian.co.uk)
The Publishers Association is hallucinating(svpow.com)
Publishers Oppose Bill on Scholarly Open Access(insidehighered.com)
A Wide Gulf on Open Access to Federally Financed Research(nytimes.com)
Crossing the Rubicon – Is the UK Going to Enable Open Access for All Taxpayer-Funded Research by 2014?(scholarlykitchen.sspnet.org)

February 3, 2012

The Research Works Act and the breakdown of mutual incomprehension

Man's face screaming/shouting. Stubbly wearing... — Image via Wikipedia

When the history of the Research Works Act, and the reaction against it, is written that history will point at the factors that allowed smart people with significant marketing experience to walk with their eyes wide open into the teeth of a storm that thousands of people would have predicted with complete confidence. That story will detail two utterly incompatible world views of scholarly communication. The interesting thing is that with the benefit of hindsight both will be totally incomprehensible to the observer from five or ten years in the future. It seems worthwhile therefore to try and detail those world views as I understand them.

The scholarly publisher

The publisher world view places them as the owner and guardian of scholarly communications. While publishers recognise that researchers provide the majority of the intellectual property in scholarly communication, their view is that researchers willingly and knowingly gift that property to the publishers in exchange for a set of services that they appreciate and value. In this view everyone is happy as a trade is carried out in which everyone gets what they want. The publisher is free to invest in the service they provide and has the necessary rights to look after and curate the content. The authors are happy because they can obtain the services they require without having to pay cash up front.

Crucial to this world view is a belief that research communication, the process of writing and publishing papers, is separate to the research itself. This is important because otherwise it would be clear that, at least in an ethical sense, that the writing of papers would be work for hire for the funders – and part and parcel of the contract of research. For the publishers the fact that no funding contract specifies that “papers must be published” is the primary evidence of this.

The researcher

The researcher’s perspective is entirely different. Researchers view their outputs as their own property, both the ideas, the physical outputs, and the communications. Within institutions you see this in the uneasy relationship between researchers and research translation and IP exploitation offices. Institutions try to avoid inflaming this issue by ensuring that economic returns on IP go largely to the researcher, at least until there is real money involved. But at that stage the issue is usually fudged as extra investment is required which dilutes ownership. But scratch a researcher who has gone down the exploitation path and then pushed gently aside and you’ll get a feel for the sense of personal ownership involved.

Researchers have a love-hate relationship with papers. Some people enjoy writing them, although I suspect this is rare. I’ve never met any researcher who did anything but hate the process of shepherding a paper through the review process. The service, as provided by the publisher, is viewed with deep suspicion. The resentment that is often expressed by researchers for professional editors is primarily a result of a loss of control over the process for the researcher and a sense of powerlessness at the hands of people they don’t trust. The truth is that researchers actually feel exactly the same resentment for academic editors and reviewers. They just don’t often admit it in public.

So from a researcher’s perspective, they have spent an inordinate amount of effort on a great paper. This is their work, their property. They are now obliged to hand over control of this to people they don’t trust to run a process they are unconvinced by. Somewhere along the line they sign something. Mostly they’re not too sure what that means, but they don’t give it much thought, let alone read it. But the idea that they are making a gift of that property to the publisher is absolute anathema to most researchers.

To be honest researchers don’t care that much about a paper once its out. It caused enough pain and they don’t ever want to see it again. This may change over time if people start to cite it and refer to it in supportive terms but most people won’t really look at a paper again. It’s a line on a CV, a notch on the bedpost. What they do notice is the cost, or lack of access, to other people’s papers. Library budgets are shrinking, subscriptions are being chopped, personal subscriptions don’t seem to be affordable any more.

The first response to this when researchers meet is “why can’t we afford access to ourÂ work?” The second is, given the general lack of respect for the work that publishers do, is to start down the process of claiming that they could do it better. Much of the rhetoric around eLife as a journal “led by scientists” is built around this view. And a lot of it is pure arrogance. Researchers neither understand, nor appreciate for the most part, the work of copyediting and curation, layout and presentation. While there are tools today that can do many of these things more cheaply there are very few researchers who could use them effectively.

The result…kaboom!

So the environment that set the scene for the Research Works Act revolt was a combination of simmering resentment amongst researchers for the cost of accessing the literature, combined with a lack of understanding of what it is publishers actually do. The spark that set it off was the publisher rhetoric about ownership of the work. This was always going to happen one day. The mutually incompatible world views could co-exist while there was still enough money to go around. While librarians felt trapped between researchers who demanded access to everything and publishers offering deals that just about meant they could scrape by things could continue.

Fundamentally once publishers started publicly using the term “appropriation of our property” the spark had flown. From the publisher perspective this makes perfect sense. The NIH mandate is a unilateral appropriation of their property. From the researcher perspective it is a system that essentially adds a bit of pressure to do something that they know is right, promote access, without causing them too much additional pain. Researchers feel they ought to be doing something to improve acccess to research output but for the most part they’re not too sure what, because they sure as hell aren’t in a position to change the journals they publish in. That would be (perceived to be) career suicide.

The elephant in the room

But it is of course the funder perspective that we haven’t yet discussed and looking forward, in my view it is the action of funders that will render both the publisher and researcher perspective incomprehensible in ten years time. The NIH view, similar to that of the Wellcome Trust, and indeed every funder I have spoken to, is that research communication is an intrinsic part of the research they fund. Funders take a close interest in the outputs that their research generates. One might say a proprietorial interest because again, there is a strong sense of ownership. The NIH Mandate language expresses this through the grant contract. Researchers are required to grant to the NIH a license to hold a copy of their research work.

In my view it is through research communication that research has outcomes and impact. From the perspective of a funder their main interest is that the research they fund generates those outcomes and impacts. For a mission driven funder the current situation signals one thing and it signals it very strongly. Neither publishers, nor researchers can be trusted to do this properly. What funders will do is move to stronger mandates, more along the Wellcome Trust lines than the NIH lines, and that this will expand. At the end of the day, the funders hold all the cards. Publishers never really did have a business model, they had a public subsidy. The holders of those subsidies can only really draw one conclusion from current events. That they are going to have to be much more active in where they spend it to successfully perform their mission.

The smart funders will work with the pre-existing prejudice of researchers, probably granting copyright and IP rights to the researchers, but placing tighter constraints on the terms of forward licensing. That funders don’t really need the publishers has been made clear by HHMI, Wellcome Trust, and the MPI. Publishing costs are a small proportion of their total expenditure. If necessary they have the resources and will to take that in house. The NIH has taken a similar route though technically implemented in a different way. Other funders will allow these experiments to run, but ultimately they will adopt the approaches that appear to work.

Bottom line: Within ten years all major funders will mandate CC-BY Open Access on publication arising from work they fund immediately on publication. Several major publishers will not survive the transition. A few will and a whole set of new players will spring up to fill the spaces. The next ten years look to be very interesting.

An Open Letter to David Willetts: A bold step towards opening British research (cameronneylon.net)
The power of blogs, or #OccupyScholComm [Confessions of a Science Librarian] (scienceblogs.com)
How much does it cost to get a scientific paper? [Discovering Biology in a Digital World] (scienceblogs.com)
Elsevier = evil (freethoughtblogs.com)
Mike Taylor: Academic publishers have become the enemies of science (junkscience.com)
The Research Works Act: Is It Time For a Rally To Restore Sanity? (scholarlykitchen.sspnet.org)
The cost of knowledge (terrytao.wordpress.com)
Will Academics’ Boycott Of Elsevier Be The Tipping Point For Open Access – Or Another Embarrassing Flop? (mt-soft.com.ar)
Down With the Research Works Act (pipeline.corante.com)
Branding academic publishers ‘enemies of science’ is offensive and wrong (guardian.co.uk)
Mysteries of the Elsevier Boycott (scholarlykitchen.sspnet.org)

October 17, 2010October 20, 2010

Some notes on Open Access Week

: Image via Wikipedia

Open Access Week kicks off for the fourth time tomorrow with events across the globe. I was honoured to be asked to contribute to the SPARC video that will be released tomorrow. The following are a transcription of my notes – not quite what I said but similar. The video was released at 9:00am US Eastern Time on Monday 18 October.

It has been a great year for Open Access. Open Access publishers are steaming ahead, OA mandates are spreading and growing and the quality and breadth of repositories is improving across institutions, disciplines, and nations. There have problems and controversies as well, many involving shady publishers seeking to take advantage of the Open Access brand, but even this in its way is a measure of success.

Beyond traditional publication we’ve also seen great strides made in the publication of a wider diversity of research outputs. Open Access to data, to software, and to materials is moving up the agenda. There have been real successes. The Alzheimer’s Disease Network showed what can change when sharing becomes a part of the process. Governments and Pharmaceutical companies are releasing data. Publicly funded researchers are falling behind by comparison!

For me although these big stories are important, and impressive, it is the little wins that matter. The thousands or millions of people who didn’t have to wait to read a paper, who didn’t need to write an email to get a dataset, who didn’t needlessly repeat and experiment known not to work. Every time a few minutes, a few hours, a few weeks, months, or years is saved we deliver more for the people who pay for this research. These small wins are the hardest to measure, and the hardest to explain, but they make up the bulk of the advantage that open approaches bring.

But perhaps the most important shift this year is something more subtle. Each morning I listen to the radio news, and every now and then there is a science story. These stories are increasingly prefaced with “…the research, published in the journal of…” and increasingly that journal is Open Access. A long running excuse for not referring the wider community to original literature has been its inaccessibility. That excuse is gradually disappearing. But more importantly there is a whole range of research outcomes that people, where they areÂ interested, where they care enough to dig deeper, can inform themselves about. Research that people can use to reach their own conclusions about their health, the environment, technology, or society.

I find it difficult to see this as anything but a good thing, but nonetheless we need to recognize that it brings challenges. Challenges of explaining clearly, challenges in presenting the balance of evidence in a useful form, but above all challenges of how to effectively engage those members of the public who are interested in the details of the research. The web has radically changed the expectations of those who seek and interact with information. Broadcast is no longer enough. People expect to be able to talk back.

The last ten years of the Open Access movement has been about how to make it possible for people to touch, read, and interact with the outputs of research. Perhaps the challenge for the next ten years is to ask how we can create access opportunities to the research itself. This won’t be easy, but then nothing that is worthwhile ever is.

Open Access Week 2010 from SPARC on Vimeo.

Open Access Week
Open Access Week (downes.ca)
“Open Access Week 2010” and related posts (blogs.library.jhu.edu)
News: Open Letter on Open Access – Inside Higher Ed (insidehighered.com)
Long Road to Open Access (insidehighered.com)

September 2, 2010September 2, 2010

What would scholarly communications look like if we invented it today?

: Image by cameronneylon via Flickr

I’ve largely stolen the title of this post from Daniel Mietchen because I it helped me to frame the issues. I’m giving an informal talk this afternoon and will, as I frequently do, use this to think through what I want to say. Needless to say this whole post is built to a very large extent on the contributions and ideas of others that are not adequately credited in the text here.

If we imagine what the specification for building a scholarly communications system would look like there are some fairly obvious things we would want it to enable. Registration of ideas, data or other outputs for the purpose of assigning credit and priority to the right people is high on everyone’s list. While researchers tend not to think too much about it, those concerned with the long term availability of research outputs would also place archival and safekeeping high on the list as well. I don’t imagine it will come as any surprise that I would rate the ability to re-use, replicate, and re-purpose outputs very highly as well. And, although I won’t cover it in this post, an effective scholarly communications system for the 21st century would need to enable and support public and stakeholder engagement. Finally this specification document would need to emphasise that the system will support discovery and filtering tools so that users can find the content they are looking for in a huge and diverse volume of available material.

So, filtering, archival, re-usability, and registration. Our current communications system, based almost purely on journals with pre-publication peer review doesn’t do too badly at archival although the question of who is actually responsible for actually doing the archival, and hence paying for it doesn’t always seem to have a clear answer. Nonetheless the standards and processes for archiving paper copies are better established, and probably better followed in many cases, than those for digital materials in general, and certainly for material on the open web.

The current system alsoÂ does reasonably well on registration, providing a firm date of submission, author lists, and increasingly descriptions of the contributions of those authors. Indeed the system defines the registration of contributions for the purpose of professional career advancement and funding decisions within the research community. It is a clear and well understood system with a range of community expectations and standards around it. Of course this is circular as the career progression process feeds the system and the system feeds career progression. It is also to some extent breaking down as wider measures of “impact” become important. However for the moment it is an area where the incumbent has clear advantages over any new system, around which we would need to grow new community standards, expectations, and norms.

It is on re-usability and replication where our current system really falls down. Access and rights are a big issue here, but ones that we are gradually pushing back. The real issues are much more fundamental. It is essentially assumed, in my experience, by most researchers that a paper will not contain sufficient information to replicate an experiment or analysis. Just consider that. Our primary means of communication, in a philosophical system that rests almost entirely on reproducibility, does not enable even simple replication of results. A lot of this is down to the boundaries created by the mindset of a printed multi-page article. Mechanisms to publish methods, detailed laboratory records, or software are limited, often leading to a lack of care in keeping and annotating such records. After all if it isn’t going in the paper why bother looking after it?

A key advantage of the web here is that we can publish a lot more with limited costs and we can publish a much greater diversity of objects. In principle we can solve the “missing information” problem by simply making more of the record available. However those important pieces of information need to be captured in the first place. Because they aren’t currently valued, because they don’t go in the paper, they often aren’t recorded in a systematic way that makes it easy to ultimately publish them. Open Notebook Science, with its focus on just publishing everything immediately, is one approach to solving this problem but it’s not for everyone, and causes its own overhead. The key problem is that recording more, and communicating it effectively requires work over and above what most of us are doing today. That work is not rewarded in the current system. This may change over time, if as I have argued we move to metrics based on re-use, but in the meantime we also need much better, easier, and ideally near-zero burden tools that make it easier to capture all of this information and publish it when we choose, in a useful form.

Of course, even with the perfect tools, if we start to publish a much greater portion of the research record then we will swamp researchers already struggling to keep up. We will need effective ways to filter this material down to reduce the volume we have to deal with. Arguably the current system is an effective filter. It almost certainly reduces the volume and rate at which material is published. Of all the research that is done, some proportion is deemed “publishable” by those who have done it, a small portion of that research is then incorporated into a draft paper and some proportion of those papers are ultimately accepted for publication. Up until 20 years ago where the resource pinch point was the decision of whether or not to publish something this is exactly what you would want. The question of whether it is an effective filter; is it actually filtering the right stuff out, is somewhat more controversial. I would say the evidence for that is weak.

When publication and distribution was the expensive part that was the logical place to make the decision. Now these steps are cheap the expensive part of the process is either peer review, the traditional process of making a decision prior to publication, or conversely, the curation and filtering after publication that is seen more widely on the open web. As I have argued I believe that using the patterns of the web will be ultimately a more effective means of enabling users to discover the right information for their needs. We should publish more; much more and much more diversely but we also need to build effective tools for filtering and discovering the right pieces of information. Clearly this also requires work, perhaps more than we are used to doing.

An imaginary solution

So what might this imaginary system that we would design look like. I’ve written before about both key aspects of this. Firstly I believe we need recording systems that as far as possible record and publish both the creation of objects, be they samples, data, or tools. As far as possible these should make a reliable, time stamped, attributable record of the creation of these objects as a byproduct of what the researcher needs to do anyway. A simple concept for instance is a label printer that, as a byproduct of printing off a label, makes a record of who, what, and when, publishing this simultaneously to a public or private feed.

Publishing rapidly is a good approach, not just for ideological reasons of openness but also some very pragmatic concerns. It is easier to publish at the source than to remember to go back and do it later. Things that aren’t done immediately are almost invariably forgotten or lost. Secondly rapid publication has the potential to both efficiently enable re-use and to prevent scooping risks by providing a time stamped citable record. This of course would require people to cite these and for those citations to be valued as a contribution; requiring a move away from considering the paper as the only form of valid research output (see also Michael Nielsen‘s interview with me).

It isn’t enough though, just to publish the objects themselves. We also need to be able to understand the relationship between them. In a semantic web sense this means creating the links between objects, recording the context in which they were created, what were their inputs and outputs. I have alluded a couple of times in the past to the OREChem Experimental Ontology and I think this is potentially a very powerful way of handling these kind of connections in a general way. In many cases, particularly in computational research, recording workflows or generating batch and log files could serve the same purpose, as long as a general vocabulary could be agreed to make this exchangeable.

As these objects get linked together they will form a network, both within and across projects and research groups, providing the kind of information that makes Google work, a network of citations and links that make it possible to directly measure the impact of a single dataset, idea, piece of software, or experimental method through its influence over other work. This has real potential to help solve both the discovery problem and the filtering problem. Bottom line, Google is pretty good at finding relevant text and they’re working hard on other forms of media. Research will have some special edges but can be expected in many ways to fit patterns that mean tools for the consumer web will work, particularly as more social components get pulled into the mix.

On the rare occasions when it is worth pulling together a whole story, for a thesis, or a paper, authors would then aggregate objects together, along with text and good visuals to present the data. The telling of a story then becomes a special event, perhaps one worthy of peer review in its traditional form. The forming of a “paper” is actually no more than providing new links, adding grist to the search and discovery mill, but it can retain its place as a high status object, merely losing its role as the only object worth considering.

So in short, publish fragments, comprehensively and rapidly. Weave those into a wider web of research communication, and from time to time put in the larger effort required to tell a more comprehensive story. This requires tools that are hard to build, standards that are hard to agree, and cultural change that at times seems like spitting into a hurricane. Progress is being made, in many places and in many ways, but how can we take this forward today?

Practical steps for today

I want to write more about these ideas in the future but here I’ll just sketch out a simple scenario that I hope can be usefully implemented locally but provide a generic framework to build out without necessarily requiring a massive agreement on standards.

The first step is simple, make a record, ideally an address on the web for everything we create in the research process. For data and software just the files themselves, on a hard disk is a good start. Pushing them to some sort of web storage, be it a blog, github, an institutional repository, or some dedicated data storage service, is even better because it makes step two easy.

Step two is to create feeds that list all of these objects, their addresses and as much standard metadata as possible, who and when would be a good start. I would make these open by choice, mainly because dealing with feed security is a pain, but this would still work behind a firewall.

Step three gets slightly harder. Where possible configure your systems so that inputs can always be selected from a user-configurable feed. Where possible automate the pushing of outputs to your chosen storage systems so that new objects are automatically registered and new feeds created.

This is extraordinarily simple conceptually. Create feeds, use them as inputs for processes. It’s not so straightforward to build such a thing into an existing tool or framework, but it doesn’t need to be too terribly difficult either. And it doesn’t need to bother the user either. Feeds should be automatically created, and presented to the user as drop down menus.

The step beyond this, creating a standard framework for describing the relationships between all of these objects is much harder. Not because its difficult, but because it requires an agreement on standards for how to describe those relationships. This is do-able and I’m very excited by the work at Southampton on the OREChem Experimental Ontology but the social problems are harder. Others prefer the Open Provenance Model or argue that workflows are the way to manage this information. Getting agreement on standards is hard, particularly if we’re trying to maximise their effective coverage but if we’re going to build a computable record of science we’re going to have to tackle that problem. If we can crack it and get coverage of the records via a compatible set of models that tell us how things are related then I think we will be will placed to solve the cultural problem of actually getting people to use them.

July 8, 2010August 5, 2010

Itâ€™s not information overload, nor is it filter failure: Itâ€™s a discovery deficit

: Image via Wikipedia

Clay Shirkyâ€™s famous soundbite has helped to focus on minds on the way information on the web needs to be tackled and a move towards managing the process of selecting and prioritising information. But in the research space Iâ€™m getting a sense that it is fuelling a focus on preventing publication in a way that is analogous to the conventional filtering process involved in peer reviewed publication.

Most recently this surfaced at Chronicle of Higher Education, to which there were many responses, Derek Lowe’s being one of the most thought out. But this is not isolated.

@JISC_RSC_YH:Â How can we provide access to online resources and maintain quality of content? Â #rscrc10 [twitter via@branwenhide]

Me: @branwenhide @JISC_RSC_YH isn’t the point of the web that we can decouple the issues of access and quality from each other? [twitter]

There is a widely held assumption that putting more research onto the web makes it harder to find the research you are looking for. Publishing more makes discovery easier.

The great strength of the web is that you can allow publication of anything at very low marginal cost without limiting the ability of people to find what they are interested in, at least in principle. Discovery mechanisms are good enough, while being a long way from perfect, to make it possible to mostly find what youâ€™re looking for while avoiding what youâ€™re not looking for.Â Search acts as a remarkable filter over the whole web through making discovery possible for large classes of problem. And high quality search algorithms depend on having a lot of data.

It is very easy to say there is too much academic literature â€“ and I do. But the solution which seems to be becoming popular is to argue for an expansion of the traditional peer review process. To prevent stuff getting onto the web in the first place. This is misguided for two important reasons. Firstly it takes the highly inefficient and expensive process of manual curation and attempts to apply it to every piece of research output created. This doesnâ€™t work today and wonâ€™t scale as the diversity and sheer number of research outputs increases tomorrow. Secondly it doesnâ€™t take advantage of the nature of the web. They way to do this efficiently is to publish everything at the lowest cost possible, and then enhance the discoverability of work that you think is important. We donâ€™t need publication filters, we need enhanced discovery engines. Publishing is cheap, curation is expensive whether it is applied to filtering or to markup and search enhancement.

Filtering before publication worked and was probably the most efficient place to apply the curation effort when the major bottleneck was publication. Value was extracted from the curation process of peer review by using it reduce the costs of layout, editing, and printing through simple printing less.Â But it created new costs, and invisible opportunity costs where a key piece of information was not made available. Today the major bottleneck is discovery. Of the 500 papers a week I could read, which ones should I read, and which ones just contain a single nugget of information which is all I need? In the Research Information Network study of costs of scholarly communication the largest component of publication creation and use cycle was peer review, followed by the cost of finding the articles to read which represented some 30% of total costs. On the web, the place to put in the curation effort is in enhancing discoverability, in providing me the tools that will identify what I need to read in detail, what I just need to scrape for data, and what I need to bookmark for my methods folder.

The problem we have in scholarly publishing is an insistence on applying this print paradigm publication filtering to the web alongside an unhealthy obsession with a publication form, the paper, which is almost designed to make discovery difficult. If I want to understand the whole argument of a paper I need to read it. But if I just want one figure, one number, the details of the methodology then I donâ€™t need to read it, but I still need to be able to find it, and to do so efficiently, and at the right time.

Currently scholarly publishers vie for the position of biggest barrier to communication. The stronger the filter the higher the notional quality. But being a pure filter play doesn’t add value because the costs of publication are now low. The value lies in presenting, enhancing, curating the material that is published. If publishers instead vied to identify, markup, and make it easy for the right people to find the right information they would be working with the natural flow of the web. Make it easy for me to find the piece of information, feature work that is particularly interesting or important, re-intepret it so I can understand it coming from a different field, preserve it so that when a technique becomes useful in 20 years the right people can find it. The brand differentiator then becomes which articles you choose to enhance, what kind of markup you do, and how well you do it.

All of these are things that publishers already do. And they are services that authors and readers will be willing to pay for. But at the moment the whole business and marketing model is built around filtering, and selling that filter. By impressing people with how much you are throwing away. Trying to stop stuff getting onto the web is futile, inefficient, and expensive. Saving people time and money by helping them find stuff on the web is an established and successful business model both at scale, and in niche areas. Providing credible and respected quality measures is a viable business model.

We donâ€™t need more filters or better filters in scholarly communications â€“ we donâ€™t need to block publication at all. Ever. What we need are tools for curation and annotation and re-integration of what is published. And a framework that enables discovery of the right thing at the right time. And the data that will help us to build these. The more data, the more reseach published, the better. Which is actually what Shirky was saying all alongâ€¦

Academic publishing is archaic (daniel-lemire.com)
too much research (text-patterns.thenewatlantis.com)
Clay Shirky’s COGNITIVE SURPLUS: how the net lets us share and do more than ever (boingboing.net)
Cups, Buckets, Pools, and Puddles: When the Flood of Papers Won’t Abate, Which Do You Choose? (scholarlykitchen.sspnet.org)

February 5, 2010February 9, 2010

Peer review: What is it good for?

: Image by Gideon Burton via Flickr

It hasn’t been a real good week for peer review. In the same week that the Lancet fully retract the original Wakefield MMR article (while keeping the retraction behind a login screen – way to go there on public understanding of science), the main stream media went to town on the report of 14 stem cell scientists writing an open letter making the claim that peer review in that area was being dominated by a small group of people blocking the publication of innovative work. I don’t have the information to actually comment on the substance of either issue but I do want to reflect on what this tells us about the state of peer review.

There remains much reverence of the traditional process of peer review. I may be over interpreting the tenor of Andrew Morrison’s editorial in BioEssays but it seems to me that he is saying, as many others have over the years “if we could just have the rigour of traditional peer review with the ease of publication of the web then all our problems would be solved”. Â Scientists worship at the altar of peer review, and I use that metaphor deliberately because it is rarely if ever questioned. Somehow the process of peer review is supposed to sprinkle some sort of magical dust over a text which makes it “scientific” or “worthy”, yet while we quibble over details of managing the process, or complain that we don’t get paid for it, rarely is the fundamental basis on which we decide whether science is formally published examined in detail.

There is a good reason for this. THE EMPEROR HAS NO CLOTHES! [sorry, had to get that off my chest]. The evidence that peer review as traditionally practiced is of any value at all is equivocal at best (Science 214, 881; 1981, J Clinical Epidemiology 50, 1189; 1998, Brain 123, 1954; 2000, Learned Publishing 22, 117; 2009). It’s not even really negative. That would at least be useful. There are a few studies that suggest peer review is somewhat better than throwing a dice and a bunch that say it is much the same. It is at its best at dealing with narrow technical questions, and at its worst at determining “importance” is perhaps the best we might say. Which for anyone who has tried to get published in a top journal or written a grant proposal ought to be deeply troubling. Professional editorial decisions may in fact be more reliable, something that Philip Campbell hints at in his response to questions about the open letter [BBC article]:

Our editors […] have always used their own judgement in what we publish. We have not infrequently overruled two or even three sceptical referees and published a paper.

But there is perhaps an even more important procedural issue around peer review. Whatever value it might have we largely throw away. Few journals make referee’s reports available, virtually none track the changes made in response to referee’s comments enabling a reader to make their own judgement as to whether a paper was improved or made worse. Referees get no public credit for good work, and no public opprobrium for poor or even malicious work. And in most cases a paper rejected from one journal starts completely afresh when submitted to a new journal, the work of the previous referees simply thrown out of the window.

Much of the commentary around the open letter has suggested that the peer review process should be made public. But only for published papers. This goes nowhere near far enough. One of the key points where we lose value is in the transfer from one journal to another. The authors lose out because they’ve lost their priority date (in the worse case giving the malicious referees the chance to get their paper in first). The referees miss out because their work is rendered worthless. Even the journals are losing an opportunity to demonstrate the high standards they apply in terms of quality and rigor – and indeed the high expectations they have of their referees.

We never ask what the cost of not publishing a paper is or what the cost of delaying publication could be. Eric Weinstein provides the most sophisticated view of this that I have come across and I recommend watching his talk at Science in the 21st Century from a few years back. There is a direct cost to rejecting papers, both in the time of referees and the time of editors, as well as the time required for authors to reformat and resubmit. But the bigger problem is the opportunity cost – how much that might have been useful, or even important, is never published? And how much is research held back by delays in publication? How many follow up studies not done, how many leads not followed up, and perhaps most importantly how many projects not refunded, or only funded once the carefully built up expertise in the form of research workers is lost?

Rejecting a paper is like gambling in a game where you can only win. There are no real downside risks for either editors or referees in rejecting papers. There are downsides, as described above, and those carry real costs, but those are never borne by the people who make or contribute to the decision. Its as though it were a futures market where you can only lose if you go long, never if you go short on a stock. In Eric’s terminology those costs need to be carried, we need to require that referees and editors who “go short” on a paper or grant are required to unwind their position if they get it wrong. This is the only way we can price in the downside risks into the process. If we want open peer review, indeed if we want peer review in its traditional form, along with the caveats, costs and problems, then the most important advance would be to have it for unpublished papers.

Journals need to acknowledge the papers they’ve rejected, along with dates of submission. Ideally all referees reports should be made public, or at least re-usable by the authors. If full publication, of either the submitted form of the paper or the referees report is not acceptable then journals could publish a hash of the submitted document and reports against a local key enabling the authors to demonstrate submission date and the provenance of referees reports as they take them to another journal.

In my view referees need to be held accountable for the quality of their work. If we value this work we should also value and publicly laud good examples. And conversely poor work should be criticised. Any scientist has received reviews that are, if not malicious, then incompetent. And even if we struggle to admit it to others we can usually tell the difference between critical, but constructive (if sometimes brutal), and nonsense. Most of us would even admit that we don’t always do as good a job as we would like. After all, why should we work hard at it? No credit, no consequences, why would you bother? It might be argued that if you put poor work in you can’t expect good work back out when your own papers and grants get refereed. This again may be true, but only in the long run, and only if there are active and public pressures to raise quality. None of which I have seen.

Traditional peer review is hideously expensive. And currently there is little or no pressure on its contributors or managers to provide good value for money. It is also unsustainable at its current level. My solution to this is to radically cut the number of peer reviewed papers probably by 90-95% leaving the rest to be published as either pure data or pre-prints. But the whole industry is addicted to traditional peer reviewed publications, from the funders who can’t quite figure out how else to measure research outputs, to the researchers and their institutions who need them for promotion, to the publishers (both OA and toll access) and metrics providers who both feed the addiction and feed off it.

So that leaves those who hold the purse strings, the funders, with a responsibility to pursue a value for money agenda. A good place to start would be a serious critical analysis of the costs and benefits of peer review.

Addition after the fact: Pointed out in the comments that there are other posts/papers I should have referred to where people have raised similar ideas and issues. In particular Martin Fenner’s post at Nature Network. The comments are particularly good as an expert analysis of the usefulness of the kind of “value for money” critique I have made. Also a paper in the Arxiv from Stefano Allesina. Feel free to mention others and I will add them here.

January 24, 2010January 25, 2010

Why I am disappointed with Nature Communications

Towards the end of last year I wrote up some initial reactions to the announcement of Nature Communications and the communications team at NPG were kind enough to do a Q&A to look at some of the issues and concerns I raised. Specifically I was concerned about two things. The licence that would be used for the “Open Access” option and the way that journal would be positioned in terms of “quality”, particularly as it related to the other NPG journals and the approach to peer review.

Unfortunately I have to say that I feel these have been fudged, and this is unfortunate because there was a real opportunity here to do something different and quite exciting. Â I get the impression that that may even have been the original intention. But from my perspective what has resulted is a poor compromise between my hopes and commercial concerns.

At the centre of my problem is the use of a Creative Commons Attribution Non-commercial licence for the “Open Access” option. This doesn’t qualify under the BBB declarations on Open Access publication and it doesn’t qualify for the SPARC seal for Open Access. But does this really matter or is it just a side issue for a bunch of hard core zealots? After all if people can see it that’s a good start isn’t it? Well yes, it is a good start but non-commercial terms raise serious problems. Putting aside the fact that there is an argument that universities are commercial entities and therefore can’t legitimately use content with non-commercial licences the problem is that NC terms limit the ability of people to create new business models that re-use content and are capable of scaling.

We need these business models because the current model of scholarly publication is simply unaffordable. The argument is often made that if you are unsure whether you areÂ allowed to use content then you can just ask, but this simply doesn’t scale. And lets be clear about some of the things that NC means you’re not licensed for: using a paper for commercially funded research even within a university, using the content of paper to support a grant application, using the paper to judge a patent application, using a paper to assess the viability of a business idea…the list goes on and on. Yes you can ask if you’re not sure, but asking each and every time does not scale. This is the central point of the BBB declarations. For scientific communication to scale it must allow the free movement and re-use of content.

Now if this were coming from any old toll access publisher I would just roll my eyes and move on, but NPG sets itself up to be judged by a higher standard. NPG is a privately held company, not beholden to share holders. It is a company that states that it is committed to advancing scientific communication not simply traditional publication. Non-commercial licences do not do this. From the Q&A:

Q: Would you accept that a CC-BY-NC(ND) licence does not qualify as Open Access under the terms of the Budapest and Bethesda Declarations because it limits the fields and types of re-use?

A: Yes, we do accept that. But we believe that we are offering authors and their funders the choices they require.Our licensing terms enable authors to comply with, or exceed, the public access mandates of all major funders.

NPG is offering the minimum that allows compliance. Not what will most effectively advance scientific communication. Again, I would expect this of a shareholder-controlled profit-driven toll access dead tree publisher but I am holding NPG to a higher standard. Even so there is a legitimate argument to be made that non-commercial licences are needed to make sure that NPG can continue to support these and other activities. This is why I asked in the Q&A whether NPG made significant money off re-licensing of content for commercial purposes. This is a discussion we could have on the substance – the balance between a commercial entity providing a valuable service and the necessary limitations we might accept as the price of ensuring the continued provision of that service. It is a value for money judgement. But not one we can make without a clear view of the costs and benefits.

So I’m calling NPG on this one. Make a case for why non-commercial licences are necessary or even beneficial, not why they are acceptable. They damage scientific communication, they create unnecessary confusion about rights, and more importantly they damage the development of new business models to support scientific communication. Explain why it is commercially necessary for the development of these new activities, or roll it back, and take a lead on driving the development of science communication forward. Don’t take the kind of small steps we expect from other, more traditional, publishers. Above all, lets have that discussion. What is the price we would have to pay to change the license terms?

Because I think it goes deeper. I think that NPG are actually limiting their potential income by focussing on the protection of their income from legacy forms of commercial re-use. They could make more money off this content by growing the pie than by protecting their piece of a specific income stream. It goes to the heart of a misunderstanding about how to effectively exploit content on the web. There is money to be made through re-packaging content for new purposes. The content is obviously key but the real value offering is the Nature brand. Which is much better protected as a trademark than through licensing. Others could re-package and sell on the content but they can never put the Nature brand on it.

By making the material available for commercial re-use NPG would help to expand a high value market for re-packaged content which they would be poised to dominate. Sure, if you’re a business you could print off your OA Nature articles and put them on the coffee table, but if you want to present them to investors you want that Nature logo and Nature packaging that you can only get from one place. Â And that NPG does damn well. NPG often makes the case that it adds value through selection, presentation, and aggregation. It is the editorial brand that is of value. Let’s see that demonstrated though monetization of the brand, rather than through unnecessarily restricting the re-use of the content, especially where authors are being charged $5000 to cover the editorial costs.

November 16, 2009December 30, 2009

Nature Communications Q&A

A few weeks ago I wrote aÂ postÂ looking at the announcement ofÂ Nature Communications, a new journal fromÂ Nature Publishing GroupÂ that will be online only and have an open access option.Â Grace Baynes, fromthe Â NPG communications team kindly offered to get some of the questions raised in that piece answered and I am presenting my questions and the answers from NPG here in their complete form. I will leave any thoughts and comments on the answers for another post. There has also been more information from NPG available at theÂ journal websiteÂ since my original post, some of which is also dealt with below. Below this point, aside from formatting I have left the response in its original form.

Q: What is the motivation behind Nature Communications? Where did the impetus to develop this new journal come from?

NPG has always looked to ensure it is serving the scientific community and providing services which address researchers changing needs. The motivation behind Nature Communications is to provide authors with more choice; both in terms of where they publish, and what access model they want for their papers.At present NPG does not provide a rapid publishing opportunity for authors with high-quality specialist work within the Nature branded titles. The launch of Nature Communications aims to address that editorial need. Further, Nature Communications provides authors with a publication choice for high quality work, which may not have the reach or breadth of work published in Nature and the Nature research journals, or which may not have a home within the existing suite of Nature branded journals. At the same time authors and readers have begun to embrace online only titles â€“ hence we decided to launch Nature Communications as a digital-first journal in order to provide a rapid publication forum which embraces the use of keyword searching and personalisation. Developments in publishing technology, including keyword archiving and personalization options for readers, make a broad scope, online-only journal like Nature Communications truly useful for researchers.

Over the past few years there has also been increasing support by funders for open access, including commitments to cover the costs of open access publication. Therefore, we decided to provide an open access option within Nature Communications for authors who wish to make their articles open access.

Q: What opportunities does NPG see from Open Access? What are the most important threats?

Opportunities: Funder policies shifting towards supporting gold open access, and making funds available to cover the costs of open access APCs. These developments are creating a market for journals that offer an open access option.Threats: That the level of APCs that funders will be prepared to pay will be too low to be sustainable for journals with high quality editorial and high rejection rates.

Q: Would you characterise the Open Access aspects of NC as a central part of the journal strategy

Yes. We see the launch of Nature Communications as a strategic development.Nature Communications will provide a rapid publication venue for authors with high quality work which will be of interest to specialists in their fields. The title will also allow authors to adhere to funding agency requirements by making their papers freely available at point of publication if they wish to do so.

or as an experiment that is made possible by choosing to develop a Nature branded online only journal?

NPG doesnâ€™t view Nature Communications as experimental. Weâ€™ve been offering open access options on a number of NPG journals in recent years, and monitoring take-up on these journals. Weâ€™ve also been watching developments in the wider industry.

Q: What would you give as the definition of Open Access within NPG?

Itâ€™s not really NPGâ€™s focus to define open access. Weâ€™re just trying to offer choice to authors and their funders.

Q: NPG has a number of “Open Access” offerings that provide articles free to the user as well as specific articles within Nature itself under a Creative Commons Non-commercial Share-alike licence with the option to authors to add a “no derivative works” clause. Can you explain the rationale behind this choice of licence?

Again, itâ€™s about providing authors with choice within a framework of commercial viability.On all our journals with an open access option, authors can choose between the Creative Commons AttribuÂtion Noncommercial Share Alike 3.0 Unported Licence and the Creative Commons Attribution-Non-commerÂcial-No Derivs 3.0 Unported Licence.The only instance where authors are not given a choice at present are genome sequences articles published in Nature and other Nature branded titles, which are published under Creative Commons AttribuÂtion Noncommercial Share Alike 3.0 Unported Licence. No APC is charged for these articles, as NPG considers making these freely available an important service to the research community.

Q: Does NPG recover significant income by charging for access or use of these articles for commercial purposes? What are the costs (if any) of enforcing the non-commercial terms of licences? Does NPG actively seek to enforce those terms?

Weâ€™re not trying to prevent derivative works or reuse for academic research purposes (as evidenced by our recent announcementÂ that NPG author manuscripts would be included in UK PMCâ€™s open access subset).What we are trying to keep a cap on is illegal e-prints and reprints where companies may be using our brands or our content to their benefit. Yes we do enforce these terms, and we have commercial licensing and reprints services available.

Q: What will the licence be for NC?

Authors who wish to take for the open access option can choose either the Creative Commons AttribuÂtion Noncommercial Share Alike 3.0 Unported Licence or the Creative Commons Attribution-Non-commerÂcial-No Derivs 3.0 Unported Licence.Subscription access articles will be published under NPGâ€™s standardÂ License to Publish.

Q: Would you accept that a CC-BY-NC(ND) licence does not qualify as Open Access under the terms of the Budapest and Bethesda Declarations because it limits the fields and types of re-use?

Yes, we do accept that. But we believe that we are offering authors and their funders the choices they require.Our licensing terms enable authors to comply with, or exceed, the public access mandates of all major funders.

Q: The title “Nature Communications” implies rapid publication. The figure of 28 days from submission to publication has been mentioned as a minimum. Do you have a target maximum or indicative average time in mind?

We are aiming to publish manuscripts within 28 days of acceptance, contrary to an earlier report which was in error. In addition, Nature Communications will have a streamlined peer review system which limits presubmission enquiries, appeals and the number of rounds of review â€“ all of which will speed up the decision making process on submitted manuscripts.

Q: In the press release an external editorial board is described. This is unusual for a Nature branded journal. Can you describe the makeup and selection of this editorial board in more detail?

In deciding whether to peer review manuscripts, editors may, on occasion, seek advice from a member of the Editorial Advisory Panel. However, the final decision rests entirely with the in-house editorial team. This is unusual for a Nature-branded journal, but in fact, Nature Communications is simply formalising a well-established system in place at other Nature journals.The Editorial Advisory Panel will be announced shortly and will consist of recognized experts from all areas of science. Their collective expertise will support the editorial team in ensuring that every field is represented in the journal.

Q: Peer review is central to the Nature brand, but rapid publication will require streamlining somewhere in the production pipeline. Can you describe the peer review process that will be used at NC?

The peer review process will be as rigorous as any Nature branded title â€“ Nature Communications will only publish papers that represent a convincing piece of work. Instead, the journal will achieve efficiencies by discouraging presubmission enquiries, capping the number of rounds of review, and limiting appeals on decisions. This will enable the editors to make fast decisions at every step in the process.

Q: What changes to your normal process will you implement to speed up production?

The production process will involve a streamlined manuscript tracking system and maximise the use of metadata to ensure manuscripts move swiftly through the production process. All manuscripts will undergo rigorous editorial checks before acceptance in order to identify, and eliminate, hurdles for the production process. Alongside using both internal and external production staff we will work to ensure all manuscripts are published within 28days of acceptance â€“ however some manuscripts may well take longer due to unforeseen circumstances. We also hope the majority of papers will take less!

Q: What volume of papers do you aim to publish each year in NC?

As Nature Communications is an online only title the journal is not limited by page-budget. As long as we are seeing good quality manuscripts suitable for publication following peer review we will continue to expand. We aim to launch publishing 10 manuscripts per month and would be happy remaining with 10-20 published manuscripts per month but would equally be pleased to see the title expand as long as manuscripts were of suitable quality.

Q: The Scientist article says there would be an 11 page limit. Can you explain the reasoning behind a page limit on an online only journal?

Articles submitted to Nature Communications can be up to 10 pages in length. Any journal, online or not, will consider setting limits to the â€˜printed paperâ€™ size (in PDF format) primarily for the benefit of the reader. Setting a limit encourages authors to edit their text accurately and succinctly to maximise impact and readability.

Q: The press release description of pap
ers for NC sounds very similar to papers found in the other “Nature Baby” journals, such as Nature Physics, Chemistry, Biotechnology, Methods etc. Can you describe what would be distinctive about a paper to make it appropriate for NC? Is there a concern that it will compete with other Nature titles?

Nature Communications will publish research of very high quality, but where the scientific reach and public interest is perhaps not that required for publication in Nature and the Nature research journals. We expect the articles published in Nature Communications to be of interest and importance to specialists in their fields. This scope of Nature Communications also includes areas like high-energy physics, astronomy, palaeontology and developmental biology, that aren’t represented by a dedicated Nature research journal.

Q: To be a commercial net gain NC must publish papers that would otherwise have not appeared in other Nature journals. Clearly NPG receives many such papers that are not published but is it not that case that these papers are, at least as NPG measures them, by definition not of the highest quality? How can you publish more while retaining the bar at its present level?

Nature journals have very high rejection rates, in many cases well over 90% of what is submitted. A proportion of these articles are very high quality research and of importance for a specialist audience, but lack the scientific reach and public interest associated with high impact journals like Nature and the Nature research journals. The best of these manuscripts could find a home in Nature Communications. In addition, we expect to attract new authors to Nature Communications, who perhaps have never submitted to the Nature family of journals, but are looking for a high quality journal with rapid publication, a wide readership and an open access option.

Q: What do you expect the headline subscription fee for NC to be? Can you give an approximate idea of what an average academic library might pay to subscribe over and above their current NPG subscription?

We havenâ€™t set prices for subscription access for Nature Communications yet, because we want them to base them on the number of manuscripts the journal may potentially publish and the proportion of open access content. This will ensure the site licence price is based on absolute numbers of manuscripts available through subscription access. Weâ€™ll announce these in 2010, well before readers or librarians will be asked to pay for content.

Q: Do personal subscriptions figure significantly in your financial plan for the journal?

No, there will be no personal subscriptions for Nature Communications. Nature Communications will publish no news or other â€˜front half contentâ€™, and we expect many of the articles to be available to individuals via the open access option or an institutional site license. If researchers require access to a subscribed-access article that is not available through their institution or via the open-access option, they have the option of buying the article through traditional pay-per-view and docuÂment-delivery options. For a journal with such a broad scope, we expect individuals will want to pick and choose the articles they pay for.

Q: What do you expect author charges to be for articles licensed for free re-use?

$5,000 (The Americas)â‚¬3,570 (Europe)Â¥637,350 (Japan)Â£3,035 (UK and Rest of World)Manuscripts accepted before April 2010 will receive a 20% discount off the quoted APC.

Q: Does this figure cover the expected costs of article production?

This is a flat fee with no additional production charges (such as page or colour figure charges). The article processing charges have been set to cover our costs, including article production.

Q: The press release states that subscription costs will be adjusted to reflect the take up of the author-pays option. Can you commit to a mechanistic adjustment to subscription charges based on the percentage of author-pays articles?

We are working towards a clear pricing principle for Nature Communications, using input from NESLi and others. Because the amount of subscription content may vary substantially from year to year, an entirely mechanistic approach may not give libraries the ability to they need to forecast with confidence.

Q: Does the strategic plan for the journal include targets for take-up of the author-pays option? If so can you disclose what those are?

We have modelled Nature Communications as an entirely subscription access journal, a totally open access journal, and continuing the hybrid model on an ongoing basis. The business model works at all these levels.

Q: If the author-pays option is a success at NC will NPG consider opening up such options on other journals?

We already have open access options on more than 10 journals, and we have recently announced the launch in 2010 of a completely open access journal, Cell Death & Disease. In addition, we publish the successful open access journal Molecular Systems Biology, in association with the European Molecular Biology OrganizationWeâ€™re open to new and evolving business models where it is sustainable.The rejection rates on Nature and the Nature research journals are so high that we expect the APC for these journals would be substantially higher than that for Nature Communications.

Q: Do you expect NC to make a profit? If so over what timeframe?

As with all new launches we would expect Nature Communications to be financially viable during a reasonable timeframe following launch.

Q: In five years time what are the possible outcomes that would be seen at NPG as the journal being a success? What might a failure look like?

We would like to see Nature Communications publish high quality manuscripts covering all of the natural sciences and work to serve the research community. The rationale for launching this title is to ensure NPG continues to serve the community with new publishing opportunities.A successful outcome would be a journal with an excellent reputation for quality and service, a good impact factor, a substantial archive of published papers that span the entire editorial scope and significant market share.

The shape of solutions

The challenges of the solutions

Means of escape

Barriers to change

Related articles

Related articles

Related articles

Related articles by Zemanta