business model – Science in the Open

February 12, 2012February 12, 2012

The parable of the garage: Why the business model shift is so hard

An auto mechanic works on a rally car at the 2... — Image via Wikipedia

Mike Taylor has a parable on the Guardian Blog about research communication and I thought it might be useful to share one that I have been using in talks recently. For me it illustrates just how silly the situation is, and how hard it is to break out of the mindset of renting access to content for the incumbent publishers. It also, perhaps, has a happier ending.

Imagine a world very similar to our own. People buy cars, they fill them with fuel, they pay road tax and these things largely work as well as they do in our own world. There is just one difference, when a car needs its annual service and is taken to a garage – just as we do – for its mechanical checkup and maintenance. In return for the service, the car is then gifted to the mechanic, who in turn provides it back to the owner for a rental fee.

Some choose to do their own servicing, or form clubs where they can work together to help service each other’s cars, but this is both hard work, and to be frank, a little obsessive and odd. Most people are perfectly happy to hand over the keys and then rent them back. It works just fine. The trouble is society is changing, there is an increase in public transport, the mechanics are worried about their future, and the users seem keen to do new and strange things with the cars. They want to use them for work purposes, they want to loan them to friends, in some cases they event want to use them to teach others to drive – possibly even for money.

Now for the mechanic, this is a concern on two levels. First they are uncertain about their future as the world seems to be changing pretty fast. How can they provide certainty for themselves? Secondly all these new uses seem to have the potential to make money for other people. That hardly seems fair and the mechanics want a slice of that income, derived as it is from their cars. So looking closely at their existing contracts they identify that the existing agreements only provide for personal use. No mention is made of work use, certainly not of lending it to others, and absolutely not for teaching.

For the garage, in this uncertain world, this is a godsend. Here are a whole set of new income streams. They can provide for the users to do all these new things, they have a diversified income stream, and everyone is happy! They could call it “Universal Uses” – a menu of options that car users can select from according to their needs and resources. Everyone will understand that this is a fair exchange. The cars are potentially generating more money and everyone gets a share of it, both the users and the real owners, the mechanics.

Unfortunately the car users aren’t so happy. They object to paying extra. After all they feel that the garage is already recouping the costs of Â doing the service and making a healthy profit so why do they need more? Having to negotiate each new use is a real pain in the backside and the fine print seems to be so fine that every slight variation requires a new negotiation and a new payment. Given the revolution in the possible uses they might want to be putting their cars to isn’t this just slowing down progress? Many of them even threaten to do their own servicing.

The problem for the garages is that they face a need for new equipment and staff training. Each time they see a new use that they don’t charge for they see a lost sales opportunity. They spend money on getting the best lawyers to draw up new agreements, make concessions on one use to try and shore up the market for another. At every stage there’s a need to pin everything down, lock down the cars, ensure they can’t be used for unlicensed purposes, all of which costs more money, leading to a greater need to focus on different possibilities for charging. And every time they do this it puts them more and more at odds with their customers. But they’re so focussed on a world view in which they need to charge for every possible different use of the “their” cars that they can’t see a way out beyond identifying each new possible use as it comes up and pinning it to the wall with a new contract and a new charge and new limitations to prevent any unexpected new opportunities for income being lost.

But things are changing. There’s a couple of radical new businesses down the road, BMC Motors and PLoS Garages. They do things differently. They charge up front for the maintenance and service but then allow the cars to be used for any purpose whatsoever. There’s a lot of scepticism – will people really pay for a service up front? How can people be sure that the service is any good? After all if they get the money when you get your car back what incentive do they have to make sure it keeps working? But there’s enough aggravation for a few people to start using them.

And gradually the view starts to shift. Where there is good service people want to come back with their new cars – they discover entirely new possibilities of use because they are free to experiment, earn more money, by more cars. The idea spreads and there is a slow but distinct shift – the whole economy gets a boost as the all of the licensing costs simply drop out of the system. But the thing that actually drives the change? It’s all those people who just got sick of having to go back to the garage every time they wanted to do something new. In the end the irritation and waste of time in negotiating for every new use just isn’t worth their time and effort. Paying up front is clean, clear, and simple. And lets everyone get on with the things they really want to do.

January 13, 2012

Response to the OSTP Request for Information on Public Access to Research Data

Response to Request for Information – FR Doc. 2011-28621

Dr Cameron Neylon â€“ U.K. based research scientist writing in a personal capacity

Introduction

Thankyou for the opportunity to respond to this request for information and to the parallel RFI on access to scientific publications. Many of the higher level policy issues relating to data are covered in my response to the other RFI and I refer to that response where appropriate here. Specifically I re-iterate my point that a focus on IP in the publication is a non-productive approach. Rather it is more productive to identify the outcomes that are desired as a result of the federal investment in generating data and from those outcomes to identify the services that are required to convert the raw material of the research process into accessible outputs that can be used to support those outcomes.

Response

(1) What specific Federal policies would encourage public access to and the preservation of broadly valuable digital data resulting from federally funded scientific research, to grow the U.S. economy and improve the productivity of the American scientific enterprise?

Where the Federal government has funded the generation of digital data, either through generic research funding or through focussed programs that directly target data generation, the purpose of this investment is to generate outcomes. Some data has clearly defined applications, and much data is obtained to further very specific research goals. However while it is possible to identify likely applications it is not possible, indeed is foolhardy, to attempt to define and limit the full range of uses which data may find.

Thus to ensure that data created through federal investment is optimally exploited it is crucial that data be a) accessible, b) discoverable, c)interpretable and d) legally re-usable by any person for any purpose. To achieve this requires investment in infrastructure, markup,Â and curation. This investment is not currently seen as either a core activity for researchers themselves, or a desirable service for them to purchase. It is rare therefore for such services or resource need to be thoughtfully costed in grant applications.

The policy challenge is therefore to create incentives, both symbolic and contractual, but also directly meaningful to researchers with an impact on their career and progression, that encourage researchers to either undertake these necessary activities directly themselves or to purchase and appropriately cost third party services to have them carried out.

Policy intervention in this area will be complex and will need to be thoughtful. Three simple policy moves however are highly tractable and productive, without requiring significant process adjustments in the short term:

a) Require researchers to provide a data management or data accessibility plan within grant requests. The focus of these plans should be showing how the project will enable third party groups to discover and re-use data outputs from the project.

b) As part of the project reporting, require measures of how data outputs have been used. These might include download counts, citations, comments, or new collaborations generated through the data. In the short term this assessment need to be directly used but it sends a message that agencies consider this important.

c) Explicitly measure performance on data re-use. Require as part of bio sketches and provide data on previous performance to grant panels. In the longer term it may be appropriate to provide guidance to panels on the assessment of previous performance on data re-use but in the first instance simply providing the information will affect behaviour and the general awareness of issues of data accessibility, discoverability, and usability.

(2) What specific steps can be taken to protect the intellectual property interests of publishers, scientists, Federal agencies, and other stakeholders, with respect to any existing or proposed policies for encouraging public access to and preservation of digital data resulting from federally funded scientific research?

As noted in my response to the other RFI, the focus on intellectual property is note helpful. Private contributors of data such as commercial collaborators should be free to exploit their own contribution of IP to projects as they see fit. Federally funded research should seek to maximise the exploitation and re-use of data generated through public investment.

It has been consistently and repeatedly demonstrated in a wide range of domains that the most effective way of exploiting the outputs of research innovation, be they physical samples, or digital data, to support further research, to drive innovation, or to support economic activity globally is to make those outputs freely available with no restrictive terms. That is, the most effective way to use research data to drive economic activity and innovation at a national level is to give the data away.

The current IP environment means that in specific cases, such as where there is very strong evidence of a patentable result with demonstrated potential, that the optimisation of outcomes does require protection of the IP. There are also situations where privacy and other legal considerations mean that data cannot be released or not be fully released. These should however be seen as the exception rather than the rule.

(3) How could Federal agencies take into account inherent differences between scientific disciplines and different types of digital data when developing policies on the management of data?

At the Federal level only very high-level policy decisions should be taken. These should provide direction and strategy but enable tactics and the details of implementation to be handled at agency or community levels. What both the Federal Agencies and coordination bodies such as OSTP can provide is an oversight and, where appropriate, funding support to maintain, develop, and expand interoperability between developing standards in different communities. Federal agencies can also effectively provide an oversight function that supports activities that enhance interoperability.

Local custom, dialects, and community practice will always differ and it is generally unproductive to enforce standardisation on implementation details. The policy objectives should be to set the expectations and the frameworks within local implementation can be developed and approaches to developing criteria against which those local implementations can be assessed.

(4) How could agency policies consider differences in the relative costs and benefits of long-term stewardship and dissemination of different types of data resulting from federally funded research?

Prior to assessing differences in performance and return on investment it will be necessary to provide data gathering frameworks and to develop significant expertise in the detailed assessment of the data gathered. A general principle that should be considered is that the administrative and performance data related to accessibility and re-use of research data should provide an outstanding exemplar of best practice in terms of accessibility, curation, discoverability, and re-usability.

The first step in cost benefit analysis must be to develop an information and data base that supports that analysis. This will mean tracking and aggregating forms of data use that are available today (download counts, citations) as well as developing mechanisms for tracking the use and impact of data in ways that are either challenging or impossible today (data use in policy development, impact of data in clinical practice guidelines).

Only once this assessment data framework is in place can detailed process of cost benefit analysis be seriously considered. Differences will exist in the measurable and imponderable return on investment in data availability, and also in the timeframes over which these returns are realised. We have only a very limited understanding of these issues today.

(5) How can stakeholders (e.g., research communities, universities, research institutions, libraries, scientific publishers) best contribute to the implementation of data management plans?

If stakeholders have serious incentives to optimise the use and re-use of data then all players will seek to gain competitive advantage through making the highest quality contributions. An appropriate incentives framework obviates the need to attempt to design in or pre-suppose how different stakeholders can, will, or should best contribute going forward.

(6) How could funding mechanisms be improved to better address the real costs of preserving and making digital data accessible?

As with all research outputs there should be a clear obligation on researchers to plan on a best efforts basis to publish these (as in make public) in a form that most effectively support access and re-use tensioned against the resources available. Funding agencies should make clear that they expect communication of research outputs to be a core activity for their funded research, that researchers and their institutions will be judged based on their performance in optimising the choices they make in selecting the appropriate modes of communication.

Further funding agencies should explicitly set guidance levels on the proportion of a research grant that is expected under normal circumstances to be used to support the communication of outputs. Based on calculations from the Wellcome Trust where projected expenditure on the publication of traditional research papers was around 1-1.5% of total grant costs, it would be reasonable to project total communication costs once data and other research communications are considered of 2-4% of total costs. This guidance and the details of best practice should clearly be adjusted as data is collected on both costs and performance.

(7) What approaches could agencies take to measure, verify, and improve compliance with Federal data stewardship and access policies for scientific research? How can the burden of compliance and verification be minimized?

Ideally compliance and performance will be trackable through automated systems that are triggered as a side effect of activities required for enabling data access. Thus references for new data should be registered with appropriate services to enable discovery by third parties â€“ these services can also be used to support the tracking of these outputs automatically. Frameworks and infrastructure for sharing should be built with tracking mechanisms built in. Much of the aggregation of data at scale can build on the existing work in the STARMETRICS program and draw inspiration from that experience.

Overall it should be possible to reduce the burden of compliance from its current level while gathering vastly more data and information of much higher quality than is currently collected.

(8) What additional steps could agencies take to stimulate innovative use of publicly accessible research data in new and existing markets and industries to create jobs and grow the economy?

There are a variety of proven methods for stimulating innovative use of data at both large and small scale. The first is to make it available. If data is made available at scale then it is highly likely that some of it will be used somewhere. The more direct encouragement of specific uses can be achieved through directed â€œhack eventsâ€ that bring together data handling and data production expertise from specific domains. There is significant US expertise in successfully managing these events and generating exciting outcomes. These in turn lead to new startups and new innovation.

There is also a significant growth in the number of data-focussed entrepreneurs who are now veterans of the early development of the consumer web. Many of these have a significant interest in research as well as significant resources and there is great potential for leveraging their experience to stimulate further growth. However this interface does need to be carefully managed as the cultures involved in research data curation and web-scale data mining and exploitation are very different.

(9) What mechanisms could be developed to assure that those who produced the data are given appropriate attribution and credit when secondary results are reported?

The existing norms of the research community that recognise and attribute contributions to further work should be strengthened and supported. While it is tempting to use legal instruments to enforce a need for attribution there is growing evidence that this can lead to inflexible systems that cannot adapt to changing needs. Thus it is better to utilise social enforcement than legal enforcement.

The current good work on data citation and mechanisms for tracking the re-use of data should be supported and expanded. Funders should explicitly require that service providers add capacity for tracking data citation to the products that are purchased for assessment purposes. Where possible the culture of citation should be expanded into the wider world in the form of clinical guidelines, government reports, and policy development papers.

(10) What digital data standards would enable interoperability, reuse, and repurposing of digital scientific data? For example, MIAME (minimum information about a microarray experiment; see Brazma et al., 2001, Nature Genetics 29, 371) is an example of a community-driven data standards effort.

At the highest level there are a growing range of interoperable information transfer formats that can provide machine readable and integratable data transfer including RDF, XML, OWL, JSON and others. My own experience is that attempting to impose global interchange standards is an enterprise doomed to failure and it is more productive to support these standards within existing communities of practice.

Thus the appropriate policy action is to recommend that communities adopt and utilise the most widely used possible set of standards and to support the transitions of practice and infrastructure required to support this adoption. Selecting standards at the highest level is likely to counterproductive. Identifying and disseminating best practice in the development and adoption of standards is however something that is the appropriate remit of federal agencies.

(11) What are other examples of standards development processes that were successful in producing effective standards and what characteristics of the process made these efforts successful?

There is now a significant literature on community development and practice and this should be referred to. Many lessons can also be drawn from the development of effective and successful open source software projects.

(12) How could Federal agencies promote effective coordination on digital data standards with other nations and international communities?

There are a range of global initiatives that communities should engage with. The most effective means of practical engagement will be to identify communities that have a desire to standardise or integrate systems and to support the technical and practical transitions to enable this. For instance there is a widespread desire to support interoperable data formats from analytical instrumentation but few examples of bringing this to transition. Funding could be directed to supporting a specific analytical community and the vendors that support them to apply an existing standard to their work.

(13) What policies, practices, and standards are needed to support linking between publications and associated data?

Development in this area is at an early stage. There is a need to reconsider the form of publication in its widest sense and this will have a significant impact on the forms and mechanisms of linking. This is a time for experimentation and exploration rather than standards development.

January 11, 2012

Response to the RFI on Public Access to Research Communications

Have you written your response to the OSTP RFIs yet? If not why not? This is amongst the best opportunities in years to directly tell the U.S. government how important Open Access to scientific publications is and how to start moving to a much more data centric research process. You’d better believe that the forces of stasis, inertia, and vested interests are getting their responses in. They need to be answered.

I’ve written mine on public access and you can read and comment on it here. I will submit it tomorrow just in front of the deadline but in the meantime any comments are welcome. It expands on and discusses many of the same issues, specifically on re-configuring the debate on access away from IP and towards services, that have been in my recent posts on the Research Works Act.

July 8, 2010August 5, 2010

Itâ€™s not information overload, nor is it filter failure: Itâ€™s a discovery deficit

: Image via Wikipedia

Clay Shirkyâ€™s famous soundbite has helped to focus on minds on the way information on the web needs to be tackled and a move towards managing the process of selecting and prioritising information. But in the research space Iâ€™m getting a sense that it is fuelling a focus on preventing publication in a way that is analogous to the conventional filtering process involved in peer reviewed publication.

Most recently this surfaced at Chronicle of Higher Education, to which there were many responses, Derek Lowe’s being one of the most thought out. But this is not isolated.

@JISC_RSC_YH:Â How can we provide access to online resources and maintain quality of content? Â #rscrc10 [twitter via@branwenhide]

Me: @branwenhide @JISC_RSC_YH isn’t the point of the web that we can decouple the issues of access and quality from each other? [twitter]

There is a widely held assumption that putting more research onto the web makes it harder to find the research you are looking for. Publishing more makes discovery easier.

The great strength of the web is that you can allow publication of anything at very low marginal cost without limiting the ability of people to find what they are interested in, at least in principle. Discovery mechanisms are good enough, while being a long way from perfect, to make it possible to mostly find what youâ€™re looking for while avoiding what youâ€™re not looking for.Â Search acts as a remarkable filter over the whole web through making discovery possible for large classes of problem. And high quality search algorithms depend on having a lot of data.

It is very easy to say there is too much academic literature â€“ and I do. But the solution which seems to be becoming popular is to argue for an expansion of the traditional peer review process. To prevent stuff getting onto the web in the first place. This is misguided for two important reasons. Firstly it takes the highly inefficient and expensive process of manual curation and attempts to apply it to every piece of research output created. This doesnâ€™t work today and wonâ€™t scale as the diversity and sheer number of research outputs increases tomorrow. Secondly it doesnâ€™t take advantage of the nature of the web. They way to do this efficiently is to publish everything at the lowest cost possible, and then enhance the discoverability of work that you think is important. We donâ€™t need publication filters, we need enhanced discovery engines. Publishing is cheap, curation is expensive whether it is applied to filtering or to markup and search enhancement.

Filtering before publication worked and was probably the most efficient place to apply the curation effort when the major bottleneck was publication. Value was extracted from the curation process of peer review by using it reduce the costs of layout, editing, and printing through simple printing less.Â But it created new costs, and invisible opportunity costs where a key piece of information was not made available. Today the major bottleneck is discovery. Of the 500 papers a week I could read, which ones should I read, and which ones just contain a single nugget of information which is all I need? In the Research Information Network study of costs of scholarly communication the largest component of publication creation and use cycle was peer review, followed by the cost of finding the articles to read which represented some 30% of total costs. On the web, the place to put in the curation effort is in enhancing discoverability, in providing me the tools that will identify what I need to read in detail, what I just need to scrape for data, and what I need to bookmark for my methods folder.

The problem we have in scholarly publishing is an insistence on applying this print paradigm publication filtering to the web alongside an unhealthy obsession with a publication form, the paper, which is almost designed to make discovery difficult. If I want to understand the whole argument of a paper I need to read it. But if I just want one figure, one number, the details of the methodology then I donâ€™t need to read it, but I still need to be able to find it, and to do so efficiently, and at the right time.

Currently scholarly publishers vie for the position of biggest barrier to communication. The stronger the filter the higher the notional quality. But being a pure filter play doesn’t add value because the costs of publication are now low. The value lies in presenting, enhancing, curating the material that is published. If publishers instead vied to identify, markup, and make it easy for the right people to find the right information they would be working with the natural flow of the web. Make it easy for me to find the piece of information, feature work that is particularly interesting or important, re-intepret it so I can understand it coming from a different field, preserve it so that when a technique becomes useful in 20 years the right people can find it. The brand differentiator then becomes which articles you choose to enhance, what kind of markup you do, and how well you do it.

All of these are things that publishers already do. And they are services that authors and readers will be willing to pay for. But at the moment the whole business and marketing model is built around filtering, and selling that filter. By impressing people with how much you are throwing away. Trying to stop stuff getting onto the web is futile, inefficient, and expensive. Saving people time and money by helping them find stuff on the web is an established and successful business model both at scale, and in niche areas. Providing credible and respected quality measures is a viable business model.

We donâ€™t need more filters or better filters in scholarly communications â€“ we donâ€™t need to block publication at all. Ever. What we need are tools for curation and annotation and re-integration of what is published. And a framework that enables discovery of the right thing at the right time. And the data that will help us to build these. The more data, the more reseach published, the better. Which is actually what Shirky was saying all alongâ€¦

Academic publishing is archaic (daniel-lemire.com)
too much research (text-patterns.thenewatlantis.com)
Clay Shirky’s COGNITIVE SURPLUS: how the net lets us share and do more than ever (boingboing.net)
Cups, Buckets, Pools, and Puddles: When the Flood of Papers Won’t Abate, Which Do You Choose? (scholarlykitchen.sspnet.org)

April 29, 2010April 30, 2010

In defence of author-pays business models

Latest journal ranking in the biological sciences — Image by cameronneylon via Flickr

There has been an awful lot recently written and said about author-pays business models for scholarly publishing and a lot of it has focussed on PLoS ONE.Â Most recently Kent Anderson has written a piece on Scholarly Kitchen that contains a number of fairly serious misconceptions about the processes of PLoS ONE. This is a shame because I feel this has muddled the much more interesting question that was intended to be the focus of his piece. Nonetheless here I want to give a robust defence of author pays models and of PLoS ONE in particular. Hopefully I can deal with the more interesting question, how radical should or could PLoS be, in a later post.

A common charge leveled at author-payment funded journals is that they are pushed in the direction of being non-selective. The figure that PLoS ONE publishes around 70% of the papers it receives is often given as a demonstration of this. There are a range of reasons why this is nonsense. The first and simplest is that the evidence we have suggests that of papers rejected from journals between 50% and 95% of them are ultimately published elsewhere [1, 2 (pdf), 3, 4]. The cost of this trickle down, a result of the use of subjective selection criteria of â€œimportanceâ€, is enormous in authors’ and referees’ time and represents a significant potential opportunity cost in terms of lost time. PLoS ONE seeks to remove this cost by simply asking â€œshould this be published?â€ In the light of the figures above it seems that 70% is a reasonable proportion of papers that are probably “basically ok but might need some work”.

The second presumption is that the peer review process is somehow â€œlight touchâ€. This is perhaps the result of some mis-messaging that went on early in the history of PLoS ONE but it is absolute nonsense. As both an academic editor and an author I would argue that the peer review process is as rigorous as I have experienced at any other journal (and I do mean any other journal).

As an author I have two papers published in PLoS ONE, both went through at least one round of revision, and one was initially rejected.Â As an editor I have seen two papers withdrawn after the initial round of peer review, presumably not because the authors felt that the required changes represented a â€œlight touchâ€. I have rejected one and have never accepted a paper without revision. Every paper I have edited has had at least one external peer reviewer and I try to get at least two. Several papers have gone through more than one cycle of revision with one going through four. Figures provided by Pete Binfield (comment from Pete about 20 comments in) suggest that this kind of proportion is about average for PLoS ONE Academic Editors. The difference between PLoS ONE and other journals is that I look for what is publishable in a submission and work with the authors to bring that out rather than taking delight in rejecting some arbitrary proportion of submissions and imagining that this equates to a quality filter. I see my role as providing a service.

The more insidious claim made is that there is a link between this supposed light touch review and the author pays models; that there is pressure on those who make the publication decision to publish as much as possible. Let me put this as simply as possible. The decision whether to publish is mine as an Academic Editor and mine alone. I have never so much as discussed my decision on a paper with the professional staff at PLoS and I have never received any payment whatsoever from PLoS (with the possible exception of two lunches and one nightâ€™s accommodation for a PLoS meeting I attended – and I missed the drinks reception…). If I ever perceived pressure to accept or was offered inducements to accept papers I would resign immediately and publicly as an AE.

That an author pays model has the potential to create a conflict of interest is clear. That is why, within reputable publishers, structures are put in place to reduce that risk as far as is possible, divorcing the financial side from editorial decision making, creating Chinese walls between editorial and financial staff within the publisher. Â The suggestion that my editorial decisions are influenced by the fact the authors will pay is, to be frank, offensive, calling into serious question my professional integrity and that of the other AEs. It is also a slightly strange suggestion. I have no financial stake in PLoS. If it were to go under tomorrow it would make no difference to my take home pay and no difference to my finances. I would be disappointed, but not poorer.

Another point that is rarely raised is that the author pays model is much more widely used than people generally admit. Page charges and colour charges for many disciplines are of the same order as Open Access publication charges. The Journal of Biological Chemistry has been charging page rates for years while increasing publication volume. Author fees of one sort or another are very common right across the biological and medical sciences literature. And it is not new. Bill Hookerâ€™s analysis (here and here) of these hidden charges bears reading.

But the core of the argument for author payments is that the market for scholarly publishing is badly broken. Until the pain of the costs of publication is directly felt by those making the choice of where to (try to) publish we will never change the system. The market is also the right place to have this out. It is value for money that we should be optimising. Let me illustrate with an example. I have heard figures of around Â£25,000 given as the level of author charge that would be required to sustain Cell, Nature, or Science as Open Access APC supported journals. This is usually followed by a statement to the effect â€œso they canâ€™t possibly go OA because authors would never pay that muchâ€.

Letâ€™s unpack that statement.

If authors were forced to make a choice between the cost of publishing in these top journals versus putting that money back into their research they would choose the latter. If the customer actually had to make the choice to pay the true costs of publishing in these journals, they wouldnâ€™tâ€¦if journals believed that authors would see the real cost as good value for money, many of them would have made that switch years ago. Subscription charges as a business model have allowed an appallingly wasteful situation to continue unchecked because authors can pretend that there is no difference in cost to where they publish, they accept that premium offerings are value for money because they don’t have to pay for them. Make them make the choice between publishing in a “top” journal vs a “quality” journal and getting another few months of postdoc time and the equation changes radically. Maybe Â£25k is good value for money. But it would be interesting to find out how many people think that.

We need a market where the true costs are a factor in the choices of where, or indeed whether, to formally publish scholarly work. Today, we do not have that market and there is little to no pressure to bring down publisher costs. That is why we need to move towards an author pays system.

January 24, 2010January 25, 2010

Why I am disappointed with Nature Communications

Towards the end of last year I wrote up some initial reactions to the announcement of Nature Communications and the communications team at NPG were kind enough to do a Q&A to look at some of the issues and concerns I raised. Specifically I was concerned about two things. The licence that would be used for the “Open Access” option and the way that journal would be positioned in terms of “quality”, particularly as it related to the other NPG journals and the approach to peer review.

Unfortunately I have to say that I feel these have been fudged, and this is unfortunate because there was a real opportunity here to do something different and quite exciting. Â I get the impression that that may even have been the original intention. But from my perspective what has resulted is a poor compromise between my hopes and commercial concerns.

At the centre of my problem is the use of a Creative Commons Attribution Non-commercial licence for the “Open Access” option. This doesn’t qualify under the BBB declarations on Open Access publication and it doesn’t qualify for the SPARC seal for Open Access. But does this really matter or is it just a side issue for a bunch of hard core zealots? After all if people can see it that’s a good start isn’t it? Well yes, it is a good start but non-commercial terms raise serious problems. Putting aside the fact that there is an argument that universities are commercial entities and therefore can’t legitimately use content with non-commercial licences the problem is that NC terms limit the ability of people to create new business models that re-use content and are capable of scaling.

We need these business models because the current model of scholarly publication is simply unaffordable. The argument is often made that if you are unsure whether you areÂ allowed to use content then you can just ask, but this simply doesn’t scale. And lets be clear about some of the things that NC means you’re not licensed for: using a paper for commercially funded research even within a university, using the content of paper to support a grant application, using the paper to judge a patent application, using a paper to assess the viability of a business idea…the list goes on and on. Yes you can ask if you’re not sure, but asking each and every time does not scale. This is the central point of the BBB declarations. For scientific communication to scale it must allow the free movement and re-use of content.

Now if this were coming from any old toll access publisher I would just roll my eyes and move on, but NPG sets itself up to be judged by a higher standard. NPG is a privately held company, not beholden to share holders. It is a company that states that it is committed to advancing scientific communication not simply traditional publication. Non-commercial licences do not do this. From the Q&A:

Q: Would you accept that a CC-BY-NC(ND) licence does not qualify as Open Access under the terms of the Budapest and Bethesda Declarations because it limits the fields and types of re-use?

A: Yes, we do accept that. But we believe that we are offering authors and their funders the choices they require.Our licensing terms enable authors to comply with, or exceed, the public access mandates of all major funders.

NPG is offering the minimum that allows compliance. Not what will most effectively advance scientific communication. Again, I would expect this of a shareholder-controlled profit-driven toll access dead tree publisher but I am holding NPG to a higher standard. Even so there is a legitimate argument to be made that non-commercial licences are needed to make sure that NPG can continue to support these and other activities. This is why I asked in the Q&A whether NPG made significant money off re-licensing of content for commercial purposes. This is a discussion we could have on the substance – the balance between a commercial entity providing a valuable service and the necessary limitations we might accept as the price of ensuring the continued provision of that service. It is a value for money judgement. But not one we can make without a clear view of the costs and benefits.

So I’m calling NPG on this one. Make a case for why non-commercial licences are necessary or even beneficial, not why they are acceptable. They damage scientific communication, they create unnecessary confusion about rights, and more importantly they damage the development of new business models to support scientific communication. Explain why it is commercially necessary for the development of these new activities, or roll it back, and take a lead on driving the development of science communication forward. Don’t take the kind of small steps we expect from other, more traditional, publishers. Above all, lets have that discussion. What is the price we would have to pay to change the license terms?

Because I think it goes deeper. I think that NPG are actually limiting their potential income by focussing on the protection of their income from legacy forms of commercial re-use. They could make more money off this content by growing the pie than by protecting their piece of a specific income stream. It goes to the heart of a misunderstanding about how to effectively exploit content on the web. There is money to be made through re-packaging content for new purposes. The content is obviously key but the real value offering is the Nature brand. Which is much better protected as a trademark than through licensing. Others could re-package and sell on the content but they can never put the Nature brand on it.

By making the material available for commercial re-use NPG would help to expand a high value market for re-packaged content which they would be poised to dominate. Sure, if you’re a business you could print off your OA Nature articles and put them on the coffee table, but if you want to present them to investors you want that Nature logo and Nature packaging that you can only get from one place. Â And that NPG does damn well. NPG often makes the case that it adds value through selection, presentation, and aggregation. It is the editorial brand that is of value. Let’s see that demonstrated though monetization of the brand, rather than through unnecessarily restricting the re-use of the content, especially where authors are being charged $5000 to cover the editorial costs.

November 16, 2009December 30, 2009

Nature Communications Q&A

A few weeks ago I wrote aÂ postÂ looking at the announcement ofÂ Nature Communications, a new journal fromÂ Nature Publishing GroupÂ that will be online only and have an open access option.Â Grace Baynes, fromthe Â NPG communications team kindly offered to get some of the questions raised in that piece answered and I am presenting my questions and the answers from NPG here in their complete form. I will leave any thoughts and comments on the answers for another post. There has also been more information from NPG available at theÂ journal websiteÂ since my original post, some of which is also dealt with below. Below this point, aside from formatting I have left the response in its original form.

Q: What is the motivation behind Nature Communications? Where did the impetus to develop this new journal come from?

NPG has always looked to ensure it is serving the scientific community and providing services which address researchers changing needs. The motivation behind Nature Communications is to provide authors with more choice; both in terms of where they publish, and what access model they want for their papers.At present NPG does not provide a rapid publishing opportunity for authors with high-quality specialist work within the Nature branded titles. The launch of Nature Communications aims to address that editorial need. Further, Nature Communications provides authors with a publication choice for high quality work, which may not have the reach or breadth of work published in Nature and the Nature research journals, or which may not have a home within the existing suite of Nature branded journals. At the same time authors and readers have begun to embrace online only titles â€“ hence we decided to launch Nature Communications as a digital-first journal in order to provide a rapid publication forum which embraces the use of keyword searching and personalisation. Developments in publishing technology, including keyword archiving and personalization options for readers, make a broad scope, online-only journal like Nature Communications truly useful for researchers.

Over the past few years there has also been increasing support by funders for open access, including commitments to cover the costs of open access publication. Therefore, we decided to provide an open access option within Nature Communications for authors who wish to make their articles open access.

Q: What opportunities does NPG see from Open Access? What are the most important threats?

Opportunities: Funder policies shifting towards supporting gold open access, and making funds available to cover the costs of open access APCs. These developments are creating a market for journals that offer an open access option.Threats: That the level of APCs that funders will be prepared to pay will be too low to be sustainable for journals with high quality editorial and high rejection rates.

Q: Would you characterise the Open Access aspects of NC as a central part of the journal strategy

Yes. We see the launch of Nature Communications as a strategic development.Nature Communications will provide a rapid publication venue for authors with high quality work which will be of interest to specialists in their fields. The title will also allow authors to adhere to funding agency requirements by making their papers freely available at point of publication if they wish to do so.

or as an experiment that is made possible by choosing to develop a Nature branded online only journal?

NPG doesnâ€™t view Nature Communications as experimental. Weâ€™ve been offering open access options on a number of NPG journals in recent years, and monitoring take-up on these journals. Weâ€™ve also been watching developments in the wider industry.

Q: What would you give as the definition of Open Access within NPG?

Itâ€™s not really NPGâ€™s focus to define open access. Weâ€™re just trying to offer choice to authors and their funders.

Q: NPG has a number of “Open Access” offerings that provide articles free to the user as well as specific articles within Nature itself under a Creative Commons Non-commercial Share-alike licence with the option to authors to add a “no derivative works” clause. Can you explain the rationale behind this choice of licence?

Again, itâ€™s about providing authors with choice within a framework of commercial viability.On all our journals with an open access option, authors can choose between the Creative Commons AttribuÂtion Noncommercial Share Alike 3.0 Unported Licence and the Creative Commons Attribution-Non-commerÂcial-No Derivs 3.0 Unported Licence.The only instance where authors are not given a choice at present are genome sequences articles published in Nature and other Nature branded titles, which are published under Creative Commons AttribuÂtion Noncommercial Share Alike 3.0 Unported Licence. No APC is charged for these articles, as NPG considers making these freely available an important service to the research community.

Q: Does NPG recover significant income by charging for access or use of these articles for commercial purposes? What are the costs (if any) of enforcing the non-commercial terms of licences? Does NPG actively seek to enforce those terms?

Weâ€™re not trying to prevent derivative works or reuse for academic research purposes (as evidenced by our recent announcementÂ that NPG author manuscripts would be included in UK PMCâ€™s open access subset).What we are trying to keep a cap on is illegal e-prints and reprints where companies may be using our brands or our content to their benefit. Yes we do enforce these terms, and we have commercial licensing and reprints services available.

Q: What will the licence be for NC?

Authors who wish to take for the open access option can choose either the Creative Commons AttribuÂtion Noncommercial Share Alike 3.0 Unported Licence or the Creative Commons Attribution-Non-commerÂcial-No Derivs 3.0 Unported Licence.Subscription access articles will be published under NPGâ€™s standardÂ License to Publish.

Q: Would you accept that a CC-BY-NC(ND) licence does not qualify as Open Access under the terms of the Budapest and Bethesda Declarations because it limits the fields and types of re-use?

Yes, we do accept that. But we believe that we are offering authors and their funders the choices they require.Our licensing terms enable authors to comply with, or exceed, the public access mandates of all major funders.

Q: The title “Nature Communications” implies rapid publication. The figure of 28 days from submission to publication has been mentioned as a minimum. Do you have a target maximum or indicative average time in mind?

We are aiming to publish manuscripts within 28 days of acceptance, contrary to an earlier report which was in error. In addition, Nature Communications will have a streamlined peer review system which limits presubmission enquiries, appeals and the number of rounds of review â€“ all of which will speed up the decision making process on submitted manuscripts.

Q: In the press release an external editorial board is described. This is unusual for a Nature branded journal. Can you describe the makeup and selection of this editorial board in more detail?

In deciding whether to peer review manuscripts, editors may, on occasion, seek advice from a member of the Editorial Advisory Panel. However, the final decision rests entirely with the in-house editorial team. This is unusual for a Nature-branded journal, but in fact, Nature Communications is simply formalising a well-established system in place at other Nature journals.The Editorial Advisory Panel will be announced shortly and will consist of recognized experts from all areas of science. Their collective expertise will support the editorial team in ensuring that every field is represented in the journal.

Q: Peer review is central to the Nature brand, but rapid publication will require streamlining somewhere in the production pipeline. Can you describe the peer review process that will be used at NC?

The peer review process will be as rigorous as any Nature branded title â€“ Nature Communications will only publish papers that represent a convincing piece of work. Instead, the journal will achieve efficiencies by discouraging presubmission enquiries, capping the number of rounds of review, and limiting appeals on decisions. This will enable the editors to make fast decisions at every step in the process.

Q: What changes to your normal process will you implement to speed up production?

The production process will involve a streamlined manuscript tracking system and maximise the use of metadata to ensure manuscripts move swiftly through the production process. All manuscripts will undergo rigorous editorial checks before acceptance in order to identify, and eliminate, hurdles for the production process. Alongside using both internal and external production staff we will work to ensure all manuscripts are published within 28days of acceptance â€“ however some manuscripts may well take longer due to unforeseen circumstances. We also hope the majority of papers will take less!

Q: What volume of papers do you aim to publish each year in NC?

As Nature Communications is an online only title the journal is not limited by page-budget. As long as we are seeing good quality manuscripts suitable for publication following peer review we will continue to expand. We aim to launch publishing 10 manuscripts per month and would be happy remaining with 10-20 published manuscripts per month but would equally be pleased to see the title expand as long as manuscripts were of suitable quality.

Q: The Scientist article says there would be an 11 page limit. Can you explain the reasoning behind a page limit on an online only journal?

Articles submitted to Nature Communications can be up to 10 pages in length. Any journal, online or not, will consider setting limits to the â€˜printed paperâ€™ size (in PDF format) primarily for the benefit of the reader. Setting a limit encourages authors to edit their text accurately and succinctly to maximise impact and readability.

Q: The press release description of pap
ers for NC sounds very similar to papers found in the other “Nature Baby” journals, such as Nature Physics, Chemistry, Biotechnology, Methods etc. Can you describe what would be distinctive about a paper to make it appropriate for NC? Is there a concern that it will compete with other Nature titles?

Nature Communications will publish research of very high quality, but where the scientific reach and public interest is perhaps not that required for publication in Nature and the Nature research journals. We expect the articles published in Nature Communications to be of interest and importance to specialists in their fields. This scope of Nature Communications also includes areas like high-energy physics, astronomy, palaeontology and developmental biology, that aren’t represented by a dedicated Nature research journal.

Q: To be a commercial net gain NC must publish papers that would otherwise have not appeared in other Nature journals. Clearly NPG receives many such papers that are not published but is it not that case that these papers are, at least as NPG measures them, by definition not of the highest quality? How can you publish more while retaining the bar at its present level?

Nature journals have very high rejection rates, in many cases well over 90% of what is submitted. A proportion of these articles are very high quality research and of importance for a specialist audience, but lack the scientific reach and public interest associated with high impact journals like Nature and the Nature research journals. The best of these manuscripts could find a home in Nature Communications. In addition, we expect to attract new authors to Nature Communications, who perhaps have never submitted to the Nature family of journals, but are looking for a high quality journal with rapid publication, a wide readership and an open access option.

Q: What do you expect the headline subscription fee for NC to be? Can you give an approximate idea of what an average academic library might pay to subscribe over and above their current NPG subscription?

We havenâ€™t set prices for subscription access for Nature Communications yet, because we want them to base them on the number of manuscripts the journal may potentially publish and the proportion of open access content. This will ensure the site licence price is based on absolute numbers of manuscripts available through subscription access. Weâ€™ll announce these in 2010, well before readers or librarians will be asked to pay for content.

Q: Do personal subscriptions figure significantly in your financial plan for the journal?

No, there will be no personal subscriptions for Nature Communications. Nature Communications will publish no news or other â€˜front half contentâ€™, and we expect many of the articles to be available to individuals via the open access option or an institutional site license. If researchers require access to a subscribed-access article that is not available through their institution or via the open-access option, they have the option of buying the article through traditional pay-per-view and docuÂment-delivery options. For a journal with such a broad scope, we expect individuals will want to pick and choose the articles they pay for.

Q: What do you expect author charges to be for articles licensed for free re-use?

$5,000 (The Americas)â‚¬3,570 (Europe)Â¥637,350 (Japan)Â£3,035 (UK and Rest of World)Manuscripts accepted before April 2010 will receive a 20% discount off the quoted APC.

Q: Does this figure cover the expected costs of article production?

This is a flat fee with no additional production charges (such as page or colour figure charges). The article processing charges have been set to cover our costs, including article production.

Q: The press release states that subscription costs will be adjusted to reflect the take up of the author-pays option. Can you commit to a mechanistic adjustment to subscription charges based on the percentage of author-pays articles?

We are working towards a clear pricing principle for Nature Communications, using input from NESLi and others. Because the amount of subscription content may vary substantially from year to year, an entirely mechanistic approach may not give libraries the ability to they need to forecast with confidence.

Q: Does the strategic plan for the journal include targets for take-up of the author-pays option? If so can you disclose what those are?

We have modelled Nature Communications as an entirely subscription access journal, a totally open access journal, and continuing the hybrid model on an ongoing basis. The business model works at all these levels.

Q: If the author-pays option is a success at NC will NPG consider opening up such options on other journals?

We already have open access options on more than 10 journals, and we have recently announced the launch in 2010 of a completely open access journal, Cell Death & Disease. In addition, we publish the successful open access journal Molecular Systems Biology, in association with the European Molecular Biology OrganizationWeâ€™re open to new and evolving business models where it is sustainable.The rejection rates on Nature and the Nature research journals are so high that we expect the APC for these journals would be substantially higher than that for Nature Communications.

Q: Do you expect NC to make a profit? If so over what timeframe?

As with all new launches we would expect Nature Communications to be financially viable during a reasonable timeframe following launch.

Q: In five years time what are the possible outcomes that would be seen at NPG as the journal being a success? What might a failure look like?

We would like to see Nature Communications publish high quality manuscripts covering all of the natural sciences and work to serve the research community. The rationale for launching this title is to ensure NPG continues to serve the community with new publishing opportunities.A successful outcome would be a journal with an excellent reputation for quality and service, a good impact factor, a substantial archive of published papers that span the entire editorial scope and significant market share.

February 3, 2009December 30, 2009

Third party data repositories – can we/should we trust them?

This is a case of a comment that got so long (and so late) that it probably merited it’s own post. David Crotty and Paul (Ling-Fung Tang) note some important caveats in comments on my last post about the idea of the “web native” lab notebook. I probably went a bit strong in that post with the idea of pushing content onto outside specialist services in my effort to try to explain the logic of the lab notebook as a feed. David notes an important point about any third part service (do read the whole comment at the post):

Wouldnâ€™t such an approach either:
1) require a lab to make a heavy investment in online infrastructure and support personnel, or
2) rely very heavily on outside service providers for access and retention of oneâ€™s own data? […]

Any system that is going to see mass acceptance is going to have to give the user a great deal of control, and also provide complete and redundant levels of back-up of all content. If youâ€™ve got data scattered all over a variety of services, and one goes down or out of business, does that mean having to revise all of those other services when/if the files are recovered?

This is a very wide problem that I’ve also seen in the context of the UK web community that supports higher education (see for example Brian Kelly‘s risk assessment for use of third party web services). Is it smart, or even safe, to use third party services? The general question divides into two sections: is the service more or less reliable than you own hard drive or locally provided server capacity (technical reliability, or uptime); and what is the long term reliability of the service remaining viable (business/social model reliability). Flickr probably has higher availability than your local institutional IT services but there is no guarantee that it will still be there tomorrow. This is why data portability is very important. If you can’t get your data out, don’t put it in there in the first place.

In the context of my previous post these data services could be local, they could be provided by the local institution, or by a local funder, or they could even be a hard disk in the lab. People are free to make those choices and to find the best balance of reliability, cost, and maintenance that suits them. My suspicion is that after a degree of consolidation we will start to see institutions offering local data repositories as well as specialised services on the cloud that can provide more specialised and exciting functionality. Ideally these could all talk to each other so that multiple copies are held in these various services.

David says:

I would worry about putting something as valuable as my own data into the â€œcloudâ€ […]

Iâ€™d rather rely on an internally controlled system and not have to worry about the business model of Flickr or whether Google was going to pull the plug on a tool I regularly use. Perhaps the level to think on is that of a university, or companyâ€“could you set up a system for all labs within an institution thatâ€™s controlled (and heavily backed up) by that institution? Preferably something standardized to allow interaction between institutions.

Then again, given the experiences Iâ€™ve had with university IT departments, this might not be such a good approach after all.

Which I think encapsulates a lot of the debate. I actually have greater faith in Flickr keeping my pictures safe than my own hard disk. And more faith in both than insitutional repository systems that don’t currently provide good data functionality and that I don’t understand. But I wouldn’t trust either in isolation. The best situation is to have everything everywhere, using interchange standards to keep copies in different places; specialised services out on the cloud to provide functionality (not every institution will want to provide a visualisation service for XAFS data), IRs providing backup archival and server space for anything that doesn’t fit elsewhere, and ultimately still probably local hard disks for a lot of the short to medium term storage. My view is that the institution has the responsibility of aggregating, making available, and archiving the work if its staff, but I personally see this role as more harvester than service provider.

All of which will turn on the question of business models. If the data stores a local, what is the business model for archival? If they are institutional how much faith do you have that the institution won’t close them down. And if they are commercial or non-profit third parties, or even directly government funded service, does the economics make sense in the long term. We need a shift in science funding if we want to archive and manage data in the longer term. And with any market some services will rise and some will die. The money has to come from somewhere and ultimately that will always be the research funders. Until there is a stronger call from them for data preservation and the resources to back it up I don’t think we will see much interesting development. Some funders are pushing fairly hard in this direction so it will be interesting to see what develops. A lot will turn on who has the responsibility for ensuring data availability and sharing. The researcher? The institution? The funder?

In the end you get what you pay for. Always worth remembering that sometimes even things that are free at point of use aren’t worth the price you pay for them.

Related articles by Zemanta