Added Value: I do not think those words mean what you think they mean

There are two major strands to position of traditional publishers have taken in justifying the process by which they will make the, now inevitable, transition to a system supporting Open Access. The first of these is that the transition will cost “more money”. The exact costs are not clear but the, broadly reasonable, assumption is that there needs to be transitional funding available to support what will clearly be a mixed system over some transitional period. The argument of course is how much money and where it will come from, as well as an issue that hasn’t yet been publicly broached, how long will it last for? Expect lots of positioning on this over the coming months with statements about “average paper costs” and “reasonable time frames”, with incumbent subscription publishers targeting figures of around $2,500-5,000 and ten years respectively, and those on my side of the fence suggesting figures of around $1,500 and two years. This will be fun to watch but the key will be to see where this money comes from (and what subsequently gets cut), the mechanisms put in place to release this “extra” money and the way in which they are set up so as to wind down, and provide downwards price pressure.

The second arm of the publisher argument has been that they provide “added value” over what the scholarly community provides into the publication process. It has become a common call of the incumbent subscription publishers that they are not doing enough to explain this added value. Most recently David Crotty has posted at Scholarly Kitchen saying that this was a core theme of the recent SSP meeting. This value exists, but clearly we disagree on its quantitative value. The problem is we never see any actual figures given. But I think there are some recent numbers that can help us put some bounds on what this added value really is, and ironically they have been provided by the publisher associations in their efforts to head off six month embargo periods.

When we talk about added value we can posit some imaginary “real” value but this is really not a useful number – there is no way we can determine it. What we can do is talk about realisable value, i.e. the amount that the market is prepared to pay for the additional functionality that is being provided. I don’t think we are in a position to pin that number down precisely, and clearly it will differ between publishers, disciplines, and work flows but what I want to do is attempt to pin down some points which I think help to bound it, both from the provider and the consumer side. In doing this I will use a few figures and reports as well as place an explicit interpretation on the actions of various parties. The key data points I want to use are as follows:

  1. All publisher associations and most incumbent publishers have actively campaigned against open access mandates that make the final refereed version of a scholarly article, prior to typesetting, publication, indexing, and archival, online in any form either immediately or within six months after publication. The Publishers Association (UK) and ALPSP are both on record as stating that such a mandate would be “unsustainable” and most recently that it would bankrupt publishers.
  2. In a survey run by ALPSP of research libraries (although there are a series of concerns that have to be raised about the methodology) a significant proportion of libraries stated that they would cut some subscriptions if the majority research articles were available online six months after formal publication. The survey states that it appeared that most respondents assumed that the freely available version would be the original author version, i.e. not that which was peer reviewed.
  3. There are multiple examples of financially viable publishing houses running a pure Open Access programme with average author charges of around $1500. These are concentrated in the life and medical sciences where there is both significant funding and no existing culture of pre-print archives.
  4. The SCOAP3 project has created a formal journal publication framework which will provide open access to peer reviewed papers for a community that does have a strong pre-print culture utilising the ArXiv.

Let us start at the top. Publishers actively campaign against a reduction of embargo periods. This makes it clear that they do not believe that the product they provide, in transforming the refereed version of a paper into the published version, has sufficient value that their existing customers will pay for it at the existing price. That is remarkable and a frightening hole at the centre of our current model. The service providers can only provide sufficient added value to justify the current price if they additionally restrict access to the “non-added-value” version. A supplier that was confident about the value that they add would have no such issues, indeed they would be proud to compete with this prior version, confident that the additional price they were charging was clearly justified. That they do not should be a concern to all of us, not least the publishers.

Many publishers also seek to restrict access to any prior version, including the authors original version prior to peer review. These publishers don’t even believe that their management of the peer review process adds sufficient value to justify the price they are charging. This is shocking. The ACS, for instance, has such little faith in the value that it adds that it seeks to control all prior versions of any paper it publishes.

But what of the customer? Well the ALPSP survey, if we take the summary as I have suggested above at face value, suggests that libraries also doubt the value added by publishers. This is more of a quantitative argument but that some libraries would cancel some subscriptions shows that overall the community doesn’t believe the overall current price is worth paying even allowing for a six month delay in access. So broadly speaking we can see that both the current service providers and the current customers do not believe that the costs of the pure service element of subscription based scholarly publication are justified by the value added through this service.  This in combination means we can provide some upper bounds on the value added by publishers.

If we take the approximately $10B currently paid as cash costs to recompense publishers for their work in facilitating scholarly communications neither the incumbent subscription publishers nor their current library customers believe that the value added by publishers justifies the current cost, absent artificial restrictions to access to the non-value added version.

This tells us not very much about what the realisable value of this work actually is, but it does provide an upper bound. But what about a lower bound? One approach would be turn to the services provided to authors by Open Access publishers. These costs are willingly incurred by a paying customer so it is tempting to use these directly as a lower bound. This is probably reasonable in the life and medical sciences but as we move into other disciplinary areas, such as mathematics, it is clear that cost level is not seen as attractive enough. In addition the life and medical sciences have no tradition of wide availability of pre-publication versions of papers. That means for these disciplines the willingness to pay the approximately $1500 average cost of APCs is in part bound up with making the wish to make the paper effectively available through recognised outlets. We have not yet separated the value in the original copy versus the added value provided by this publishing service. The $1000-1500 mark is however a touchstone that is worth bearing in mind for these disciplines.

To do a fair comparison we would need to find a space where there is a thriving pre-print culture and a demonstrated willingness to pay a defined price for added-value in the form of formal publication over and above this existing availability. The Sponsoring Consortium for Open Access Publishing in Particle Physics (SCOAP3) is an example of precisely this. The particle physics community have essentially decided unilaterally to assume control of the journals for their area and have placed their service requirements out for tender. Unfortunately this means we don’t have the final prices yet, but we will soon and the executive summary of the working party report suggests a reasonable price range of €1000-2000. If we assume the successful tender comes in at the lower end or slightly below of this range we see an accepted price for added value, over that already provided by the ArXiv for this disciplinary area, that is not a million miles away from that figure of $1500.

Of course this is before real price competition in this space is factored in. The realisable value is a function of the market and as prices inevitably drop there will be downward pressure on what people are willing to pay. There will also be increasing competition from archives, repositories, and other services that are currently free or near free to use, as they inevitably increase the quality and range of the services they offer. Some of these will mirror the services provided by incumbent publishers.

A reasonable current lower bound for realisable added value by publication service providers is ~$1000 per paper. This is likely to drop as market pressures come to bear and existing archives and repositories seek to provide a wider range of low cost services.

Where does this leave us? Not with a clear numerical value we can ascribe to this added value, but that’s always going to be a moving target. But we can get some sense of the bottom end of the range. It’s currently $1000 or greater at least in some disciplines, but is likely to go down. It’s also likely to diversify as new providers offer subsets of the services currently offered as one indivisible lump. At the top end both customers and service providers actions suggest they believe that the added value is less than what we currently pay and that it is only artificial controls over access to the non-value add versions that justify the current price. What we need is a better articulation of what is the real value that publishers add and an honest conversation about what we are prepared to pay for it.

Enhanced by Zemanta

A tale of two analysts

Understanding how a process looks from outside our own echo chamber can be useful. It helps to calibrate and sanity check our own responses. It adds an external perspective and at its best can save us from our own overly fixed ideas. In the case of the ongoing Elsevier Boycott we even have a perspective that comes from two opposed directions. The two analyst/brokerage firms Bernstein and Exane Paribas have recently published reports on their view of how recent events should effect the view of those investing in Reed Elsevier. In the weeks following the start of the boycott Elsevier’s stock price dropped – was this an indication of a serious structural problems in the business revealed by the boycott (the Bernstein view) or just a short term over reaction that provides an opportunity for a quick profit (the Exane view)?

Claudio Aspesi from Bernstein has been negative on Elsevier stock for sometime [see Stephen Curry’s post for links and the most recent report], citing the structural problem that the company is stuck in a cycle of publishing more, losing subscriptions, charging more, and managing to squeeze out a little more profit for shareholders in each cycle. Aspesi has been stating for some time that this simply can’t go on. He also makes the link between the boycott and a potentially increased willingness of libraries to drop subscriptions or abandon big deals altogether. He is particularly scathing about the response to the boycott arguing that Elsevier is continuing to estrange the researcher community and that this must ultimately be disastrous. In particular the report focuses on the claims management have made of their ability to shift the cost base away from libraries and onto researchers based on “excellent relations with researchers”.

The Exane view on the other hand is that this is a storm in a teacup [summary at John Baez’s G+]. They point to the relatively small number of researchers signing up to the boycott, particularly in the context of the much larger numbers involved in similar pledges in 2001 and 2007. In doing this I feel they are missing the point – that the environment of those boycotts was entirely different both in terms of disciplines and targeting but an objective observer might well view me as biased.

I do however find this report complacent on details – claiming as it does that the “low take-up of this petition is a sign of the scientific community’s improving perception of Elsevier”, an indication of a lack of real data on researcher sentiment. They appear to have bought the Elsevier line on “excellent relations” uncritically – and what I see on the ground is barely suppressed fury that is increasingly boiling over. It also focuses on OA as a threat – not an opportunity – for Elsevier, a view which would certainly lead me to discount their long term views on the company’s stock price. Their judgement for me is brought even further into question by the following:

“In our DCF terminal value, we capture the Open Access risk by assuming the pricing models flip to Gold Open Access with average revenue per article of USD3,000. Even on that assumption, we find value in the shares.”

Pricing the risk at this level is risible. The notion that Elsevier could flip to an author pays model by charging $US3000 an article is absurd. The poor take up of the current Elsevier options and the massive growth of PLoS ONE and clones at half this price sets a clear price point, and one that is likely a high water mark for journal APCs. If there is value in the shares at $3000 then I can’t help but feel there won’t be very much at a likely end point price well below $1000.

However both reports appear to me to fail to recognize one very important aspect of the situation – its volatility. As I understand it these firms make their names by being right when they take positions away from the consensus. Thus they have a tendency to report their views as certainties. In this case I think the situation could swing either way very suddenly. As the Bernstein report notes, the defection of editorial staff from Elsevier journals is the most significant risk. A single board defection from a middle to high ranking journal – or a signal from a major society journal that they will not renew an Elsevier contract – could very easily start a landslide that ends Elsevier’s dominance as the largest research publisher. Equally, nothing much could happen which would certainly likely lead to a short term rally in stock prices. But no-one is in a position to guess how this is going to play out.

In the long term I side with Aspesi – I see nothing in the overall tenor of Elsevier’s position statements that suggests to me that they really understand either the research community, the environment, or how it is changing. Their pricing model for hybrid options seems almost designed to fail. As mandates strengthen it appears the company is likely to continue to fight them rather than adapt. But to accept my analysis you need to be believe my view that the subscription business model is no longer fit for purpose.

What this shows, more than anything else, is that the place where the battle for change will ultimately be fought out is in stock market. While Elsevier continues to tell its shareholders that it can deliver continuing profit growth from scholarly publishing with a subscription business model – it will be trapped into defending that business model against all threats. The Research Works Act is a part of that fight – as will be attempts to block simple and global mandates by funders on researchers in other places. While the shareholders believe that the status quo can continue the senior management of the company is trapped by a legacy mindset. Until shareholders accept that the company needs to take a short-term haircut the real investment required for change seems unlikely. And I don’t meant a few million here or there. I mean a full year’s profits ploughed back into the company over a few years to allow for root and branch change.

The irony seems that large-scale change requires that the investors get spooked. For that to happen something has to go very publicly wrong. The uproar over the support of SOPA and RWA is not, yet, enough to convince the analysts beyond Aspesi that something is seriously wrong. It is an interesting question what would be. My sense is that nothing big enough will come along soon enough and that those structural issues will gradually come into play leading to a long term decline. It may be that we are very near “Peak Elsevier”. Your mileage, of course, may vary.

In case it is not obvious I am not competent to offer financial or investment advice and no-one should view the proceeding as any form of such. 

Enhanced by Zemanta

The Research Works Act and the breakdown of mutual incomprehension

Man's face screaming/shouting. Stubbly wearing...
Image via Wikipedia

When the history of the Research Works Act, and the reaction against it, is written that history will point at the factors that allowed smart people with significant marketing experience to walk with their eyes wide open into the teeth of a storm that thousands of people would have predicted with complete confidence. That story will detail two utterly incompatible world views of scholarly communication. The interesting thing is that with the benefit of hindsight both will be totally incomprehensible to the observer from five or ten years in the future. It seems worthwhile therefore to try and detail those world views as I understand them.

The scholarly publisher

The publisher world view places them as the owner and guardian of scholarly communications. While publishers recognise that researchers provide the majority of the intellectual property in scholarly communication, their view is that researchers willingly and knowingly gift that property to the publishers in exchange for a set of services that they appreciate and value. In this view everyone is happy as a trade is carried out in which everyone gets what they want. The publisher is free to invest in the service they provide and has the necessary rights to look after and curate the content. The authors are happy because they can obtain the services they require without having to pay cash up front.

Crucial to this world view is a belief that research communication, the process of writing and publishing papers, is separate to the research itself. This is important because otherwise it would be clear that, at least in an ethical sense, that the writing of papers would be work for hire for the funders – and part and parcel of the contract of research. For the publishers the fact that no funding contract specifies that “papers must be published” is the primary evidence of this.

The researcher

The researcher’s perspective is entirely different. Researchers view their outputs as their own property, both the ideas, the physical outputs, and the communications. Within institutions you see this in the uneasy relationship between researchers and research translation and IP exploitation offices. Institutions try to avoid inflaming this issue by ensuring that economic returns on IP go largely to the researcher, at least until there is real money involved. But at that stage the issue is usually fudged as extra investment is required which dilutes ownership. But scratch a researcher who has gone down the exploitation path and then pushed gently aside and you’ll get a feel for the sense of personal ownership involved.

Researchers have a love-hate relationship with papers. Some people enjoy writing them, although I suspect this is rare. I’ve never met any researcher who did anything but hate the process of shepherding a paper through the review process. The service, as provided by the publisher, is viewed with deep suspicion. The resentment that is often expressed by researchers for professional editors is primarily a result of a loss of control over the process for the researcher and a sense of powerlessness at the hands of people they don’t trust. The truth is that researchers actually feel exactly the same resentment for academic editors and reviewers. They just don’t often admit it in public.

So from a researcher’s perspective, they have spent an inordinate amount of effort on a great paper. This is their work, their property. They are now obliged to hand over control of this to people they don’t trust to run a process they are unconvinced by. Somewhere along the line they sign something. Mostly they’re not too sure what that means, but they don’t give it much thought, let alone read it. But the idea that they are making a gift of that property to the publisher is absolute anathema to most researchers.

To be honest researchers don’t care that much about a paper once its out. It caused enough pain and they don’t ever want to see it again. This may change over time if people start to cite it and refer to it in supportive terms but most people won’t really look at a paper again. It’s a line on a CV, a notch on the bedpost. What they do notice is the cost, or lack of access, to other people’s papers. Library budgets are shrinking, subscriptions are being chopped, personal subscriptions don’t seem to be affordable any more.

The first response to this when researchers meet is “why can’t we afford access to our work?” The second is, given the general lack of respect for the work that publishers do, is to start down the process of claiming that they could do it better. Much of the rhetoric around eLife as a journal “led by scientists” is built around this view. And a lot of it is pure arrogance. Researchers neither understand, nor appreciate for the most part, the work of copyediting and curation, layout and presentation. While there are tools today that can do many of these things more cheaply there are very few researchers who could use them effectively.

The result…kaboom!

So the environment that set the scene for the Research Works Act revolt was a combination of simmering resentment amongst researchers for the cost of accessing the literature, combined with a lack of understanding of what it is publishers actually do. The spark that set it off was the publisher rhetoric about ownership of the work. This was always going to happen one day. The mutually incompatible world views could co-exist while there was still enough money to go around. While librarians felt trapped between researchers who demanded access to everything and publishers offering deals that just about meant they could scrape by things could continue.

Fundamentally once publishers started publicly using the term “appropriation of our property” the spark had flown. From the publisher perspective this makes perfect sense. The NIH mandate is a unilateral appropriation of their property. From the researcher perspective it is a system that essentially adds a bit of pressure to do something that they know is right, promote access, without causing them too much additional pain. Researchers feel they ought to be doing something to improve acccess to research output but for the most part they’re not too sure what, because they sure as hell aren’t in a position to change the journals they publish in. That would be (perceived to be) career suicide.

The elephant in the room

But it is of course the funder perspective that we haven’t yet discussed and looking forward, in my view it is the action of funders that will render both the publisher and researcher perspective incomprehensible in ten years time. The NIH view, similar to that of the Wellcome Trust, and indeed every funder I have spoken to, is that research communication is an intrinsic part of the research they fund. Funders take a close interest in the outputs that their research generates. One might say a proprietorial interest because again, there is a strong sense of ownership. The NIH Mandate language expresses this through the grant contract. Researchers are required to grant to the NIH a license to hold a copy of their research work.

In my view it is through research communication that research has outcomes and impact. From the perspective of a funder their main interest is that the research they fund generates those outcomes and impacts. For a mission driven funder the current situation signals one thing and it signals it very strongly. Neither publishers, nor researchers can be trusted to do this properly. What funders will do is move to stronger mandates, more along the Wellcome Trust lines than the NIH lines, and that this will expand. At the end of the day, the funders hold all the cards. Publishers never really did have a business model, they had a public subsidy. The holders of those subsidies can only really draw one conclusion from current events. That they are going to have to be much more active in where they spend it to successfully perform their mission.

The smart funders will work with the pre-existing prejudice of researchers, probably granting copyright and IP rights to the researchers, but placing tighter constraints on the terms of forward licensing. That funders don’t really need the publishers has been made clear by HHMI, Wellcome Trust, and the MPI. Publishing costs are a small proportion of their total expenditure. If necessary they have the resources and will to take that in house. The NIH has taken a similar route though technically implemented in a different way. Other funders will allow these experiments to run, but ultimately they will adopt the approaches that appear to work.

Bottom line: Within ten years all major funders will mandate CC-BY Open Access on publication arising from work they fund immediately on publication. Several major publishers will not survive the transition. A few will and a whole set of new players will spring up to fill the spaces. The next ten years look to be very interesting.

Enhanced by Zemanta

Forward linking and keeping context in the scholarly literature

Alchemical symbol for arsenic
Image via Wikipedia

Last Friday I spoke at the STM Innovation Seminar in London, taking in general terms the theme I’ve been developing recently of focussing on enabling user discovery rather than providing central filtering, of enabling people to act as their own gatekeeper rather than publishers taking that role on for themselves.

An example I used, one I’ve used before was the hydride oxidation paper that was published in JACS, comprehensively demolished online and subsequently retracted. The point I wanted to make was that detailed information, the comprehensive view of what had happened was only available by searching Google. In retrospect, as has been pointed out to me in private communication, this wasn’t such a good example because, there is often more detailed information available in the published retraction. It isn’t always as visible as I might like, particularly to automated systems but actually the ACS does a pretty good job overall with retractions.

Had it come a few days earlier the arsenic microbes paper, and subsequent detailed critique might well have made a better example. Here again, the detailed criticism is not visible from the paper but only through a general search on the web, or via specialist indexes like researchblogging.org. The external reader, arriving at the paper, would have no idea that this conversation was even occurring. The best case scenario is that if and when a formal critique is published that this will be visible from the page, but even in this case this can easily be buried in other citations from the formal literature.

The arsenic story is still unfolding and deserves close observation, as does the critique of the P/NP paper from a few months ago. However a broader trend does appear to be evident. If a high profile paper is formally published, it will receive detailed, public critique. This in itself is remarkable. Open peer review is happening, even becoming common place, an expected consequence of the release of big papers. What is perhaps even more encouraging as that when that critique starts it seems capable of aggregating sufficient expertise to make the review comprehensive. When Rosie Redfield first posted her critique of the arsenic paper I noted that she skipped over the EXAFS data which I felt could be decisive. Soon after, people with EXAFS expertise were in the comments section of the blog post, pulling it apart [1, 2, 3, 4].

Two or three things jump out at me here. First that the complaint that people “won’t comment on papers” now seems outdated. Sufficiently high profile papers will receive criticism, and woe betide those journals who aren’t able to summon a very comprehensive peer review panel for these papers. Secondly that this review is not happening on journal websites even when journals provide commenting fora. The reasons for this are, in my opinion, reasonably clear. The journal websites are walled gardens, often requiring sign in, often with irritating submission or review policies. People simply can’t be arsed. The second is the fact that people are much more comfortable commenting in their own spaces, their own blogs, their community on twitter or facebook. These may not be private, but they feel safer, less wide open.

This leads onto the third point. I’ve been asked recently to try to identify what publishers (widely drawn) can do to take advantage of social media in general terms. Forums and comments haven’t really worked, not on the journal websites. Other adventures have had some success, some failures, but nothing which has taken the world by storm.

So what to do? For me the answer is starting to form, and it might be one that seems obvious. The conversation will always take place externally. Conversations happen where people come together. And people fundamentally don’t come together on journal websites. The challenge is to capture this conversation and use it to keep the static paper in context. I’d like to ditch the concept of the version of record but its not going to happen. What we can do, what publishers could do to add value and, drawing on theme of my talk, to build new discovery paths that lead to the paper, is to keep annotating, keep linking, keep building the story around the paper as it develops.

This is both technically achievable and it would add value that doesn’t really exist today. It’s something that publishers with curation and editorial experience and the right technical infrastructure could do well. And above all it is something that people might find of sufficient value to pay for.

Enhanced by Zemanta

Binary decisions are a real problem in a grey-scale world

Peer Review Monster
Image by Gideon Burton via Flickr

I recently made the most difficult decision I’ve had to take thus far as a journal editor. That decision was ultimately to accept the paper; that probably doesn’t sound like a difficult decision until I explain that I made this decision despite a referee saying I should reject the paper with no opportunity for resubmission not once, but twice.

One of the real problems I have with traditional pre-publication peer review is the way it takes a very nuanced problem around a work which has many different parts and demands that you take a hard yes/no decision. I could point to many papers that will probably remain unpublished where the methodology or the data might have been useful but there was disagreement about the interpretation. Or where there was no argument except that perhaps this was the wrong journal (with no suggestion of what the right one might be). Recently we had a paper rejected because we didn’t try to make up some spurious story about the biological reason for an interesting physical effect. Of course, we wanted to publish in a biologically slanted journal because that’s where it might come to the attention of people with ideas about what the biological relevance was.

So the problem is two-fold. Firstly that the paper is set up in a way that requires it to go forward or to fail as a single piece, despite the fact that one part might remain useful while another part is clearly wrong. The second is that this decision is binary, there is no way to “publish with reservations about X”, in most cases indeed no way to even mark which parts of the paper were controversial within the review process.

Thus when faced with this paper where, in my opinion, the data reported were fundamentally sound and well expressed but the intepretation perhaps more speculative than the data warranted, I was torn. The guidelines of PLoS ONE are clear: conclusions must be supported by valid evidence. Yet the data, even if the conclusions are proven wrong, are valuable in their own right. The referee objected fundamentally to the strength of the conclusion as well as having some doubts about the way those conclusions were drawn.

So we went through a process of couching the conclusions in much more careful terms, a greater discussion of the caveats and alternative interpretations. Did this fundamentally change the paper? Not really. Did it take a lot of time? Yes, months in the end. But in the end it felt like a choice between making the paper fit the guidelines, or blocking the publication of useful data. I hope the disagreement over the interpretation of the results and even the validity of the approach will play out in the comments for the paper or in the wider literature.

Is there a solution? Well I would argue that if we published first and then reviewed later this would solve many problems. Continual review and markup as well as modification would match what we actually do as our ideas change and the data catches up and propels us onwards. But making it actually happen? Still very hard work and a long way off.

In any case, you can always comment on the paper if you disagree with me. I just have.

Enhanced by Zemanta

It’s not information overload, nor is it filter failure: It’s a discovery deficit

Clay Shirky
Image via Wikipedia

Clay Shirky’s famous soundbite has helped to focus on minds on the way information on the web needs to be tackled and a move towards managing the process of selecting and prioritising information. But in the research space I’m getting a sense that it is fuelling a focus on preventing publication in a way that is analogous to the conventional filtering process involved in peer reviewed publication.

Most recently this surfaced at Chronicle of Higher Education, to which there were many responses, Derek Lowe’s being one of the most thought out. But this is not isolated.

@JISC_RSC_YH: How can we provide access to online resources and maintain quality of content?  #rscrc10 [twitter via@branwenhide]

Me: @branwenhide @JISC_RSC_YH isn’t the point of the web that we can decouple the issues of access and quality from each other? [twitter]

There is a widely held assumption that putting more research onto the web makes it harder to find the research you are looking for. Publishing more makes discovery easier.

The great strength of the web is that you can allow publication of anything at very low marginal cost without limiting the ability of people to find what they are interested in, at least in principle. Discovery mechanisms are good enough, while being a long way from perfect, to make it possible to mostly find what you’re looking for while avoiding what you’re not looking for.  Search acts as a remarkable filter over the whole web through making discovery possible for large classes of problem. And high quality search algorithms depend on having a lot of data.

It is very easy to say there is too much academic literature – and I do. But the solution which seems to be becoming popular is to argue for an expansion of the traditional peer review process. To prevent stuff getting onto the web in the first place. This is misguided for two important reasons. Firstly it takes the highly inefficient and expensive process of manual curation and attempts to apply it to every piece of research output created. This doesn’t work today and won’t scale as the diversity and sheer number of research outputs increases tomorrow. Secondly it doesn’t take advantage of the nature of the web. They way to do this efficiently is to publish everything at the lowest cost possible, and then enhance the discoverability of work that you think is important. We don’t need publication filters, we need enhanced discovery engines. Publishing is cheap, curation is expensive whether it is applied to filtering or to markup and search enhancement.

Filtering before publication worked and was probably the most efficient place to apply the curation effort when the major bottleneck was publication. Value was extracted from the curation process of peer review by using it reduce the costs of layout, editing, and printing through simple printing less.  But it created new costs, and invisible opportunity costs where a key piece of information was not made available. Today the major bottleneck is discovery. Of the 500 papers a week I could read, which ones should I read, and which ones just contain a single nugget of information which is all I need? In the Research Information Network study of costs of scholarly communication the largest component of publication creation and use cycle was peer review, followed by the cost of finding the articles to read which represented some 30% of total costs. On the web, the place to put in the curation effort is in enhancing discoverability, in providing me the tools that will identify what I need to read in detail, what I just need to scrape for data, and what I need to bookmark for my methods folder.

The problem we have in scholarly publishing is an insistence on applying this print paradigm publication filtering to the web alongside an unhealthy obsession with a publication form, the paper, which is almost designed to make discovery difficult. If I want to understand the whole argument of a paper I need to read it. But if I just want one figure, one number, the details of the methodology then I don’t need to read it, but I still need to be able to find it, and to do so efficiently, and at the right time.

Currently scholarly publishers vie for the position of biggest barrier to communication. The stronger the filter the higher the notional quality. But being a pure filter play doesn’t add value because the costs of publication are now low. The value lies in presenting, enhancing, curating the material that is published. If publishers instead vied to identify, markup, and make it easy for the right people to find the right information they would be working with the natural flow of the web. Make it easy for me to find the piece of information, feature work that is particularly interesting or important, re-intepret it so I can understand it coming from a different field, preserve it so that when a technique becomes useful in 20 years the right people can find it. The brand differentiator then becomes which articles you choose to enhance, what kind of markup you do, and how well you do it.

All of these are things that publishers already do. And they are services that authors and readers will be willing to pay for. But at the moment the whole business and marketing model is built around filtering, and selling that filter. By impressing people with how much you are throwing away. Trying to stop stuff getting onto the web is futile, inefficient, and expensive. Saving people time and money by helping them find stuff on the web is an established and successful business model both at scale, and in niche areas. Providing credible and respected quality measures is a viable business model.

We don’t need more filters or better filters in scholarly communications – we don’t need to block publication at all. Ever. What we need are tools for curation and annotation and re-integration of what is published. And a framework that enables discovery of the right thing at the right time. And the data that will help us to build these. The more data, the more reseach published, the better. Which is actually what Shirky was saying all along…

Enhanced by Zemanta

In defence of author-pays business models

Latest journal ranking in the biological sciences
Image by cameronneylon via Flickr

There has been an awful lot recently written and said about author-pays business models for scholarly publishing and a lot of it has focussed on PLoS ONE.  Most recently Kent Anderson has written a piece on Scholarly Kitchen that contains a number of fairly serious misconceptions about the processes of PLoS ONE. This is a shame because I feel this has muddled the much more interesting question that was intended to be the focus of his piece. Nonetheless here I want to give a robust defence of author pays models and of PLoS ONE in particular. Hopefully I can deal with the more interesting question, how radical should or could PLoS be, in a later post.

A common charge leveled at author-payment funded journals is that they are pushed in the direction of being non-selective. The figure that PLoS ONE publishes around 70% of the papers it receives is often given as a demonstration of this. There are a range of reasons why this is nonsense. The first and simplest is that the evidence we have suggests that of papers rejected from journals between 50% and 95% of them are ultimately published elsewhere [1, 2 (pdf), 3, 4]. The cost of this trickle down, a result of the use of subjective selection criteria of “importance”, is enormous in authors’ and referees’ time and represents a significant potential opportunity cost in terms of lost time. PLoS ONE seeks to remove this cost by simply asking “should this be published?” In the light of the figures above it seems that 70% is a reasonable proportion of papers that are probably “basically ok but might need some work”.

The second presumption is that the peer review process is somehow “light touch”. This is perhaps the result of some mis-messaging that went on early in the history of PLoS ONE but it is absolute nonsense. As both an academic editor and an author I would argue that the peer review process is as rigorous as I have experienced at any other journal (and I do mean any other journal).

As an author I have two papers published in PLoS ONE, both went through at least one round of revision, and one was initially rejected. As an editor I have seen two papers withdrawn after the initial round of peer review, presumably not because the authors felt that the required changes represented a “light touch”. I have rejected one and have never accepted a paper without revision. Every paper I have edited has had at least one external peer reviewer and I try to get at least two. Several papers have gone through more than one cycle of revision with one going through four. Figures provided by Pete Binfield (comment from Pete about 20 comments in) suggest that this kind of proportion is about average for PLoS ONE Academic Editors. The difference between PLoS ONE and other journals is that I look for what is publishable in a submission and work with the authors to bring that out rather than taking delight in rejecting some arbitrary proportion of submissions and imagining that this equates to a quality filter. I see my role as providing a service.

The more insidious claim made is that there is a link between this supposed light touch review and the author pays models; that there is pressure on those who make the publication decision to publish as much as possible. Let me put this as simply as possible. The decision whether to publish is mine as an Academic Editor and mine alone. I have never so much as discussed my decision on a paper with the professional staff at PLoS and I have never received any payment whatsoever from PLoS (with the possible exception of two lunches and one night’s accommodation for a PLoS meeting I attended – and I missed the drinks reception…). If I ever perceived pressure to accept or was offered inducements to accept papers I would resign immediately and publicly as an AE.

That an author pays model has the potential to create a conflict of interest is clear. That is why, within reputable publishers, structures are put in place to reduce that risk as far as is possible, divorcing the financial side from editorial decision making, creating Chinese walls between editorial and financial staff within the publisher.  The suggestion that my editorial decisions are influenced by the fact the authors will pay is, to be frank, offensive, calling into serious question my professional integrity and that of the other AEs. It is also a slightly strange suggestion. I have no financial stake in PLoS. If it were to go under tomorrow it would make no difference to my take home pay and no difference to my finances. I would be disappointed, but not poorer.

Another point that is rarely raised is that the author pays model is much more widely used than people generally admit. Page charges and colour charges for many disciplines are of the same order as Open Access publication charges. The Journal of Biological Chemistry has been charging page rates for years while increasing publication volume. Author fees of one sort or another are very common right across the biological and medical sciences literature. And it is not new. Bill Hooker’s analysis (here and here) of these hidden charges bears reading.

But the core of the argument for author payments is that the market for scholarly publishing is badly broken. Until the pain of the costs of publication is directly felt by those making the choice of where to (try to) publish we will never change the system. The market is also the right place to have this out. It is value for money that we should be optimising. Let me illustrate with an example. I have heard figures of around £25,000 given as the level of author charge that would be required to sustain Cell, Nature, or Science as Open Access APC supported journals. This is usually followed by a statement to the effect “so they can’t possibly go OA because authors would never pay that much”.

Let’s unpack that statement.

If authors were forced to make a choice between the cost of publishing in these top journals versus putting that money back into their research they would choose the latter. If the customer actually had to make the choice to pay the true costs of publishing in these journals, they wouldn’t…if journals believed that authors would see the real cost as good value for money, many of them would have made that switch years ago. Subscription charges as a business model have allowed an appallingly wasteful situation to continue unchecked because authors can pretend that there is no difference in cost to where they publish, they accept that premium offerings are value for money because they don’t have to pay for them. Make them make the choice between publishing in a “top” journal vs a “quality” journal and getting another few months of postdoc time and the equation changes radically. Maybe £25k is good value for money. But it would be interesting to find out how many people think that.

We need a market where the true costs are a factor in the choices of where, or indeed whether, to formally publish scholarly work. Today, we do not have that market and there is little to no pressure to bring down publisher costs. That is why we need to move towards an author pays system.

Reblog this post [with Zemanta]

The future of research communication is aggregation

Paper as aggregation
Image by cameronneylon via Flickr

“In the future everyone will be a journal editor for 15 minutes” – apologies to Andy Warhol

Suddenly it seems everyone wants to re-imagine scientific communication. From the ACS symposium a few weeks back to a PLoS Forum, via interesting conversations with a range of publishers, funders and scientists, it seems a lot of people are thinking much more seriously about how to make scientific communication more effective, more appropriate to the 21st century and above all, to take more advantage of the power of the web.

For me, the “paper” of the future has to encompass much more than just the narrative descriptions of processed results we have today. It needs to support a much more diverse range of publication types, data, software, processes, protocols, and ideas, as well provide a rich and interactive means of diving into the detail where the user in interested and skimming over the surface where they are not. It needs to provide re-visualisation and streaming under the users control and crucially it needs to provide the ability to repackage the content for new purposes; education, public engagement, even main stream media reporting.

I’ve got a lot of mileage recently out of thinking about how to organise data and records by ignoring the actual formats and thinking more about what the objects I’m dealing with are, what they represent, and what I want to do with them. So what do we get if we apply this thinking to the scholarly published article?

For me, a paper is an aggregation of objects. It contains, text, divided up into sections, often with references to other pieces of work. Some of these references are internal, to figures and tables, which are representations of data in some form or another. The paper world of journals has led us to think about these as images but a much better mental model for figures on the web is of an embedded object, perhaps a visualisation from a service like Many Eyes, Swivel, and Tableau Public. Why is this better? It is better because it maps more effectively onto what we want to do with the figure. We want to use it to absorb the data it represents, and to do this we might want to zoom, pan, re-colour, or re-draw the data. But we want to know if we do this that we are using the same underlying data, so the data needs a home, an address somewhere on the web, perhaps with the journal, or perhaps somewhere else entirely, that we can refer to with confidence.

If that data has an individual identity it can in turn refer back to the process used to generate it, perhaps in an online notebook or record, perhaps pointing to a workflow or software process based on another website. Maybe when I read the paper I want that included, maybe when you read it you don’t – it is a personal choice, but one that should be easy to make. Indeed, it is a choice that would be easy to make with today’s flexible web frameworks if the underlying pieces were available and represented in the right way.

The authors of the paper can also be included as a reference to a unique identifier. Perhaps the authors of the different segments are different. This is no problem, each piece can refer to the people that generated it. Funders and other supporting players might be included by reference. Again this solves a real problem of today, different players are interested in how people contributed to a piece of work, not just who wrote the paper. Providing a reference to a person where the link show what their contribution was can provide this much more detailed information. Finally the overall aggregation of pieces that is brought together and finally published also has a unique identifier, often in the form of the familiar DOI.

This view of the paper is interesting to me for two reasons. The first is that it natively supports a wide range of publication or communication types, including data papers, process papers, protocols, ideas and proposals. If we think of publication as the act of bringing a set of things together and providing them with a coherent identity then that publication can be many things with many possible uses. In a sense this is doing what a traditional paper should do, bringing all the relevant information into a single set of pages that can be found together, as opposed to what they usually do, tick a set of boxes about what a paper is supposed to look like. “Is this publishable?” is an almost meaningless question on the web. Of course it is. “Is it a paper?” is the question we are actually asking. By applying the principles of what the paper should be doing as opposed to the straightjacket of a paginated, print-based document, we get much more flexibility.

The second aspect which I find exciting revolves around the idea of citation as both internal and external references about the relationships between these individual objects. If the whole aggregation has an address on the web via a doi or a URL, and if its relationship both to the objects that make it up and to other available things on the web are made clear in a machine readable citation then we have the beginnings of a machine readable scientific web of knowledge. If we take this view of objects and aggregates that cite each other, and we provide details of what the citations mean (this was used in that, this process created that output, this paper is cited as an input to that one) then we are building the semantic web as a byproduct of what we want to do anyway. Instead of scaring people with angle brackets we are using a paradigm that researchers understand and respect, citation, to build up meaningful links between packages of knowledge. We need the authoring tools that help us build and aggregate these objects together and tools that make forming these citations easy and natural by using the existing ideas around linking and referencing but if we can build those we get the semantic web for science as a free side product – while also making it easier for humans to find the details they’re looking for.

Finally this view blows apart the monolithic role of the publisher and creates an implicit marketplace where anybody can offer aggregations that they have created to potential customers. This might range from a high school student putting their science library project on the web through to a large scale commercial publisher that provides a strong brand identity, quality filtering, and added value through their infrastructure or services. And everything in between. It would mean that large scale publishers would have to compete directly with the small scale on a value-for-money basis and that new types of communication could be rapidly prototyped and deployed.

There are a whole series of technical questions wrapped up in this view, in particular if we are aggregating things that are on the web, how did they get there in the first place, and what authoring tools will we need to pull them together. I’ll try to start on that in a follow-up post.

Reblog this post [with Zemanta]

Avoid the pain and embarassment – make all the raw data available

Enzyme

A story of two major retractions from a well known research group has been getting a lot of play over the last few days with a News Feature (1) and Editorial (2) in the 15 May edition of Nature. The story turns on claim that Homme Hellinga’s group was able to convert the E. coli ribose binding protein into a Triose phosphate isomerase (TIM) using a computational design strategy. Two papers on the work appeared, one in Science (3) and one in J Mol Biol (4). However another group, having obtained plasmids for the designed enzymes, could not reproduce the claimed activity. After many months of work the group established that the supposed activity appeared to that of the bacteria’s native TIM and not that of the designed enzyme. The paper’s were retracted and Hellinga went on to accuse the graduate student who did the work of fabricating the results, a charge of which she was completely cleared.

Much of the heat the story is generating is about the characters involved and possible misconduct of various players, but that’s not what I want to cover here. My concern is about how much time, effort, and tears could have been saved if all the relevant raw data was made available in the first place. Demonstrating a new enzymatic activity is very difficult work. It is absolutely critical to rigorously exclude the possibility of any contaminating activity and in practice this is virtually impossible to guarantee. Therefore a negative control experiment is very important. It appears that this control experiment was carried out, but possibly only once, against a background of significant variability in the results. All of this lead to another group wasting on the order of twelve months trying to replicate these results. Well, not wasting, but correcting the record, arguably a very important activity, but one for which they will get little credit in any meaningful sense (an issue for another post and mentioned by Noam Harel in a comment at the News Feature online).

So what might have happened if the original raw data were available? Would it have prevented the publication of the papers in the first place? It’s very hard to tell. The referees were apparently convinced by the quality of the data. But if this was ‘typical data’ (using the special scientific meaning of typical vis ‘the best we’ve got’) and the referees had seen the raw data with greater variability then maybe they would have wanted to see more or better controls; perhaps not. Certainly if the raw data were available the second group would have realised much sooner that something was wrong.

And this is a story we see over and over again. The selective publication of results without reference to the full set of data; a slight shortcut taken or potential issues with the data somewhere that is not revealed to referees or to the readers of the paper; other groups spending months or years attempting to replicate results or simply use a method described by another group. And in the meantime graduate students and postdocs get burnt on the pyre of scientific ‘progress’ discovering that something isn’t reproducible.

The Nature editorial is subtitled ‘Retracted papers require a thorough explanation of what went wrong in the experiments’. In my view this goes nowhere near far enough. There is no longer any excuse for not providing all the raw and processed data as part of the supplementary information for published papers. Even in the form of scanned lab book pages this could have made a big difference in this case, immediately indicating the degree of variability and the purity of the proteins. Many may say that this is too much effort, that the data cannot be found. But if this is the case then serious questions need to be asked about the publication of the work. Publishers also need to play a role by providing more flexible and better indexed facilities for supplementary information, and making sure they are indexed by search engines.

Some of us go much further than this, and believe that making the raw data immediately available is a better way to do science. Certainly in this case it might have reduced the pressure to rush to publish, might have forced a more open and more thorough scrutiny of the underlying data. This kind of radical openness is not for everyone perhaps but it should be less prone to gaffes of the sort described here. I know I can have more faith in the work of my group where I can put my fingers on the raw data and check through the detail. We are still going through the process of implementing this move to complete (or as complete as we can be) openness and its not easy. But it helps.

Science has moved on from the days where the paper could only contain what would fit on the printed pages. It has moved on from the days when an informal circle of contacts would tell you which group’s work was repeatable and which was not. The pressures are high and potential for career disaster probably higher. In this world the reliability and completeness of the scientific record is crucial. Yes there are technical difficulties in making it all available. Yes it takes effort, and yes it will involve more work, and possibly less papers. But the only thing that ultimately can really be relied on is the raw data (putting aside deliberate fraud). If the raw data doesn’t form a central part of the scientific record then we perhaps need to start asking whether the usefulness of that record in its current form is starting to run out.

  1. Editorial Nature 453, 258 (2008)
  2. Wenner M. Nature 453, 271-275 (2008)
  3. Dwyer, M. A. , Looger, L. L. & Hellinga, H. W. Science 304, 1967–1971 (2004).
  4. Allert, M. , Dwyer, M. A. & Hellinga, H. W. J. Mol. Biol. 366, 945–953 (2007).

Attribution for all! Mechanisms for citation are the key to changing the academic credit culture

A reviewer at the National Institutes of Health evaluates a grant proposal.Image via Wikipedia

Once again a range of conversations in different places have collided in my feed reader. Over on Nature Networks, Martin Fenner posted on Researcher ID which lead to a discussion about attribution and in particular Martin’s comment that there was a need to be able to link to comments and the necessity of timestamps. Then DrugMonkey posted a thoughtful blog about the issue of funding body staff introducing ideas from unsuccessful grant proposals they have handled to projects which they have a responsibility in guiding. Continue reading “Attribution for all! Mechanisms for citation are the key to changing the academic credit culture”