First thoughts on the Finch Report: Good steps but missed opportunities

The Finch Report was commissioned by the UK Minister for Universities and Science to investigate possible routes for the UK to adopt Open Access for publicly funded research. The report was released last night and I have had just the chance to skim it over breakfast. These are just some first observations. Overall my impression is that the overall direction of travel is very positive but the detail shows some important missed opportunities.

The Good

The report comes out strongly in favour of Open Access to publicly funded research. Perhaps the core of this is found in the introduction [p5].

The principle that the results of research that has been publicly funded should be freely accessible in the public domain is a compelling one, and fundamentally unanswerable.

What follows this is a clear listing of other potential returns. On the cost side the report makes clear that in achieving open access through journal it is necessary that the first copy costs of publication be paid in some form and that appropriate mechanisms are in place to make that happen. This focus on Gold OA is a result in large part of the terms of reference for the report that placed retention of peer review at its heart. The other excellent aspect of the report is the detailed cost and economic modelling for multiple scenarios of UK Open Access adoption. These will be a valuable basis for discussion of managing the transition and how cost flows will change.

The bad

The report is maddeningly vague on the potential of repositories to play a major role in the transition to full open access. Throughout there is a focus on hybrid journals, a route which – with a few exceptions – appears to me to have failed to deliver any appreciable gains and simply allowed publishers to charge unjustified fee for very limited services. By comparison the repository offers an existing infrastructure that can deliver at relatively low marginal cost and will enable a dispassionate view of the additional value that publishers add. Because the value of peer review was baked into the report as an assumption this important issue gets lost but as I have noted before if publishers are adding value then repositories should pose no threat to them whatsoever.

The second issue I have with the report is that it fails to address the question of what Open Access is. The report does not seek to define open access. This is a difficult issue and I can appreciate a strict definition may be best avoided but the report does not raise the issues that such a definition would require and in this it misses an opportunity to lay out clearly the discussions required to make decisions on the critical issues of what is functionally required to realise the benefits laid out in the introduction. Thus in the end it is a report on increasing access but with no clear statement of what level of access is desirable or what the end target for this might look like.

This is most serious on the issue of licences for open access content which has been seriously fudged. Four key pieces of text from the report:

“…support for open access publication should be accompanied by policies to minimise restrictions on the rights of use and re-use, especially for non-commercial purposes, and on the ability to use the latest tools and services to organise and manipulate text and other content” [recommendations, p7]

“…[in a section on instituional and subject repositories]…But for subscription-based publishers, re-use rights may pose problems. Any requirement for them to use a Creative Commons ‘CC-BY’ licence, for example, would allow users to modify, build upon and distribute the licensed work, for commercial as well as non-commercial purposes, so long as the original authors were credited178. Publishers – and some researchers – are especially concerned about allowing commercial re-use. Medical journal publishers, who derive a considerable part of their revenues from the sale of reprints to pharmaceutical companies, could face significant loss of income. But more generally, commercial re-use would allow third parties to harvest published content from repositories and present them on new platforms that would compete with the original publisher.” [p87]

“…[from the summary on OA journals]…A particular advantage of open access journals is that publishers can afford to be more relaxed about rights of use and re-use.” [p92]

“…[from the summary on repositories]…But publishers have strong concerns about the possibility that funders might introduce further limits on the restrictions on access that they allow in their terms and conditions of grant. They believe that a reduction in the allowable embargo period to six months, especially if it were to be combined with a Creative Commons CC-BY licence that would allow commercial as well as non-commercial re-use, would represent a fundamental threat to the viability of their subscription-based journals.” [p96]

As far as I can tell the comment on page 92 is the only one that even suggests a requirement for CC-BY for open access through journals where the costs are paid. As a critical portion of the whole business model for full OA publishers it worried me that this is given almost a brief throw away line, when it is at the centre of the debate. But more widely a concern over a requirement for liberal licensing in the context of repositories appears to colour the whole discussion of licences in the report. There is, as far as I have been able to tell, no strong statement that where a fee is paid CC-BY should be required – and much that will enable incumbent subscription publishers to continue making claims that they provide “Open Access” under a variety of non-commercial licences satisfying no community definition of either “Open” nor “Open Access”.

But more critically this fudge risks failing to deliver on the minister’s brief, to support innovation and exploitation of UK research. This whole report is embedded in a government innovation strategy that places publicly funded knowledge creation at the heart of an effort to kick start the UK economy. Non-commercial licences can not deliver on this and we should avoid them at all costs. This whole discussion seems to revolve around protecting publishers rights to sell reprints, as though it made sense to legislate to protect candle makers from innovators threatening to put in an electric grid.

Much of this report is positive – and taken in the context of the RCUK draft policy there is a real opportunity to get this right. If we both make a concerted effort to utilise the potential of repositories as a transitional infrastructure, and if we get the licensing right, then the report maps out a credible route with the financial guidelines to make it through a transition. It also sends a strong signal to the White House and the European Commission, both currently considering policy statements on open access, that the UK is ready to move which will strengthen the hands of those arguing for strong policy.

This is a big step – and it heads in the right direction. The devil is in the details of implementation. But then it always is.

More will follow – particularly on the financial modelling – when I have a chance to digest more fully. This is a first pass draft based on a quick skim and I may modify this post if I discover I have made errors in my reading.

Enhanced by Zemanta

Tracking research into practice: Are nurses on twitter a good case study?

The holy grail of research assessment is a means of automatically tracking the way research changes the way practitioners act in the real world. How does new research influence policy? Where has research been applied by start-ups? And have new findings changed the way medical practitioners treat patients? Tracking this kind of research impact is hard for a variety of reasons: practitioners don’t (generally) write new research papers citing the work they’ve used; even if they did their work is often several steps removed from the original research making the links harder to identify; and finally researchers themselves are often too removed from the application of the research to be aware of it. Where studies of downstream impact have been done they are generally carefully selected case studies, generating a narrative description. These case studies can be incredibly expensive, and by their nature are unlikely to uncover unexpected applications of research.

In recent talks I have used a specific example of a research article reaching a practitioner community. This is a paper that I discovered will search through the output of the University of Cape Town on Euan Adie‘s Altmetric.com service. The paper deals with domestic violence, HIV status and rape. These are critical social issues and new insights have a real potential to improve people’s lives, particularly in the area of the study. The paper was tweeted by a number of accounts but in particularly by @Shukumisa and @SonkeTogether two support and adovcacy organisations in South Africa. Shukumisa in particular tweeted in response to another account “@lizieloots a really important study, we have linked to it on our site”. This is a single example but it illustrates how it is possible to at least identify where research is being discussed within practitioner and community spaces.

But can we go further? More recently I’ve shown some other examples of heavily tweeted papers that relate to work funded by cancer charities. In one of those talks I made the throw away comment “You’ve always struggled to see whether practitioners actually use your research…and there are a lot of nurses on Twitter”. I hadn’t really followed that up until yesterday when I asked on twitter about research into the use of social media by nurses and was rapidly put in touch with a range of experts on the subject (remind me, how did we ask speculative research questions before Twitter?) . So the question I’m interested in probing is whether the application of research by nurses is something that can be tracked using links shared on Twitter as a proxy?

The is interesting from a range of perspectives. To what extent do practicing nurses who use social media share links to web content that informs their professional practice. How does this mirror the parallel link sharing activity by academic researchers? Are nurses referring to primary research content, or is this information mediated through other sources? Do such other sources link back to the primary research? Can those links be traced automatically?  And a host of other questions around how professional practice is changing with the greater availability of these primary and secondary resources.

My hypothesis is as follows: Links shared by nurse practitioners and their online community are a viable proxy of (some portion of) the impact that research has in clinical practice. The extent to which links are shared by nurses on Twitter, perhaps combined with sentiment analysis,  could serve as a measure of the impact of research targeted at the professional practice of nurses.

Thoughts? Criticisms?

Added Value: I do not think those words mean what you think they mean

There are two major strands to position of traditional publishers have taken in justifying the process by which they will make the, now inevitable, transition to a system supporting Open Access. The first of these is that the transition will cost “more money”. The exact costs are not clear but the, broadly reasonable, assumption is that there needs to be transitional funding available to support what will clearly be a mixed system over some transitional period. The argument of course is how much money and where it will come from, as well as an issue that hasn’t yet been publicly broached, how long will it last for? Expect lots of positioning on this over the coming months with statements about “average paper costs” and “reasonable time frames”, with incumbent subscription publishers targeting figures of around $2,500-5,000 and ten years respectively, and those on my side of the fence suggesting figures of around $1,500 and two years. This will be fun to watch but the key will be to see where this money comes from (and what subsequently gets cut), the mechanisms put in place to release this “extra” money and the way in which they are set up so as to wind down, and provide downwards price pressure.

The second arm of the publisher argument has been that they provide “added value” over what the scholarly community provides into the publication process. It has become a common call of the incumbent subscription publishers that they are not doing enough to explain this added value. Most recently David Crotty has posted at Scholarly Kitchen saying that this was a core theme of the recent SSP meeting. This value exists, but clearly we disagree on its quantitative value. The problem is we never see any actual figures given. But I think there are some recent numbers that can help us put some bounds on what this added value really is, and ironically they have been provided by the publisher associations in their efforts to head off six month embargo periods.

When we talk about added value we can posit some imaginary “real” value but this is really not a useful number – there is no way we can determine it. What we can do is talk about realisable value, i.e. the amount that the market is prepared to pay for the additional functionality that is being provided. I don’t think we are in a position to pin that number down precisely, and clearly it will differ between publishers, disciplines, and work flows but what I want to do is attempt to pin down some points which I think help to bound it, both from the provider and the consumer side. In doing this I will use a few figures and reports as well as place an explicit interpretation on the actions of various parties. The key data points I want to use are as follows:

  1. All publisher associations and most incumbent publishers have actively campaigned against open access mandates that make the final refereed version of a scholarly article, prior to typesetting, publication, indexing, and archival, online in any form either immediately or within six months after publication. The Publishers Association (UK) and ALPSP are both on record as stating that such a mandate would be “unsustainable” and most recently that it would bankrupt publishers.
  2. In a survey run by ALPSP of research libraries (although there are a series of concerns that have to be raised about the methodology) a significant proportion of libraries stated that they would cut some subscriptions if the majority research articles were available online six months after formal publication. The survey states that it appeared that most respondents assumed that the freely available version would be the original author version, i.e. not that which was peer reviewed.
  3. There are multiple examples of financially viable publishing houses running a pure Open Access programme with average author charges of around $1500. These are concentrated in the life and medical sciences where there is both significant funding and no existing culture of pre-print archives.
  4. The SCOAP3 project has created a formal journal publication framework which will provide open access to peer reviewed papers for a community that does have a strong pre-print culture utilising the ArXiv.

Let us start at the top. Publishers actively campaign against a reduction of embargo periods. This makes it clear that they do not believe that the product they provide, in transforming the refereed version of a paper into the published version, has sufficient value that their existing customers will pay for it at the existing price. That is remarkable and a frightening hole at the centre of our current model. The service providers can only provide sufficient added value to justify the current price if they additionally restrict access to the “non-added-value” version. A supplier that was confident about the value that they add would have no such issues, indeed they would be proud to compete with this prior version, confident that the additional price they were charging was clearly justified. That they do not should be a concern to all of us, not least the publishers.

Many publishers also seek to restrict access to any prior version, including the authors original version prior to peer review. These publishers don’t even believe that their management of the peer review process adds sufficient value to justify the price they are charging. This is shocking. The ACS, for instance, has such little faith in the value that it adds that it seeks to control all prior versions of any paper it publishes.

But what of the customer? Well the ALPSP survey, if we take the summary as I have suggested above at face value, suggests that libraries also doubt the value added by publishers. This is more of a quantitative argument but that some libraries would cancel some subscriptions shows that overall the community doesn’t believe the overall current price is worth paying even allowing for a six month delay in access. So broadly speaking we can see that both the current service providers and the current customers do not believe that the costs of the pure service element of subscription based scholarly publication are justified by the value added through this service.  This in combination means we can provide some upper bounds on the value added by publishers.

If we take the approximately $10B currently paid as cash costs to recompense publishers for their work in facilitating scholarly communications neither the incumbent subscription publishers nor their current library customers believe that the value added by publishers justifies the current cost, absent artificial restrictions to access to the non-value added version.

This tells us not very much about what the realisable value of this work actually is, but it does provide an upper bound. But what about a lower bound? One approach would be turn to the services provided to authors by Open Access publishers. These costs are willingly incurred by a paying customer so it is tempting to use these directly as a lower bound. This is probably reasonable in the life and medical sciences but as we move into other disciplinary areas, such as mathematics, it is clear that cost level is not seen as attractive enough. In addition the life and medical sciences have no tradition of wide availability of pre-publication versions of papers. That means for these disciplines the willingness to pay the approximately $1500 average cost of APCs is in part bound up with making the wish to make the paper effectively available through recognised outlets. We have not yet separated the value in the original copy versus the added value provided by this publishing service. The $1000-1500 mark is however a touchstone that is worth bearing in mind for these disciplines.

To do a fair comparison we would need to find a space where there is a thriving pre-print culture and a demonstrated willingness to pay a defined price for added-value in the form of formal publication over and above this existing availability. The Sponsoring Consortium for Open Access Publishing in Particle Physics (SCOAP3) is an example of precisely this. The particle physics community have essentially decided unilaterally to assume control of the journals for their area and have placed their service requirements out for tender. Unfortunately this means we don’t have the final prices yet, but we will soon and the executive summary of the working party report suggests a reasonable price range of €1000-2000. If we assume the successful tender comes in at the lower end or slightly below of this range we see an accepted price for added value, over that already provided by the ArXiv for this disciplinary area, that is not a million miles away from that figure of $1500.

Of course this is before real price competition in this space is factored in. The realisable value is a function of the market and as prices inevitably drop there will be downward pressure on what people are willing to pay. There will also be increasing competition from archives, repositories, and other services that are currently free or near free to use, as they inevitably increase the quality and range of the services they offer. Some of these will mirror the services provided by incumbent publishers.

A reasonable current lower bound for realisable added value by publication service providers is ~$1000 per paper. This is likely to drop as market pressures come to bear and existing archives and repositories seek to provide a wider range of low cost services.

Where does this leave us? Not with a clear numerical value we can ascribe to this added value, but that’s always going to be a moving target. But we can get some sense of the bottom end of the range. It’s currently $1000 or greater at least in some disciplines, but is likely to go down. It’s also likely to diversify as new providers offer subsets of the services currently offered as one indivisible lump. At the top end both customers and service providers actions suggest they believe that the added value is less than what we currently pay and that it is only artificial controls over access to the non-value add versions that justify the current price. What we need is a better articulation of what is the real value that publishers add and an honest conversation about what we are prepared to pay for it.

Enhanced by Zemanta

25,000 signatures and still rolling: Implications of the White House petition

The petition at 25,000 signatures

I’m afraid I went to bed. It was getting on for midnight and it looked like another four hours or so before the petition would reach the magic mark of 25,000 signatures. As it turns out a final rush put us across the line at around 2am my time, but never mind, I woke up wondering whether we had got there, headed for the computer and had a pleasant surprise waiting for me.

What does this mean? What have John Wilbanks, Heather Joseph, Mike Carroll, and Mike Rossner achieved by deciding to push through what was a real hard slog? And what about all those people and groups involved in getting signatures in? I think there are maybe three major points here.

Access to Research is now firmly on the White House (and other governments’) agenda

The petition started as a result of a meeting between the Access2Research founders and John Holdren from the White House. John Wilbanks has written about how the meeting went and what the response was. The US administration has sympathy and understands many of the issues. However it must be hard to make the case that this was something worth the bandwidth it would take to drive a policy initiative. Especially in an election year. The petition and the mechanism of the “We the people” site has enabled us to show that it is a policy item that generates public interest, but more importantly it creates an opportunity for the White House to respond. It is worth noting that this has been one of the more successful petitions. Reaching the 25k mark in two weeks is a real achievement, and one that has got the attention of key people.

And that attention spreads globally as well. The Finch Report on mechanisms for improving access to UK research outputs will probably not mention the petition, but you can bet that those within the UK government involved in implementation will have taken note. Similarly as the details of the Horizon2020 programme within the EU are hammered out, those deciding on the legal instruments that will distribute around $80B, will have noted that there is public demand, and therefore political cover, to take action.

The Open Access Movement has a strong voice, and a diverse network, and can be an effective lobby

It is easy, as we all work towards the shared goal of enabling wider access and the full exploitation of web technologies, to get bogged down in details and to focus on disagreements. What this effort showed was that when we work together we can muster the connections and the network to send a very strong message. And that message is stronger for coming from diverse directions in a completely transparent manner. We have learnt the lessons that could be taken from the fight against SOPA and PIPA and refined them in the campaign to defeat, in fact to utterly destroy, the Research Works Act. But this was not a reaction, and it was not merely a negative campaign. This was a positive campaign, originating within the movement, which together we have successfully pulled off. There are lessons to be learnt. Things we could have done better. But what we now know is that we have the capacity to take on large scale public actions and pull them off.

The wider community wants access and has a demonstrated capacity to use it

There has in the past been an argument that public access is not useful because “they can’t possibly undertand it”, that “there is no demand for public access”. That argument has been comprehensively and permanently destroyed. It was always an arrogant argument, and in my view a dangerous one for those with a vested interest in ensuring continued public funding of research. The fact that it had strong parallels with the arguments deployed in the 18th and 19th centuries that colonists, or those who did not own land, or women, could not possibly be competent to vote should have been enough to warn people off using it. The petition has shown demand, and the stories that have surfaced through this campaign show not only that there are many people who are not professional researchers who can use research, but that many of these people also want, and are more than capable, to contribute back to the professional research effort.

The campaign has put the ideas of Open Access in front of more people than perhaps ever before. We have reached out to family, friends, co-workers, patients, technologists, entrepreneurs, medical practitioners, educators, and people just interested in the world around them. Perhaps one in ten of them actually signed the petition, but many of them will have talked to others, spreading the ideas. This is perhaps one of the most important achievements of the petition. Getting the message and the idea out in front of hundreds of thousands of people who may not take action today, but will now be primed to see the problems that arise from a lack of access, and the opportunities that could be created through access.

Where now?

So what are our next steps? Continuing to gain signatures for the next two weeks is still important. This may be one of the most rapidly growing petitions but showing that continued growth is still valuable. But more generally my sense is that we need to take stock and look forward to the next phase. The really hard work of implementation is coming. As a movement we still disagree strongly on elements of tactics and strategy. The tactics I am less concerned about, we can take multiple paths, applying pressure at multiple points and this will be to our advantage. But I think we need a clearer goal on strategy. We need to articulate what the endgame is. What is the vision? When will we know that we have achieved what we set out to do?

Peter Murray-Rust has already quoted Churchill but it does seem apposite. “…this is not the end. This is not even the beginning of the end. But it is perhaps, the end of the beginning.”

We now know how much we can achieve when we work together with a shared goal. The challenge now is to harness that to a shared understanding of the direction of travel, if perhaps not the precise route. But if we, with all the diversity of needs and views that this movement contains, we can find the core of goals that we all agree on, then what we now know is that we have the capacity, the depth, and the strength to achieve them.

 

Enhanced by Zemanta

Send a message to the Whitehouse: Show the strength of support for OA

The Whitehouse
The Whitehouse - from Flickr User nancy_t3i

Changing the world is hard. Who knew? Advocating for change can be lonely. It can also be hard. As a scholar, particularly one at the start of a career it is still hard to commit fully to ensuring that research outputs are accessible and re-useable. But we are reaching a point where support for Open Access is mainstream, where there is a growing public interest in greater access to research, and increasingly serious engagement with the policy issues at the highest level.

The time has come to show just how strong that support is. As of today there is a petition on the Whitehouse site calling for the Executive to mandate Open Access to the literature generated from US Federal Funding. If the petition reaches 25,000 signatures within 30 days then the Whitehouse is committed to respond. The Executive has been considering the issues of access to research publications and data and with FRPAA active in both houses there are multiple routes available to enact change. If we can demonstrate widespread and diverse support for Open Access, then we will have made the case for that change. This is a real opportunity for each and everyone of us to make a difference.

So go to the Access2Research Petition on whitehouse.gov and sign up now. Blog and tweet using the hashtag #OAMonday and lets show just how wide the coalition is. Go to the Access2Research Website to learn more. Post the site link to your community to get people involved.

I’ll be honest. The Whitehouse petition site isn’t great – this isn’t a 30 second job. But it shouldn’t take you more than five minutes. You will need to give a real name and an email address and go through a validation process via email. You don’t need to be a US citizen or resident. Obviously if you give a US Zip code it is likely that more weight will be given to your signature but don’t be put off if you are not in the US. Once you have an account signing the petition is a simple matter of clicking a single button. The easiest approach will be to go to the Open Access petition and sign up for an account from there. Once you get the validation link via email you will be taken back to the petition.

The power of Open Access will only be unlocked through networks of people using, re-using, and re-purposing the outputs of research. The time has come to show just how broad and diverse that network is. Please take the time as one single supporter of Open Access to add your voice to the thousands of others who will be signing with you. And connect to your network to tell them how important it is for them to add their voice as well.

Parsing the Willetts Speech on Access to UK Research Outputs

David Willetts speaking at the Big Society pol...
David Willetts speaking at the Big Society policy launch, Coin St, London. (Photo credit: Wikipedia)

Yesterday David Willetts, the UK Science and Universities Minister gave a speech to the Publishers Association that has got wide coverage. However it is worth pulling apart both the speech and the accompanying opinion piece from the Guardian because there are some interesting elements in there, and also some things have got a little confused.

The first really key point is that there is nothing new here. This is basically a re-announcement of the previous position from the December Innovation Strategy on moving towards a freely accessible literature and a more public announcement of the Gateway to Research project previously mentioned in the RCUK response to the Innovation Statement.

The Gateway to Research project is a joint venture of the Department of Business Innovation and Skills and Research Councils UK to provide a one stop shop for information on UK research funding as well as pointers to outputs. It will essentially draw information directly from sources that already exist (the Research Outputs System and eVal) as well as some new ones with the intention of helping the UK public and enterprise find research and researchers that is of interest to them, and see how they are funded.

The new announcement was that Jimmy Wales of Wikipedia fame will be advising on the GTR portal. This is a good thing and he is well placed to provide both technical and social expertise on the provision of public facing information portals as well as providing a more radical perspective than might come out of BIS itself. While this might in part be cynically viewed as another example of bringing in celebrities to advise on policy this is a celebrity with relevant expertise and real credibility based on making similar systems work.

The rest of the information that we can gather relates to government efforts in moving towards making the UK research literature accessible. Wales also gets a look in here, and will be “advising us on [..] common standards to ensure information is presented in a readily reusable form”. My reading of this is that the Minister understands the importance of interoperability and my hope is that this will mean that government is getting good advice on appropriate licensing approaches to support this.

However, many have read this section of the speech as saying that GTR will act as some form of national repository for research articles. I do not believe this is the intention, and reading between the lines the comment that it will “provide direct links to actual research outputs such as data sets and publications” [my emphasis] is the key. The point of GTR is to make UK research more easily discoverable. Access is a somewhat orthogonal issue. This is better read as an expression of Willetts’ and the wider government’s agenda on transparency of public spending than as a mechanism for providing access.

What else can we tell from the speech? Well the term “open access” is used several times, something that was absent from the innovation statement, but still the emphasis is on achieving “public access” in the near term with “open access” cast as the future goal as I read it. It’s not clear to me whether this is a well informed distinction. There is a somewhat muddled commentary on Green vs Gold OA but not that much more muddled than what often comes from our own community. There are also some clear statements on the challenges for all involved.

As an aside I found it interesting that Willetts gave a parenthetical endorsement of usage metrics for the research literature when speaking of his own experience.

As well as reading some of the articles set by my tutors, I also remember browsing through the pages of the leading journals to see which articles were well-thumbed. It helped me to spot the key ones I ought to be familiar with – a primitive version of crowd-sourcing. The web should make that kind of search behaviour far easier.

This is the most sophisticated appreciation of the potential for the combination of measurement and usage data in discovery that I have seen from any politician. It needs to be set against his endorsement of rather cruder filters earlier in the speech but it nonetheless gives me a sense that there is a level of understanding within government that is greater than we often fear.

Much of the rest of the speech is hedging. Options are discussed but not selected and certainly not promoted. The key message: wait for the Finch Report which will be the major guide for the route the government will take and the mechanisms that will be put in place to support it.

But there are some clearer statements. There is a strong sense that Hargreave’s recommendations on enabling text mining should be implemented. And the logic for this is well laid out. The speech and the policy agenda is embedded in a framework of enabling innovation – making it clear what kinds of evidence and argument we will need to marshal in order to persuade. There is also a strong emphasis on data as well as an appreciation that there is much to do in this space.

But the clearest statement made here is on the end goals. No-one can be left in any doubt of Willetts’ ultimate target. Full access to the outputs of research, ideally at the time of publication, in a way that enables them to be fully exploited, manipulated and modified for any purpose by any party. Indeed the vision is strongly congruent with the Berlin, Bethesda, and Budapest declarations on Open Access. There is still much to be argued about the route and and its length, but in the UK at least, the destination appears to be in little doubt.

Enhanced by Zemanta

Some brief responses to the Sage Bionetworks Congress

I attended the first Sage Bionetworks Congress in 2010 and it left a powerful impression on my thinking. I have just attended the third congress in San Francisco and again the challenging nature of views, the real desire to make a difference, and the standard of thinking in the room will take me some time to process. But a series of comments, and soundbites over the course of the meeting have made me realise just how seriously bad our situation is.

  • Attempts by a variety of big pharma to replicate disease relevant results published by academic labs failed in ~80% of cases (see for instance this story about this commentary in Nature[$])
  • When a particular blood cancer group was asked what factor about their disease was most to them, they said gastro-intestinal problems. No health professional had ever even considered this as a gastro-intestinal disease.
  • Jamie Heywood spent $25M of his own money on attempting to replicate around 500 published results that were therapeutically relevant to ALS and could not repeat the findings in a single case.
  • A cancer patient, advocate, and fundraiser of 25 years standing said the following to me: “We’ve been at this for 25 years, we’ve raised over $2B for research, and new patients today get the same treatment I did. What’s the point?”

In a room full of very smart people absolutely committed to making a difference there were very few new ideas on how we actually cut through the thicket of perverse incentives, institutional inertia, disregard for replicability, and personal ego-stroking which is perpetuating these problems. I’ve been uncertain for some time whether change from within our existing structures and systems is really possible. I’m leaning further and further to the view that it is not. That doesn’t mean that we can’t do anything – just that it may be more effective to simply bypass existing institutions to do it.

Enhanced by Zemanta

A big leap and a logical step: Moving to PLoS

PLoS: The Public Library of Science
PLoS: The Public Library of Science (Photo credit: dullhunk)

As a child I was very clear I wanted to be a scientist. I am not sure exactly where the idea came from. In part I blame Isaac Asimov but it must have been a combination of things. I can’t remember not having a clear idea of wanting to go into research.

I started off a conventional career with big ideas – understanding the underlying physics, chemistry, and information theory that limits molecular evolution – but my problem was always that I was interested in too many things. I kept getting distracted. Along with this I also started to wonder how much of a difference the research I was doing was really making. This led to a shift towards working on methods development – developing tools that would support many researchers to do better and more efficient work. In turn it lead to my current position, with the aim of developing the potential of neutron scattering as a tool for the biosciences. I got gradually more interested in the question of  how to make the biggest difference I could, rather than just pursuing one research question.

And at the same time I was developing a growing interest in the power of the web and how it had the potential, as yet unrealized, to transform the effectiveness of the research community. This has grown from side interest to hobby to something like a full time job, on top of the other full time job I have. This wasn’t sustainable. At the same time I’ve realized I am pretty good at the strategy, advocacy, speaking and writing; at articulating a view of where we might go, and how we might get there. That in this space I can make a bigger difference. If we can increase the efficiency of research by just 5%, reduce the time for the developing world to bring a significant research capacity on stream by just a few years, give a few patients better access to information, or increase the wider public interest and involvement in science just a small amount, then this will be a far reader good than I could possibly make doing my own research.

Which is why, from July I will be moving to PLoS to take up the role of Advocacy Director.

PLoS is an organization that right from the beginning has had a vision, not just of making research papers more accessible but of transforming research communication, of making it ready for, making it of the 21st century. This is a vision I share and one that I am very excited to playing a part in.

In the new role I will obviously be doing a lot of advocacy, planning, speaking, and writing on open access. There is a lot to play for over the next few years with FRPAA in the US, new policies being developed in Europe, and a growing awareness of the need to think hard about data as a form of publication. But I will also be taking the long view, looking out on a ten year horizon to try and identify the things we haven’t seen yet, the opportunities that are already there and how we can navigate a path between them. Again there is huge potential in this space, gradually turning from ideas and vaporware into real demos and even products.

The two issues, near term policy and longer term technical development are inextricably linked. The full potential of networked research cannot be realized except in a world of open content, open standards, APIs, process, and data. Interoperability is crucial, technical interoperability, standards interoperability, social interoperability, and legal interoperability. It is being at the heart of the community that is working to link these together and make them work that really excites me about this position.

PLoS has been an engine of innovation since it was formed, changing the landscape of scholarly publishing in a way that no-one would have dreamed was possible. Some have argued that this hasn’t been so much the case in the last few years. But really things have just been quiet, plans have been laid, and I think you will find the next few years exciting.

Inevitably, I will be leaving some things behind. I won’t be abandoning research completely, I hope to keep my toe in a range of projects but I will be scaling back a lot. I will be stepping down as an Academic Editor for PLoS ONE (and apologies for all those reviews and editorial requests for PLoS ONE that I’ve turned down in the last few months) because this would be a clear conflict of interest. I’ve got a lot to clear up before July.

I will be sad to leave behind some of those roles but above all I am excited and looking forward to working in a great organisation, with people I respect doing things I believe are important. Up until now I’ve been trying to fit these things in, more or less as a hobby around the research. Now I can focus on them full time, while still staying at least a bit connected. It’s a big leap for me, but a logical step along the way to trying to make a difference.

 

Enhanced by Zemanta

They. Just. Don’t. Get. It…

English: Traffic Jam in Delhi Français : Un em...
Image via Wikipedia

…although some are perhaps starting to see the problems that are going to arise.

Last week I spoke at a Question Time style event held at Oxford University and organised by Simon Benjamin and Victoria Watson called “The Scientific Evolution: Open Science and the Future of Publishing” featuring Tim Gowers (Cambridge), Victor Henning (Mendeley), Alison Mitchell (Nature Publishing Group), Alicia Wise (Elsevier), and Robert Winston (mainly in his role as TV talking head on science issues). You can get a feel for the proceedings from Lucy Pratt’s summary but I want to focus on one specific issue.

As is common for me recently I emphasised the fact that networked research communication needs to be different to what we are used to. I made a comparison to the fact that when the printing press was developed one of the first things that happened was that people created facsimiles of hand written manuscripts. It took hundreds of years for someone to come up with the idea of a newspaper and to some extent our current use of the network is exactly that – digital facsimiles of paper objects, not truly networked communication.

It’s difficult to predict exactly what form a real networked communication system will take, in much the same way that asking a 16th century printer how newspaper advertising would work would not provide a detailed and accurate answer, but there are some principles of successful network systems that we can see emerging. Effective network systems distribute control and avoid centralisation, they are loosely coupled, and distributed. Very different to the centralised systems for control of access and control we have today.

This is a difficult concept and one that scholarly publishers simply don’t get for the most part. This is not particularly suprising because truly disruptive innovation rarely comes from incumbent players. Large and entrenched organisations don’t generally enable the kind of thinking that is required to see the new possibilities. This is seen in publishers statements that they are providing “more access than ever before” via “more routes”, but all routes that are under tight centralised control, with control systems that don’t scale. By insisting on centralised control over access publishers are setting themselves up to fail.

Nowhere is this going to play out more starkly than in the area of text mining. Bob Campbell from Wiley-Blackwell walked into this – but few noticed it – with the now familiar claim that “text mining is not a problem because people can ask permission”. Centralised control, failure to appreciate scale, and failure to understand the necessity of distribution and distributed systems. I have with me a device capable of holding the text of perhaps 100,000 papers It also has the processor power to mine that text. It is my phone. In 2-3 years our phones, hell our watches, will have the capacity to not only hold the world’s literature but also to mine it, in context for what I want right now. Is Bob Campbell ready for every researcher, indeed every interested person in the world, to come into his office and discuss an agreement for text mining? Because the mining I want to do and the mining that Peter Murray-Rust wants to do will be different, and what I will want to do tomorrow is different to what I want to do today. This kind of personalised mining is going to be the accepted norm of handling information online very soon and will be at the very centre of how we discover the information we need. Google will provide a high quality service for free, subscription based scholarly publishers will charge an arm and a leg for a deeply inferior one – because Google is built to exploit network scale.

The problem of scale has also just played out in fact. Heather Piwowar writing yesterday describes a call with six Elsevier staffers to discuss her project and needs for text mining. Heather of course now has to have this same conversation with Wiley, NPG, ACS, and all the other subscription based publishers, who will no doubt demand different conditions, creating a nightmare patchwork of different levels of access on different parts of the corpus. But the bit I want to draw out is at the bottom of the post where Heather describes the concerns of Alicia Wise:

At the end of the call, I stated that I’d like to blog the call… it was quickly agreed that was fine. Alicia mentioned her only hesitation was that she might be overwhelmed by requests from others who also want text mining access. Reasonable.

Except that it isn’t. It’s perfectly reasonable for every single person who wants to text mine to want a conversation about access. Elsevier, because they demand control, have set themselves up as the bottleneck. This is really the key point, because the subscription business model implies an imperative to extract income from all possible uses of the content it sets up a need for control of access for differential uses. This means in turn that each different use, and especially each new use, has to be individually negotiated, usually by humans, apparently about six of them. This will fail because it cannot scale in the same way that the demand will.

The technology exists today to make this kind of mass distributed text mining trivial. Publishers could push content to bit torrent servers and then publish regular deltas to notify users of new content. The infrastructure for this already exists. There is no infrastructure investment required. The problems that publishers raise of their servers not coping is one that they have created for themselves. The catch is that distributed systems can’t be controlled from the centre and giving up control requires a different business model. But this is also an opportunity. The publishers also save money  if they give up control – no more need for six people to sit in on each of hundreds of thousands of meetings. I often wonder how much lower subscriptions would be if they didn’t need to cover the cost of access control, sales, and legal teams.

We are increasingly going to see these kinds of failures. Legal and technical incompatibility of resources, contractual requirements at odds with local legal systems, and above all the claim “you can just ask for permission” without the backing of the hundreds or thousands of people that would be required to provide a timely answer. And that’s before we deal with the fact that the most common answer will be “mumble”. A centralised access control system is simply not fit for purpose in a networked world. As demand scales, people making legitimate requests for access will have the effect of a distributed denial of service attack. The clue is in the name; the demand is distributed. If the access control mechanisms are manual, human and centralised, they will fail. But if that’s what it takes to get subscription publishers to wake up to the fact that the networked world is different then so be it.

Enhanced by Zemanta

Github for science? Shouldn’t we perhaps build TCP/IP first?

Mapa mental do TCP/IP
Image via Wikipedia

It’s one of those throw away lines, “Before we can talk about a github for science we really need to sort out a TCP/IP for science”, that’s geeky, sharp, a bit needly and goes down a treat on Twitter. But there is a serious point behind it. And its not intended to be dismissive of the ideas that are swirling around about scholarly communication at the moment either. So it seems worth exploring in a bit more detail.

The line is stolen almost wholesale from John Wilbanks who used it (I think) in the talk he gave at a Science Commons meetup in Redmond a few years back. At the time I think we were awash in “Facebooks for Science” so that was the target but the sentiment holds. As once was the case with Facebook and now is for Github, or Wikipedia, or StackOverflow, the possibilities opened up by these new services and technologies to support a much more efficient and effective research process look amazing. And they are. But you’ve got to be a little careful about taking the analogy too far.

If you look at what these services provide, particularly those that are focused on coding, they deliver commentary and documentation, nearly always in the form of text about code – which is also basically text. The web is very good at transferring text, and code, and data. The stack that delivers this is built on a set of standards, with each layer building on the layer beneath it. StackOverflow and Github are built on a set of services, that in turn sit on top of the web standards of http, which in turn are built on network standards like TCP/IP that control the actual transfer of bits and bytes.

The fundamental stuff of these coding sites and Wikipedia is text, and text is really well supported by the stack of web technologies. Open Source approaches to software development didn’t just develop because of the web, they developed the web so its not surprising that they fit well together. They grew up together and nurtured each other. But the bottom line is that the stack is optimized to transfer the grains of material, text and code, that make up the core of these services.

When we look at research we can see that when we dig down to the granular level it isn’t just made up of text. Sure most research could be represented as text but we don’t have the standardized forms to do this. We don’t have standard granules of research that we can transfer from place to place. This is because its complicated to transfer the stuff of research. I picked on TCP/IP specifically because it is the transfer protocol that supports moving bits and bytes from one place to another. What we need are protocols that support moving the substance of a piece of my research from one place to another.

Work on Research Objects [see also this paper], intended to be self-contained but useable pieces of research is a step in this direction, as are the developing set of workflow tools, that will ultimately allow us to describe and share the process by which we’ve transformed at least some parts of the research process into others. Laboratory recording systems will help us to capture and workflow-ify records of the physical parts of the research process. But until we can agree how to transfer these in a standardized fashion then I think it is premature to talk about Githubs for research.

Now there is a flip side to this, which is that where there are such services that do support the transfer of pieces of the research process we absolutely should be  experimenting with them. But in most cases the type-case itself will do the job. Github is great for sharing research code and some people are doing terrific things with data there as well. But if it does the job for those kinds of things why do we need one for researchers? The scale that the consumer web brings, and the exposure to a much bigger community, is a powerful counter argument to building things ‘just for researchers’. To justify a service focused on a small community you need to have very strong engagement or very specific needs. By the time that a mainstream service has mindshare and researchers are using it, your chances of pulling them away to a new service just for them are very small.

So yes, we should be inspired by the possibilities that these new services open up, and we should absolutely build and experiment but while we are at it can we also focus on the lower levels of the stack?They aren’t as sexy and they probably won’t make anyone rich, but we’ve got to get serious about the underlying mechanisms that will transfer our research in comprehensible packages from one place to another.

We have to think carefully about capturing the context of research and presenting that to the next user. Github works in large part because the people using it know how to use code, can recognize specific languages, and know how to drive it. It’s actually pretty poor for the user who just wants to do something – we’ve had to build up another set of services at different levels, the Python Package Index, tools for making and distributing executables, that help provide the context required for different types of user. This is going to be much, much harder, for all the different types of use we might want to put research to.

But if we can get this right – if we can standardize transfer protocols and build in the context of the research into those ‘packets’ that lets people use it then what we have seen on the wider web will happen naturally. As we build the stack up these services that seem so hard to build at the moment will become as easy today as throwing up a blog, downloading a rubygem, or firing up a machine instance. If we can achieve that then we’ll have much more than a github for research, we’ll have a whole web for research.

There’s nothing new here that wasn’t written some time ago by John Wilbanks and others but it seemed worth repeating. In particular I recommend these posts [1, 2] from John.