The holy grail of research assessment is a means of automatically tracking the way research changes the way practitioners act in the real world. How does new research influence policy? Where has research been applied by start-ups? And have new findings changed the way medical practitioners treat patients? Tracking this kind of research impact is hard for a variety of reasons: practitioners don’t (generally) write new research papers citing the work they’ve used; even if they did their work is often several steps removed from the original research making the links harder to identify; and finally researchers themselves are often too removed from the application of the research to be aware of it. Where studies of downstream impact have been done they are generally carefully selected case studies, generating a narrative description. These case studies can be incredibly expensive, and by their nature are unlikely to uncover unexpected applications of research.
In recent talks I have used a specific example of a research article reaching a practitioner community. This is a paper that I discovered will search through the output of the University of Cape Town on Euan Adie‘s Altmetric.com service. The paper deals with domestic violence, HIV status and rape. These are critical social issues and new insights have a real potential to improve people’s lives, particularly in the area of the study. The paper was tweeted by a number of accounts but in particularly by @Shukumisa and @SonkeTogether two support and adovcacy organisations in South Africa. Shukumisa in particular tweeted in response to another account “@lizieloots a really important study, we have linked to it on our site”. This is a single example but it illustrates how it is possible to at least identify where research is being discussed within practitioner and community spaces.
But can we go further? More recently I’ve shown some other examples of heavily tweeted papers that relate to work funded by cancer charities. In one of those talks I made the throw away comment “You’ve always struggled to see whether practitioners actually use your research…and there are a lot of nurses on Twitter”. I hadn’t really followed that up until yesterday when I asked on twitter about research into the use of social media by nurses and was rapidly put in touch with a rangeofexperts on the subject (remind me, how did we ask speculative research questions before Twitter?) . So the question I’m interested in probing is whether the application of research by nurses is something that can be tracked using links shared on Twitter as a proxy?
The is interesting from a range of perspectives. To what extent do practicing nurses who use social media share links to web content that informs their professional practice. How does this mirror the parallel link sharing activity by academic researchers? Are nurses referring to primary research content, or is this information mediated through other sources? Do such other sources link back to the primary research? Can those links be traced automatically? Â And a host of other questions around how professional practice is changing with the greater availability of these primary and secondary resources.
My hypothesis is as follows: Links shared by nurse practitioners and their online community are a viable proxy of (some portion of) the impact that research has in clinical practice. The extent to which links are shared by nurses on Twitter, perhaps combined with sentiment analysis, Â could serve as a measure of the impact of research targeted at the professional practice of nurses.
There are two major strands to position of traditional publishers have taken in justifying the process by which they will make the, now inevitable, transition to a system supporting Open Access. The first of these is that the transition will cost “more money”. The exact costs are not clear but the, broadly reasonable, assumption is that there needs to be transitional funding available to support what will clearly be a mixed system over some transitional period. The argument of course is how much money and where it will come from, as well as an issue that hasn’t yet been publicly broached, how long will it last for? Expect lots of positioning on this over the coming months with statements about “average paper costs” and “reasonable time frames”, with incumbent subscription publishers targeting figures of around $2,500-5,000 and ten years respectively, and those on my side of the fence suggesting figures of around $1,500 and two years. This will be fun to watch but the key will be to see where this money comes from (and what subsequently gets cut), the mechanisms put in place to release this “extra” money and the way in which they are set up so as to wind down, and provide downwards price pressure.
The second arm of the publisher argument has been that they provide “added value” over what the scholarly community provides into the publication process. It has become a common call of the incumbent subscription publishers that they are not doing enough to explain this added value. Most recently David Crotty has posted at Scholarly Kitchen saying that this was a core theme of the recent SSP meeting. This value exists, but clearly we disagree on its quantitative value. The problem is we never see any actual figures given. But I think there are some recent numbers that can help us put some bounds on what this added value really is, and ironically they have been provided by the publisher associations in their efforts to head off six month embargo periods.
When we talk about added value we can posit some imaginary “real” value but this is really not a useful number – there is no way we can determine it. What we can do is talk about realisable value, i.e. the amount that the market is prepared to pay for the additional functionality that is being provided. I don’t think we are in a position to pin that number down precisely, and clearly it will differ between publishers, disciplines, and work flows but what I want to do is attempt to pin down some points which I think help to bound it, both from the provider and the consumer side. In doing this I will use a few figures and reports as well as place an explicit interpretation on the actions of various parties. The key data points I want to use are as follows:
All publisher associations and most incumbent publishers have actively campaigned against open access mandates that make the final refereed version of a scholarly article, prior to typesetting, publication, indexing, and archival, online in any form either immediately or within six months after publication. The Publishers Association (UK) and ALPSP are both on record as stating that such a mandate would be “unsustainable” and most recently that it would bankrupt publishers.
In a survey run by ALPSP of research libraries (although there are a series of concerns that have to be raised about the methodology) a significant proportion of libraries stated that they would cut some subscriptions if the majority research articles were available online six months after formal publication. The survey states that it appeared that most respondents assumed that the freely available version would be the original author version, i.e. not that which was peer reviewed.
There are multiple examples of financially viable publishing houses running a pure Open Access programme with average author charges of around $1500. These are concentrated in the life and medical sciences where there is both significant funding and no existing culture of pre-print archives.
The SCOAP3 project has created a formal journal publication framework which will provide open access to peer reviewed papers for a community that does have a strong pre-print culture utilising the ArXiv.
Let us start at the top. Publishers actively campaign against a reduction of embargo periods. This makes it clear that they do not believe that the product they provide, in transforming the refereed version of a paper into the published version, has sufficient value that their existing customers will pay for it at the existing price. That is remarkable and a frightening hole at the centre of our current model. The service providers can only provide sufficient added value to justify the current price if they additionally restrict access to the “non-added-value” version. A supplier that was confident about the value that they add would have no such issues, indeed they would be proud to compete with this prior version, confident that the additional price they were charging was clearly justified. That they do not should be a concern to all of us, not least the publishers.
Many publishers also seek to restrict access to any prior version, including the authors original version prior to peer review. These publishers don’t even believe that their management of the peer review process adds sufficient value to justify the price they are charging. This is shocking. The ACS, for instance, has such little faith in the value that it adds that it seeks to control all prior versions of any paper it publishes.
But what of the customer? Well the ALPSP survey, if we take the summary as I have suggested above at face value, suggests that libraries also doubt the value added by publishers. This is more of a quantitative argument but that some libraries would cancel some subscriptions shows that overall the community doesn’t believe the overall current price is worth paying even allowing for a six month delay in access. So broadly speaking we can see that both the current service providers and the current customers do not believe that the costs of the pure service element of subscription based scholarly publication are justified by the value added through this service.  This in combination means we can provide some upper bounds on the value added by publishers.
If we take the approximately $10B currently paid as cash costs to recompense publishers for their work in facilitating scholarly communications neither the incumbent subscription publishers nor their current library customers believe that the value added by publishers justifies the current cost, absent artificial restrictions to access to the non-value added version.
This tells us not very much about what the realisable value of this work actually is, but it does provide an upper bound. But what about a lower bound? One approach would be turn to the services provided to authors by Open Access publishers. These costs are willingly incurred by a paying customer so it is tempting to use these directly as a lower bound. This is probably reasonable in the life and medical sciences but as we move into other disciplinary areas, such as mathematics, it is clear that cost level is not seen as attractive enough. In addition the life and medical sciences have no tradition of wide availability of pre-publication versions of papers. That means for these disciplines the willingness to pay the approximately $1500 average cost of APCs is in part bound up with making the wish to make the paper effectively available through recognised outlets. We have not yet separated the value in the original copy versus the added value provided by this publishing service. The $1000-1500 mark is however a touchstone that is worth bearing in mind for these disciplines.
To do a fair comparison we would need to find a space where there is a thriving pre-print culture and a demonstrated willingness to pay a defined price for added-value in the form of formal publication over and above this existing availability. The Sponsoring Consortium for Open Access Publishing in Particle Physics (SCOAP3) is an example of precisely this. The particle physics community have essentially decided unilaterally to assume control of the journals for their area and have placed their service requirements out for tender. Unfortunately this means we don’t have the final prices yet, but we will soon and the executive summary of the working party report suggests a reasonable price range of €1000-2000. If we assume the successful tender comes in at the lower end or slightly below of this range we see an accepted price for added value, over that already provided by the ArXiv for this disciplinary area, that is not a million miles away from that figure of $1500.
Of course this is before real price competition in this space is factored in. The realisable value is a function of the market and as prices inevitably drop there will be downward pressure on what people are willing to pay. There will also be increasing competition from archives, repositories, and other services that are currently free or near free to use, as they inevitably increase the quality and range of the services they offer. Some of these will mirror the services provided by incumbent publishers.
A reasonable current lower bound for realisable added value by publication service providers is ~$1000 per paper. This is likely to drop as market pressures come to bear and existing archives and repositories seek to provide a wider range of low cost services.
Where does this leave us? Not with a clear numerical value we can ascribe to this added value, but that’s always going to be a moving target. But we can get some sense of the bottom end of the range. It’s currently $1000 or greater at least in some disciplines, but is likely to go down. It’s also likely to diversify as new providers offer subsets of the services currently offered as one indivisible lump. At the top end both customers and service providers actions suggest they believe that the added value is less than what we currently pay and that it is only artificial controls over access to the non-value add versions that justify the current price. What we need is a better articulation of what is the real value that publishers add and an honest conversation about what we are prepared to pay for it.
I’m afraid I went to bed. It was getting on for midnight and it looked like another four hours or so before the petition would reach the magic mark of 25,000 signatures. As it turns out a final rush put us across the line at around 2am my time, but never mind, I woke up wondering whether we had got there, headed for the computer and had a pleasant surprise waiting for me.
What does this mean? What have John Wilbanks, Heather Joseph, Mike Carroll, and Mike Rossner achieved by deciding to push through what was a real hard slog? And what about all those people and groups involved in getting signatures in? I think there are maybe three major points here.
Access to Research is now firmly on the White House (and other governments’) agenda
The petition started as a result of a meeting between the Access2Research founders and John Holdren from the White House. John Wilbanks has written about how the meeting went and what the response was. The US administration has sympathy and understands many of the issues. However it must be hard to make the case that this was something worth the bandwidth it would take to drive a policy initiative. Especially in an election year. The petition and the mechanism of the “We the people” site has enabled us to show that it is a policy item that generates public interest, but more importantly it creates an opportunity for the White House to respond. It is worth noting that this has been one of the more successful petitions. Reaching the 25k mark in two weeks is a real achievement, and one that has got the attention of key people.
And that attention spreads globally as well. The Finch Report on mechanisms for improving access to UK research outputs will probably not mention the petition, but you can bet that those within the UK government involved in implementation will have taken note. Similarly as the details of the Horizon2020 programme within the EU are hammered out, those deciding on the legal instruments that will distribute around $80B, will have noted that there is public demand, and therefore political cover, to take action.
The Open Access Movement has a strong voice, and a diverse network, and can be an effective lobby
It is easy, as we all work towards the shared goal of enabling wider access and the full exploitation of web technologies, to get bogged down in details and to focus on disagreements. What this effort showed was that when we work together we can muster the connections and the network to send a very strong message. And that message is stronger for coming from diverse directions in a completely transparent manner. We have learnt the lessons that could be taken from the fight against SOPA and PIPA and refined them in the campaign to defeat, in fact to utterly destroy, the Research Works Act. But this was not a reaction, and it was not merely a negative campaign. This was a positive campaign, originating within the movement, which together we have successfully pulled off. There are lessons to be learnt. Things we could have done better. But what we now know is that we have the capacity to take on large scale public actions and pull them off.
The wider community wants access and has a demonstrated capacity to use it
There has in the past been an argument that public access is not useful because “they can’t possibly undertand it”, that “there is no demand for public access”. That argument has been comprehensively and permanently destroyed. It was always an arrogant argument, and in my view a dangerous one for those with a vested interest in ensuring continued public funding of research. The fact that it had strong parallels with the arguments deployed in the 18th and 19th centuries that colonists, or those who did not own land, or women, could not possibly be competent to vote should have been enough to warn people off using it. The petition has shown demand, and the stories that have surfaced through this campaign show not only that there are many people who are not professional researchers who can use research, but that many of these people also want, and are more than capable, to contribute back to the professional research effort.
The campaign has put the ideas of Open Access in front of more people than perhaps ever before. We have reached out to family, friends, co-workers, patients, technologists, entrepreneurs, medical practitioners, educators, and people just interested in the world around them. Perhaps one in ten of them actually signed the petition, but many of them will have talked to others, spreading the ideas. This is perhaps one of the most important achievements of the petition. Getting the message and the idea out in front of hundreds of thousands of people who may not take action today, but will now be primed to see the problems that arise from a lack of access, and the opportunities that could be created through access.
Where now?
So what are our next steps? Continuing to gain signatures for the next two weeks is still important. This may be one of the most rapidly growing petitions but showing that continued growth is still valuable. But more generally my sense is that we need to take stock and look forward to the next phase. The really hard work of implementation is coming. As a movement we still disagree strongly on elements of tactics and strategy. The tactics I am less concerned about, we can take multiple paths, applying pressure at multiple points and this will be to our advantage. But I think we need a clearer goal on strategy. We need to articulate what the endgame is. What is the vision? When will we know that we have achieved what we set out to do?
Peter Murray-Rust has already quoted Churchill but it does seem apposite. “…this is not the end. This is not even the beginning of the end. But it is perhaps, the end of the beginning.”
We now know how much we can achieve when we work together with a shared goal. The challenge now is to harness that to a shared understanding of the direction of travel, if perhaps not the precise route. But if we, with all the diversity of needs and views that this movement contains, we can find the core of goals that we all agree on, then what we now know is that we have the capacity, the depth, and the strength to achieve them.
Changing the world is hard. Who knew? Advocating for change can be lonely. It can also be hard. As a scholar, particularly one at the start of a career it is still hard to commit fully to ensuring that research outputs are accessible and re-useable. But we are reaching a point where support for Open Access is mainstream, where there is a growing public interest in greater access to research, and increasingly serious engagement with the policy issues at the highest level.
The time has come to show just how strong that support is. As of today there is a petition on the Whitehouse site calling for the Executive to mandate Open Access to the literature generated from US Federal Funding. If the petition reaches 25,000 signatures within 30 days then the Whitehouse is committed to respond. The Executive has been considering the issues of access to research publications and data and with FRPAA active in both houses there are multiple routes available to enact change. If we can demonstrate widespread and diverse support for Open Access, then we will have made the case for that change. This is a real opportunity for each and everyone of us to make a difference.
So go to the Access2Research Petition on whitehouse.gov and sign up now. Blog and tweet using the hashtag #OAMonday and lets show just how wide the coalition is. Go to the Access2Research Website to learn more. Post the site link to your community to get people involved.
I’ll be honest. The Whitehouse petition site isn’t great – this isn’t a 30 second job. But it shouldn’t take you more than five minutes. You will need to give a real name and an email address and go through a validation process via email. You don’t need to be a US citizen or resident. Obviously if you give a US Zip code it is likely that more weight will be given to your signature but don’t be put off if you are not in the US. Once you have an account signing the petition is a simple matter of clicking a single button. The easiest approach will be to go to the Open Access petition and sign up for an account from there. Once you get the validation link via email you will be taken back to the petition.
The power of Open Access will only be unlocked through networks of people using, re-using, and re-purposing the outputs of research. The time has come to show just how broad and diverse that network is. Please take the time as one single supporter of Open Access to add your voice to the thousands of others who will be signing with you. And connect to your network to tell them how important it is for them to add their voice as well.
Yesterday David Willetts, the UK Science and Universities Minister gave a speech to the Publishers Association that has got wide coverage. However it is worth pulling apart both the speech and the accompanying opinion piece from the Guardian because there are some interesting elements in there, and also some things have got a little confused.
The first really key point is that there is nothing new here. This is basically a re-announcement of the previous position from the December Innovation Strategy on moving towards a freely accessible literature and a more public announcement of the Gateway to Research project previously mentioned in the RCUK response to the Innovation Statement.
The Gateway to Research project is a joint venture of the Department of Business Innovation and Skills and Research Councils UK to provide a one stop shop for information on UK research funding as well as pointers to outputs. It will essentially draw information directly from sources that already exist (the Research Outputs System and eVal) as well as some new ones with the intention of helping the UK public and enterprise find research and researchers that is of interest to them, and see how they are funded.
The new announcement was that Jimmy Wales of Wikipedia fame will be advising on the GTR portal. This is a good thing and he is well placed to provide both technical and social expertise on the provision of public facing information portals as well as providing a more radical perspective than might come out of BIS itself. While this might in part be cynically viewed as another example of bringing in celebrities to advise on policy this is a celebrity with relevant expertise and real credibility based on making similar systems work.
The rest of the information that we can gather relates to government efforts in moving towards making the UK research literature accessible. Wales also gets a look in here, and will be “advising us on [..] common standards to ensure information is presented in a readily reusable form”. My reading of this is that the Minister understands the importance of interoperability and my hope is that this will mean that government is getting good advice on appropriate licensing approaches to support this.
However, many have read this section of the speech as saying that GTR will act as some form of national repository for research articles. I do not believe this is the intention, and reading between the lines the comment that it will “provide direct links to actual research outputs such as data sets and publications” [my emphasis] is the key. The point of GTR is to make UK research more easily discoverable. Access is a somewhat orthogonal issue. This is better read as an expression of Willetts’ and the wider government’s agenda on transparency of public spending than as a mechanism for providing access.
What else can we tell from the speech? Well the term “open access” is used several times, something that was absent from the innovation statement, but still the emphasis is on achieving “public access” in the near term with “open access” cast as the future goal as I read it. It’s not clear to me whether this is a well informed distinction. There is a somewhat muddled commentary on Green vs Gold OA but not that much more muddled than what often comes from our own community. There are also some clear statements on the challenges for all involved.
As an aside I found it interesting that Willetts gave a parenthetical endorsement of usage metrics for the research literature when speaking of his own experience.
As well as reading some of the articles set by my tutors, I also remember browsing through the pages of the leading journals to see which articles were well-thumbed. It helped me to spot the key ones I ought to be familiar with – a primitive version of crowd-sourcing. The web should make that kind of search behaviour far easier.
This is the most sophisticated appreciation of the potential for the combination of measurement and usage data in discovery that I have seen from any politician. It needs to be set against his endorsement of rather cruder filters earlier in the speech but it nonetheless gives me a sense that there is a level of understanding within government that is greater than we often fear.
Much of the rest of the speech is hedging. Options are discussed but not selected and certainly not promoted. The key message: wait for the Finch Report which will be the major guide for the route the government will take and the mechanisms that will be put in place to support it.
But there are some clearer statements. There is a strong sense that Hargreave’s recommendations on enabling text mining should be implemented. And the logic for this is well laid out. The speech and the policy agenda is embedded in a framework of enabling innovation – making it clear what kinds of evidence and argument we will need to marshal in order to persuade. There is also a strong emphasis on data as well as an appreciation that there is much to do in this space.
But the clearest statement made here is on the end goals. No-one can be left in any doubt of Willetts’ ultimate target. Full access to the outputs of research, ideally at the time of publication, in a way that enables them to be fully exploited, manipulated and modified for any purpose by any party. Indeed the vision is strongly congruent with the Berlin, Bethesda, and Budapest declarations on Open Access. There is still much to be argued about the route and and its length, but in the UK at least, the destination appears to be in little doubt.
I attended the first Sage Bionetworks Congress in 2010 and it left a powerful impression on my thinking. I have just attended the third congress in San Francisco and again the challenging nature of views, the real desire to make a difference, and the standard of thinking in the room will take me some time to process. But a series of comments, and soundbites over the course of the meeting have made me realise just how seriously bad our situation is.
Attempts by a variety of big pharma to replicate disease relevant results published by academic labs failed in ~80% of cases (see for instance this story about this commentary in Nature[$])
When a particular blood cancer group was asked what factor about their disease was most to them, they said gastro-intestinal problems. No health professional had ever even considered this as a gastro-intestinal disease.
A cancer patient, advocate, and fundraiser of 25 years standing said the following to me: “We’ve been at this for 25 years, we’ve raised over $2B for research, and new patients today get the same treatment I did. What’s the point?”
In a room full of very smart people absolutely committed to making a difference there were very few new ideas on how we actually cut through the thicket of perverse incentives, institutional inertia, disregard for replicability, and personal ego-stroking which is perpetuating these problems. I’ve been uncertain for some time whether change from within our existing structures and systems is really possible. I’m leaning further and further to the view that it is not. That doesn’t mean that we can’t do anything – just that it may be more effective to simply bypass existing institutions to do it.
As a child I was very clear I wanted to be a scientist. I am not sure exactly where the idea came from. In part I blame Isaac Asimov but it must have been a combination of things. I can’t remember not having a clear idea of wanting to go into research.
I started off a conventional career with big ideas – understanding the underlying physics, chemistry, and information theory that limits molecular evolution – but my problem was always that I was interested in too many things. I kept getting distracted. Along with this I also started to wonder how much of a difference the research I was doing was really making. This led to a shift towards working on methods development – developing tools that would support many researchers to do better and more efficient work. In turn it lead to my current position, with the aim of developing the potential of neutron scattering as a tool for the biosciences. I got gradually more interested in the question of how to make the biggest difference I could, rather than just pursuing one research question.
And at the same time I was developing a growing interest in the power of the web and how it had the potential, as yet unrealized, to transform the effectiveness of the research community. This has grown from side interest to hobby to something like a full time job, on top of the other full time job I have. This wasn’t sustainable. At the same time I’ve realized I am pretty good at the strategy, advocacy, speaking and writing; at articulating a view of where we might go, and how we might get there. That in this space I can make a bigger difference. If we can increase the efficiency of research by just 5%, reduce the time for the developing world to bring a significant research capacity on stream by just a few years, give a few patients better access to information, or increase the wider public interest and involvement in science just a small amount, then this will be a far reader good than I could possibly make doing my own research.
PLoS is an organization that right from the beginning has had a vision, not just of making research papers more accessible but of transforming research communication, of making it ready for, making it of the 21st century. This is a vision I share and one that I am very excited to playing a part in.
In the new role I will obviously be doing a lot of advocacy, planning, speaking, and writing on open access. There is a lot to play for over the next few years with FRPAA in the US, new policies being developed in Europe, and a growing awareness of the need to think hard about data as a form of publication. But I will also be taking the long view, looking out on a ten year horizon to try and identify the things we haven’t seen yet, the opportunities that are already there and how we can navigate a path between them. Again there is huge potential in this space, gradually turning from ideas and vaporware into real demos and even products.
The two issues, near term policy and longer term technical development are inextricably linked. The full potential of networked research cannot be realized except in a world of open content, open standards, APIs, process, and data. Interoperability is crucial, technical interoperability, standards interoperability, social interoperability, and legal interoperability. It is being at the heart of the community that is working to link these together and make them work that really excites me about this position.
PLoS has been an engine of innovation since it was formed, changing the landscape of scholarly publishing in a way that no-one would have dreamed was possible. Some have argued that this hasn’t been so much the case in the last few years. But really things have just been quiet, plans have been laid, and I think you will find the next few years exciting.
Inevitably, I will be leaving some things behind. I won’t be abandoning research completely, I hope to keep my toe in a range of projects but I will be scaling back a lot. I will be stepping down as an Academic Editor for PLoS ONE (and apologies for all those reviews and editorial requests for PLoS ONE that I’ve turned down in the last few months) because this would be a clear conflict of interest. I’ve got a lot to clear up before July.
I will be sad to leave behind some of those roles but above all I am excited and looking forward to working in a great organisation, with people I respect doing things I believe are important. Up until now I’ve been trying to fit these things in, more or less as a hobby around the research. Now I can focus on them full time, while still staying at least a bit connected. It’s a big leap for me, but a logical step along the way to trying to make a difference.
…although some are perhaps starting to see the problems that are going to arise.
Last week I spoke at a Question Time style event held at Oxford University and organised by Simon Benjamin and Victoria Watson called “The Scientific Evolution: Open Science and the Future of Publishing” featuring Tim Gowers (Cambridge), Victor Henning (Mendeley), Alison Mitchell (Nature Publishing Group), Alicia Wise (Elsevier), and Robert Winston (mainly in his role as TV talking head on science issues). You can get a feel for the proceedings from Lucy Pratt’s summary but I want to focus on one specific issue.
As is common for me recently I emphasised the fact that networked research communication needs to be different to what we are used to. I made a comparison to the fact that when the printing press was developed one of the first things that happened was that people created facsimiles of hand written manuscripts. It took hundreds of years for someone to come up with the idea of a newspaper and to some extent our current use of the network is exactly that – digital facsimiles of paper objects, not truly networked communication.
It’s difficult to predict exactly what form a real networked communication system will take, in much the same way that asking a 16th century printer how newspaper advertising would work would not provide a detailed and accurate answer, but there are some principles of successful network systems that we can see emerging. Effective network systems distribute control and avoid centralisation, they are loosely coupled, and distributed. Very different to the centralised systems for control of access and control we have today.
This is a difficult concept and one that scholarly publishers simply don’t get for the most part. This is not particularly suprising because truly disruptive innovation rarely comes from incumbent players. Large and entrenched organisations don’t generally enable the kind of thinking that is required to see the new possibilities. This is seen in publishers statements that they are providing “more access than ever before” via “more routes”, but all routes that are under tight centralised control, with control systems that don’t scale. By insisting on centralised control over access publishers are setting themselves up to fail.
Nowhere is this going to play out more starkly than in the area of text mining. Bob Campbell from Wiley-Blackwell walked into this – but few noticed it – with the now familiar claim that “text mining is not a problem because people can ask permission”. Centralised control, failure to appreciate scale, and failure to understand the necessity of distribution and distributed systems. I have with me a device capable of holding the text of perhaps 100,000 papers It also has the processor power to mine that text. It is my phone. In 2-3 years our phones, hell our watches, will have the capacity to not only hold the world’s literature but also to mine it, in context for what I want right now. Is Bob Campbell ready for every researcher, indeed every interested person in the world, to come into his office and discuss an agreement for text mining? Because the mining I want to do and the mining that Peter Murray-Rust wants to do will be different, and what I will want to do tomorrow is different to what I want to do today. This kind of personalised mining is going to be the accepted norm of handling information online very soon and will be at the very centre of how we discover the information we need. Google will provide a high quality service for free, subscription based scholarly publishers will charge an arm and a leg for a deeply inferior one – because Google is built to exploit network scale.
The problem of scale has also just played out in fact. Heather Piwowar writing yesterday describes a call with six Elsevier staffers to discuss her project and needs for text mining. Heather of course now has to have this same conversation with Wiley, NPG, ACS, and all the other subscription based publishers, who will no doubt demand different conditions, creating a nightmare patchwork of different levels of access on different parts of the corpus. But the bit I want to draw out is at the bottom of the post where Heather describes the concerns of Alicia Wise:
At the end of the call, I stated that I’d like to blog the call… it was quickly agreed that was fine. Alicia mentioned her only hesitation was that she might be overwhelmed by requests from others who also want text mining access. Reasonable.
Except that it isn’t. It’s perfectly reasonable for every single person who wants to text mine to want a conversation about access. Elsevier, because they demand control, have set themselves up as the bottleneck. This is really the key point, because the subscription business model implies an imperative to extract income from all possible uses of the content it sets up a need for control of access for differential uses. This means in turn that each different use, and especially each new use, has to be individually negotiated, usually by humans, apparently about six of them. This will fail because it cannot scale in the same way that the demand will.
The technology exists today to make this kind of mass distributed text mining trivial. Publishers could push content to bit torrent servers and then publish regular deltas to notify users of new content. The infrastructure for this already exists. There is no infrastructure investment required. The problems that publishers raise of their servers not coping is one that they have created for themselves. The catch is that distributed systems can’t be controlled from the centre and giving up control requires a different business model. But this is also an opportunity. The publishers also save money  if they give up control – no more need for six people to sit in on each of hundreds of thousands of meetings. I often wonder how much lower subscriptions would be if they didn’t need to cover the cost of access control, sales, and legal teams.
We are increasingly going to see these kinds of failures. Legal and technical incompatibility of resources, contractual requirements at odds with local legal systems, and above all the claim “you can just ask for permission” without the backing of the hundreds or thousands of people that would be required to provide a timely answer. And that’s before we deal with the fact that the most common answer will be “mumble”. A centralised access control system is simply not fit for purpose in a networked world. As demand scales, people making legitimate requests for access will have the effect of a distributed denial of service attack. The clue is in the name; the demand is distributed. If the access control mechanisms are manual, human and centralised, they will fail. But if that’s what it takes to get subscription publishers to wake up to the fact that the networked world is different then so be it.
It’s one of those throw away lines, “Before we can talk about a github for science we really need to sort out a TCP/IP for scienceâ€, that’s geeky, sharp, a bit needly and goes down a treat on Twitter. But there is a serious point behind it. And its not intended to be dismissive of the ideas that are swirling around about scholarly communication at the moment either. So it seems worth exploring in a bit more detail.
The line is stolen almost wholesale from John Wilbanks who used it (I think) in the talk he gave at a Science Commons meetup in Redmond a few years back. At the time I think we were awash in “Facebooks for Science†so that was the target but the sentiment holds. As once was the case with Facebook and now is for Github, or Wikipedia, or StackOverflow, the possibilities opened up by these new services and technologies to support a much more efficient and effective research process look amazing. And they are. But you’ve got to be a little careful about taking the analogy too far.
If you look at what these services provide, particularly those that are focused on coding, they deliver commentary and documentation, nearly always in the form of text about code – which is also basically text. The web is very good at transferring text, and code, and data. The stack that delivers this is built on a set of standards, with each layer building on the layer beneath it. StackOverflow and Github are built on a set of services, that in turn sit on top of the web standards of http, which in turn are built on network standards like TCP/IP that control the actual transfer of bits and bytes.
The fundamental stuff of these coding sites and Wikipedia is text, and text is really well supported by the stack of web technologies. Open Source approaches to software development didn’t just develop because of the web, they developed the web so its not surprising that they fit well together. They grew up together and nurtured each other. But the bottom line is that the stack is optimized to transfer the grains of material, text and code, that make up the core of these services.
When we look at research we can see that when we dig down to the granular level it isn’t just made up of text. Sure most research could be represented as text but we don’t have the standardized forms to do this. We don’t have standard granules of research that we can transfer from place to place. This is because its complicated to transfer the stuff of research. I picked on TCP/IP specifically because it is the transfer protocol that supports moving bits and bytes from one place to another. What we need are protocols that support moving the substance of a piece of my research from one place to another.
Work on Research Objects [see also this paper], intended to be self-contained but useable pieces of research is a step in this direction, as are the developing set of workflow tools, that will ultimately allow us to describe and share the process by which we’ve transformed at least some parts of the research process into others. Laboratory recording systems will help us to capture and workflow-ify records of the physical parts of the research process. But until we can agree how to transfer these in a standardized fashion then I think it is premature to talk about Githubs for research.
Now there is a flip side to this, which is that where there are such services that do support the transfer of pieces of the research process we absolutely should be experimenting with them. But in most cases the type-case itself will do the job. Github is great for sharing research code and some people are doing terrific things with data there as well. But if it does the job for those kinds of things why do we need one for researchers? The scale that the consumer web brings, and the exposure to a much bigger community, is a powerful counter argument to building things ‘just for researchers’. To justify a service focused on a small community you need to have very strong engagement or very specific needs. By the time that a mainstream service has mindshare and researchers are using it, your chances of pulling them away to a new service just for them are very small.
So yes, we should be inspired by the possibilities that these new services open up, and we should absolutely build and experiment but while we are at it can we also focus on the lower levels of the stack?They aren’t as sexy and they probably won’t make anyone rich, but we’ve got to get serious about the underlying mechanisms that will transfer our research in comprehensible packages from one place to another.
We have to think carefully about capturing the context of research and presenting that to the next user. Github works in large part because the people using it know how to use code, can recognize specific languages, and know how to drive it. It’s actually pretty poor for the user who just wants to do something – we’ve had to build up another set of services at different levels, the Python Package Index, tools for making and distributing executables, that help provide the context required for different types of user. This is going to be much, much harder, for all the different types of use we might want to put research to.
But if we can get this right – if we can standardize transfer protocols and build in the context of the research into those ‘packets’ that lets people use it then what we have seen on the wider web will happen naturally. As we build the stack up these services that seem so hard to build at the moment will become as easy today as throwing up a blog, downloading a rubygem, or firing up a machine instance. If we can achieve that then we’ll have much more than a github for research, we’ll have a whole web for research.
There’s nothing new here that wasn’t written some time ago by John Wilbanks and others but it seemed worth repeating. In particular I recommend these posts [1, 2] from John.
A few weeks ago I attended a workshop run by the ESRC Genomics Forum in Edinburgh which brought together humanists, social scientists, and science focused folks with an interest in how open approaches can and should be applied to genomic science. This was interesting on a number of levels but I was especially interested in the comments of Marina Levina on citizenship. In particular she asked the question “what are the civic responsibilities of a network citizen?â€
Actually she asked me this question several times and it took me until quite late in the day to really understand what she meant. I initially answered with reference to Clay Shirky on the rise of creative contribution on the web as if just making stuff was all that a citizen need do but what Marina was getting at was a deeper question about a shared sense of responsibilities.
Citizenship as a concept is a vexed question and there are a range of somewhat incompatible philosophical approaches to describing and understanding it. For my purposes here I want to focus on citizenship as a sense of belonging to a group with shared values and resources, and rights to access those resources. Traditionally these allegiances lie with the nation state but, while nationalism is undeniably on the rise, there seems to be a growing group of us who have a patchwork of citizenships with different groups and communities.
Many of these communities live on the web and benefit from the use of the internet as a sort of commons. At the same time there has been a growing sense of behavioural norms and responsibilities in some parts of the social web: a sophisticated sense of identity, the responsibility to mark spam for takedown, a dedication to broad freedom of expression, perhaps even a growing understanding of the tensions between that freedom and “civiltyâ€.
In the context of research on the web we have often talked about the value of “norms†of behaviour as a far better mechanism for regulation than licences and legal documents. A sense of belonging to a community, of being a citizen, and the consequent risk of exclusion for bad behaviour is a powerful encouragement to adhere to those norms, even if that exclusion is just being shunned. Of course such enforcement can lead to negative consequences as well as positive but I would argue that in our day to day activities in most cases an element of social pressure has a largely positive effect.
A citizen has a responsibility to contribute to the shared resources that support the community. In a nation state we pay taxes, undertake jury duty, vote in elections. What are the contributions expected of a network citizen? Taking one step back, what are those shared resources? The internet and the underlying framework of the web are one set of resources. Of course these are resources that lie at the intersection of our traditional states, as physical and commercial resources, and our network society. In this context the protests against SOPA, PIPA, and ACTA might be seen as the citizens of the network attending a rally, perhaps even mobilizing our “military†if only to demonstrate their capacity.
But the core resources of the network are the nodes on the network and the connections between them. The people, information resources, and tools make up the nodes, and the links connecting them are what actually makes them usable. As citizens of the network our contribution is to make these links, to tend the garden of resources, to build tools. Above all our civic duty is to share.
It is a commonly made point that with digital resources being infinitely copyable there is no need for a tragedy of the commons. But there is a flip side to this – when we think of physical commons we often think of resources that don’t need active maintenance. As long as they are properly managed, not over-grazed or polluted, there is a sense that these physical commons will be ok. The digital commons requires constant maintenance. As an information resource it needs to be brought up to date. And with these constant updates the tools and resources need to be constantly checked for interoperability.
Maintaining these resources requires work. It requires money and it requires time. The active network citizen contributes to these resources, modifying content, adding links, removing vandalism. In exchange for this the active network citizen obtains influence – not dissimilar to getting to vote in elections – in those discussions about norms and behaviour. But the core civic duty is to share, with the expectation that other citizens, in their turn, will share back; that working together as a community the citizenry will build, maintain, and strengthen the civic institutions of the network.
This analysis scales beyond individual people to organizations. Wikipedia is an important civic institution of network, one that accepts a tithe from the active citizen in the form of time and eyeballs but which gives much back to the community in the form of links and high quality resources. Google accepts the links we make and gives back search results but isn’t always quite such a good citizen, breaking standards, removing the RSS feeds that could be used by others. Facebook? Well the less said the better. But good citizens will both take what they need from the pool of resources and contribute effectively back to the common institutions, those aggregation points for resources and tools that make the network an attractive place to live and work.
And I use “work†advisedly because a core piece of the value of the network is the ability for citizens to use it to do their jobs, for it to be a source of resources tools and expertise, that can be used by people to make a living. And the quid pro quo is that the good citizen contributes back resources that others might use to make money. In a viable community with a viable commons there will be money, or its equivalent, being generated and spent. A networked community will encourage its citizens to generate value because this floats all boats higher. In return for taking value out of the system the good citizen will contribute it back. But they will do this as a matter of principle, as part of their social contract, not because a legal document tells them to. Indeed requiring someone to do something actually reduces the sense of community, the valuing of good practice, that makes a healthy society.
When I first applied the ccZero waiver to this blog I didn’t really think deeply about what I was doing. I wanted to make a point. I wanted my work to be widely shared and I wanted to make it as easily shareable as I could. In retrospect I can see I was making a statement about the networked world I wanted to work in, one in which people actively participate in building a better network. I was making the point that I didn’t just want to consume and benefit from the content, links, and resources that other people had created, I wanted to give back. And I have benefited, commercially, in the form of consultancies and grants, and simply the opportunities that have opened up for me as a result of reading and conversing about the work of other people.
My current life and work would be unthinkable without the network and the value I have extracted from it. In return it is clear to me that I need to give back in the form of resources that others are free to use, and to exploit, even to make money off them. There may be a risk of enclosure, although I think it small, but my choice as a citizen is to be clear about what I expect of other citizens, not to attempt to enforce my beliefs about good behaviour through legal documents but through acting to build up and support the community of good citizens.
Dave White has talked and written about the distinction between visitors and residents in social networks, the experience they bring and the experience they have. I think there is a space, indeed a need, to recognize that there is another group beyond those who simply inhabit online spaces. Those of us who want to build a sustainable networked society should identify ourselves, our values, and our expectations of others. Our networked world needs citizens as well