I’m afraid I went to bed. It was getting on for midnight and it looked like another four hours or so before the petition would reach the magic mark of 25,000 signatures. As it turns out a final rush put us across the line at around 2am my time, but never mind, I woke up wondering whether we had got there, headed for the computer and had a pleasant surprise waiting for me.
What does this mean? What have John Wilbanks, Heather Joseph, Mike Carroll, and Mike Rossner achieved by deciding to push through what was a real hard slog? And what about all those people and groups involved in getting signatures in? I think there are maybe three major points here.
Access to Research is now firmly on the White House (and other governments’) agenda
The petition started as a result of a meeting between the Access2Research founders and John Holdren from the White House. John Wilbanks has written about how the meeting went and what the response was. The US administration has sympathy and understands many of the issues. However it must be hard to make the case that this was something worth the bandwidth it would take to drive a policy initiative. Especially in an election year. The petition and the mechanism of the “We the people” site has enabled us to show that it is a policy item that generates public interest, but more importantly it creates an opportunity for the White House to respond. It is worth noting that this has been one of the more successful petitions. Reaching the 25k mark in two weeks is a real achievement, and one that has got the attention of key people.
And that attention spreads globally as well. The Finch Report on mechanisms for improving access to UK research outputs will probably not mention the petition, but you can bet that those within the UK government involved in implementation will have taken note. Similarly as the details of the Horizon2020 programme within the EU are hammered out, those deciding on the legal instruments that will distribute around $80B, will have noted that there is public demand, and therefore political cover, to take action.
The Open Access Movement has a strong voice, and a diverse network, and can be an effective lobby
It is easy, as we all work towards the shared goal of enabling wider access and the full exploitation of web technologies, to get bogged down in details and to focus on disagreements. What this effort showed was that when we work together we can muster the connections and the network to send a very strong message. And that message is stronger for coming from diverse directions in a completely transparent manner. We have learnt the lessons that could be taken from the fight against SOPA and PIPA and refined them in the campaign to defeat, in fact to utterly destroy, the Research Works Act. But this was not a reaction, and it was not merely a negative campaign. This was a positive campaign, originating within the movement, which together we have successfully pulled off. There are lessons to be learnt. Things we could have done better. But what we now know is that we have the capacity to take on large scale public actions and pull them off.
The wider community wants access and has a demonstrated capacity to use it
There has in the past been an argument that public access is not useful because “they can’t possibly undertand it”, that “there is no demand for public access”. That argument has been comprehensively and permanently destroyed. It was always an arrogant argument, and in my view a dangerous one for those with a vested interest in ensuring continued public funding of research. The fact that it had strong parallels with the arguments deployed in the 18th and 19th centuries that colonists, or those who did not own land, or women, could not possibly be competent to vote should have been enough to warn people off using it. The petition has shown demand, and the stories that have surfaced through this campaign show not only that there are many people who are not professional researchers who can use research, but that many of these people also want, and are more than capable, to contribute back to the professional research effort.
The campaign has put the ideas of Open Access in front of more people than perhaps ever before. We have reached out to family, friends, co-workers, patients, technologists, entrepreneurs, medical practitioners, educators, and people just interested in the world around them. Perhaps one in ten of them actually signed the petition, but many of them will have talked to others, spreading the ideas. This is perhaps one of the most important achievements of the petition. Getting the message and the idea out in front of hundreds of thousands of people who may not take action today, but will now be primed to see the problems that arise from a lack of access, and the opportunities that could be created through access.
Where now?
So what are our next steps? Continuing to gain signatures for the next two weeks is still important. This may be one of the most rapidly growing petitions but showing that continued growth is still valuable. But more generally my sense is that we need to take stock and look forward to the next phase. The really hard work of implementation is coming. As a movement we still disagree strongly on elements of tactics and strategy. The tactics I am less concerned about, we can take multiple paths, applying pressure at multiple points and this will be to our advantage. But I think we need a clearer goal on strategy. We need to articulate what the endgame is. What is the vision? When will we know that we have achieved what we set out to do?
Peter Murray-Rust has already quoted Churchill but it does seem apposite. “…this is not the end. This is not even the beginning of the end. But it is perhaps, the end of the beginning.”
We now know how much we can achieve when we work together with a shared goal. The challenge now is to harness that to a shared understanding of the direction of travel, if perhaps not the precise route. But if we, with all the diversity of needs and views that this movement contains, we can find the core of goals that we all agree on, then what we now know is that we have the capacity, the depth, and the strength to achieve them.
Changing the world is hard. Who knew? Advocating for change can be lonely. It can also be hard. As a scholar, particularly one at the start of a career it is still hard to commit fully to ensuring that research outputs are accessible and re-useable. But we are reaching a point where support for Open Access is mainstream, where there is a growing public interest in greater access to research, and increasingly serious engagement with the policy issues at the highest level.
The time has come to show just how strong that support is. As of today there is a petition on the Whitehouse site calling for the Executive to mandate Open Access to the literature generated from US Federal Funding. If the petition reaches 25,000 signatures within 30 days then the Whitehouse is committed to respond. The Executive has been considering the issues of access to research publications and data and with FRPAA active in both houses there are multiple routes available to enact change. If we can demonstrate widespread and diverse support for Open Access, then we will have made the case for that change. This is a real opportunity for each and everyone of us to make a difference.
So go to the Access2Research Petition on whitehouse.gov and sign up now. Blog and tweet using the hashtag #OAMonday and lets show just how wide the coalition is. Go to the Access2Research Website to learn more. Post the site link to your community to get people involved.
I’ll be honest. The Whitehouse petition site isn’t great – this isn’t a 30 second job. But it shouldn’t take you more than five minutes. You will need to give a real name and an email address and go through a validation process via email. You don’t need to be a US citizen or resident. Obviously if you give a US Zip code it is likely that more weight will be given to your signature but don’t be put off if you are not in the US. Once you have an account signing the petition is a simple matter of clicking a single button. The easiest approach will be to go to the Open Access petition and sign up for an account from there. Once you get the validation link via email you will be taken back to the petition.
The power of Open Access will only be unlocked through networks of people using, re-using, and re-purposing the outputs of research. The time has come to show just how broad and diverse that network is. Please take the time as one single supporter of Open Access to add your voice to the thousands of others who will be signing with you. And connect to your network to tell them how important it is for them to add their voice as well.
Yesterday David Willetts, the UK Science and Universities Minister gave a speech to the Publishers Association that has got wide coverage. However it is worth pulling apart both the speech and the accompanying opinion piece from the Guardian because there are some interesting elements in there, and also some things have got a little confused.
The first really key point is that there is nothing new here. This is basically a re-announcement of the previous position from the December Innovation Strategy on moving towards a freely accessible literature and a more public announcement of the Gateway to Research project previously mentioned in the RCUK response to the Innovation Statement.
The Gateway to Research project is a joint venture of the Department of Business Innovation and Skills and Research Councils UK to provide a one stop shop for information on UK research funding as well as pointers to outputs. It will essentially draw information directly from sources that already exist (the Research Outputs System and eVal) as well as some new ones with the intention of helping the UK public and enterprise find research and researchers that is of interest to them, and see how they are funded.
The new announcement was that Jimmy Wales of Wikipedia fame will be advising on the GTR portal. This is a good thing and he is well placed to provide both technical and social expertise on the provision of public facing information portals as well as providing a more radical perspective than might come out of BIS itself. While this might in part be cynically viewed as another example of bringing in celebrities to advise on policy this is a celebrity with relevant expertise and real credibility based on making similar systems work.
The rest of the information that we can gather relates to government efforts in moving towards making the UK research literature accessible. Wales also gets a look in here, and will be “advising us on [..] common standards to ensure information is presented in a readily reusable form”. My reading of this is that the Minister understands the importance of interoperability and my hope is that this will mean that government is getting good advice on appropriate licensing approaches to support this.
However, many have read this section of the speech as saying that GTR will act as some form of national repository for research articles. I do not believe this is the intention, and reading between the lines the comment that it will “provide direct links to actual research outputs such as data sets and publications” [my emphasis] is the key. The point of GTR is to make UK research more easily discoverable. Access is a somewhat orthogonal issue. This is better read as an expression of Willetts’ and the wider government’s agenda on transparency of public spending than as a mechanism for providing access.
What else can we tell from the speech? Well the term “open access” is used several times, something that was absent from the innovation statement, but still the emphasis is on achieving “public access” in the near term with “open access” cast as the future goal as I read it. It’s not clear to me whether this is a well informed distinction. There is a somewhat muddled commentary on Green vs Gold OA but not that much more muddled than what often comes from our own community. There are also some clear statements on the challenges for all involved.
As an aside I found it interesting that Willetts gave a parenthetical endorsement of usage metrics for the research literature when speaking of his own experience.
As well as reading some of the articles set by my tutors, I also remember browsing through the pages of the leading journals to see which articles were well-thumbed. It helped me to spot the key ones I ought to be familiar with – a primitive version of crowd-sourcing. The web should make that kind of search behaviour far easier.
This is the most sophisticated appreciation of the potential for the combination of measurement and usage data in discovery that I have seen from any politician. It needs to be set against his endorsement of rather cruder filters earlier in the speech but it nonetheless gives me a sense that there is a level of understanding within government that is greater than we often fear.
Much of the rest of the speech is hedging. Options are discussed but not selected and certainly not promoted. The key message: wait for the Finch Report which will be the major guide for the route the government will take and the mechanisms that will be put in place to support it.
But there are some clearer statements. There is a strong sense that Hargreave’s recommendations on enabling text mining should be implemented. And the logic for this is well laid out. The speech and the policy agenda is embedded in a framework of enabling innovation – making it clear what kinds of evidence and argument we will need to marshal in order to persuade. There is also a strong emphasis on data as well as an appreciation that there is much to do in this space.
But the clearest statement made here is on the end goals. No-one can be left in any doubt of Willetts’ ultimate target. Full access to the outputs of research, ideally at the time of publication, in a way that enables them to be fully exploited, manipulated and modified for any purpose by any party. Indeed the vision is strongly congruent with the Berlin, Bethesda, and Budapest declarations on Open Access. There is still much to be argued about the route and and its length, but in the UK at least, the destination appears to be in little doubt.
I attended the first Sage Bionetworks Congress in 2010 and it left a powerful impression on my thinking. I have just attended the third congress in San Francisco and again the challenging nature of views, the real desire to make a difference, and the standard of thinking in the room will take me some time to process. But a series of comments, and soundbites over the course of the meeting have made me realise just how seriously bad our situation is.
Attempts by a variety of big pharma to replicate disease relevant results published by academic labs failed in ~80% of cases (see for instance this story about this commentary in Nature[$])
When a particular blood cancer group was asked what factor about their disease was most to them, they said gastro-intestinal problems. No health professional had ever even considered this as a gastro-intestinal disease.
A cancer patient, advocate, and fundraiser of 25 years standing said the following to me: “We’ve been at this for 25 years, we’ve raised over $2B for research, and new patients today get the same treatment I did. What’s the point?”
In a room full of very smart people absolutely committed to making a difference there were very few new ideas on how we actually cut through the thicket of perverse incentives, institutional inertia, disregard for replicability, and personal ego-stroking which is perpetuating these problems. I’ve been uncertain for some time whether change from within our existing structures and systems is really possible. I’m leaning further and further to the view that it is not. That doesn’t mean that we can’t do anything – just that it may be more effective to simply bypass existing institutions to do it.
It’s one of those throw away lines, “Before we can talk about a github for science we really need to sort out a TCP/IP for scienceâ€, that’s geeky, sharp, a bit needly and goes down a treat on Twitter. But there is a serious point behind it. And its not intended to be dismissive of the ideas that are swirling around about scholarly communication at the moment either. So it seems worth exploring in a bit more detail.
The line is stolen almost wholesale from John Wilbanks who used it (I think) in the talk he gave at a Science Commons meetup in Redmond a few years back. At the time I think we were awash in “Facebooks for Science†so that was the target but the sentiment holds. As once was the case with Facebook and now is for Github, or Wikipedia, or StackOverflow, the possibilities opened up by these new services and technologies to support a much more efficient and effective research process look amazing. And they are. But you’ve got to be a little careful about taking the analogy too far.
If you look at what these services provide, particularly those that are focused on coding, they deliver commentary and documentation, nearly always in the form of text about code – which is also basically text. The web is very good at transferring text, and code, and data. The stack that delivers this is built on a set of standards, with each layer building on the layer beneath it. StackOverflow and Github are built on a set of services, that in turn sit on top of the web standards of http, which in turn are built on network standards like TCP/IP that control the actual transfer of bits and bytes.
The fundamental stuff of these coding sites and Wikipedia is text, and text is really well supported by the stack of web technologies. Open Source approaches to software development didn’t just develop because of the web, they developed the web so its not surprising that they fit well together. They grew up together and nurtured each other. But the bottom line is that the stack is optimized to transfer the grains of material, text and code, that make up the core of these services.
When we look at research we can see that when we dig down to the granular level it isn’t just made up of text. Sure most research could be represented as text but we don’t have the standardized forms to do this. We don’t have standard granules of research that we can transfer from place to place. This is because its complicated to transfer the stuff of research. I picked on TCP/IP specifically because it is the transfer protocol that supports moving bits and bytes from one place to another. What we need are protocols that support moving the substance of a piece of my research from one place to another.
Work on Research Objects [see also this paper], intended to be self-contained but useable pieces of research is a step in this direction, as are the developing set of workflow tools, that will ultimately allow us to describe and share the process by which we’ve transformed at least some parts of the research process into others. Laboratory recording systems will help us to capture and workflow-ify records of the physical parts of the research process. But until we can agree how to transfer these in a standardized fashion then I think it is premature to talk about Githubs for research.
Now there is a flip side to this, which is that where there are such services that do support the transfer of pieces of the research process we absolutely should be experimenting with them. But in most cases the type-case itself will do the job. Github is great for sharing research code and some people are doing terrific things with data there as well. But if it does the job for those kinds of things why do we need one for researchers? The scale that the consumer web brings, and the exposure to a much bigger community, is a powerful counter argument to building things ‘just for researchers’. To justify a service focused on a small community you need to have very strong engagement or very specific needs. By the time that a mainstream service has mindshare and researchers are using it, your chances of pulling them away to a new service just for them are very small.
So yes, we should be inspired by the possibilities that these new services open up, and we should absolutely build and experiment but while we are at it can we also focus on the lower levels of the stack?They aren’t as sexy and they probably won’t make anyone rich, but we’ve got to get serious about the underlying mechanisms that will transfer our research in comprehensible packages from one place to another.
We have to think carefully about capturing the context of research and presenting that to the next user. Github works in large part because the people using it know how to use code, can recognize specific languages, and know how to drive it. It’s actually pretty poor for the user who just wants to do something – we’ve had to build up another set of services at different levels, the Python Package Index, tools for making and distributing executables, that help provide the context required for different types of user. This is going to be much, much harder, for all the different types of use we might want to put research to.
But if we can get this right – if we can standardize transfer protocols and build in the context of the research into those ‘packets’ that lets people use it then what we have seen on the wider web will happen naturally. As we build the stack up these services that seem so hard to build at the moment will become as easy today as throwing up a blog, downloading a rubygem, or firing up a machine instance. If we can achieve that then we’ll have much more than a github for research, we’ll have a whole web for research.
There’s nothing new here that wasn’t written some time ago by John Wilbanks and others but it seemed worth repeating. In particular I recommend these posts [1, 2] from John.
Prior to all the nonsense with the Research Works Act, I had been having a discussion with Heather Morrison about licenses and Open Access and peripherally the principle of requiring specific licenses of authors. I realized then that I needed to lay out the background thinking that leads me to where I am. The path that leads me here is one built on a technical understanding of how networks functional and what their capacity can be. This builds heavily on the ideas I have taken from (in no particular order) Jon Udell, Jonathan Zittrain, Michael Nielsen, Clay Shirky, Tim O’Reilly, Danah Boyd, and John Wilbanks among many others. Nothing much here is new but it remains something that very few people really get. Ironically the debate over the Research Works Act is what helped this narrative crystallise. This should be read as a contribution to Heather’s suggested “Articulating the Commons” series.
A pragmatic perspective
I am at heart a pragmatist. I want to see outcomes, I want to see evidence to support the decisions we make about how to get outcomes. I am happy to compromise, even to take tactical steps in the wrong direction if they ultimately help us to get where we need to be. In the case of publicly funded research we need to ensure that the public investment in research is made in such a way that it maximizes those outcomes. We may not agree currently on how to prioritize those outcomes, or the timeframe they occur on. We may not even agree that we can know how best to invest. But we can agree on the principle that public money should be effectively invested.
Ultimately the wider global public is for the most part convinced that research is something worth investing in, but in turn they expect to see outcomes of that research, jobs, economic activity, excitement, prestige, better public health, improved standards of living. The wider public are remarkably sophisticated when it comes to understanding that research may take a long time to bear fruit. But they are not particularly interested in papers. And when they become aware of academia’s obsession with papers they tend to be deeply unimpressed. We ignore that at our peril.
So it is important that when we think about the way we do research, that we understand the mechanisms and the processes that lead to outcomes. Even if we can’t predict exactly where outcomes will spring from (and I firmly believe that we cannot) that does not mean that we can avoid the responsibility of thoughtfully designing our systems so as to maximize the potential for innovation. The fact that we cannot, literally cannot under our current understanding of physics, follow the path of an electron through a circuit does not meant that we cannot build circuits with predictable overall behaviour. You simply design the system at a different level.
The assumptions underlying research communication have changed
So why are we having this conversation? And why now? What is it about today’s world that is so different? The answer, of course, is the internet. Our underlying communications and information infrastructure is arguably undergoing its biggest change since the development of the Gutenberg’s press. Like all developments of new communication networks, SMS, fixed telephones, the telegraph, the railways, and writing itself, the internet doesn’t just change how well we can do things, it qualitatively changes what we can do. To give a seemingly trivial example the expectations and possibilities of a society with mobile telephones is qualitatively different and their introduction has changed the way we behave and expect others to behave. The internet is a network on a scale, and with connectivity, that we have never had before. The potential change in our capacity as individuals, communities, and societies is therefore immense.
Why do networks change things? Before a network technology spreads you can imagine people, largely separated from each other, unable to communicate in this new way. As you start to make connections nothing much really happens, a few small groups can start to communicate in this new way, but that just means that they can do a few things a bit better. But as more connections form suddenly something profound happens. There comes a point where there is a transition – where suddenly nearly everyone is connected. For the physical scientists this is in fact a phase transition and can display extreme cooperativity – a sudden break where the whole system crystallizes into a new state.
At this point the whole is suddenly greater than the sum of its parts. Suddenly there is the possibility of coordination, of distribution of tasks that was simply not possible before. The internet simply does this better than any other network we have ever had. It is better for a range of reasons but they key ones are: its immense scale – connecting more people, and now machines than any previous network; its connectivity – the internet is incredibly densely connected, essentially enabling any computer to speak to any other computer globally; its lack of friction – transfer of information is very low cost, essentially zero compared to previous technologies, and is very very easy. Anyone with a web browser can point and click and be a part of that transfer.
What does this mean for research?
So if the internet and the web bring new capacity, where is the evidence that this is making a difference? If we have fundamentally new capacity where are the examples of that being exploited? I will give two examples, both very familiar to many people now, but ones that illustrate what can be achieved.
In late January 2009 Tim Gowers, a Fields medalist and arguably one of the worlds greatest living mathematicians, posed a question. Could a group of mathematicians working together be better at solving a problem than one on their own. He suggested a problem, one that he had an idea how to solve but felt was too challenging to tackle on his own. He then started to hedge his bets, stating:
“It is not the case that the aim of the project is [to solve the problem but rather it is to see whether the proposed approach was viable] I think that the chances of success even for this more modest aim are substantially less than 100%.â€
A loose collection of interested parties, some world leading mathematicians, others interested but less expert, started to work on the problem. Six weeks later Gower’s announced that he believed the problem solved:
“I hereby state that I am basically sure that the problem is solved (though not in the way originally envisaged).â€
In six weeks a non-planned assortment of contributors had solved a problem that a world-leading mathematician had thought both interesting, and too hard. And had solved it by a route other than the one he had originally proposed. Gower’s commented:
“It feels as though this is to normal research as driving is to pushing a car.â€
For one of the world’s great mathematicians, there was a qualitative difference in what was possible when a group of people with the appropriate expertise were connected via a network through which they could easily and effectively transmit ideas, comments, and proposals. Three key messages emerge, the scale of the network was sufficient to bring the required resources to bear, the connectivity of the network was sufficient that work could be divided effectively and rapidly, and there was little friction in transferring ideas.
The Galaxy Zoo project arose out of a different kind of problem at at different kind of scale. One means of testing theories of the history and structure of the universe is to look at the numbers and types of different categories of galaxy in the sky. Images of the sky are collected and made freely available to the community. Researchers will then categories galaxies by hand to build up data sets to allow them to test theories. An experienced researcher could perhaps classify a hundred galaxies in a day. A paper might require a statistical sample of around 10,000 galaxy classifications to get past peer review. One truly heroic student classified 50,000 galaxies within their PhD, declaring at the end that they would never classify another again.
However problems were emerging. It was becoming clear that the statistical power offered by even 10,000 galaxies was not enough. One group would get different results to another. More classifications were required. Data wasn’t the problem. The Sloan Digital Sky Survey had a million galaxy images. But computer based image categorization wasn’t up to the job. The solution? Build a network. In this case a network of human participants willing to contribute by categorizing the galaxies. Several hundred thousand people classified the millions of images several times over in a matter of months. Again the key messages: scale of the network – both the number of images and the number of participants; the connectivity of the network – the internet made it easy for people to connect and participate; a lack of friction – sending images one way, and a simple classification was easy. Making the website easy, even fun, for people to use was a critical part of the success.
Galaxy Zoo changed the scale of this kind of research. It provided a statistical power that was unheard of and made it possible to ask fundamentally new types of questions. It also enabled fundamentally new types of people to play an effective role in the research, school children, teachers, full time parents. It enabled qualitatively different research to take place.
So why hasn’t the future arrived then?
These are exciting stories, but they remain just that. Sure I can multiply examples but they are still limited. We haven’t yet taken real advantage of the possibilities. There are lots of reasons for this but the fundamental one is inertia. People within the system are pretty happy for the most part with how it works. They don’t want to rock the boat too much.
But there are a group of people who are starting to be interested in rocking the boat. The funders, the patient groups, that global public who want to see outcomes. The thought process hasn’t worked through yet, but when it does they will all be asking one question. “How are you building networks to enable researchâ€. The question may come in many forms – “How are you maximizing your research impact?†– “What are you doing to ensure the commercialization of your research?†– “Where is your research being used?†– but they all really mean the same thing. How are you working to make sure that the outputs of your research are going into the biggest, most connected, lowest friction, network that they possibly can.
As service providers, all of those who work in this industry – and I mean all, from the researchers to the administrators, to the publishers to the librarians – will need to have an answer. The suprising thing is that it’s actually very easy. The web makes building and exploiting networks easier than it has ever been because it is a network infrastructure. It has scale, billions of people – billions of computers – exabytes of information resources – exaflops of computational resources. It has connectivity on a scale that is literally unimaginable – the human mind can’t conceive of that number of connections because the web has more. It is incredibly low in friction – the cost of information transfer is in most cases so close to zero as to make no difference.
Service requirements
To exploit the potential of the network all we need to do is get as much material online as fast as we can. We need to connect it up, to make it discoverable, to make sure that people can find and understand and use it. And we need to ensure that once found those resources can be easily transferred, shared, and used. And used in any way – at network scale the system is designed to ensure  that resources get used in unexpected ways. At scale you can have serendipity by design, not by blind luck.
The problem arises with the systems we have in place to get material online. The raw material of science is not often in a state where putting it online is immediately useful. It needs checking, formatting, testing, indexing. All of this does require real work, and real money. So we need services to do this, and we need to be prepared to pay for those services. The trouble is our current system has this backwards. We don’t pay directly for those services so those costs have to be recouped somehow. And the current set of service providers do that by producing the product that we really need and want and then crippling it.
Currently we take raw science and through a collaborative process between researchers and publishers we generate a communication product, generally a research paper, that is what most of the community holds as the standard means by which they wish to receive information. Because the publishers receive no direct recompense for their contribution they need to recover those costs by other means. They do this by artificially introducing friction and then charging to remove it.
This is a bad idea on several levels. Firstly because it means the product we get doesn’t have the maximum impact it could, because its not embedded in the largest possible network. From a business perspective it creates risks, publishers have to invest up front and then recoup money later, rather than being confident that expenditure and cash flow are coupled. This means, for instance that if there is a sudden rise (or fall) in the number of submissions there is no guarantee that cash flows or costs will scale with that change. But the real problem is that it distorts the market. Because on the researcher side we don’t pay for the product of effective communication we don’t pay much attention to what we’re getting. On the publisher side it drives a focus on surface and presentation, because it enhances the product in the current purchasers eyes, rather than a ruthless focus on production costs and shareability.
Network Ready Research Communication
If we care about taking advantage of the web and internet for research then we must tackle the building of scholarly communication networks. These networks will have those critical characteristics described above, scale and a lack of friction. The question is how do we go about building them. In practice we actually already have a network at huge scale – the web and the internet do that job for us, connecting essentially all professional researchers and a large proportion of the interested public. There is work to be done on expanding the reach of the network but this is a global development goal, not something specific to research.
So if we already have the network then what is the problem? The issue lies in the second characteristic – friction. Our current systems are actually designed to create friction. Before the internet was in place our network was formed of a distribution system involving trucks and paper – reducing costs to reasonable levels meant charging for that distribution process. Today those distribution costs have fallen to as near zero as makes no difference, yet we retain the systems that add friction unnecessarily. Slow review processes, charging for access, formats and discovery tools that are no longer fit for purpose.
What we need to do is focus on the process of taking research we that we do and convert it into a Network Ready form. That is we need to have access to the services that take our research and make them ready to exploit our network infrastructure – or we need to do it ourselves. What does “Network Ready†mean? A piece of Network Ready Research will be modular and easily discoverable, it will present different facets that will allow people and systems to use it in a wide variety of ways, it will be compatible with the widest range of systems and above all it will be easily shareable. Not just copyable or pasteable but easily shared through multiple systems while carrying with it all the context required to make use of it, all the connections that will allow a user to dive deeper into its component parts.
Network Ready Research will be interoperable, socially, technically, and legally with the rest of the network. The network is more than just technical infrastructure. It is also built up from the social connections, a shared understanding of the parameters of re-use, and a compatible system of checks and balances. The network is the shared set of technical and social connections that together enable new connections to be made. Network Ready Research will move freely across that, building new connections as it goes, able to act as both connecting edge and connected node in different contexts.
Building and strengthening the network
If you believe the above, as I do, then you see a potential for us to qualitatively change our capacity as a society to innovate, understand our world, and help to make it a better place. That potential will be best realized by building the largest possible, most effective, and lowest friction network possible. A networked commons in which ideas and data, concepts and expertise can be most easily shared, and can most easily find the place where they can do the most good.
Therefore the highest priority is building this network, making its parts and components interoperable, and making it as easy as possible to connect up networks that already exist. For an agency that funds research and seeks to ensure that research makes a difference the only course of action is to place the outputs of that research where they are most accessible on the network. In blunt terms that means three things: free at the point of access, technically interoperable with as many systems as possible, and free to use for any purpose. The key point is that at network scale the most important uses are statistically likely to be unexpected uses. We know we can’t predict the uses, or even success, of much research. That means we must position it so it can be used in unexpected ways.
Ultimately, the bigger the commons, the bigger the network, the better. And the more interoperable and the widest range of uses the better. That ultimately is why I argue for liberal licences, for the exclusion of non-commercial terms. It is why I use ccZero on this blog and for software that I write where I can. For me, the risk of commercial enclosure is so much smaller than the risk of not building the right networks, or of creating fragmented incompatible networks, of ultimately not being able to solve the crises we face today in time to do any good, that the course of action is clear. At the same time we need to build up the social interoperability of the network, to call out bad behavior and perhaps in some cases to isolate its perpetrators but we need to find ways of doing this that don’t damage the connectivity and freedom of movement on the network. Legal tools are useful to assure users of interoperability and their rights, otherwise they just become a source of friction. Social tools are a more viable route for encouraging desirable behaviour.
The priority has to be achieving scale and lowering friction. If we can do this then we have the potential to create a qualitative jump in our research capacity on a scale not seen since the 18th century and perhaps never. And it certainly feels like we need it.
When the history of the Research Works Act, and the reaction against it, is written that history will point at the factors that allowed smart people with significant marketing experience to walk with their eyes wide open into the teeth of a storm that thousands of people would have predicted with complete confidence. That story will detail two utterly incompatible world views of scholarly communication. The interesting thing is that with the benefit of hindsight both will be totally incomprehensible to the observer from five or ten years in the future. It seems worthwhile therefore to try and detail those world views as I understand them.
The scholarly publisher
The publisher world view places them as the owner and guardian of scholarly communications. While publishers recognise that researchers provide the majority of the intellectual property in scholarly communication, their view is that researchers willingly and knowingly gift that property to the publishers in exchange for a set of services that they appreciate and value. In this view everyone is happy as a trade is carried out in which everyone gets what they want. The publisher is free to invest in the service they provide and has the necessary rights to look after and curate the content. The authors are happy because they can obtain the services they require without having to pay cash up front.
Crucial to this world view is a belief that research communication, the process of writing and publishing papers, is separate to the research itself. This is important because otherwise it would be clear that, at least in an ethical sense, that the writing of papers would be work for hire for the funders – and part and parcel of the contract of research. For the publishers the fact that no funding contract specifies that “papers must be published” is the primary evidence of this.
The researcher
The researcher’s perspective is entirely different. Researchers view their outputs as their own property, both the ideas, the physical outputs, and the communications. Within institutions you see this in the uneasy relationship between researchers and research translation and IP exploitation offices. Institutions try to avoid inflaming this issue by ensuring that economic returns on IP go largely to the researcher, at least until there is real money involved. But at that stage the issue is usually fudged as extra investment is required which dilutes ownership. But scratch a researcher who has gone down the exploitation path and then pushed gently aside and you’ll get a feel for the sense of personal ownership involved.
Researchers have a love-hate relationship with papers. Some people enjoy writing them, although I suspect this is rare. I’ve never met any researcher who did anything but hate the process of shepherding a paper through the review process. The service, as provided by the publisher, is viewed with deep suspicion. The resentment that is often expressed by researchers for professional editors is primarily a result of a loss of control over the process for the researcher and a sense of powerlessness at the hands of people they don’t trust. The truth is that researchers actually feel exactly the same resentment for academic editors and reviewers. They just don’t often admit it in public.
So from a researcher’s perspective, they have spent an inordinate amount of effort on a great paper. This is their work, their property. They are now obliged to hand over control of this to people they don’t trust to run a process they are unconvinced by. Somewhere along the line they sign something. Mostly they’re not too sure what that means, but they don’t give it much thought, let alone read it. But the idea that they are making a gift of that property to the publisher is absolute anathema to most researchers.
To be honest researchers don’t care that much about a paper once its out. It caused enough pain and they don’t ever want to see it again. This may change over time if people start to cite it and refer to it in supportive terms but most people won’t really look at a paper again. It’s a line on a CV, a notch on the bedpost. What they do notice is the cost, or lack of access, to other people’s papers. Library budgets are shrinking, subscriptions are being chopped, personal subscriptions don’t seem to be affordable any more.
The first response to this when researchers meet is “why can’t we afford access to our work?” The second is, given the general lack of respect for the work that publishers do, is to start down the process of claiming that they could do it better. Much of the rhetoric around eLife as a journal “led by scientists” is built around this view. And a lot of it is pure arrogance. Researchers neither understand, nor appreciate for the most part, the work of copyediting and curation, layout and presentation. While there are tools today that can do many of these things more cheaply there are very few researchers who could use them effectively.
The result…kaboom!
So the environment that set the scene for the Research Works Act revolt was a combination of simmering resentment amongst researchers for the cost of accessing the literature, combined with a lack of understanding of what it is publishers actually do. The spark that set it off was the publisher rhetoric about ownership of the work. This was always going to happen one day. The mutually incompatible world views could co-exist while there was still enough money to go around. While librarians felt trapped between researchers who demanded access to everything and publishers offering deals that just about meant they could scrape by things could continue.
Fundamentally once publishers started publicly using the term “appropriation of our property” the spark had flown. From the publisher perspective this makes perfect sense. The NIH mandate is a unilateral appropriation of their property. From the researcher perspective it is a system that essentially adds a bit of pressure to do something that they know is right, promote access, without causing them too much additional pain. Researchers feel they ought to be doing something to improve acccess to research output but for the most part they’re not too sure what, because they sure as hell aren’t in a position to change the journals they publish in. That would be (perceived to be) career suicide.
The elephant in the room
But it is of course the funder perspective that we haven’t yet discussed and looking forward, in my view it is the action of funders that will render both the publisher and researcher perspective incomprehensible in ten years time. The NIH view, similar to that of the Wellcome Trust, and indeed every funder I have spoken to, is that research communication is an intrinsic part of the research they fund. Funders take a close interest in the outputs that their research generates. One might say a proprietorial interest because again, there is a strong sense of ownership. The NIH Mandate language expresses this through the grant contract. Researchers are required to grant to the NIH a license to hold a copy of their research work.
In my view it is through research communication that research has outcomes and impact. From the perspective of a funder their main interest is that the research they fund generates those outcomes and impacts. For a mission driven funder the current situation signals one thing and it signals it very strongly. Neither publishers, nor researchers can be trusted to do this properly. What funders will do is move to stronger mandates, more along the Wellcome Trust lines than the NIH lines, and that this will expand. At the end of the day, the funders hold all the cards. Publishers never really did have a business model, they had a public subsidy. The holders of those subsidies can only really draw one conclusion from current events. That they are going to have to be much more active in where they spend it to successfully perform their mission.
The smart funders will work with the pre-existing prejudice of researchers, probably granting copyright and IP rights to the researchers, but placing tighter constraints on the terms of forward licensing. That funders don’t really need the publishers has been made clear by HHMI, Wellcome Trust, and the MPI. Publishing costs are a small proportion of their total expenditure. If necessary they have the resources and will to take that in house. The NIH has taken a similar route though technically implemented in a different way. Other funders will allow these experiments to run, but ultimately they will adopt the approaches that appear to work.
Bottom line: Within ten years all major funders will mandate CC-BY Open Access on publication arising from work they fund immediately on publication. Several major publishers will not survive the transition. A few will and a whole set of new players will spring up to fill the spaces. The next ten years look to be very interesting.
In my last post on scholarly publishers that support the US Congress SOPA bill I ended up making a series of edits. It was pointed out to me that the Macmillan listed as a supporter is not the Macmillan that is the parent group of Nature Publishing Group but a separate U.S. subsidiary of the same ultimate holding company, Holtzbrinck. As I dug further it became clear that while only a small number of scholarly publishers were explicitly and publicly supporting SOPA, many of them are members of the Association of American Publishers, which is listed publicly as a supporter.
This is a little different to directly supporting the act. The AAP is a membership organisation that represents its members (including Nature Publishing Group, Oxford University Press, Wiley Blackwell and a number of other familiar names, see the full list at the bottom) to – amongst others – the U.S. government. Not all of its positions would necessarily be held by all its members. However, neither have any of those members come out and publicly stated that they disagree with the AAP position. In another domain Kaspersky software quit the Business Software Alliance over the BSA’s support of SOPA, even after the BSA withdrew its support.
I was willing to give AAP members some benefit of the doubt, hoping that some of them might come out publicly against SOPA. But if that was the hope then the AAP have just stepped over the line. In a spectacularly disingenuous press release the AAP claims significant credit for a new act just submitted to the U.S. Congress. This, in a repeat of some previous efforts, would block any efforts on the part of U.S. federal agencies to enact open access policies, even to the extent of blocking them from continuing to run the spectacularly successful PubMedCentral. That this comes days before the deadline for a request for information on the development of appropriate and balanced policies that would support access to the published results of U.S. taxpayer-funded research is a calculated political act, an abrogation of any principled stance, and clear signal of a lack of any interest in a productive discussion on how to move scholarly communications forward into a networked future.
I was willing to give AAP members some space. Not any more. The time has come to decide whether you want to be part of the future of research communication or whether you want to legislate to try and stop that future happening. You can be part of that future or you can be washed into the past. You can look forward or you can be part of a political movement working to rip off the taxpayers and charitable donors of the world. Remember that the profits alone of Elsevier and Springer (though I should be cutting Springer a little slack as they’re not on the AAP list – the one on the list is a different Springer) could fund the publication of every paper in the world in PLoS ONE. Remember that the cost of putting a SAGE article on reserve for a decent sized class or of putting a Taylor and Francis monograph on reserve for a more modest sized oneat one university is more than it would cost to publish them in most BioMedCentral journals and make them available to all.
Ultimately this legislation is irrelevant – the artificial level of current costs of publication and the myriad of additional charges that publishers make for this, that, and the other (Colour charges? Seriously?) will ultimately be destroyed. The current inefficiencies and inflated markups cannot be sustained. The best legislation can do is protect them for a little longer, at the cost of damaging the competitiveness of the U.S. as a major player in global research. With PLoS ONE rapidly becoming a significant proportion of the world’s literature on its own and Nature and Science soon to be facing serious competition at the top end from an OA journal backed by three of the most prestigious funders in the world, we are moving rapidly towards a world where publishing in a subscription journal will be foolhardy at best and suicidal for researchers in many fields. This act is ultimately a pathetic rearguard action and a sign of abject failure.
But for me it is also a sign that the rhetoric of being supportive of a gradual managed change to our existing systems, a plausible argument for such organisations to make, is dead for those signed up to the AAP. Publishers have a choice – lobby and legislate to preserve the inefficient, costly, and largely ineffective status quo – or play a positive part in developing the future.
I don’t expect much; to be honest I expect deafening silence as most publishers continue to hope that most researchers will be too buried in their work to notice what is going on around them. But I will continue to hope that some members of that list, the organisations that really believe that their core mission is to support the most effective research communication – not that those are just a bunch of pretty words that get pulled out from time to time – will disavow the AAP position and commit to a positive and open discussion about how we can take the best from the current system and make it work with the best we can with the technology available. A positive discussion about managed change that enables us to get where we want to go and helps to make sure that we reap the benefits when we get there.
This bill is self-defeating as legislation but as a political act it may be effective in the short term. It could hold back the tide for a while. But publishers that support it will ultimately get wiped out as the world moves on and they spend so time pushing back the tide that they miss the opportunity to catch up. Publishers who move against the bill have a role to play in the future and are the ones with enough insight to see the way the world is moving. And those publishers who sit on the sidelines? They don’t have the institutional capability to take the strategic decisions required to survive. Choose.
On the 8th December David Willetts, the Minister of State for Universities and Science, and announced new UK government strategies to develop innovation and research to support growth. The whole document is available online and you can see more analysis at the links at the bottom of the post.  A key aspect for Open Access advocates was the section that discussed a wholesale move by the UK to an author pays system to freely accessible research literature with SCOAP3 raised as a possible model. The report refers not to Open Access, but to freely accessible content. I think this is missing a massive opportunity for Britain to take a serious lead in defining the future direction of scholarly communication. That’s the case I attempt to lay out in this open letter. This post should be read in the context of my usual disclaimer.
Minister of State for Universities and Science
Department of Business Innovation and Skills
Dear Mr Willetts,
I am writing in the first instance to congratulate you on your stance on developing routes to a freely accessible research outputs. I cannot say I am a great fan of many current government positions and I might have wished for greater protection of the UK science budget but in times of resource constraint for research I believe your focus on ensuring the efficiency of access to and exploitation of research outputs in its widest sense is the right one.
The position you have articulated offers a real opportunity for the UK to take a lead in this area. But along with the opportunities there are risks, and those risks could entrench existing inefficiencies of our scholarly communication system. They could also reduce the value for money that the public purse, and it will be the public purse one way or another, gets for its investment. In our current circumstances this would be unfortunate. I would therefore ask you to consider the following as the implementation pathway for this policy is developed.
Firstly, the research community will be buying a service. This is a significant change from the current system where the community buys a product, the published journal. The purchasing exercise should be seen in this light and best practice in service procurement applied.
Secondly the nature of this service must be made clear. The service that is being provided must provide for any and all downstream uses, including commercial use, text mining, indeed any use that might developed at some point in the future. We are paying for this service and we must dictate its terms. Incumbent publishers will say in response that they need to retain commercial rights, or text mining rights, to ensure their viability, as indeed they have done in response to the Hargreaves Review.
This, not to put to fine a point on it, is hogwash. PLoS and BioMedCentral, both operate financially viable operations in which no downstream rights beyond that of appropriate attribution are retained by the publishers and where the author charges are lower in price then many of the notionally equivalent, but actually far more limited, offerings of more traditional publishers. High quality scholarly communication can be supported by reasonable author charges without any need for publishers to retain rights beyond those protected by their trademarks. An effective market place could therefore be expected to bring the average costs of this form of scholarly communications down.
The reason for supporting a system that demands that any downstream use of the communication be enabled is that we need innovation and development within the publishing systems well as innovation and development as a result of its content. Our scholarship is currently being held back by a morass of retained rights that prevent the development of research projects, of new technology startups and potentially new industries. The government consultation document of 14 December on the Hargreaves report explicitly notes that enabling downstream uses of content, and scholarly content in particular, can support new economic activity. It can also support new scholarly activity. The exploitation of our research outputs requires new approaches to indexing, mining, and parsing the literature. The shame of our current system is that much of this is possible today. The technology exists but is prevented from being exploited at scale by the logistical impossibility of clearing the required rights. These new approaches will require money and it is entirely appropriate, indeed desirable, that some of this work therefore occurs in the private sector. Experimentation will require both freedom to act as well as freedom to develop new business models. Our content and its accessibility and its reusability must support this.
Finally I ask you to look beyond the traditional scholarly publishing industry to the range of experimentation that is occurring globally in academic spaces, non-profits, and commercial endeavours. The potential leaps in functionality as well as the potential cost reductions are enormous. We need to work to encourage this experimentation and develop a diverse and vibrant market which both provides the quality assurance and stability that we are used to while encouraging technical experimentation and the improvement of business models. What we don’t need is a five or ten year deal that cements in existing players, systems, and practices.
Your government’s philosophy is based around the effectiveness of markets. The recent history of major government procurement exercises is not a glorious one. This is one we should work to get right. We should take our time to do so and ensure a deal that delivers on its promise. The vision of a Britain that is lead by innovation and development supported by a vibrant and globally leading research community is, I believe, the right one. Please ensure that this innovation isn’t cut off at the knees by agreeing terms that prevent our research communication tools being re-used to improve the effectiveness of that communication. And please ensure that the process of procuring these services is one that supports innovation and development in scholarly communications itself.
One of the things you notice as a visitor from the UK in South Africa is how clean the toilets are. In restaurants, at the University, in public places. Sometimes a bit worn down but always clean. And then you start to notice how clear and clean the pavements are and your first response, well at least my first response, is that this is a sign of things going right. One element of the whole is working well. But of course one of the main reasons for this is that labour is cheap and plentiful.
And suddenly it seemed less positive. The temptation is to offer advice, experience of first world development programs, but it became clear very quickly how little of the context I was aware. The problems on the ground are often not the obvious ones – the solutions rarely easy visible from the privileged perspective of the global north. The thing I learnt was to offer advice that was tactical, not strategic, routes towards the desired goal, not the direction of travel.
So Heather Morrison’s post on the use of Creative Commons for open access, arriving as it did in the middle of my visit to Cape Town, troubled me. Those who read here will know I am a strong partisan of the OA = CC-BY view and indeed tend to the view that we should just place things in the public domain. So my first response was a rejection. But there is an argument in there that non-commercial and share-alike terms are appropriate for the developing world, because they can protect access to the results of text mining of relevant research. These arguments are always worth taking apart because they help to illuminate the practicalities of how we take scholarly communication and make it valuable to people. It helps to pull apart the issues and raise important use cases, and the effective use of research to aid development is a key use case. Heather is also, as someone who has thought deeply about these issues, a person I will always disagree with with caution.
But here I do have to disagree. And I disagree on two levels. I gave a presentation at UCT where I talked in part about some of our open science work and a question was asked “have you thought about how accessible this is in rural Africaâ€. My lame answer was that we have thought about it and worried about it but not actually done anything. But the better answer for me was that it is far better for the people on the ground who really know the infrastructure, and the need, to decide what is required and do the appropriate format conversion, the printing and distribution than for us in the UK or the US to presume to know what it is of most value. This, after all is the principle of open access, allowing others to re-use as appropriate for their context precisely because we don’t know what those uses may be.
And NC terms will break this in the developing world. Right where that research is needed the communications infrastructure is patchy and in many cases non-existent. Getting that research to people, whether the raw communication, or processed or text mined material can require paper, and trucks to transport. It may require translation in format or in language or in form. All of these things cost money and in choosing NC terms we would be condemning those who could use this material to relying on charity, or which is in some cases worse, governments. A service industry that charges someone, somewhere, for production, translation, and transport is ruled out as a possible business model. Exploitation is a value-laden term, particularly in the context of Africa but in its pure and ideal sense it is neutral. We want to see research exploited for good but in choosing NC terms we are dictating the way that exploitation can occur. The risks of exploitation in the bad sense are also there. But I don’t think licences are the way to deal with it. By using legal instruments we take those key choices out of the hands of those best placed to make them.
And this is the more insidious issue from my perspective. The thing I learned in a week in Cape Town was that the last thing scholars in the developing world need is for us to make decisions for them. What we need to do, as a community, is to ensure that people can make the widest possible range of choices. Don’t get me wrong, using CC-BY has some risks in that regard, and ccZero perhaps some more, but if we act to preserve the principle of giving people the space to make their own decisisons based on local knowledge and local needs, then that is the biggest contribution we can make.
It’s not just licensing. One of the biggest holes in the entire fabric of the OA movement, is the lack of a principled and rational stance on author payment charges. Just patting people on the head and saying “oh don’t worry we won’t make you pay†is not just patronising, it is damaging to our credibility, and damaging to our own progress. One of the other key lessons I learnt from a week listening to people in Cape Town is how far ahead of the traditional centres of scholarship they are on some issues. I spend a lot of time in the UK and the US trying to convince people that there is an issue; that we need to look at how our research matters to the wider community. In South Africa the needs are clear, and scholars want to make a difference. The question is how, and how to tell when it is working. They are damn good at this. We could learn a lot from these people about how to balance the pursuit of prestige and “research with impact†that seems to be such a struggle for us.
We need to think of the global scholarly community, not as made up of “us†and “those who are catching upâ€, but as made up of different groups with different priorities and different needs, and crucially with different experiences and value to bring to a global endeavour. The creation of a “special OA†for the developing world runs the risk of perpetuating a view in which 85% of the world will always be catching up. Yes we need to try and build systems that ensure access to all scholarship, primary and derived. How to do that is an important debate, and one I hope will be at the centre of the Berlin10 conference to be held in Stellenbosch in South Africa next year. To not use that meeting to address the issues and challenges of business models and safeguards that help preserve access and optimise impact for the developing world would be a terrible loss.
OA is about enabling people, enabling business, and enabling development. There is a global community of scholars who get that, who want to be part of the wider community, and have their own skills and expertise to bring. They also want to share and contribute to the resource needs of scholarly communication in an appropriate and equitable way. We need to enable them to do that and get out of their way. After all, we might learn something.