25,000 signatures and still rolling: Implications of the White House petition

The petition at 25,000 signatures

I’m afraid I went to bed. It was getting on for midnight and it looked like another four hours or so before the petition would reach the magic mark of 25,000 signatures. As it turns out a final rush put us across the line at around 2am my time, but never mind, I woke up wondering whether we had got there, headed for the computer and had a pleasant surprise waiting for me.

What does this mean? What have John Wilbanks, Heather Joseph, Mike Carroll, and Mike Rossner achieved by deciding to push through what was a real hard slog? And what about all those people and groups involved in getting signatures in? I think there are maybe three major points here.

Access to Research is now firmly on the White House (and other governments’) agenda

The petition started as a result of a meeting between the Access2Research founders and John Holdren from the White House. John Wilbanks has written about how the meeting went and what the response was. The US administration has sympathy and understands many of the issues. However it must be hard to make the case that this was something worth the bandwidth it would take to drive a policy initiative. Especially in an election year. The petition and the mechanism of the “We the people” site has enabled us to show that it is a policy item that generates public interest, but more importantly it creates an opportunity for the White House to respond. It is worth noting that this has been one of the more successful petitions. Reaching the 25k mark in two weeks is a real achievement, and one that has got the attention of key people.

And that attention spreads globally as well. The Finch Report on mechanisms for improving access to UK research outputs will probably not mention the petition, but you can bet that those within the UK government involved in implementation will have taken note. Similarly as the details of the Horizon2020 programme within the EU are hammered out, those deciding on the legal instruments that will distribute around $80B, will have noted that there is public demand, and therefore political cover, to take action.

The Open Access Movement has a strong voice, and a diverse network, and can be an effective lobby

It is easy, as we all work towards the shared goal of enabling wider access and the full exploitation of web technologies, to get bogged down in details and to focus on disagreements. What this effort showed was that when we work together we can muster the connections and the network to send a very strong message. And that message is stronger for coming from diverse directions in a completely transparent manner. We have learnt the lessons that could be taken from the fight against SOPA and PIPA and refined them in the campaign to defeat, in fact to utterly destroy, the Research Works Act. But this was not a reaction, and it was not merely a negative campaign. This was a positive campaign, originating within the movement, which together we have successfully pulled off. There are lessons to be learnt. Things we could have done better. But what we now know is that we have the capacity to take on large scale public actions and pull them off.

The wider community wants access and has a demonstrated capacity to use it

There has in the past been an argument that public access is not useful because “they can’t possibly undertand it”, that “there is no demand for public access”. That argument has been comprehensively and permanently destroyed. It was always an arrogant argument, and in my view a dangerous one for those with a vested interest in ensuring continued public funding of research. The fact that it had strong parallels with the arguments deployed in the 18th and 19th centuries that colonists, or those who did not own land, or women, could not possibly be competent to vote should have been enough to warn people off using it. The petition has shown demand, and the stories that have surfaced through this campaign show not only that there are many people who are not professional researchers who can use research, but that many of these people also want, and are more than capable, to contribute back to the professional research effort.

The campaign has put the ideas of Open Access in front of more people than perhaps ever before. We have reached out to family, friends, co-workers, patients, technologists, entrepreneurs, medical practitioners, educators, and people just interested in the world around them. Perhaps one in ten of them actually signed the petition, but many of them will have talked to others, spreading the ideas. This is perhaps one of the most important achievements of the petition. Getting the message and the idea out in front of hundreds of thousands of people who may not take action today, but will now be primed to see the problems that arise from a lack of access, and the opportunities that could be created through access.

Where now?

So what are our next steps? Continuing to gain signatures for the next two weeks is still important. This may be one of the most rapidly growing petitions but showing that continued growth is still valuable. But more generally my sense is that we need to take stock and look forward to the next phase. The really hard work of implementation is coming. As a movement we still disagree strongly on elements of tactics and strategy. The tactics I am less concerned about, we can take multiple paths, applying pressure at multiple points and this will be to our advantage. But I think we need a clearer goal on strategy. We need to articulate what the endgame is. What is the vision? When will we know that we have achieved what we set out to do?

Peter Murray-Rust has already quoted Churchill but it does seem apposite. “…this is not the end. This is not even the beginning of the end. But it is perhaps, the end of the beginning.”

We now know how much we can achieve when we work together with a shared goal. The challenge now is to harness that to a shared understanding of the direction of travel, if perhaps not the precise route. But if we, with all the diversity of needs and views that this movement contains, we can find the core of goals that we all agree on, then what we now know is that we have the capacity, the depth, and the strength to achieve them.

 

Enhanced by Zemanta

Github for science? Shouldn’t we perhaps build TCP/IP first?

Mapa mental do TCP/IP
Image via Wikipedia

It’s one of those throw away lines, “Before we can talk about a github for science we really need to sort out a TCP/IP for science”, that’s geeky, sharp, a bit needly and goes down a treat on Twitter. But there is a serious point behind it. And its not intended to be dismissive of the ideas that are swirling around about scholarly communication at the moment either. So it seems worth exploring in a bit more detail.

The line is stolen almost wholesale from John Wilbanks who used it (I think) in the talk he gave at a Science Commons meetup in Redmond a few years back. At the time I think we were awash in “Facebooks for Science” so that was the target but the sentiment holds. As once was the case with Facebook and now is for Github, or Wikipedia, or StackOverflow, the possibilities opened up by these new services and technologies to support a much more efficient and effective research process look amazing. And they are. But you’ve got to be a little careful about taking the analogy too far.

If you look at what these services provide, particularly those that are focused on coding, they deliver commentary and documentation, nearly always in the form of text about code – which is also basically text. The web is very good at transferring text, and code, and data. The stack that delivers this is built on a set of standards, with each layer building on the layer beneath it. StackOverflow and Github are built on a set of services, that in turn sit on top of the web standards of http, which in turn are built on network standards like TCP/IP that control the actual transfer of bits and bytes.

The fundamental stuff of these coding sites and Wikipedia is text, and text is really well supported by the stack of web technologies. Open Source approaches to software development didn’t just develop because of the web, they developed the web so its not surprising that they fit well together. They grew up together and nurtured each other. But the bottom line is that the stack is optimized to transfer the grains of material, text and code, that make up the core of these services.

When we look at research we can see that when we dig down to the granular level it isn’t just made up of text. Sure most research could be represented as text but we don’t have the standardized forms to do this. We don’t have standard granules of research that we can transfer from place to place. This is because its complicated to transfer the stuff of research. I picked on TCP/IP specifically because it is the transfer protocol that supports moving bits and bytes from one place to another. What we need are protocols that support moving the substance of a piece of my research from one place to another.

Work on Research Objects [see also this paper], intended to be self-contained but useable pieces of research is a step in this direction, as are the developing set of workflow tools, that will ultimately allow us to describe and share the process by which we’ve transformed at least some parts of the research process into others. Laboratory recording systems will help us to capture and workflow-ify records of the physical parts of the research process. But until we can agree how to transfer these in a standardized fashion then I think it is premature to talk about Githubs for research.

Now there is a flip side to this, which is that where there are such services that do support the transfer of pieces of the research process we absolutely should be  experimenting with them. But in most cases the type-case itself will do the job. Github is great for sharing research code and some people are doing terrific things with data there as well. But if it does the job for those kinds of things why do we need one for researchers? The scale that the consumer web brings, and the exposure to a much bigger community, is a powerful counter argument to building things ‘just for researchers’. To justify a service focused on a small community you need to have very strong engagement or very specific needs. By the time that a mainstream service has mindshare and researchers are using it, your chances of pulling them away to a new service just for them are very small.

So yes, we should be inspired by the possibilities that these new services open up, and we should absolutely build and experiment but while we are at it can we also focus on the lower levels of the stack?They aren’t as sexy and they probably won’t make anyone rich, but we’ve got to get serious about the underlying mechanisms that will transfer our research in comprehensible packages from one place to another.

We have to think carefully about capturing the context of research and presenting that to the next user. Github works in large part because the people using it know how to use code, can recognize specific languages, and know how to drive it. It’s actually pretty poor for the user who just wants to do something – we’ve had to build up another set of services at different levels, the Python Package Index, tools for making and distributing executables, that help provide the context required for different types of user. This is going to be much, much harder, for all the different types of use we might want to put research to.

But if we can get this right – if we can standardize transfer protocols and build in the context of the research into those ‘packets’ that lets people use it then what we have seen on the wider web will happen naturally. As we build the stack up these services that seem so hard to build at the moment will become as easy today as throwing up a blog, downloading a rubygem, or firing up a machine instance. If we can achieve that then we’ll have much more than a github for research, we’ll have a whole web for research.

There’s nothing new here that wasn’t written some time ago by John Wilbanks and others but it seemed worth repeating. In particular I recommend these posts [1, 2] from John.

Data is free or hidden – there is no middle ground

Science commons and other are organising a workshop on Open Science issues as a satellite meeting of the European Science Open Forum meeting in July. This is pitched as an opportunity to discuss issues around policy, funding, and social issues with an impact on the ‘Open Research Agenda’. In preparation for that meeting I wanted to continue to explore some of the conflicts that arise between wanting to make data freely available as soon as possible and the need to protect the interests of the researchers that have generated data and (perhaps) have a right to the benefits of exploiting that data.

John Cumbers proposed the idea of a ‘Protocol’ for open science that included the idea of a ‘use embargo’; the idea that when data is initially made available, no-one else should work on it for a specified period of time. I proposed more generally that people could ask that people leave data alone for any particular period of time, but that there ought to be an absolute limit on this type of embargo to prevent data being tied up. These kinds of ideas revolve around the need to forge community norms – standards of behaviour that are expected, and to some extent enforced, by a community. The problem is that these need to evolve naturally, rather than be imposed by committee. If there isn’t community buy in then proposed standards have no teeth.

An alternative approach to solving the problem is to adopt some sort ‘license’. A legal or contractual framework that creates obligation about how data can be used and re-used. This could impose embargoes of the type that John suggested, perhaps as flexible clauses in the license. One could imagine an ‘Open data – six month analysis embargo’ license. This is attractive because it apparently gives you control over what is done with your data while also allowing you to make it freely available. This is why people who first come to the table with an interest in sharing content always start with CC-BY-NC. They want everyone to have their content, but not to make money out of it. It is only later that people realise what other effects this restriction can have.

I had rejected the licensing approach because I thought it could only work in a walled garden, something which goes against my view of what open data is about. More recently John Wilbanks has written some wonderfully clear posts on the nature of the public domain, and the place of data in it, that make clear that it can’t even work in a walled garden. Because data is in the public domain, no contractual arrangement can protect your ability to exploit that data, it can only give you a legal right to punish someone who does something you haven’t agreed to. This has important consequences for the idea of Open Science licences and standards.

If we argue as an ‘Open Science Movement’ that data is in and must remain in the public domain then, if we believe this is in the common good, we should also argue for the widest possible interpretation of what is data. The results of an experiment, regardless of how clever its design might be, are a ‘fact of nature’, and therefore in the public domain (although not necessarily publically available). Therefore if any person has access to that data they can do whatever the like with it as long as they are not bound by a contractual arrangement. If someone breaks a contractual arrangement and makes the data freely available there is no way you can get that data back. You can punish the person who made it available if they broke a contract with you. But you can’t recover the data. The only way you can protect the right to exploit data is by keeping it secret. The is entirely different to creative content where if someone ignores or breaks licence terms then you can legally recover the content from anyone that has obtained it.

Why does this matter to the Open Science movement? Aren’t we all about making the data available for people to do whatever anyway? It matters because you can’t place any legal limitations on what people do with data you make available. You can’t put something up and say ‘you can only use this for X’ or ‘you can only use it after six months’ or even ‘you must attribute this data’. Even in a walled garden, once there is one hole, the entire edifice is gone. The only way we can protect the rights of those who generate data to benefit from exploiting it is through the hard work of developing and enforcing community norms that provide clear guidelines on what can be done. It’s that or simply keep the data secret.

What is important is that we are clear about this distinction between legal and ethical protections. We must not tell people that their data can be protected because essentially they can’t. And this is a real challenge to the ethos of open data because it means that our only absolutely reliable method for protecting people is by hiding data. Strong community norms will, and do, help but there is a need to be careful about how we encourage people to put data out there. And we need to be very strong in condemning people who do the ‘wrong’ thing. Which is why a discussion on what we believe is ‘right’ and ‘wrong’ behaviour is incredibly important. I hope that discussion kicks off in Barcelona and continues globally over the next few months. I know that not everyone can make the various meetings that are going on – but between them and the blogosphere and the ‘streamosphere‘ we have the tools, the expertise, and hopefully the will, to figure these things out.

Related articles

Zemanta Pixie