Hoist by my own petard: How to reduce your impact with restrictive licences

No Television
Image via Wikipedia

I was greatly honoured to be asked to speak at the symposium held on Monday to recognize Peter Murray-Rusts’ contribution to scholarly communication. The lineup was spectactular, the talks insightful and probing, and the discussion serious, but also no longer trapped in the naive yes/no discussions of openness and machine readability, but moving on into detail, edge cases, problems and issues.

For my own talk I wanted to do something different to what I’ve been doing in recent talks. Following the example of Deepak Singh, John Wilbank and others I’ve developed what seems to be a pretty effective way of doing an advocacy talk, involving lots of slides, big images, few words going by at a fast rate. Recently I did 118 slides in 20 minutes. The talk for Peter’s symposium required something different so I eschewed slides and just spoke for 40 minutes wanting to explore the issues deeply rather than skate over the surface in the way the rapid fire approach tends to do.

The talk was, I think, reasonably well received and provoked some interesting (and heated) discussion. I’ve put the draft text I was working from up on an Etherpad. However due to my own stupidity the talk was neither livestreamed nor recorded. In a discussion leading up to talk I was asked whether I wanted to put up a pretty picture as a backdrop and I thought it would be good to put up the licensing slide that I use in all of my talks to show that livestreaming, twittering, etc, is fine and encouraging people to do it. The trouble is that I navigated to the slideshare deck that has that slide and just hit full screen without thinking. What the audience therefore saw was the first slide, which looks like this.

A restrictive talk licence prohibiting live streaming, tweeting, etc.

I simply didn’t notice as I was looking the other way. The response to this was both instructive and interesting. The first thing that happened as soon as the people running the (amazingly effective given the resources they had) livestream and recording saw the slide they shut down everything. In a sense this is really positive, it shows that people respect the requests of the speaker by default.

Across the audience people didn’t tweet, and indeed in a couple of cases deleted photographs that they had taken. Again the respect for the request people thought I was making was solid. Even in an audience full of radicals and open geeks no-one questioned the request. I’m slightly gobsmacked in fact that no-one shouted at me to ask what the hell I thought I was doing. Some thought I was being ironic, which I have to say would have been too clever by half. But again it shows, if you ask, people do for the most part respect that request.

Given the talk was about research impact, and how open approaches will enable it, it is rather ironic that by inadvertantly using the wrong slide I probably significantly reduced the impact of the talk. There is no video that I can upload, no opportunity for others to see the talk. Several people who I know were watching online whose opinion I value didn’t get to see the talk, and the tweetstream that I might have hoped would be full of discussion, disagreement, and alternative perspectives was basically dead. I effectively made my own point, reducing what I’d hoped might kick off a wider discussion to a dead talk that only exists in a static document and memories of the limited number of people who were in the room.

The message is pretty clear. If you want to reduce the effectiveness and impact of the work you’re doing, if you want to limit the people you can reach, then use restrictive terms. If you want our work to reach people and to maximise the chance it has to make a difference, make it clear and easy for people to understand that they are encouraged to copy, share, and cite your work. Be open. Make a difference.

Enhanced by Zemanta

Data is free or hidden – there is no middle ground

Science commons and other are organising a workshop on Open Science issues as a satellite meeting of the European Science Open Forum meeting in July. This is pitched as an opportunity to discuss issues around policy, funding, and social issues with an impact on the ‘Open Research Agenda’. In preparation for that meeting I wanted to continue to explore some of the conflicts that arise between wanting to make data freely available as soon as possible and the need to protect the interests of the researchers that have generated data and (perhaps) have a right to the benefits of exploiting that data.

John Cumbers proposed the idea of a ‘Protocol’ for open science that included the idea of a ‘use embargo’; the idea that when data is initially made available, no-one else should work on it for a specified period of time. I proposed more generally that people could ask that people leave data alone for any particular period of time, but that there ought to be an absolute limit on this type of embargo to prevent data being tied up. These kinds of ideas revolve around the need to forge community norms – standards of behaviour that are expected, and to some extent enforced, by a community. The problem is that these need to evolve naturally, rather than be imposed by committee. If there isn’t community buy in then proposed standards have no teeth.

An alternative approach to solving the problem is to adopt some sort ‘license’. A legal or contractual framework that creates obligation about how data can be used and re-used. This could impose embargoes of the type that John suggested, perhaps as flexible clauses in the license. One could imagine an ‘Open data – six month analysis embargo’ license. This is attractive because it apparently gives you control over what is done with your data while also allowing you to make it freely available. This is why people who first come to the table with an interest in sharing content always start with CC-BY-NC. They want everyone to have their content, but not to make money out of it. It is only later that people realise what other effects this restriction can have.

I had rejected the licensing approach because I thought it could only work in a walled garden, something which goes against my view of what open data is about. More recently John Wilbanks has written some wonderfully clear posts on the nature of the public domain, and the place of data in it, that make clear that it can’t even work in a walled garden. Because data is in the public domain, no contractual arrangement can protect your ability to exploit that data, it can only give you a legal right to punish someone who does something you haven’t agreed to. This has important consequences for the idea of Open Science licences and standards.

If we argue as an ‘Open Science Movement’ that data is in and must remain in the public domain then, if we believe this is in the common good, we should also argue for the widest possible interpretation of what is data. The results of an experiment, regardless of how clever its design might be, are a ‘fact of nature’, and therefore in the public domain (although not necessarily publically available). Therefore if any person has access to that data they can do whatever the like with it as long as they are not bound by a contractual arrangement. If someone breaks a contractual arrangement and makes the data freely available there is no way you can get that data back. You can punish the person who made it available if they broke a contract with you. But you can’t recover the data. The only way you can protect the right to exploit data is by keeping it secret. The is entirely different to creative content where if someone ignores or breaks licence terms then you can legally recover the content from anyone that has obtained it.

Why does this matter to the Open Science movement? Aren’t we all about making the data available for people to do whatever anyway? It matters because you can’t place any legal limitations on what people do with data you make available. You can’t put something up and say ‘you can only use this for X’ or ‘you can only use it after six months’ or even ‘you must attribute this data’. Even in a walled garden, once there is one hole, the entire edifice is gone. The only way we can protect the rights of those who generate data to benefit from exploiting it is through the hard work of developing and enforcing community norms that provide clear guidelines on what can be done. It’s that or simply keep the data secret.

What is important is that we are clear about this distinction between legal and ethical protections. We must not tell people that their data can be protected because essentially they can’t. And this is a real challenge to the ethos of open data because it means that our only absolutely reliable method for protecting people is by hiding data. Strong community norms will, and do, help but there is a need to be careful about how we encourage people to put data out there. And we need to be very strong in condemning people who do the ‘wrong’ thing. Which is why a discussion on what we believe is ‘right’ and ‘wrong’ behaviour is incredibly important. I hope that discussion kicks off in Barcelona and continues globally over the next few months. I know that not everyone can make the various meetings that are going on – but between them and the blogosphere and the ‘streamosphere‘ we have the tools, the expertise, and hopefully the will, to figure these things out.

Related articles

Zemanta Pixie

More on the science exchance – or building and capitalising a data commons

Image from Wikipedia via ZemantaBanknotes from all around the World donated by visitors to the British Museum, London

Following on from the discussion a few weeks back kicked off by Shirley at One Big Lab and continued here I’ve been thinking about how to actually turn what was a throwaway comment into reality:

What is being generated here is new science, and science isn’t paid for per se. The resources that generate science are supported by governments, charities, and industry but the actual production of science is not supported. The truly radical approach to this would be to turn the system on its head. Don’t fund the universities to do science, fund the journals to buy science; then the system would reward increased efficiency.

There is a problem at the core of this. For someone to pay for access to the results, there has to be a monetary benefit to them. This may be through increased efficiency of their research funding but that’s a rather vague benefit. For a serious charitable or commercial funder there has to be the potential to either make money, or at least see that the enterprise could become self sufficient. But surely this means monetizing the data somehow? Which would require restrictive licences, which is not at the end what we’re about.

The other story of the week has been the, in the end very useful, kerfuffle caused by ChemSpider moving to a CC-BY-SA licence, and the confusion that has been revealed regarding data, licencing, and the public domain. John Wilbanks, whose comments on the ChemSpider licence, sparked the discussion has written two posts [1, 2] which I found illuminating and have made things much clearer for me. His point is that data naturally belongs in the public domain and that the public domain and the freedom of the data itself needs to be protected from erosion, both legal, and conceptual that could be caused by our obsession with licences. What does this mean for making an effective data commons, and the Science Exchange that could arise from it, financially viable? Continue reading “More on the science exchance – or building and capitalising a data commons”