Hoist by my own petard: How to reduce your impact with restrictive licences

No Television
Image via Wikipedia

I was greatly honoured to be asked to speak at the symposium held on Monday to recognize Peter Murray-Rusts’ contribution to scholarly communication. The lineup was spectactular, the talks insightful and probing, and the discussion serious, but also no longer trapped in the naive yes/no discussions of openness and machine readability, but moving on into detail, edge cases, problems and issues.

For my own talk I wanted to do something different to what I’ve been doing in recent talks. Following the example of Deepak Singh, John Wilbank and others I’ve developed what seems to be a pretty effective way of doing an advocacy talk, involving lots of slides, big images, few words going by at a fast rate. Recently I did 118 slides in 20 minutes. The talk for Peter’s symposium required something different so I eschewed slides and just spoke for 40 minutes wanting to explore the issues deeply rather than skate over the surface in the way the rapid fire approach tends to do.

The talk was, I think, reasonably well received and provoked some interesting (and heated) discussion. I’ve put the draft text I was working from up on an Etherpad. However due to my own stupidity the talk was neither livestreamed nor recorded. In a discussion leading up to talk I was asked whether I wanted to put up a pretty picture as a backdrop and I thought it would be good to put up the licensing slide that I use in all of my talks to show that livestreaming, twittering, etc, is fine and encouraging people to do it. The trouble is that I navigated to the slideshare deck that has that slide and just hit full screen without thinking. What the audience therefore saw was the first slide, which looks like this.

A restrictive talk licence prohibiting live streaming, tweeting, etc.

I simply didn’t notice as I was looking the other way. The response to this was both instructive and interesting. The first thing that happened as soon as the people running the (amazingly effective given the resources they had) livestream and recording saw the slide they shut down everything. In a sense this is really positive, it shows that people respect the requests of the speaker by default.

Across the audience people didn’t tweet, and indeed in a couple of cases deleted photographs that they had taken. Again the respect for the request people thought I was making was solid. Even in an audience full of radicals and open geeks no-one questioned the request. I’m slightly gobsmacked in fact that no-one shouted at me to ask what the hell I thought I was doing. Some thought I was being ironic, which I have to say would have been too clever by half. But again it shows, if you ask, people do for the most part respect that request.

Given the talk was about research impact, and how open approaches will enable it, it is rather ironic that by inadvertantly using the wrong slide I probably significantly reduced the impact of the talk. There is no video that I can upload, no opportunity for others to see the talk. Several people who I know were watching online whose opinion I value didn’t get to see the talk, and the tweetstream that I might have hoped would be full of discussion, disagreement, and alternative perspectives was basically dead. I effectively made my own point, reducing what I’d hoped might kick off a wider discussion to a dead talk that only exists in a static document and memories of the limited number of people who were in the room.

The message is pretty clear. If you want to reduce the effectiveness and impact of the work you’re doing, if you want to limit the people you can reach, then use restrictive terms. If you want our work to reach people and to maximise the chance it has to make a difference, make it clear and easy for people to understand that they are encouraged to copy, share, and cite your work. Be open. Make a difference.

Enhanced by Zemanta

It’s not easy being clear…

There has been some debate going backwards and forwards over the past few weeks about licensing, peoples expectations, and the extent to which researchers can be expected to understand, or want to understand, the details of legal terms, licensing and other technical minutiae. It is reasonable for scientific researchers not to wish to get into the details. One of the real successes of Creative Commons has been to provide a relatively small set of reasonably clear terms that enable people to express their wishes about what people can do with their work. But even here there is the potential for significant confusion as demonstrated by the work that CC is doing on the perception of what “non commercial” means.

The end result of this is two-fold. Firstly people are genuinely confused about what to do and a result they give up. In giving up there is often an unspoken assumption that “people will understand what I want/mean”. Two examples yesterday illustrated exactly how misguided this can be and showed the importance of being clear, and thinking about, what you want people to do with your content and information.

The first was pointed out by Paulo Nuin who linked to a post on The Matrix Cookbook, a blog and PDF containing much useful information on matrix transforms. The post complained that Amazon were selling a Kindle version of the PDF, apparently without asking permission or even bothering to inform the authors. So far, so big corporation. But digging a little deeper I went to the front page of the site and found this interesting “license”:

“License? No, there is no license. It is provided as a knowledge sharing project for anyone to use. But if you use it in an academic or research like context, we would love to be cited appropriately.”

Now I would intepret this as meaning that the authors had intended to place the work in the public domain. They clearly felt that while educational and research re-use was fine that commercial use was not. I would guess that someone at Amazon read the statement “there is no license” and felt that it was free to re-use. It seems odd that they wouldn’t email the authors to notify them but if it were public domain there is no requirement to. Rude, yes. Theft? Well it depends on your perspective. Going back today the authors have made a significant change to the “license”:

It is provided as a knowledge sharing project for anyone to use. But if you use it in an academic or research like context, we would love to be cited appropriately. And NO, you are not allowed to make money on it by reselling The Matrix Cookbook in any form or shape.

Had the authors made the content CC-BY-NC then their intentions would have been much clearer. My personal belief is that an NC license would be counter-productive (meaning the work couldn’t be used for teaching at a fee charging college or for research funded by a commercial sponsor for instance) but the point of the CC licenses is to give people these choices. What is important is that people make those choices and make them clear.

The second example related to identity. As part of an ongoing discussion involving online commenting genereg, a Friendfeed user, linked to their blog which included their real name. Mr Gunn, the nickname used by Dr William Gunn online wrote a blog post in which he referred to genereg’s contribution by linking to their blog from their real name [subsequently removed on request]. I probably would have done the same, wanting to ascribe the contribution clearly to the “real person” so they get credit for it. Genereg objected to this feeling that as their real name wasn’t directly in that conversational context it was inappropriate to use it.

So in my view, “Genereg” was a nickname that someone was happy to have connected with their real name, while in their view this was inappropriate. No-one is right or wrong here, we are evolving the rules of conduct more or less as we go and frankly, identity is a mess. But this wasn’t clear to me or to Mr Gunn. I am often uncomfortable with trying to tell whether a specific person who has linked two apparently separate identities is happy with that link being public, has linked the two by mistake, or just regards one as an alias. And you can’t ask in public forum can you?

What links these, and this week’s other fracas, is confusion over people’s expectations. The best way to avoid this is to be as clear as you possibly can. Don’t assume that everyone thinks the same way that you do. And definitely don’t assume that what is obvious to you is obvious to everyone else. When it comes to content, make a clear statement of your expectations and wishes, preferably using a widely recognized and understood licenses. If you’re reading this at OWW you should be seeing my nice shiny new cc0 waiver in the right hand navbar (I haven’t figured how to get it into the RSS feed yet). Most of my slidesets at Slideshare are CC-BY-SA. I’d prefer them to be CC-BY but most include images with CC-BY-SA licenses which (try to make sure) I respect. Overall I try to make the work I generate as widely re-usable as possible and aim to make that as clear as possible.

There are no such tools to make clear statements about how you wish your identity to be treated (and perhaps there should be). But a plain english statement on the appropriate profile page might be useful “I blog under a pseudonym because…and I don’t want my identity revealed”…”Bunnykins is the Friendfeed handle of Professor Serious Person”. Consider whether what you are doing is sending mixed messages or potentially confusing. Personally I like to keep things simple so I just use my real name or variants of it. But that is clearly not for everyone.

Above all, try to express clearly what you expect and wish to happen. Don’t expect others necessarily to understand where you’re coming from. It is very easy for one person’s polite and helpful to be another person’s deeply offensive. When you put something online, think about how you want people to use it, think about how you don’t want people to use it (and remember you may need to balance the allowing of one against the restricting of the other) and make those as clear as you possibly can, where possible using a statement or license that is widely recognized and has had some legal attention at some point like the CC licenses, cc0 waiver, or the PDDL. Clarity helps everyone. If we get this wrong we may end up with a web full of things we can’t use.

And before anyone else gets in to tell me I’ve made plenty of unjustified, and plain wrong, assumptions about other people’s views before. Pot. Kettle. Black. Welcome to being human.

Best practice for data availability – the debate starts…well over there really

The issue of licensing arrangements and best practice for making data available has been brewing for some time but has just recently come to a head. John Wilbanks and Science Commons have a reasonably well established line that they have been developing for some time. Michael Nielsen has a recent blog post and Rufus Pollock, of the Open Knowledge Foundation, has also just synthesised his thoughts in response into a blog essay. I highly recommend reading John’s article on licensing at Nature Precedings, Michael’s blog post, and Rufus’ essay before proceeding. Another important document is the discussion of the license that Victoria Stodden is working to develop. Actually if you’ve read them go and read them again anyway – it will refresh the argument.

To crudely summarize, Rufus makes a cogent argument for the use of explicit licenses applied to collections of data, and feels that share-alike provisions in licenses or otherwise do not cause major problems and that the benefit that arises from enforcing re-use outweighs the problem. John’s position is that it far better for standards to be applied through social pressure (“community norms”) rather than licensing arrangements. He also believes that share-alike provisions are bad because they break interoperability between different types of objects and domains. One point that I think is very important and (I think) is a point of agreement is that some form of license or at dedication to the public domain will be crucial to developing best practice. Even if the final outcome of debate is that everything will go in the public domain it should be part of best practice to make that explicit.

Broadly speaking I belong to John’s camp but I don’t want to argue that case with this post. What is important in my view is that the debate takes place and that we are clear about what the aims of that debate are. What is it we are trying to achieve in the process of coming to (hopefully) some consensus of what best practice should look like?
It is important to remember that anyone can assert a license (or lack thereof) on any object that they (assert they) own or have rights over. We will never be able to impose a specific protocol on all researchers, all funders. Therefore what we are looking for is not the perfect arrangement but a balance between what is desired, what can be practically achieved, and what is politically feasible. We do need a coherent consensus view that can be presented to research communities and research funders. That is why the debate is important. We also need something that works, and is extensible into the future, where it will stand up to the development of new types of research, new types of data, new ways of making that data available, and perhaps new types of researchers altogether.

I think we agree that the minimal aim is to enable, encourage, and protect into the future the ability to re-use and re-purpose the publicly published products of publicly funded research. Arguments about personal or commercial work are much harder and much more subtle. Restricting the argument to publicly funded researchers makes it possible to open a discussion with a defined number of funders who have a public service and public engagement agenda. It also makes the moral arguments much clearer.

In focussing on research that is being made public we short circuit the contentious issue of timing. The right, or the responsibility, to commercially exploit research outputs and the limitations this can place on data availability is a complex and difficult area and one in which agreement is unlikely any time soon. I would also avoid the word “Open”. This is becoming a badly overloaded term with both political and emotional overtones, positive and negative. Focussing on what should happen after the decision has been to go public reduces the argument to “what is best practice for making research outputs available”. The question of when to make them available can then be kept separate. The key question for the current debate is not when but how.

So what I believe the debate should be about is the establishment, if possible, of a consensus  protocol or standard or license for enabling and ensuring the availability of the research outputs associated with publicly published, publicly funded research.  Along side this is the question of establishing mechanisms, for researchers to implement and be supported to observe these standards, as well as for “enforcement”. These might be trademarks, community standards, or legal or contractual approaches as well as systems and software to make all of this work, including trackbacks, citation aggregators, and effective data repositories. In addition we need to consider the public relations issue of selling such standards to disparate research funders and research communities.

Perhaps a good starting point would be to pinpoint the issues where there is general agreement and map around those. If we agree some central principles then we can take an empirical approach to the mechanisms. We’re scientists after all aren’t we?

What Russel Brand and Jonathan Ross can teach us about the value of community norms

For anyone in the UK who lives under a stone, or those people elsewhere in the world who don’t follow British news, this week there has been at least some news beyond the ongoing economic crisis and a U.S. election. Two media ‘personalities’ have been excoriated for leaving what can only be described as crass and offensive messages on an elderly actor’s answer phone, while on air. What made the affair worse was that the radio programme was in fact recorded and someone, somewhere, made a decision to broadcast it in full. Even worse was the fact that the broadcaster was that bastion of British values, the BBC.

If you want to get more of the details of what exactly happened then do a search on their names, but what I wanted to focus on here was some of the public and institutional reactions and their relation to the presumed need within the science community for ‘rules’, ‘licences’, and ‘copyright’ over works and data. Consistently we try to explain why this is not a good approach and developing strong community norms is better [1, 2]. I think this affair gives an example of why.

Much of the media and public outcry has been of the type ‘there must be some law, or if not some BBC rule that must have been broken, bang them up!’ There is a sense that there can only be recourse if someone has broken a rule. This is quite similar to the sense amongst many researchers, that they will only be able to ‘protect’ the results they make public by making them available under an explicit licence. That the only way they can have any recourse against someone ‘misusing’ ‘their’ results is if they are able to show that they have broken the terms of a licence.
The problem with this, as we know, is two-fold. First if someone does break the terms of the licence then frankly your chance of actually doing anything about it is pretty minimal. Secondly, and more importantly from the perspective of those of us interested in re-use and re-purposing, we know that pretty much any licensing system will create incompatibilities that prevent combining datasets, or using them in new ways, even when that wasn’t the intention of the original licensor.

There is an interesting parallel here with the Brand/Ross affair. It is entirely possible that no laws, or even BBC rules, have been broken. Does this mean they get off scott free? No, Brand has resigned and Ross has been suspended with his popular Friday night TV show apparently not to be recorded this week. The most interesting thing about the whole affair is that the central failure at the BBC was an editorial one. Some, so far unnamed, senior editor signed off and allowed the programme to be broadcast. What should have happened was that this editor should have blocked the programme or removed the offending passages. Not because a rule was broken but because it was not appropriate for the BBC’s editorial standards. Because it violated the community norms of what is acceptable for the BBC to broadcast. Whether or not they broke any rules what was done was crass and offensive. Whether or not someone is technically in violation of a data re-use license, failing to provide adequate attribution to the generators of that dataset is equally crass and unacceptable behaviour.

What the BBC discovered was that when it doesn’t live up to the standards that the wider community expects of it, that it receives withering censure. Indeed much of the most serious criticism came from some of its own programmes. It was the voice of the wider community (as mediated through the mass media admittedly) which has lead to the resignation and suspension. If it were just a question of ‘rules’ it is entirely possible that nothing could have been done. And if rules were put in place that would have prevented it then the unintended consequence would almost certainly have been to block programmes that had valid dramatic or narrative reasons for carrying such a passage. Again, community censure was much more powerful than any tribunal arbitrating some set of rules.

Yes this is nuanced, yes it is difficult to get right, and yes there is the potential for mob rule. That is why there is a team of senior professional editors at the BBC charged with policing and protecting the ‘community norms’ of what is acceptable for the BBC brand. That is why the damage done to the BBC’s brand will be severe. Standards, where it is explicit that the spirit is applied rather than the letter, where there are grey areas, can be much more effective than legalistic rules. When someone or some group clearly steps outside of the bounds then widespread censure is appropriate. It is then for individuals and organisations to decide how to apply that censure. And in turn to expect to be held to the same standards.

The cheats will always break the rules. If you use legalistic rules, then you invite legalistic approaches to getting around them. Those that try to apply the rules properly will then be hamstrung in their attempts to do anything useful while staying within the letter of the law. Community norms and standards of behaviour, appropriate citation, respect for people’s work and views, can be much more effective.

  1. Wilbanks, John. The Control Fallacy: Why OA Out-Innovates the Alternative. Available from Nature Precedings <http://hdl.handle.net/10101/npre.2008.1808.1> (2008)
  2. Wilbanks, John. Chemspider: Good intentions and the fog of licensing. http://network.nature.com/people/wilbanks/blog/2008/05/10/chemspider-good-intentions-and-the-fog-of-licensing (2008)