Community norms – Science in the Open

For anyone in the UK who lives under a stone, or those people elsewhere in the world who donâ€™t follow British news, this week there has been at least some news beyond the ongoing economic crisis and a U.S. election. Two media â€˜personalitiesâ€™ have been excoriated for leaving what can only be described as crass and offensive messages on an elderly actorâ€™s answer phone, while on air. What made the affair worse was that the radio programme was in fact recorded and someone, somewhere, made a decision to broadcast it in full. Even worse was the fact that the broadcaster was that bastion of British values, the BBC.

If you want to get more of the details of what exactly happened then do a search on their names, but what I wanted to focus on here was some of the public and institutional reactions and their relation to the presumed need within the science community for â€˜rulesâ€™, â€˜licencesâ€™, and â€˜copyrightâ€™ over works and data. Consistently we try to explain why this is not a good approach and developing strong community norms is better [1, 2]. I think this affair gives an example of why.

Much of the media and public outcry has been of the type â€˜there must be some law, or if not some BBC rule that must have been broken, bang them up!â€™ There is a sense that there can only be recourse if someone has broken a rule. This is quite similar to the sense amongst many researchers, that they will only be able to â€˜protectâ€™ the results they make public by making them available under an explicit licence. That the only way they can have any recourse against someone â€˜misusingâ€™ â€˜theirâ€™ results is if they are able to show that they have broken the terms of a licence.
The problem with this, as we know, is two-fold. First if someone does break the terms of the licence then frankly your chance of actually doing anything about it is pretty minimal. Secondly, and more importantly from the perspective of those of us interested in re-use and re-purposing, we know that pretty much any licensing system will create incompatibilities that prevent combining datasets, or using them in new ways, even when that wasnâ€™t the intention of the original licensor.

There is an interesting parallel here with the Brand/Ross affair. It is entirely possible that no laws, or even BBC rules, have been broken. Does this mean they get off scott free? No, Brand has resigned and Ross has been suspended with his popular Friday night TV show apparently not to be recorded this week. The most interesting thing about the whole affair is that the central failure at the BBC was an editorial one. Some, so far unnamed, senior editor signed off and allowed the programme to be broadcast. What should have happened was that this editor should have blocked the programme or removed the offending passages. Not because a rule was broken but because it was not appropriate for the BBCâ€™s editorial standards. Because it violated the community norms of what is acceptable for the BBC to broadcast. Whether or not they broke any rules what was done was crass and offensive. Whether or not someone is technically in violation of a data re-use license, failing to provide adequate attribution to the generators of that dataset is equally crass and unacceptable behaviour.

What the BBC discovered was that when it doesnâ€™t live up to the standards that the wider community expects of it, that it receives withering censure. Indeed much of the most serious criticism came from some of its own programmes. It was the voice of the wider community (as mediated through the mass media admittedly) which has lead to the resignation and suspension. If it were just a question of â€˜rulesâ€™ it is entirely possible that nothing could have been done. And if rules were put in place that would have prevented it then the unintended consequence would almost certainly have been to block programmes that had valid dramatic or narrative reasons for carrying such a passage. Again, community censure was much more powerful than any tribunal arbitrating some set of rules.

Yes this is nuanced, yes it is difficult to get right, and yes there is the potential for mob rule. That is why there is a team of senior professional editors at the BBC charged with policing and protecting the â€˜community normsâ€™ of what is acceptable for the BBC brand. That is why the damage done to the BBCâ€™s brand will be severe. Standards, where it is explicit that the spirit is applied rather than the letter, where there are grey areas, can be much more effective than legalistic rules. When someone or some group clearly steps outside of the bounds then widespread censure is appropriate. It is then for individuals and organisations to decide how to apply that censure. And in turn to expect to be held to the same standards.

The cheats will always break the rules. If you use legalistic rules, then you invite legalistic approaches to getting around them. Those that try to apply the rules properly will then be hamstrung in their attempts to do anything useful while staying within the letter of the law. Community norms and standards of behaviour, appropriate citation, respect for peopleâ€™s work and views, can be much more effective.

Wilbanks, John. The Control Fallacy: Why OA Out-Innovates the Alternative. Available from Nature Precedings <http://hdl.handle.net/10101/npre.2008.1808.1> (2008)
Wilbanks, John. Chemspider: Good intentions and the fog of licensing. http://network.nature.com/people/wilbanks/blog/2008/05/10/chemspider-good-intentions-and-the-fog-of-licensing (2008)

Science commons and other are organising a workshop on Open Science issues as a satellite meeting of the European Science Open Forum meeting in July. This is pitched as an opportunity to discuss issues around policy, funding, and social issues with an impact on the â€˜Open Research Agendaâ€™. In preparation for that meeting I wanted to continue to explore some of the conflicts that arise between wanting to make data freely available as soon as possible and the need to protect the interests of the researchers that have generated data and (perhaps) have a right to the benefits of exploiting that data.

John Cumbers proposed the idea of a â€˜Protocolâ€™ for open science that included the idea of a â€˜use embargoâ€™; the idea that when data is initially made available, no-one else should work on it for a specified period of time. I proposed more generally that people could ask that people leave data alone for any particular period of time, but that there ought to be an absolute limit on this type of embargo to prevent data being tied up. These kinds of ideas revolve around the need to forge community norms â€“ standards of behaviour that are expected, and to some extent enforced, by a community. The problem is that these need to evolve naturally, rather than be imposed by committee. If there isnâ€™t community buy in then proposed standards have no teeth.

An alternative approach to solving the problem is to adopt some sort â€˜licenseâ€™. A legal or contractual framework that creates obligation about how data can be used and re-used. This could impose embargoes of the type that John suggested, perhaps as flexible clauses in the license. One could imagine an â€˜Open data â€“ six month analysis embargoâ€™ license. This is attractive because it apparently gives you control over what is done with your data while also allowing you to make it freely available. This is why people who first come to the table with an interest in sharing content always start with CC-BY-NC. They want everyone to have their content, but not to make money out of it. It is only later that people realise what other effects this restriction can have.

I had rejected the licensing approach because I thought it could only work in a walled garden, something which goes against my view of what open data is about. More recently John Wilbanks has written some wonderfully clear posts on the nature of the public domain, and the place of data in it, that make clear that it canâ€™t even work in a walled garden. Because data is in the public domain, no contractual arrangement can protect your ability to exploit that data, it can only give you a legal right to punish someone who does something you havenâ€™t agreed to. This has important consequences for the idea of Open Science licences and standards.

If we argue as an â€˜Open Science Movementâ€™ that data is in and must remain in the public domain then, if we believe this is in the common good, we should also argue for the widest possible interpretation of what is data. The results of an experiment, regardless of how clever its design might be, are a â€˜fact of natureâ€™, and therefore in the public domain (although not necessarily publically available). Therefore if any person has access to that data they can do whatever the like with it as long as they are not bound by a contractual arrangement. If someone breaks a contractual arrangement and makes the data freely available there is no way you can get that data back. You can punish the person who made it available if they broke a contract with you. But you canâ€™t recover the data. The only way you can protect the right to exploit data is by keeping it secret. The is entirely different to creative content where if someone ignores or breaks licence terms then you can legally recover the content from anyone that has obtained it.

Why does this matter to the Open Science movement? Arenâ€™t we all about making the data available for people to do whatever anyway? It matters because you canâ€™t place any legal limitations on what people do with data you make available. You canâ€™t put something up and say â€˜you can only use this for Xâ€™ or â€˜you can only use it after six monthsâ€™ or even â€˜you must attribute this dataâ€™. Even in a walled garden, once there is one hole, the entire edifice is gone. The only way we can protect the rights of those who generate data to benefit from exploiting it is through the hard work of developing and enforcing community norms that provide clear guidelines on what can be done. Itâ€™s that or simply keep the data secret.

What is important is that we are clear about this distinction between legal and ethical protections. We must not tell people that their data can be protected because essentially they can’t. And this is a real challenge to the ethos of open data because it means that our only absolutely reliable method for protecting people is by hiding data. Strong community norms will, and do, help but there is a need to be careful about how we encourage people to put data out there. And we need to be very strong in condemning people who do the ‘wrong’ thing. Which is why a discussion on what we believe is ‘right’ and ‘wrong’ behaviour is incredibly important. I hope that discussion kicks off in Barcelona and continues globally over the next few months. I know that not everyone can make the various meetings that are going on – but between them and the blogosphere and the ‘streamosphere‘ we have the tools, the expertise, and hopefully the will, to figure these things out.

The Open Data licensing issue – Deepak Singh [viaÂ Zemanta]
On the erosion of the public domain – John Wilbanks
Chemspider: Good intentions and the fog of licensing – John Wilbanks
Going Legal on CC-0 [viaÂ Zemanta]

Tag: Community norms

What Russel Brand and Jonathan Ross can teach us about the value of community norms

Data is free or hidden – there is no middle ground