February 4, 2009December 30, 2009 by Cameron Neylon

Best practice for data availability â€“ the debate starts…well over there really

The issue of licensing arrangements and best practice for making data available has been brewing for some time but has just recently come to a head. John Wilbanks and Science Commons have a reasonably well established line that they have been developing for some time. Michael Nielsen has a recent blog post and Rufus Pollock, of the Open Knowledge Foundation, has also just synthesised his thoughts in response into a blog essay. I highly recommend reading John’s article on licensing at Nature Precedings, Michael’s blog post, and Rufus’ essay before proceeding. Another important document is the discussion of the license that Victoria Stodden is working to develop. Actually if you’ve read them go and read them again anyway â€“ it will refresh the argument.

To crudely summarize, Rufus makes a cogent argument for the use of explicit licenses applied to collections of data, and feels that share-alike provisions in licenses or otherwise do not cause major problems and that the benefit that arises from enforcing re-use outweighs the problem. John’s position is that it far better for standards to be applied through social pressure (“community norms”) rather than licensing arrangements. He also believes that share-alike provisions are bad because they break interoperability between different types of objects and domains. One point that I think is very important and (I think) is a point of agreement is that some form of license or at dedication to the public domain will be crucial to developing best practice. Even if the final outcome of debate is that everything will go in the public domain it should be part of best practice to make that explicit.

Broadly speaking I belong to John’s camp but I don’t want to argue that case with this post. What is important in my view is that the debate takes place and that we are clear about what the aims of that debate are. What is it we are trying to achieve in the process of coming to (hopefully) some consensus of what best practice should look like?
It is important to remember that anyone can assert a license (or lack thereof) on any object that they (assert they) own or have rights over. We will never be able to impose a specific protocol on all researchers, all funders. Therefore what we are looking for is not the perfect arrangement but a balance between what is desired, what can be practically achieved, and what is politically feasible. We do need a coherent consensus view that can be presented to research communities and research funders. That is why the debate is important. We also need something that works, and is extensible into the future, where it will stand up to the development of new types of research, new types of data, new ways of making that data available, and perhaps new types of researchers altogether.

I think we agree that the minimal aim is to enable, encourage, and protect into the future the ability to re-use and re-purpose the publicly published products of publicly funded research. Arguments about personal or commercial work are much harder and much more subtle. Restricting the argument to publicly funded researchers makes it possible to open a discussion with a defined number of funders who have a public service and public engagement agenda. It also makes the moral arguments much clearer.

In focussing on research that is being made public we short circuit the contentious issue of timing. The right, or the responsibility, to commercially exploit research outputs and the limitations this can place on data availability is a complex and difficult area and one in which agreement is unlikely any time soon. I would also avoid the word â€œOpenâ€. This is becoming a badly overloaded term with both political and emotional overtones, positive and negative. Focussing on what should happen after the decision has been to go public reduces the argument to â€œwhat is best practice for making research outputs availableâ€. The question of when to make them available can then be kept separate. The key question for the current debate is not when but how.

So what I believe the debate should be about is the establishment, if possible, of a consensusÂ protocol or standard or license for enabling and ensuring the availability of the research outputs associated with publicly published, publicly funded research.Â Along side this is the question of establishing mechanisms, for researchers to implement and be supported to observe these standards, as well as for â€œenforcementâ€. These might be trademarks, community standards, or legal or contractual approaches as well as systems and software to make all of this work, including trackbacks, citation aggregators, and effective data repositories. In addition we need to consider the public relations issue of selling such standards to disparate research funders and research communities.

Perhaps a good starting point would be to pinpoint the issues where there is general agreement and map around those. If we agree some central principles then we can take an empirical approach to the mechanisms. We’re scientists after all aren’t we?

6 Replies to “Best practice for data availability â€“ the debate starts…well over there really”

Rufus Pollock says:

February 5, 2009 at 2:43 pm

Cameron, thanks for the write-up and useful comments.

I think everyone agrees that whatever your go for (PD only or not) you
need something explicit and that something explicit is going to look
like a license.

The real debate is about what can be ‘required’ of users of open
material and whether those ‘requirements’ go in norms or licenses.
These two items seem to get confused a lot.

Most of the arguments I see from e.g. John at Science Commons against
requiring attribution and share-alike seem to apply just as much to
norms as to licenses. But do people really want to give these up? It
seems to me that attribution is absolutely central to the scientific
community — and share-alike also seems pretty important to ‘open
science’ (no-one wants people taking lots of data and then keeping it
secret).

In that case it seems that the argument must be that these type of
provisions are somehow OK in norms but not in licenses. But this seems
weird (the only explanation I can think of is that the ‘norms’ are
really going to be ‘enforced’ but that does not seem a very attractive
answer).

Moving to the second point, whatever route you take (attribution or no
attribution requirement etc) what makes norms better than licenses?
Just like norms licenses are going to be enforced primarily by social
means though they have the added backup of being licenses (useful when
dealing those people especially outside of the ‘community’ who might
not play fair …). Furthermore, Licenses seem as flexible (in the
good ways) as norms are (see the original essay for more on this).
Rufus Pollock says:

February 5, 2009 at 2:43 pm

Cameron, thanks for the write-up and useful comments.

I think everyone agrees that whatever your go for (PD only or not) you
need something explicit and that something explicit is going to look
like a license.

The real debate is about what can be ‘required’ of users of open
material and whether those ‘requirements’ go in norms or licenses.
These two items seem to get confused a lot.

Most of the arguments I see from e.g. John at Science Commons against
requiring attribution and share-alike seem to apply just as much to
norms as to licenses. But do people really want to give these up? It
seems to me that attribution is absolutely central to the scientific
community — and share-alike also seems pretty important to ‘open
science’ (no-one wants people taking lots of data and then keeping it
secret).

In that case it seems that the argument must be that these type of
provisions are somehow OK in norms but not in licenses. But this seems
weird (the only explanation I can think of is that the ‘norms’ are
really going to be ‘enforced’ but that does not seem a very attractive
answer).

Moving to the second point, whatever route you take (attribution or no
attribution requirement etc) what makes norms better than licenses?
Just like norms licenses are going to be enforced primarily by social
means though they have the added backup of being licenses (useful when
dealing those people especially outside of the ‘community’ who might
not play fair …). Furthermore, Licenses seem as flexible (in the
good ways) as norms are (see the original essay for more on this).
Mike Chelen says:

February 11, 2009 at 4:32 am

Why should norms and licenses be in any way mutually exclusive? Open licenses each have degrees of freedom, ranging from basic copy permissions to entirely public domain. Codifying and applying these licenses sets clear boundaries for users, helping them know what legal restrictions exist. Within these limits the community standards still apply, as a flexible and improving set of best practices. Either licenses or community norms are helpful separately, and combined their effects work hand in hand to provide freedom of usage and ease of access, along with guidelines to achieve the best results.
Thanks for the nice discussion, still picking over all the articles =)
Mike Chelen says:

February 11, 2009 at 4:32 am

Why should norms and licenses be in any way mutually exclusive? Open licenses each have degrees of freedom, ranging from basic copy permissions to entirely public domain. Codifying and applying these licenses sets clear boundaries for users, helping them know what legal restrictions exist. Within these limits the community standards still apply, as a flexible and improving set of best practices. Either licenses or community norms are helpful separately, and combined their effects work hand in hand to provide freedom of usage and ease of access, along with guidelines to achieve the best results.
Thanks for the nice discussion, still picking over all the articles =)
Cameron Neylon says:

February 11, 2009 at 8:45 am

Mike, my concern is precisely that there are too many licences and most of them are totally incompatible. Thus you will struggle to find any two datasets in the most recent NAR database issue where you can legally combine the data. My personal belief is that the only way you can avoid this is by removing all legal conditions of use and essentially asking the users to â€œplay niceâ€. And this already works pretty well in the sciences. You don’t need a licence to force people to cite properly, you have a community that self-polices.

The other argument is that what I am looking for is a simple set of rules that we can sell to funding agencies as a protocol that they can sign up to and impose on funded researchers. So a single straightforward approach is desirable both for clarity and interoperability.

But as Heather Morrison pointed out in a recent email to the list (link above) we do need to take some time to look at all the options anyway. This was the point I got to at the end of my series of posts on the state of Open Science. That now is the time to look at what people are doing and what is working – to aggregate best practice and then try to pull something together out of that. The current debate is useful in getting the issues out and worked over though.
Cameron Neylon says:

February 11, 2009 at 8:45 am

Mike, my concern is precisely that there are too many licences and most of them are totally incompatible. Thus you will struggle to find any two datasets in the most recent NAR database issue where you can legally combine the data. My personal belief is that the only way you can avoid this is by removing all legal conditions of use and essentially asking the users to â€œplay niceâ€. And this already works pretty well in the sciences. You don’t need a licence to force people to cite properly, you have a community that self-polices.

The other argument is that what I am looking for is a simple set of rules that we can sell to funding agencies as a protocol that they can sign up to and impose on funded researchers. So a single straightforward approach is desirable both for clarity and interoperability.

But as Heather Morrison pointed out in a recent email to the list (link above) we do need to take some time to look at all the options anyway. This was the point I got to at the end of my series of posts on the state of Open Science. That now is the time to look at what people are doing and what is working – to aggregate best practice and then try to pull something together out of that. The current debate is useful in getting the issues out and worked over though.

Comments are closed.