Protocols for Open Science
One of the strong messages that came back from the workshop we held at the BioSysBio meeting was that protocols and standards of behaviour were something that people would appreciate having available. There are many potential issues that are raised by the idea of a ‘charter’ or ‘protocol’ for open science but these are definitely things that are worth talking about. I thought I would through a few ideas out and see where they go. There are some potentially serious contradictions to be worked through.To begin with it is useful to point at other efforts at definition and standardisation. Two that are key for Open Science are the Open Knowledge Definition developed by the Open Knowledge Foundation and the efforts of Creative Commons and Science Commons and particularly the development of the cc0 protocol for placing works in the public domain. At a basic level the work of the OKF can be seen as defining what it means to be ‘Open’ whereas CC and SC are working on licenses and protocols that make content, data, and works available in a way that satisfies the OKF definition of open.
So to a first approximation we can define Open Science as;
“Science carried out in such a way that all the products of research are made available in a way that satisfies the terms of the Open Knowledge Definition”
This raises the first of many issues. To strictly satify the OKD it is necessary to publish in Strong Open Access journals (i.e. strictly BBB compliant journals). A paper is a major output from most research and therefore it should necessarily be available under an OKD compliant license (such as CC-BY). An alternative is to publish a preprint but this raises the question of the contribution of referees and editors to the content of the article. Is this not part of the process? Or are these additional contributors not necessarily doing Open Science so their contribution cannot be expected, necessarily to be licensed? Publication in strictly Open Access journals is not always possible or practicable (although PLoS ONE goes a long way to making this argument unsustainable) so from a practical perspective making a pre-print available is probably acceptable.
However there are many other technical details that have the potential to raise significant issues. The Open Knowledge Definition explicitly states there are no restrictions on the re-use of data. One of John Cumber’s motivations in opening the discussion on protocols was the desire to provide a mechanism for releasing data while at the same time requiring that people don’t run off and analyse the data that someone has spent several years of their life on. This is an ethical issue as well as a ‘legal’ issue. It may be valuable to separate the two. If data is made available as part of an Open Science project it appears to me that it should be issues under an OKD compliant license. This is separate to the issue of the person making that data available making a clear statement to request that people do not analyse or process the data for a specified embargo period. In my view, any license that allows access to data, but prohibits its re-use in any specific fashion cannot be described as ‘Open’. This is the distinction between Strong Open Access literature and Weak Open Access literature and is an important distinction to maintain if the word ‘Open’ is to have any semantic value. Respect of people’s requests to allow them to finish off a piece of work is simply being decent. Some may not, but with the data in the open, a strong case can be made to show that someone has behaved unethically, even if what they have done is not illegal.
The other key aspect of licensing for Open Science is attribution. Strong attribution is at the core of good science anyway. I am unconvinced that explicitly requiring strong attribution as part of an Open Science license is required although it can do no harm. Particularly in the case of data, which cannot generally be copyrighted, it is challenging to enforce a requirement for attribution. Attribution and its necessity needs to be strong supported by social pressure. Licensing can help but we need much stronger social protocols, including the strengthening of journal editorial practise, and if necessary punishment in place to protect the rights of scientists to be recognised.
The flip side of this is that we need to recognise that where many smart people are thinking about the same things it is entirely plausible for them to independently come to the same conclusion. Proving theft will always be challenging although technology is making this easier. But the principle of innocence until proven guilty must be maintained. Witchhunts are a real potential issue, as is decision making and punishment behind closed doors. If there are concerns then they must be raised in an open fashion with the evidence (which is after all raw data) made fully available.
The Open Knowledge Definition makes no comment on timeframes, or timeliness. One of the aims of Open Science is to make data available as soon as possible (or practicable). But the definition I’ve given above doesn’t mention this at all. I think this may be a good thing though. It allows us a route out of the embargo problem because it implies it is possible to be practicing Open Science regardless of when the results are made available. Thus publication in an Open Access journal with accompaning release of all the raw data is Open Science. Making all the data available as it is generated is also Open Science, but what is more it is Open Notebook Science.
By maintaining this distinction we offer a route both for those who are making the effort but can’t go the whole way as well as making it possible for people to release data immediately but under a limited or embargoed license that automatically releases the full rights at a later date. There are completely legitimate reasons for placing data or portions of data under an embargo. Some may be under a very long embargo (e.g. where personal data or medical records are involved) but this need not in and of itself prevent people practicing and promoting Open Science.
I therefore propose the definition I have given above with the following examples of how this definition could be satisified. The following practices could therefore be labelled as ‘Open Science’. However a specific project could not be said to be open until the complete products of that research are available under an OKD compliant license:
- Making the full body of all research material available as near to when it is generated as is practicable. Also known as Open Notebook Science.
- Making a proportion (which may be zero) of material immediately available under and OKD compliant license with the remaining material available under a limited license for a specified period of time, after which it reverts to an OKD compliant license.
- Making a proportion (which may be zero) of material immediately available under an OKD compliant license with the remaining material unavailable for a specified period, after which it is made available under an OKD compliant license.
- Any combination of 2 and 3
- Publication in a Strong Open Access journal associated with release of all relevant material under an OKD compliant licence
- Making available a pre-print version of a paper published in a non-Open Access journal associated with release of all relevant material under an OKD compliant license
The distinction is made here between practicing Open Science, where there is a committment made to release material at some point in the future, and whether a given project or set of material is open. Protocols 5 and 6 are only Open once the material is made available, whereas Protocols 2, 3, and 4 are Open once the commitment has been made to make the full set of material available. This makes it possible for those people who want to make an effort but for whatever reason are not able to make the material directly available to still consider their contribution to fall under the banner of Open Science. We don’t need to exclude people who are genuinely philosophically with us. After all we need all the help we can get.
Image from Wikipedia. Prize for the first person who figures out the relevance (those who work in the building are excluded)
Edits: Following Peter Suber and Stevan Harnad’s adoption of the terms ‘Strong OA’ and ‘Weak OA’ I have edited this post to use those terms.