A personal view of Open Science – Part IV – Policies and standards
A question that needs to be asked when contemplating any major change in practice is the balance and timing of ‘bottom up’ versus ‘top-down’ approaches for achieving that change. Scientists are notoriously un-responsive to decrees and policy initiatives but as has been discussed they are also inherently conservative and generally resistant to change led from within the community as well. For those advocating the widespread, and ideally rapid, adoption of more open practice in science it will be important to strike the right balance between calling for mandates and conditions for funding or journal submission and of simply adopting these practices in their own work. While the motivation behind the adoption of data sharing policies by funders such as the UK research councils is to be applauded it is possible for such intiatives to be counterproductive if the policies are not supported by infrastructure development, appropriate funding, and appropriate enforcement. Equally, standards and policy statements can send a powerful message on the aspirations of funders to make the research they fund more widely available and, for the most part, when funders speak, scientists listen.
One Approach for Mainstream Adoption – The fully supported paper
There are two broad approaches to standards that are currently being discussed. The first of these is aimed at mainstream acceptance and uptake and can be described as ‘The fully supported paper’. This is a concept that is simple on the surface but very complex to implement in practice. In essence it is the idea that the claims made in a peer reviewed paper in the conventional literature should be fully supported by a publically accessible record of all the background data, methodology, and data analysis procedures that contribute to those claims. On one level this is only a slightly increased in requirements from the Brussels Declaration made by the Internaional Association of Scientific, Technical, and Medical Publishers in 2007 which states;
Raw research data should be made freely available to all researchers. Publishers encourage the public posting of the raw data outputs of research. Sets or sub-sets of data that are submitted with a paper to a journal should wherever possible be made freely accessible to other scholars
The degree to which this declaration is supported by publishers and the level to which different journals require their authors to adhere to it is a matter for debate but the principle of availability of background data has been accepted by a broad range of publishers. It is therefore reasonable to consider the possibility of making the public posting of data as a requirement for submission. At a simple level this is already possible. For specific types of data repositories already exist and in many cases most journals require submission of these data types to recognised respositories. More generally it is possible to host data sets in some institutional repositories and with the expected announcement of a large scale data hosting service from Google the argument that this is not practicable is becoming unsustainable. While such datasets may have limited discoverability and limited metadata, they will at least be discoverable from the papers that reference them. It is reasonable to expect sufficent context to be provided in the published paper to make the data useable.
However the data itself, except in specific cases, is not enough to be useful to other researchers. The detail of how that data was collected and how it was processed are critical for making a proper analysis of whether the claims made in a paper to be properly judged. Once again we come to the problem of recording the process of research and then presenting that in a form which is both detailed enough to be widely useful but not so dense as to be impenetrable. The technical challenges of delivering a fully supported paper are substantial. However it is difficult to argue that this shouldn’t be available. If claims made in the scientific literature cannot be fully verified can they be regarded as scientific? Once again – while the target is challenging – it is simply a proposal to do good science, properly communicated.
Aspirational Standards – celebrating best practice in open science
While the fully supported paper would be a massive social and technical step forward it in many ways is no more open than the current system. It does not deal with the problem of unpublished or unsuccessful studies that may never find a home in a traditional peer reviewed paper. As discussed above the ‘fully supported paper’ is not really ‘open science'; it is just good science. What then are the requirements, or standards for ‘open science’. Does there need to be a certificate or a set of requirements that need to be met before a project, individual, or institution can claim they are doing Open Science. Or is Open Science simply too generic and prone to misinterpretation?
I would argue that while ‘Open Science’ is a very generic term it has real value as a rallying point or banner. It is a term which generates significant positive reaction amongst the general public, the mainstream media, and large sections of the research community. Its very vagueness also allows some flexibility making it possible to welcome contributions from publishers, scientists, and funders which while not 100% open are nonetheless positive and helpful. Within this broad umbrella it is then possible to look at defining or recomending practices and standards and giving these specific labels for identification.
The main work in the area of defining relevant practices and standards has been carried out by Science Commons and the Open Knowledge Foundation. Science Commons have published four ‘Principles for Open Science‘ which focus on the availability and accessiblity of published literature, research tools, and data, and the development of cyberinfrastructure to make this possible. These four principles currently do no explicitly include the availability of process, which has been covered in detail above, but provide a clear set of criteria which could form the basis of standards. Broadly speaking research projects, individuals, or institutions that deliver on these principles could be said to be doing Open Science. The Open Knowledge Definiton, developed by the Open Knowledge Foundation, is another useful touchstone here. Another possible defining criterion for Open Science is that all the relevant material is made available under licenses that adhere to the definition.
The devil, naturally, lies in the details. Are embargoes on data and methodology appropriate, and if so, in what fields and how should they be constructed? For data that cannot be released should specific exceptions be made, or special arrangments made to hold data in secure repositories? Where the same group is doing open and commercial research how should the divisions between these projects be defined and declared? These details are important, and will take time to work out. In the short term it is therefore probably more effective to identify and celebrate examples of open science, define best practice and observe how it works (and does not work) in the real world. This will raise the profile of Open Science without making it immediately an exclusive preserve of those with the luxury of radically changing practice. It enables examples of best practice to be held up as aspirational standards, providing the goals for others to work towards, and the impetus for the tool and infrastructure development that will support them. Many government funders are starting to introduce data sharing mandates, generally with very weak wording, but in most cases these refer to the expectation that funded research will adhere to the standard of ‘best practice’ in the relevant field. At this stage of development it may be more productive to drive adoption throgh the strategic support of improving best practice in a wide range of fields than to attempt to define strict standards.
The community advocating more open practice in scientific research is growing in size and influence. The major progress made in the past 12-18 months by the Open Access movement and the development of deposition and data sharing mandates by a range of research funders show that real progress is being made in increasing access to both the finished products of research and the materials that support them. While there have been significant successes this remains a delicate moment. There is a risk of over enthusiasm driving expectations which cannot be delivered and of alienating the mainstream community that we wish to draw in. The fears and concerns of researchers in widening access to their work need to be addressed sensitively and seriously, pointing out the benefits but also acknowledging the risks involved in adopting these practices.
It will not be enough to develop tools and infrastructure that, if adopted, would revolutionize science communication. Those tools must be built with an understanding of how scientists work today, and with the explicit aim of embedding these tools in existing workflows. The need for, and the benefits of, adopting controlled vocabularies needs to be sold much more effectively to the mainstream scientific community. The ontologies community also needs to recognise that there are cases and areas where the use of strict controlled vocabularies is not appropriate. Web 2.0 and Semantic web technologies are not competitors but are complementary approaches that are appropriate in different contexts. Again, the right question to ask is ‘what do scientists do? And what can we do to make that work better?'; not how can we make scientists see they need to do things the ‘right’ way.
Finally, it is my belief that now is not the time to set out specific and strict standards of what qualifies as Open Science. It is the right time to discuss the details of what these standards might look like. It is the right time to look at examples of best practice; to celebrate these and to see what can be learnt from them, but with our current lack of experience, and lack of knowledge of what the unintended consequences of specific standards might be, it is too early to pin down the details of those standards. It is a good time to be clearly articulating the specific aspirations of the movement, and to provide goals that communities can aggregate around; the fully supported paper, the Science Commons principles, and the Open Knowledge Definition are all useful starting points. Open Science is gathering momentum, and that is a good thing. But equally it is a good time to take stock, identify the best course forward, and make sure that we ar carrying as many people forward with use as we can.