data-publishing – Science in the Open

A very interesting paper from Caroline Savage and Andrew Vickers was published in PLoS ONE last week detailing an empirical study of data sharing of PLoS journal authors. The results themselves, that one out ten corresponding authors provided data, are not particularly surprising, mirroring as they do previous studies, both formal [pdf] and informal (also from Vickers, I assume this is a different data set), of data sharing.

Nor are the reasons why data was not shared particularly new. Two authors couldn’t be tracked down at all. Several did not reply and the remainder came up with the usual excuses; “too hard”, “need more information”, “university policy forbids it”. The numbers in the study are small and it is a shame it wasn’t possible to do a wider study that might have teased out discipline, gender, and age differences in attitude. Such a study really ought to be done but it isn’t clear to me how to do it effectively, properly, or indeed ethically. The reason why small numbers were chosen was both to focus on PLoS authors, who might be expected to have more open attitudes, and to make the request from the authors, that the data was to be used in a Master educational project, plausible.

So while helpful, the paper itself isn’t doesn’t provide much that is new. What will be interesting will be to see how PLoS responds. These authors are clearly violating stated PLoS policy on data sharing (see e.g. PLoS ONE policy). The papers should arguably be publicly pulled from the journals. Most journals have similar policies on data sharing, and most have no corporate interest in actually enforcing them. I am unaware of any cases where a paper has been retracted due to the authors unwillingness to share (if there are examples I’d love to know about them! [Ed. Hilary Spencer from NPG pointed us in the direction of some case studies in a presentation from Philip Campbell).

Is it fair that a small group be used as a scapegoat? Is it really necessary to go for the nuclear option and pull the papers? As was said in a Friendfeed discussion thread on the paper: “IME [In my experience] researchers are reeeeeeeally good at calling bluffs. I think there’s no other way“. I can’t see any other way of raising the profile of this issue. Should PLoS take the risk of being seen as hardline on this? Risking the consequences of people not sending papers there because of the need to reveal data?

The PLoS offering has always been about quality, high profile journals delivering important papers, and at PLoS ONE critical analysis of the quality of the methodology. The perceived value of that quality is compromised by authors who do not make data available. My personal view is that PLoS would win by taking a hard line and the moral high ground. Your paper might be important enough to get into Journal X, but is the data of sufficient quality to make it into PLoS ONE? Other journals would be forced to follow – at least those that take quality seriously.

There will always be cases where data can not or should not be available. But these should be carefully delineated exceptions and not the rule. If you can’t be bothered putting your data into a shape worthy of publication then the conclusions you have based on that data are worthless. You should not be allowed to publish. End of. We are running out of excuses. The time to make the data available is now. If it isn’t backed by the data then it shouldn’t be published.

Update: It is clear from this editorial blog post from the PLoS Medicine editors that PLoS do not in fact know which papers are involved.Â As was pointed out by Steve Koch in the friendfeed discussion there is an irony that Savage and Vickers have not, in a sense, provided their own raw data i.e. the emails and names of correspondents. However I would accept that to do so would be a an unethical breach of presumed privacy as the correspondents might reasonably have expected these were private emails and to publish names would effectively be entrapment. Life is never straightforward and this is precisely the kind of grey area we need more explicit guidance on.

Savage CJ, Vickers AJ (2009) Empirical Study of Data Sharing by Authors Publishing in PLoS Journals. PLoS ONE 4(9): e7078. doi:10.1371/journal.pone.0007078

Full disclosure: I am an academic editor for PLoS ONE and have raised the issue of insisting on supporting data for all charts and graphs in PLoS ONE papers in the editors’ forum. There is also a recent paper with my name on in which the words “data not shown” appear. If anyone wants that data I will make sure they get it, and as soon as Nature enable article commenting we’ll try to get something up there. The usual excuses apply, and don’t really cut the mustard.

A session entitled “The Future of the Paper” at Science Online London 2009 was a panel made up of an interesting set of people, Lee-Ann Coleman from the British Library, Katharine Barnes the editor of Nature Protocols, Theo Bloom from PLoS and Enrico Balli of SISSA Medialab.

The panelists rehearsed many of the issues and problems that have been discussed before and I won’t re-hash here. My feeling was that the panelists didn’t offer a radical enough view of the possibilities but there was an interesting discussion around what a paper was for and where it was going. My own thinking on this has been recently revolving around the importance of a narrative as a human route into the data. It might be argued that if the whole scientific enterprise could be made machine readable then we wouldn’t need papers. Lee-Ann argued and I agree that the paper as the human readable version will retain an important place. Our scientific model building exploits our particular skill as story tellers, something computers remain extremely poor at.

But this is becoming an increasingly smaller part of the overall record itself. For a growing band of scientists the paper is only a means of citing a dataset or an idea. We need to widen the idea of what the literature is and what it is made up of. To do this we need to make all of these objects stable and citeable. As Phil Lord pointed out this isn’t enough because you also have to make those objects and their citations “count” for career credit. My personal view is that the market in talent will actually drive the adoption of wider metrics that are essentially variations of Page Rank because other metrics will become increasingly useless, and the market will become increasingly efficient as geographical location becomes gradually less important. But I’m almost certainly over optimistic about how effective this will be.

Where I thought the panel didn’t go far enough was in questioning the form of the paper as an object within a journal. Essentially each presentation became “and because there wasn’t a journal for this kind of thing we created/will create a new one”. To me the problem isn’t the paper. As I said above the idea of a narrative document is a useful and important one. The problem is that we keep thinking in terms of journals, as though a pair of covers around a set of paper documents has any relevance in the modern world.

The journal used to play an important role in publication. The publisher still has an important role but we need to step outside the notion of the journal and present different types of content and objects in the best way for that set of objects. The journal as brand may still have a role to play although I think that is increasingly going to be important only at the very top of the market. The idea of the journal is both constraining our thinking about how best to publish different types of research object and distorting the way we do and communicate science. Data publication should be optimized for access to and discoverability of data, software publication should make the software available and useable. Neither are particularly helped by putting “papers” in “journals”. They are helped by creating stable, appropriate publication mechanisms, with appropriate review mechanisms, making them citeable and making them valued. The point at which our response to needing to publish things stops being “well we’d better create a journal for that” then we might just have made it into the 21st century.

But the paper remains the way we tell story’s about and around our science. And if us dumb humans are going to keep doing science then it will continue to be an important part of the way we go about that.

Tag: data-publishing

The Future of the Paperâ€¦does it have one? (and the answer is yes!)