Show us the data now damnit! Excuses are running out.

A very interesting paper from Caroline Savage and Andrew Vickers was published in PLoS ONE last week detailing an empirical study of data sharing of PLoS journal authors. The results themselves, that one out ten corresponding authors provided data, are not particularly surprising, mirroring as they do previous studies, both formal [pdf] and informal (also from Vickers, I assume this is a different data set), of data sharing.

Nor are the reasons why data was not shared particularly new. Two authors couldn’t be tracked down at all. Several did not reply and the remainder came up with the usual excuses; “too hard”, “need more information”, “university policy forbids it”. The numbers in the study are small and it is a shame it wasn’t possible to do a wider study that might have teased out discipline, gender, and age differences in attitude. Such a study really ought to be done but it isn’t clear to me how to do it effectively, properly, or indeed ethically. The reason why small numbers were chosen was both to focus on PLoS authors, who might be expected to have more open attitudes, and to make the request from the authors, that the data was to be used in a Master educational project, plausible.

So while helpful, the paper itself isn’t doesn’t provide much that is new. What will be interesting will be to see how PLoS responds. These authors are clearly violating stated PLoS policy on data sharing (see e.g. PLoS ONE policy). The papers should arguably be publicly pulled from the journals. Most journals have similar policies on data sharing, and most have no corporate interest in actually enforcing them. I am unaware of any cases where a paper has been retracted due to the authors unwillingness to share (if there are examples I’d love to know about them! [Ed. Hilary Spencer from NPG pointed us in the direction of some case studies in a presentation from Philip Campbell).

Is it fair that a small group be used as a scapegoat? Is it really necessary to go for the nuclear option and pull the papers? As was said in a Friendfeed discussion thread on the paper: “IME [In my experience] researchers are reeeeeeeally good at calling bluffs. I think there’s no other way“. I can’t see any other way of raising the profile of this issue. Should PLoS take the risk of being seen as hardline on this? Risking the consequences of people not sending papers there because of the need to reveal data?

The PLoS offering has always been about quality, high profile journals delivering important papers, and at PLoS ONE critical analysis of the quality of the methodology. The perceived value of that quality is compromised by authors who do not make data available. My personal view is that PLoS would win by taking a hard line and the moral high ground. Your paper might be important enough to get into Journal X, but is the data of sufficient quality to make it into PLoS ONE? Other journals would be forced to follow – at least those that take quality seriously.

There will always be cases where data can not or should not be available. But these should be carefully delineated exceptions and not the rule. If you can’t be bothered putting your data into a shape worthy of publication then the conclusions you have based on that data are worthless. You should not be allowed to publish. End of. We are running out of excuses. The time to make the data available is now. If it isn’t backed by the data then it shouldn’t be published.

Update: It is clear from this editorial blog post from the PLoS Medicine editors that PLoS do not in fact know which papers are involved.Â As was pointed out by Steve Koch in the friendfeed discussion there is an irony that Savage and Vickers have not, in a sense, provided their own raw data i.e. the emails and names of correspondents. However I would accept that to do so would be a an unethical breach of presumed privacy as the correspondents might reasonably have expected these were private emails and to publish names would effectively be entrapment. Life is never straightforward and this is precisely the kind of grey area we need more explicit guidance on.

Savage CJ, Vickers AJ (2009) Empirical Study of Data Sharing by Authors Publishing in PLoS Journals. PLoS ONE 4(9): e7078. doi:10.1371/journal.pone.0007078

Full disclosure: I am an academic editor for PLoS ONE and have raised the issue of insisting on supporting data for all charts and graphs in PLoS ONE papers in the editors’ forum. There is also a recent paper with my name on in which the words “data not shown” appear. If anyone wants that data I will make sure they get it, and as soon as Nature enable article commenting we’ll try to get something up there. The usual excuses apply, and don’t really cut the mustard.

3 Replies to “Show us the data now damnit! Excuses are running out.”

Pingback: Twitter Trackbacks for Science in the open Â» Show us the data now damnit! Excuses are running out. [openwetware.org] on Topsy.com
Anna says:

September 22, 2009 at 7:45 am

On a good day: The email went into my spam folder. Therefore I didn’t respond.

On a slightly less good day: I haven’t got time to reply to your request because of the backlogue of other things I need to be doing, the pressure to publish more and who the hell are you anyway and what do you want with my data?

I imagine these could be possible scenarios. I also imagine that data sharing (on the last point) is more common between people who know each other (and have a positive or neutral relationship) – is there data here?

But finally, why not get journals to keep repositories of all relevant data? – then you don’t have the problem with author request and the journals are now perceived as doing something for their crust rather than ‘just’ delegating peer review to people to do for free (oh, and typesetting and … :).
Anna says:

September 22, 2009 at 7:45 am

On a good day: The email went into my spam folder. Therefore I didn’t respond.

On a slightly less good day: I haven’t got time to reply to your request because of the backlogue of other things I need to be doing, the pressure to publish more and who the hell are you anyway and what do you want with my data?

I imagine these could be possible scenarios. I also imagine that data sharing (on the last point) is more common between people who know each other (and have a positive or neutral relationship) – is there data here?

But finally, why not get journals to keep repositories of all relevant data? – then you don’t have the problem with author request and the journals are now perceived as doing something for their crust rather than ‘just’ delegating peer review to people to do for free (oh, and typesetting and … :).

Comments are closed.