The economic case for Open Science
I am thinking about how to present the case for Open Science, Open Notebook Science, and Open Data at Science in the 21st Century, the meeting being organised by Sabine Hossenfelder and Michael Nielsen at the Perimeter Institute for Theoretical Physics. I’ve put up a draft abstract and as you might guess from this I wanted to make an economic case that the waste of resources, both human and monetary is not something that is sustainable for the future. Here I want to rehearse that argument a bit further as well as explore the business case that could be presented to Google/Gates Foundation as a package that would include the development of the Science Exchange ideas that I blogged about last week.
This line of thinking arose from Jenny Rohn’s comment on her own blog where we were discussing open notebooks (my emphasis);
In my example above, the Evil Lab who stole my idea that Gene X is involved in Pathway Y publishes their paper to great fanfare in Nature in 2008, and my open notebook shows I made the same connection in 2007. Big deal. There are probably 200 groups working on Pathway Y, and several of them were bound to stumble across the same thing, including Evil Lab.
Now this would be a difficult thing to pin down in terms of real numbers. How much of funded research internationally is simply replication of other work? How much of the work done by people who don’t win the race actually gets published? And how many people with potentially valuable contributions to make have their careers blighted by this sort of competitive replication? This would be very difficult to accurately quantify. However it is the case that this effect is going to be strongest in the most competitive fields; i.e. precisely those fields where an open approach is hardest to sell. Thus if we accept that there is wasteful replication, then removing that waste, via open practice, would disproportionately benefit the fields where people are most resistant. If you accept the further premise that the reason these fields are competitive is because they have the most direct potential to improve people’s lives then the argument for increased efficiency is even stronger.
We shouldn’t regard competition as a bad thing. It is good for maintaining standards and indeed for increasing efficiency. Nor is replication of results a bad thing. But simple replication through a lack of communication, and then abandonment of results, because they are the same as what someone else has already published is inefficient. An interesting question is whether it is more effective for constructive replication to be done with knowledge of the other work, thus making sure it is complementary, or in complete ignorance, ensuring that there aren’t any issues of unconscious assumptions being built in due to awareness of the other work. But this is something that can be worked out. People manage double blind trials so I see no reason why collaborating groups could choose not to confer if that were appropriate. Overall there are gains to be made both in preventing replication but also in encouraging the right type of replication; of creating a positive encouragement to shed new light on a system by replicating results in a slightly different context or from a different perspective.
There are further efficiency gains to be made through rapid access to new information. This must be tempered by the risks that this information may be of lower quality than the traditionally peer reviewed literature and without real data it is difficult to know where the balance lies. In the business world, timeliness of information is critical and therefore high quality real time data is very valuable. We can wave our hands around and come up with some figures for this, say saving a month of time on 1% of postdoc projects (is that a conservative estimate? Who knows?). This would add up to a significant quantity of money over time, certainly to tens, possibly hundreds of millions of dollars per year.
But possibly the biggest gains are the hardest to estimate. What is the value to a research funder of ensuring the research they support is available for re-use in new and unexpected ways? Enough that the UK government has spent hundreds of millions on e-science to try and develop systems to enable storage and handling of a large data sets over the last several years but putting a figure on the actual value is tough, particularly when the technical problems are still to be solved and many of the benefits are still to materialise. To ask the question a different way, what is the economic value of GenBank? Of the Protein Data Bank? Huge I would have thought, but I wouldn’t know how to put a number on it. Economists, however do develop numbers for this kind of thing so finding some numbers ought to be do-able.
To me the biggest efficiency gains may come from the ability to distribute work and have problems solved by those best able to handle them. The opportunity to get the right information to the right people at the right time could, in my view, make a real step change in the rate at which we can work possible. Again, very difficult to quantify, and there may be losses as well. Potentially poorer communication in a distributed network where the partners may not even know each other could lead to misunderstandings and wasted effort. I believe Jean-Claude’s experience is that the Open Notebook approach helps to mitigate against this and improves communication over a traditional approach, but it remains a potential confounding factor.
So to wrap up. There are potentially large efficiency gains to be made by carrying out science in the open. Funders will both have the data at their fingertips and know that they are generating that data more effectively. All of this relies absolutely on a technical implementation that means the data is captured and made available effectively but those are issues that we know about and are working on. And it makes the case for working with funders who know and understand about the issues of handling and processing data. But the underlying point is that with the right framework the gains in terms of results per currency unit will rise. For a relatively small investment in infrastructure development the returns on that much bigger investment into general research will rise significantly.
I have avoided trying to come up with any specific numbers here because I am sure my calculations would just show my ignorance of economics. We should be talking to economists to try and quantify the range of values these efficiency gains could lead to and talking to research funders about getting the background information that would enable these values to be pinned down. If we can achieve qualitatively more science for the same money then working with business minded funders to help them increase their impact may well be the way forward.