I am thinking about how to present the case for Open Science, Open Notebook Science, and Open Data at Science in the 21st Century, the meeting being organised by Sabine Hossenfelder and Michael Nielsen at the Perimeter Institute for Theoretical Physics. I’ve put up a draft abstract and as you might guess from this I wanted to make an economic case that the waste of resources, both human and monetary is not something that is sustainable for the future. Here I want to rehearse that argument a bit further as well as explore the business case that could be presented to Google/Gates Foundation as a package that would include the development of the Science Exchange ideas that I blogged about last week.
This line of thinking arose from Jenny Rohn’s comment on her own blog where we were discussing open notebooks (my emphasis);
In my example above, the Evil Lab who stole my idea that Gene X is involved in Pathway Y publishes their paper to great fanfare in Nature in 2008, and my open notebook shows I made the same connection in 2007. Big deal. There are probably 200 groups working on Pathway Y, and several of them were bound to stumble across the same thing, including Evil Lab.
Now this would be a difficult thing to pin down in terms of real numbers. How much of funded research internationally is simply replication of other work? How much of the work done by people who don’t win the race actually gets published? And how many people with potentially valuable contributions to make have their careers blighted by this sort of competitive replication? This would be very difficult to accurately quantify. However it is the case that this effect is going to be strongest in the most competitive fields; i.e. precisely those fields where an open approach is hardest to sell. Thus if we accept that there is wasteful replication, then removing that waste, via open practice, would disproportionately benefit the fields where people are most resistant. If you accept the further premise that the reason these fields are competitive is because they have the most direct potential to improve people’s lives then the argument for increased efficiency is even stronger.
We shouldn’t regard competition as a bad thing. It is good for maintaining standards and indeed for increasing efficiency. Nor is replication of results a bad thing. But simple replication through a lack of communication, and then abandonment of results, because they are the same as what someone else has already published is inefficient. An interesting question is whether it is more effective for constructive replication to be done with knowledge of the other work, thus making sure it is complementary, or in complete ignorance, ensuring that there aren’t any issues of unconscious assumptions being built in due to awareness of the other work. But this is something that can be worked out. People manage double blind trials so I see no reason why collaborating groups could choose not to confer if that were appropriate. Overall there are gains to be made both in preventing replication but also in encouraging the right type of replication; of creating a positive encouragement to shed new light on a system by replicating results in a slightly different context or from a different perspective.
There are further efficiency gains to be made through rapid access to new information. This must be tempered by the risks that this information may be of lower quality than the traditionally peer reviewed literature and without real data it is difficult to know where the balance lies. In the business world, timeliness of information is critical and therefore high quality real time data is very valuable. We can wave our hands around and come up with some figures for this, say saving a month of time on 1% of postdoc projects (is that a conservative estimate? Who knows?). This would add up to a significant quantity of money over time, certainly to tens, possibly hundreds of millions of dollars per year.
But possibly the biggest gains are the hardest to estimate. What is the value to a research funder of ensuring the research they support is available for re-use in new and unexpected ways? Enough that the UK government has spent hundreds of millions on e-science to try and develop systems to enable storage and handling of a large data sets over the last several years but putting a figure on the actual value is tough, particularly when the technical problems are still to be solved and many of the benefits are still to materialise. To ask the question a different way, what is the economic value of GenBank? Of the Protein Data Bank? Huge I would have thought, but I wouldn’t know how to put a number on it. Economists, however do develop numbers for this kind of thing so finding some numbers ought to be do-able.
To me the biggest efficiency gains may come from the ability to distribute work and have problems solved by those best able to handle them. The opportunity to get the right information to the right people at the right time could, in my view, make a real step change in the rate at which we can work possible. Again, very difficult to quantify, and there may be losses as well. Potentially poorer communication in a distributed network where the partners may not even know each other could lead to misunderstandings and wasted effort. I believe Jean-Claude’s experience is that the Open Notebook approach helps to mitigate against this and improves communication over a traditional approach, but it remains a potential confounding factor.
So to wrap up. There are potentially large efficiency gains to be made by carrying out science in the open. Funders will both have the data at their fingertips and know that they are generating that data more effectively. All of this relies absolutely on a technical implementation that means the data is captured and made available effectively but those are issues that we know about and are working on. And it makes the case for working with funders who know and understand about the issues of handling and processing data. But the underlying point is that with the right framework the gains in terms of results per currency unit will rise. For a relatively small investment in infrastructure development the returns on that much bigger investment into general research will rise significantly.
I have avoided trying to come up with any specific numbers here because I am sure my calculations would just show my ignorance of economics. We should be talking to economists to try and quantify the range of values these efficiency gains could lead to and talking to research funders about getting the background information that would enable these values to be pinned down. If we can achieve qualitatively more science for the same money then working with business minded funders to help them increase their impact may well be the way forward.
Sorry to sound like a broken record but raw data changes the dynamic of the discussion. If you provide the raw data that supports your claim, people are forced to address the issues at hand squarely. If you don’t provide the raw data, nothing of substance can really be proved.
Sorry to sound like a broken record but raw data changes the dynamic of the discussion. If you provide the raw data that supports your claim, people are forced to address the issues at hand squarely. If you don’t provide the raw data, nothing of substance can really be proved.
It is hard to quantify something that we cannot measure. There are many stories of people having to change projects because some other group managed to publish the work first but there are no numbers of how often this happens. Maybe we could try to organize a survey to measure this. We could try to organize this in collaboration with some magazine/journal or just try to send it to as many groups as we know.
It is hard to quantify something that we cannot measure. There are many stories of people having to change projects because some other group managed to publish the work first but there are no numbers of how often this happens. Maybe we could try to organize a survey to measure this. We could try to organize this in collaboration with some magazine/journal or just try to send it to as many groups as we know.
I am not sure economists are your best bet. (or perhaps I am confusing them with bean-counters). In my experience, the advances that the open approach are likely to bring are exactly the things they are unable/unwilling to measure and is part of the reason we are in the mess we are in now. This is because it is nothing that is usually measured directly (although you make a strong case for those things that can be seen as direct output) but are synergistic outcomes from random events. We’re having enough trouble understanding these things in molecular systems, much less with people!
And in my monday morning disaffectedness:
If we can achieve qualitatively more science for the same money then working with business minded funders to help them increase their impact may well be the way forward.
Or they will just give us less money.
I think the principles are right, but there are still some holes that need to be filled. Unfortunately if I knew how to do it, I already would have. :)
Are there some values/economics that have been done for the Opensource community? This might be a good starting point.
I am not sure economists are your best bet. (or perhaps I am confusing them with bean-counters). In my experience, the advances that the open approach are likely to bring are exactly the things they are unable/unwilling to measure and is part of the reason we are in the mess we are in now. This is because it is nothing that is usually measured directly (although you make a strong case for those things that can be seen as direct output) but are synergistic outcomes from random events. We’re having enough trouble understanding these things in molecular systems, much less with people!
And in my monday morning disaffectedness:
If we can achieve qualitatively more science for the same money then working with business minded funders to help them increase their impact may well be the way forward.
Or they will just give us less money.
I think the principles are right, but there are still some holes that need to be filled. Unfortunately if I knew how to do it, I already would have. :)
Are there some values/economics that have been done for the Opensource community? This might be a good starting point.