Policy and technology for e-science – A forum on on open science policy

I’m in Barcelona at a satellite meeting of the EuroScience Open Forum organised by Science Commons and a number of their partners.  Today is when most of the meeting will be with forums on ‘Open Access Today’, ‘Moving OA to the Scientific Enterprise:Data, materials, software’, ‘Open access in the the knowledge network’, and ‘Open society, open science: Principle and lessons from OA’. There is also a keynote from Carlos Morais-Pires of the European Commission and the lineup for the panels is very impressive.

Last night was an introduction and social kickoff as well. James Boyle (Duke Law School, Chair of board of directors of Creative Commons, Founder of Science commons) gave a wonderful talk (40 minutes, no slides, barely taking breath) where his central theme was the relationship between where we are today with open science and where international computer networks were in 1992. He likened making the case for open science today with that of people suggesting in 1992 that the networks would benefit from being made freely accessible, freely useable, and based on open standards. The fears that people have today of good information being lost in a deluge of dross, of their being large quantities of nonsense, and nonsense from people with an agenda, can to a certain extent be balanced against the idea that to put it crudely, that Google works. As James put it (not quite a direct quote) ‘You need to reconcile two statements; both true. 1) 99% of all material on the web is incorrect, badly written, and partial. 2) You probably  haven’t opened an encylopedia as a reference in ten year.

James gave two further examples, one being the availability of legal data in the US. Despite the fact that none of this is copyrightable in the US there are thriving businesses based on it. The second, which I found compelling, for reasons that Peter Murray-Rust has described in some detail. Weather data in the US is free. In a recent attempt to get long term weather data a research effort was charged on the order of $1500, the cost of the DVDs that would be needed to ship the data, for all existing US weather data. By comparison a single German state wanted millions for theirs. The consequence of this was that the European data didn’t go into the modelling. James made the point that while the European return on investment for weather data was a respectable nine-fold, that for the US (where they are giving it away remember) was 32 times. To me though the really compelling part of this argument is if that data is not made available we run the risk of being underwater in twenty years with nothing to eat. This particular case is not about money, it is potentially about survival.

Finally – and this you will not be surprised was the bit I most liked – he went on to issue a call to arms to get on and start building this thing that we might call the data commons. The time has come to actually sit down and start to take these things forward, to start solving the issues of reward structures, of identifying business models, and to build the tools and standards to make this happen. That, he said was the job for today. I am looking forward to it.

I will attempt to do some updates via twitter/friendfeed (cameronneylon on both) but I don’t know how well that will work. I don’t have a roaming data tariff and the charges in Europe are a killer so it may be a bit sparse.

Open Science and the developing world: Good intentions, bad implementation?

I spent last week in Cuba. I was there on holiday but my wife (who is a chemistry academic) was on a work trip to visit collaborators. This meant I had the opportunity to talk to a range of scientists and to see the conditions they work under. One of the strong arguments for Open Science (literature access, data, methods, notebooks) is that it provides access to scientists in less priviledged countries to both peer reviewed research as well as to the details of methodology that can enable them to carry out their science. I was therefore interested to see both what was available to them and whether they viewed our efforts in this area as useful or helpful. I want to emphasise that these people were doing good science in difficult circumstances by playing to their strengths and focussing on achievable goals. This is not second rate science, just science that is limited by access to facilities, reagents, and information.

Access to the literature

There is essentially no access to the subscriber-only literature.  Odd copies of journal issues are highly valued and many people get by by having visiting positions at institutes in the developed world. I talked to a few people about our protein ligation work and they were immensely grateful that this was published in an open access journal. However they were uncertain about publishing in open access journals due to the perceived costs.  While it is likely that they could get such costs waived I believe there is an issue of pride here in not wishing to take ‘charity’. Indeed, in the case of Cuba it may be illegal for US based open access publishers to provide such assistance. It would be interesting to know whether this is the case.

Overall though, it is clear that acccess to the peer reviewed literature is a serious problem for these people.  Open Access publishing provides a partial solution to this problem. I think to be effective it is important that this not be limited to self archving, as for reasons I will come back to, it is difficult for them to find such self archived papers. It is clear that mandating archival on a free access repository can help.

Access to primary data

Of more immediate interest to me was whether people with limited access to the literature saw value in having free access to the primary data in open notebooks. Again, people were grateful for the provision of access to information as this has the potential to make their life easier. When you have limited resources it is important to make sure that things work and that they produce publishable results. Getting details information on methodology of interest is therefore very valuable. Often the data that we take for granted is not available (fluorescence spectra, NMR, mass spectrometry) but details like melting points, colours, retention times can be very valuable.

There were two major concerns; one is a concern we regularly see, that of information overload. I think this is less of a concern as long as search engines make it possible to find information that is of interest. Work needs to be done on this but I think it is clear that some sort of cross between Google Scholar and Amazon’s recommendation system/Delicious etc. (original concept suggested by Neil Saunders) can deal with this.  The other concern, relating to them adopting  such approaches, was one that we have seen over and over again, that of ‘getting scooped’. Here though the context is subtley different and there is a measure of first world-developing world politics thrown in. These scientists are, understandably, very reluctant to publicise initial results because the way they work is methodical and slow. Very often the key piece of data required to make up a paper can only be obtained on apparatus that is not available in house or requires lengthy negotiations with potential overseas collaborators. By comparison it would often be trivially easy for a developed world laboratory to take the initial results and turn out the paper.

The usual flip side argument holds here; by placing an initial result in the public domain it may be easier for them to find a collaborator who can finish of the paper but I can understand their perspective. These are people struggling against enormous odds to stake out a place for themselves in the scientific community. The first world does not exactly have an outstanding record on acknowledging or even valuing work in developing countries so I can appreciate a degree of scepticism on their part. I hope that this may be overcome eventually but given that the assumption of most people in my own community is that by being open we are bound to be shafted I suspect we need to get our own house in order first.

The catch…

All of this is well and good. There are many real and potential benefits for scientists in the developing world if we move to more open styles of science communication. This is great, and I think it is a good argument for more openness. However there is a serious problem with the way we present this information and our reliance on modern web tools to do it. Its a very simple problem: bandwidth.

All of our blogs, our data, and indeed the open access literature is very graphics heavy. I actually tried to load up the front page of openwetware.org while sitting at the computer of the head of the department my wife was visiting (the department has two networked computers). Fifteen minutes later it was still loading.  The PLoS One front page was similarly sluggish. I get irritated if my download speeds drop below 500K/second, at home, and I will give up if they go down to 100K. We were seeing download rates of 44 bytes/second at the worst point. In some cases this can even make search engines unuseable making it near impossible to track down the self-archived versions of papers. Cuba is perhaps a special case because the US embargo means they have no access to the main transatlantic and North American cables, in effect the whole country is on a couple of bundles of phone lines, but I suspect that even while access is becoming more pervasive the penetration of reasonable levels of bandwidth is limited in the developing world.

The point of this is that access is about more than just putting stuff up, it is also about making it accessible. If we are serious about providing access, and expanding our networks to include scientists who do not have the advantages that we have, then this necessarily includes thinking about low bandwidth versions of the pages that provide information. I looked through PLoS One, openwetware, BioMedCentral, and couldn’t find a ‘text only version’ button on any of them (to be fair there isn’t one on our lab blog either).  I appreciate the need to present things in an appealling and useful format, and indeed the need to place advertising to diversify revenue streams. I guess the main point is not to assume that by making it available, that you are necessarily making it accessible. If universal accessibility is an important goal then some thought needs to go into alternative presentations.

Overall I think there are real benefits for these scientists when we make things available. The challenges shouldn’t put us off doing it but perhaps it is advisable to bear in mind the old saw; If you want to help people, make sure you find out what they need first.