Sourceforge for science

I got to meet Jeremiah Faith this morning and we had an excellent wide ranging discussion which I will try to capture in more detail later. However I wanted to get down some thoughts we had at the end of the discussion. We were talking about how to publicise and generate more interest and activity for Open Notebook Science. Jeremiah suggested the idea of a Sourceforge for science; a central clearing house somewhere on the web where projects could be described and people could opt in to contribute. There have been some ideas in this direction such as Totally retrosynthetic but I don’t think there has been a lot of uptake there.

This was all tied into the idea of making lab books findable and indexed in places where people might look for them. I have been taken with the way PostGenomic and ChemicalBlogSpace aggregate blogs, particularly blog posts on the peer reviewed literature and in the case of ChemicalBlogSpace aggregate comments on molecules, based on trawling for InChi Keys (I think). So can we propose that one of (both of?) these sites start aggregating online notebook posts? If we could make these point at peer reviewed papers online it would also be possible to use a modified version of the Blue Obelisk Grease Monkey that would popup whenever you were looking at a paper for which there was raw data online.

It wouldn’t be necessary, or perhaps even advisable, to limit these to people strictly practising Open Notebook Science. People could put up data once a paper was published or after a delay. Perhaps we could not even require that all the raw data be put up. If the barriers are lowered more people may do it. A range of appropriate tags (‘Partial Raw Data is available for this paper’, ‘Full raw data is available for this paper’, ‘Full raw data and associated data is available as an open notebook’,) would distinguish between what people are making available. Data could be dropped anywhere online and by aggregation it gains more visibility encouraging people to move from making specific data available towards making all their data available.

Any thoughts?

The youth of today…

I like to think of myself as still a young person – in many ways I am stretching the definition but I still just about qualify for various schemes for ‘young scientists’ that cut out at the age of 35 or nine years post PhD. However I have to admit that I have been feeling a bit old recently. I have been struck by how many of the people driving the open science agenda forward are either current or relatively recent graduate students

It is conventional for ‘older’ people to complain about falling standards in exams, literacy, presentation, respect etc etc but I think in this field there would be a strong case for the new generation of scientists to point to falling standards amongst older scientists as people have become gradually more secretive and protective of ‘their’ work. It bodes well for the future that there are so many articulate and energetic champions of open science coming through the ranks. I only hope that it isn’t simply that people lose their drive to be open as the realities of professional research hit post-graduation.

Why and where we search…

This quote is grabbed from a comment by Jean-Claude Bradley at bbgm in reply to my comment on Deepak’s post on my post on…. anyway my original comment was that our Wiki review would not be indexed on Google Scholar which is where people might go for literature searches

Jean-Claude:

Getting on Google Scholar is something on my list to look into – if anyone knows how to do it please let us know. But from our Sitemeter tracking on UsefulChem it is clear that scientists are using Google to search for (and find) actionable scientific information.

Now this is an interesting point and it mirrors what I do. Jean-Claude has established that a lot of the ‘new’ traffic coming to UsefulChem comes from Google searches for specific information. Specific molecules in many cases but also spectra and other experimental data. If I’m looking for information, or a resource, scientific or otherwise, I will do a generic Google search for the most part (the most successful recent one was for ‘sticky apple pudding‘ – the result was very good indeed – see the Waitrose.com link).

But if I’m looking for scientific literature I will go to PubMed or sometimes to Google Scholar if I’m getting frustrated. I only ever use WOK for citation based searching (i.e. who cited a paper) or on the rare occasions when I’m looking for material that is not indexed in PubMed. Partly this is because I really like the ‘related items’ tab in PubMed. But what strikes me is that in my mind I have obviously divided these classes of searches up into two different things: ‘information/resources’ and ‘literature’. I bet this correlates quite strongly with both age and with scientific field. Do others out there think of these things as different or as all part of a continuum of information?

I recently saw a talk on a ‘Research Information Centre’ being developed by Microsoft, a sort of portal for handling research projects and all the associated information. This is at an early stage of development but one of the features they were working on was an integrated search where you could add and subtract various items (PubMed, WOK, Google, GoogleScholar, and various toll access info sources as well). GoogleScholar could do this well. So as Jean-Claude says above. Anybody got any contacts with the developers? We could just talk to them…

The Directed evolution library construction Wiki Review

I and a few others have been working for some time on converting a review I wrote some time ago for publication on OpenWetWare. Jason Kelly put the initial work in in developing an area for reviews and it was from him that I had the idea of taking this forward.

The traditional review is a cornerstone of the research literature. The first port of call for virtually any researcher on moving into an unfamiliar area is to find a good review. These vary vastly in length, specificity, and usefulness. The one thing that all traditional reviews have in common is that they are out of date as soon as they are printed.

On the other hand collaborative tools on the web make it straightforward to update and maintain, or even correct the text of a review. This is an area where community support can provide real added value in generating a useful and useable resource that can be maintained. By placing a review within the framework of a Wiki it would be possible for regular updates, and links to the current literature to be made available and for an open discussion of differences of opinion to take place. Unlike e-notebooks there are less technical and user interface issues to be worked out. This really ought to be an easy win.

There are many details that remain to be worked out. Should these be moderated, who can be trusted to maintain them, and more practically how can the regular maintainance of such reviews be guaranteed. On OpenWetWare anyone with an account can edit and anyone can view. The best practise for maintaining these reviews remains to be worked. The first step is to get a community of people involved in doing this.

So the review is now in a reasonable state, and certainly more up to date than it was. Any comments are welcome, either here, or on the review itself. We are also starting to write a paper on the review which we hope to get published. I think this is necessary so as to provide a pointer from the traditional literature to the review. As it happens the review is one of the top hits on Google but this is not where people go to find literature. It doesn’t appear on a Google Scholar search (that top hit is the original review) and is obviously not on PubMed or WOK. Whether we can get the paper published remains to be seen. Time to start emailing journal editors…

Joint NSF-EPSRC programme in Chemistry – an opportunity for ONS?

Looking at the EPSRC website I came across the following call for proposals involving collaboration between a US and UK programme:

http://www.epsrc.ac.uk/CallsForProposals/NSF-EPSRCChemistryProposals07.htm

Now, being an academic I’m up for any method of trying to get money out the system, especially special programmes. But is there an opportunity here to do something quite exciting in the area of Open Notebooks for chemistry where we take Jean-Claude’s experience and our Lab Blogbook system and try to build something that combines the best of both or possibly better, something which is a superset of both? Biggest issue I can see is that it might not be seen as chemistry, but if we focus on the idea of getting the chemical data both in and back out again it might fly.

Deadline for outline applications is November 6th…

UK research council policies on open data

I was looking through the website of the Biotechnology and Biological Sciences Research Council the other day looking for policies on data sharing and open access. You can find the whole policy here but here are the edited highlights;

…BBSRC is committed to getting the best value for the funds we invest and believes that helping to make research data more readily available will reinforce open scientific enquiry and stimulate new investigations and analyses…

…BBSRC expects research data generated as a result of BBSRC support to be made available with as few restrictions as possible in a timely and responsible manner to the scientific community for subsequent research…In line with the BBSRC Statement on Safeguarding Good Scientific Practice, data should also be retained for a period of ten years after completion of a research project…

Ten years? I wonder how many UK scientists really work to that standard in practise (and I don’t include ‘I know its around here somewhere…’)?

…BBSRC supports the view that those enabling sharing should receive full and appropriate recognition by funders, their academic institutions and new users for promoting secondary research…

…BBSRC reserves the right to implement a more prescriptive approach to data sharing for research initiatives…

There are also detailed guidance notes and a FAQ for those who want to follow up.

The focus here is really on large coherent data sets rather than aggregating or indexing diffuse sets of online notebooks that I am more interested in. However I am in the process of writing some proposals that I want to embed an Open Notebook Science approach into so it will be interesting to see what the referees comments come back looking like on those. All proposals to BBSRC since April this year have had to have a section on ‘Data Sharing’ that explicitly includes published and unpublished data.

The need for open data is getting more mainstream

Via Jean-Claude Bradley on UsefulChem, an article in Wired on making more of the ‘Dark Data’ out there available. As Jean-Claude notes this is focussed mainly on the notion of ‘failed experiments’ and ‘positive bias’ but there is much more background data out there. Experiments that don’t quite make the grade for inclusion in the paper or are just one of many that may be useful from a statistical perspective. How many synthetic chemistry papers give the range of yields achieved for a reaction? Or for a PCR reaction.

But its good to see more of this happening in the mainstream media and especially that Jean-Claude is getting the kudos for pushing the Open Notebook Science agenda. As this gets more mainstream it will filter through to the funders and other bodies.

Postscript: The article was originally commented on by Attilla at Pimm where there are more thoughtful comments on this.

Limits to openness – where is the boundary?

I’ve been fiddling with this post for a while and I’m not sure where its going but I think other people’s views might make the whole thing clearer. This is after all why we believe in being open. So here it is in its unfinished and certainly unclarified form. All comments gratefully received.

One issue that got a lot of people talking at the Scifoo lives on session on Monday (transcript here) was the question of where the boundaries between what should and should not be open lie. At one level it seems obvious: the structure of a molecule can’t really have privacy issues whereas it is clear that a patient’s medical data should remain private. The issue came up a lot at the recent All Hands UK E-science meeting where the issues were often about census data or geographical data that could pinpoint specific people. It seems obvious that people’s personal data should be private but where do we draw the line? I am uncomfortable with a position where it is ‘obvious’ that my data should be open but ‘obvious’ that personal medical or geographical data should not be. Ideally I would like to find a clear logical distinction.

Continue reading “Limits to openness – where is the boundary?”

The issues of safety information in open notebook science

Research in most places today is done under more or less rigorous safety regimes. A general approach which I believe is fairly universal is that any action should in principle be ‘Risk Assessed’. For many everyday procedures such an assessment may not need to be written down but it is general practise in the UK that there needs to be a paper trail that demonstrates that such risk assessments are carried out. In practise this means that there is generally for any given laboratory procedure a document of some form in which the risks are assessed. This may in many cases be a tick box list or pro forma document.

In addition in the UK there is an obligation to consider whether a particular substance requires a specific assessment under the Control of Substances Hazardous to Health (COSHH) regulations. Again these are usually based on a pro forma template. Most researchers will have a folder containing both risk and COSHH assessments, or these may be held in a laboratory wide folder depending on local practise.

This month we have an extra person in working on our netural drift project which is being recorded in this blog. She felt that as the blog is the lab book it must contain these risk assessments and you can see these here. I have no problem with this and indeed it seems like a good idea to have this information available. So from the perspective of the group and electronic notebook practise this is good. 

From the open notebook perspective, if we are working towards applying the slogan of ‘No insider information’ this must necessarily include safety information. If we say how to do an experiment this arguably should include not just the procedure but other details: how do you work, what protection might you need, how should waste be disposed of. Many journals now request that any specific safety issues should be flagged in methods sections of papers.

But there is a flip side here. I am happy that our safety documentation is robust and works so I am not worried about ‘the inspectors’ seeing it on the web. Indeed I feel that having your work exposed is a good way of raising standards. It can be a bit bracing but if you’re not prepared to have the details of methodology public then should you be publishing it? Equally if you are worried about a bit of scrutiny of your safety documentation then should your lab really be operating at all?

However, what if someone takes this safety information, uses it, and still manages to injure themselves? What if the regulations in the UK are different, say, from those in the US. There is the potential for legal exposure here and this is the reason why most safety information from a chemical supplier says that anyone handling the compound in questions should use ventilators and full body protection (including for table salt and sugar). There is very little useful safety information available because anything that suggests that a particular compound is ‘safe’ creates legal exposure. We could put a disclaimer on our safety information to try and avoid this but that seems a little like cheating. Being ‘open’ means being open about as much as possible. I feel on balance that we should include it but there is a good argument we should leave it out or hide it for our own protection.