Why and where we search…

This quote is grabbed from a comment by Jean-Claude Bradley at bbgm in reply to my comment on Deepak’s post on my post on…. anyway my original comment was that our Wiki review would not be indexed on Google Scholar which is where people might go for literature searches

Jean-Claude:

Getting on Google Scholar is something on my list to look into – if anyone knows how to do it please let us know. But from our Sitemeter tracking on UsefulChem it is clear that scientists are using Google to search for (and find) actionable scientific information.

Now this is an interesting point and it mirrors what I do. Jean-Claude has established that a lot of the ‘new’ traffic coming to UsefulChem comes from Google searches for specific information. Specific molecules in many cases but also spectra and other experimental data. If I’m looking for information, or a resource, scientific or otherwise, I will do a generic Google search for the most part (the most successful recent one was for ‘sticky apple pudding‘ – the result was very good indeed – see the Waitrose.com link).

But if I’m looking for scientific literature I will go to PubMed or sometimes to Google Scholar if I’m getting frustrated. I only ever use WOK for citation based searching (i.e. who cited a paper) or on the rare occasions when I’m looking for material that is not indexed in PubMed. Partly this is because I really like the ‘related items’ tab in PubMed. But what strikes me is that in my mind I have obviously divided these classes of searches up into two different things: ‘information/resources’ and ‘literature’. I bet this correlates quite strongly with both age and with scientific field. Do others out there think of these things as different or as all part of a continuum of information?

I recently saw a talk on a ‘Research Information Centre’ being developed by Microsoft, a sort of portal for handling research projects and all the associated information. This is at an early stage of development but one of the features they were working on was an integrated search where you could add and subtract various items (PubMed, WOK, Google, GoogleScholar, and various toll access info sources as well). GoogleScholar could do this well. So as Jean-Claude says above. Anybody got any contacts with the developers? We could just talk to them…

UK research council policies on open data

I was looking through the website of the Biotechnology and Biological Sciences Research Council the other day looking for policies on data sharing and open access. You can find the whole policy here but here are the edited highlights;

…BBSRC is committed to getting the best value for the funds we invest and believes that helping to make research data more readily available will reinforce open scientific enquiry and stimulate new investigations and analyses…

…BBSRC expects research data generated as a result of BBSRC support to be made available with as few restrictions as possible in a timely and responsible manner to the scientific community for subsequent research…In line with the BBSRC Statement on Safeguarding Good Scientific Practice, data should also be retained for a period of ten years after completion of a research project…

Ten years? I wonder how many UK scientists really work to that standard in practise (and I don’t include ‘I know its around here somewhere…’)?

…BBSRC supports the view that those enabling sharing should receive full and appropriate recognition by funders, their academic institutions and new users for promoting secondary research…

…BBSRC reserves the right to implement a more prescriptive approach to data sharing for research initiatives…

There are also detailed guidance notes and a FAQ for those who want to follow up.

The focus here is really on large coherent data sets rather than aggregating or indexing diffuse sets of online notebooks that I am more interested in. However I am in the process of writing some proposals that I want to embed an Open Notebook Science approach into so it will be interesting to see what the referees comments come back looking like on those. All proposals to BBSRC since April this year have had to have a section on ‘Data Sharing’ that explicitly includes published and unpublished data.

The need for open data is getting more mainstream

Via Jean-Claude Bradley on UsefulChem, an article in Wired on making more of the ‘Dark Data’ out there available. As Jean-Claude notes this is focussed mainly on the notion of ‘failed experiments’ and ‘positive bias’ but there is much more background data out there. Experiments that don’t quite make the grade for inclusion in the paper or are just one of many that may be useful from a statistical perspective. How many synthetic chemistry papers give the range of yields achieved for a reaction? Or for a PCR reaction.

But its good to see more of this happening in the mainstream media and especially that Jean-Claude is getting the kudos for pushing the Open Notebook Science agenda. As this gets more mainstream it will filter through to the funders and other bodies.

Postscript: The article was originally commented on by Attilla at Pimm where there are more thoughtful comments on this.

Limits to openness – where is the boundary?

I’ve been fiddling with this post for a while and I’m not sure where its going but I think other people’s views might make the whole thing clearer. This is after all why we believe in being open. So here it is in its unfinished and certainly unclarified form. All comments gratefully received.

One issue that got a lot of people talking at the Scifoo lives on session on Monday (transcript here) was the question of where the boundaries between what should and should not be open lie. At one level it seems obvious: the structure of a molecule can’t really have privacy issues whereas it is clear that a patient’s medical data should remain private. The issue came up a lot at the recent All Hands UK E-science meeting where the issues were often about census data or geographical data that could pinpoint specific people. It seems obvious that people’s personal data should be private but where do we draw the line? I am uncomfortable with a position where it is ‘obvious’ that my data should be open but ‘obvious’ that personal medical or geographical data should not be. Ideally I would like to find a clear logical distinction.

Continue reading “Limits to openness – where is the boundary?”

Scifoo Lives On session: Open notebook science case studies

Yesterday afternoon the Open Notebook Science case studies session was held as part of the Scifoo lives on sessions at Nature Island, Second Life. Jean-Claude Bradley organised, moderated and spoke first followed by me and Jeremiah Faith. We all spoke about experiences and implementation of different approaches to open notebook science.

Jean-Claude has put the transcript up here.

There was an active discussion about the need for more fun in science and the way in which science has become secretive has taken a lot of the fun out of it. CW Underwood talked about being sick of the ‘Secret Squirrel’ nature of science. One thing that was very encouraging was that Jeremiah said that in his search for his next post he had raised the issue as to whether the possible PI objected to him keeping his notebook open. So far he had had no objections.

Other issues that came up:

Open notebook science takes work and discipline.

It does involve some effort to get set up and to keep systems running as well as to maintain the observation that makes sure that things are running properly. CW pointed out that his PI would regard this as a waste of time. I can see this perspective being quite a strong one and a slow one to counter. Arguably the benefits of putting the effort in are not yet tangible enough to be convincing to people who are happy with the way their science works as it is.

What is the best system for holding the notebook?

Three different systems were presented. The UsefulChem Wiki by Jean-Claude using publically available hosted services. Our Blog based notebook which is a custom built and somewhat closed system. And Jeremiah uses Tex to generate a PDF of the whole thing. Jeremiah’s presentation included the comment that he had shown several people both a Wiki and the Tex based system and they had all preferred the Tex based one. This is the opposite of my experience where people seem to prefer whatever is closest to Word. This may be different communities or maybe just different people we have had contact with.

My feeling is that the three groups have evolved different systems because of three main things. Firstly a different initial aim, my initial aim for instance was not really an open notebook but an effective means of capturing data and procedures. Secondly differences in the procedures being carried out and the culture we work within. I am still slowly working my way through putting up Exp098 from UsefulChem in our blog system and this is certainly showing up some differences but I’m not sure I know what they mean yet.

Finally I think we have a different view of what the lab book is and what the ‘ideal’ lab book would look like. Jeremiah’s point was that he, and others, wanted it to ‘look like a lab book’, which is fair enough. I think my group is somewhere in the middle and Jean-Claude has pushed the idea of what a lab book is one step further. The finished product is a summary of the experiment – not precisely a point by point record of everything that happened along the way, that is all in the history tab – but the visual product is a clear description of the experiment that is immediately accessible to an outsider as to how to repeat the experiment and what the data and conclusions were. The point is,within the wiki framework, there is no need to worry about editing the page because the history is all still there. This means that it can be taken from the record as it goes – which is still kept – right through to ‘finished’ in a form that can go straight into a thesis or paper. I’m still not entirely comfortable with this, but I’m not entirely sure that this is particularly logical.

In any case in the end I think it will depend on what you want and why you want to go down the ONS route. There’s still a lot of work to be done.

Limits to openness

There was a discussion on where the boundaries should lie as to what can be open or not. I will handle this in another post because I think this is something I want to think about quite hard.

Postscript

On my way into work on Thursday last week I bumped into one of my RAL colleagues who, among other things, works on our communications and public relations. He thought that the ‘talk’ in SL made a nice little story and went to our central STFC comms people to see whether he might place it somewhere (website, newsletter, etc). Apparently the answer came back that they wouldn’t issue a press release (which would have been rather over the top in any case) because we didn’t have an institutional policy on Open Access.

Postscript 2

Jean-Claude has also reviewed the session at the UsefulChem Blog.

Replication of Usefulchem Exp098 continued

I am continuing this in a new post rather than keeping mucking with the old one.

Currently I am working on reproducing the description of Exp098 from Jean-Claude Bradley’s UsefulChem Wiki within our blog based notebook to identify differences in practise. The reproduction can be found at;

http://chemtools.chem.soton.ac.uk/projects/blog/blogs.php/blog_id/15/

then click on ‘Usefulchem_exp098’ under the ‘Sandpit Group’ heading on the right hand side and explore from there

18/09/07 15:00 UTC I have added the next two steps of the experiment, the addition of methanol followed by the addition of NaOH to neutralise the solution. In doing this I have possible done something slightly the wrong way around as a reading of the original Exp098 suggests that this step was not originally planned but added later. I still need to add all the NMRs in a sensible fashion. This is it has to be said somewhat tedious but it does keep the relationships between things clear.

I have probably divided this up far more than is necessary in retrospect and the division makes it difficult to read. This probably is a distinction between the way we think about molecular biology and synthetic chemistry. Chemistry does often involve small steps with characterisation along the way on the various materials generated. Each of these steps only require a sentence or more to describe. Most of our molecular biology requires some form of table to detail the inputs and there is usually very little analysis along the way. The differences therefore seem to be: more things in, less in situ analysis.

It might well be more informative to actually do a synthetic chemistry experiment and record that in our system. Because I am doing this replication in bits of spare time I have a bit too much time to think about it as I go along.

Replication of UsefulChem Exp098 in the Southampton blog notebook

In a previous post I said I would try to replicate an experiment from the UsefulChem open Wiki notebook within our blog system to see how it might look. This post is to record what I am doing as I do it. Thus this is the lab book I am using to record the process and decisions I have taken in using a lab book. The pages in the notebook can be found at;

http://chemtools.chem.soton.ac.uk/projects/blog/blogs.php/blog_id/15/

I have chosen to use Exp098 as this involved several different stages and modifications to the wiki page over many weeks. My aim is to try and record this in the way we would. This will involve some changes to the text and the process but I will try to re-use the original text as far as is possible. In the spirit of open notebook science I will make this visible as I go but as this may take me a while (days to a week or two) to finish this page is likely to be unstable and you may wish to come back to it. Ironically I am therefore using this notebook more in the way that Jean-Claude’s group use the Wiki than the way we use the blog.

  1. 13/09 16:58 UTC I have added two initial posts. In the first I have created a product from the previous notional reaction. Exp098 is a study on the stability of a previously generated compound, utilising a specific instance of that compound described in a previous experiment (Exp064). The second post is the initial description of the protocol. This is cut and pasted from the very first version of Exp098 and represents the experimental plan that I would put into the blog before going into the lab.
  2. In Exp098 the process is split into two phases, first a series of reagents are mixed and the reaction is monitored by NMR. After completion the reaction was neutralised and then extracted into organic solvent and dried before further analysis. Because analysis has been carried out on two different ‘samples’ I am therefore going to split the post up in what might appear to be a slightly odd way to an organic chemist. The first stage will generate one product, which will then be subject to analyses (further procedures). The first product will then be subjected to a second procedure (neutralization and drying) to generate a further product (the ‘real’ product) which will be subjected to further analysis.
  3. 13/09 17:20 UTC I have added the first procedure and product post. I now need to make a decision about whether I set up one post with all the NMR analysis from the time course in it or multiple analyses, one for each time point (I can do this with a template) with the data for each in that. Or I could set it up with one procedure post containing all the NMR descriptors but a separate product for each that contains the actual spectrum. I think the latter may be the best. This provides a way of scraping metadata for each of the spectra. It also means that the data can be added slightly later without directly editing the procedure post. I will create a new section for ‘Analysis’ to distinguish it from procedure.
  4. 17/09 13:06 UTC: I have now added all the NMR data for the time course. This was a laborious process obviously but it does meant that it is reasonably clear what goes where. For the separate HOMODEC experiment I didn’t bother putting the data in separately as this is obvious which goes with which. I haven’t as yet put in the NMR data for the starting material 064C which ought really to have been there first.