Joint NSF-EPSRC programme in Chemistry – an opportunity for ONS?

Looking at the EPSRC website I came across the following call for proposals involving collaboration between a US and UK programme:

http://www.epsrc.ac.uk/CallsForProposals/NSF-EPSRCChemistryProposals07.htm

Now, being an academic I’m up for any method of trying to get money out the system, especially special programmes. But is there an opportunity here to do something quite exciting in the area of Open Notebooks for chemistry where we take Jean-Claude’s experience and our Lab Blogbook system and try to build something that combines the best of both or possibly better, something which is a superset of both? Biggest issue I can see is that it might not be seen as chemistry, but if we focus on the idea of getting the chemical data both in and back out again it might fly.

Deadline for outline applications is November 6th…

Scifoo Lives On session: Open notebook science case studies

Yesterday afternoon the Open Notebook Science case studies session was held as part of the Scifoo lives on sessions at Nature Island, Second Life. Jean-Claude Bradley organised, moderated and spoke first followed by me and Jeremiah Faith. We all spoke about experiences and implementation of different approaches to open notebook science.

Jean-Claude has put the transcript up here.

There was an active discussion about the need for more fun in science and the way in which science has become secretive has taken a lot of the fun out of it. CW Underwood talked about being sick of the ‘Secret Squirrel’ nature of science. One thing that was very encouraging was that Jeremiah said that in his search for his next post he had raised the issue as to whether the possible PI objected to him keeping his notebook open. So far he had had no objections.

Other issues that came up:

Open notebook science takes work and discipline.

It does involve some effort to get set up and to keep systems running as well as to maintain the observation that makes sure that things are running properly. CW pointed out that his PI would regard this as a waste of time. I can see this perspective being quite a strong one and a slow one to counter. Arguably the benefits of putting the effort in are not yet tangible enough to be convincing to people who are happy with the way their science works as it is.

What is the best system for holding the notebook?

Three different systems were presented. The UsefulChem Wiki by Jean-Claude using publically available hosted services. Our Blog based notebook which is a custom built and somewhat closed system. And Jeremiah uses Tex to generate a PDF of the whole thing. Jeremiah’s presentation included the comment that he had shown several people both a Wiki and the Tex based system and they had all preferred the Tex based one. This is the opposite of my experience where people seem to prefer whatever is closest to Word. This may be different communities or maybe just different people we have had contact with.

My feeling is that the three groups have evolved different systems because of three main things. Firstly a different initial aim, my initial aim for instance was not really an open notebook but an effective means of capturing data and procedures. Secondly differences in the procedures being carried out and the culture we work within. I am still slowly working my way through putting up Exp098 from UsefulChem in our blog system and this is certainly showing up some differences but I’m not sure I know what they mean yet.

Finally I think we have a different view of what the lab book is and what the ‘ideal’ lab book would look like. Jeremiah’s point was that he, and others, wanted it to ‘look like a lab book’, which is fair enough. I think my group is somewhere in the middle and Jean-Claude has pushed the idea of what a lab book is one step further. The finished product is a summary of the experiment – not precisely a point by point record of everything that happened along the way, that is all in the history tab – but the visual product is a clear description of the experiment that is immediately accessible to an outsider as to how to repeat the experiment and what the data and conclusions were. The point is,within the wiki framework, there is no need to worry about editing the page because the history is all still there. This means that it can be taken from the record as it goes – which is still kept – right through to ‘finished’ in a form that can go straight into a thesis or paper. I’m still not entirely comfortable with this, but I’m not entirely sure that this is particularly logical.

In any case in the end I think it will depend on what you want and why you want to go down the ONS route. There’s still a lot of work to be done.

Limits to openness

There was a discussion on where the boundaries should lie as to what can be open or not. I will handle this in another post because I think this is something I want to think about quite hard.

Postscript

On my way into work on Thursday last week I bumped into one of my RAL colleagues who, among other things, works on our communications and public relations. He thought that the ‘talk’ in SL made a nice little story and went to our central STFC comms people to see whether he might place it somewhere (website, newsletter, etc). Apparently the answer came back that they wouldn’t issue a press release (which would have been rather over the top in any case) because we didn’t have an institutional policy on Open Access.

Postscript 2

Jean-Claude has also reviewed the session at the UsefulChem Blog.

The issues of safety information in open notebook science

Research in most places today is done under more or less rigorous safety regimes. A general approach which I believe is fairly universal is that any action should in principle be ‘Risk Assessed’. For many everyday procedures such an assessment may not need to be written down but it is general practise in the UK that there needs to be a paper trail that demonstrates that such risk assessments are carried out. In practise this means that there is generally for any given laboratory procedure a document of some form in which the risks are assessed. This may in many cases be a tick box list or pro forma document.

In addition in the UK there is an obligation to consider whether a particular substance requires a specific assessment under the Control of Substances Hazardous to Health (COSHH) regulations. Again these are usually based on a pro forma template. Most researchers will have a folder containing both risk and COSHH assessments, or these may be held in a laboratory wide folder depending on local practise.

This month we have an extra person in working on our netural drift project which is being recorded in this blog. She felt that as the blog is the lab book it must contain these risk assessments and you can see these here. I have no problem with this and indeed it seems like a good idea to have this information available. So from the perspective of the group and electronic notebook practise this is good. 

From the open notebook perspective, if we are working towards applying the slogan of ‘No insider information’ this must necessarily include safety information. If we say how to do an experiment this arguably should include not just the procedure but other details: how do you work, what protection might you need, how should waste be disposed of. Many journals now request that any specific safety issues should be flagged in methods sections of papers.

But there is a flip side here. I am happy that our safety documentation is robust and works so I am not worried about ‘the inspectors’ seeing it on the web. Indeed I feel that having your work exposed is a good way of raising standards. It can be a bit bracing but if you’re not prepared to have the details of methodology public then should you be publishing it? Equally if you are worried about a bit of scrutiny of your safety documentation then should your lab really be operating at all?

However, what if someone takes this safety information, uses it, and still manages to injure themselves? What if the regulations in the UK are different, say, from those in the US. There is the potential for legal exposure here and this is the reason why most safety information from a chemical supplier says that anyone handling the compound in questions should use ventilators and full body protection (including for table salt and sugar). There is very little useful safety information available because anything that suggests that a particular compound is ‘safe’ creates legal exposure. We could put a disclaimer on our safety information to try and avoid this but that seems a little like cheating. Being ‘open’ means being open about as much as possible. I feel on balance that we should include it but there is a good argument we should leave it out or hide it for our own protection.

The Southampton Electronic Blog Notebook – Part 4 – Visualisation

In previous posts I have discussed the setup and rationale for how we are organising our blog-based electronic laboratory notebook. This has covered how the blog is actually organised. In this post I will look at the issue of how we actually view the blog and extract information.

The organisation of the blog with a ‘one item one post‘ approach creates a problem. There are a large number of posts to describe even a relatively simple process. For instance running two PCR reactions involves at least three posts, even before there is any consideration of the input materials. We have created these separate posts so as to use the links to encode information. It is however important to make sure we can get the information back out again.

Dealing with too many posts

The first problem is the number of posts. The generation of the product posts creates a large number of posts with relatively little information content. These posts are essentially place holders that provide each sample with a unique ID. They are not terribly useful in the process of figuring out what was done. A simple solution is therefore to provide views in which product posts are omitted. This has been done on the Neutral Drift Blog. Compare for instance the view you get on first arriving at the top level with that of all recent posts (e.g. all posts in September 2007). The entry view provides posts in the categories that are not products, materials, or templates (safety should possibly also be added to this). The ideal implementation here would include a user configurable view that would allow combinations of different categories but this is some way off yet. The entry view does however provide evidence of the value of the post categorisation system.

The entry view is a reasonable facimile of a paper notebook (although in reverse order). Entries appear in order without necessarily being obviously linked in a logical fashion. So if two experiments are being carried on in parallel it is not immediately obvious which is which. This is no worse than a paper lab book but the aim of the blog notebook is to make it easier to see the relationship between items. One approach is to follow the links through and this can be effective although it can also be confusing if there are many links. The provision of a list of posts that link to the current post is also useful (‘What links here’, generally at the bottom of the post). Finally the identity of an experiment can be recorded as metadata. See for example the Sandpit Blog where there are two separate activities that have been recorded: a demonstration at a conference, and the replication of Exp098 from the UsefulChem Wiki. (Aside – this is a good example of where a configurable blog view would be useful. It would be nice to select ‘Section = procedures and Sandpit group = amh2007).

A point worth noting is that there is one significant way in which an electronic lab notebook can be worse than a paper notebook. The physical object of the notebook is very good physical mnemonic (‘I know that experiment is about a quarter of the way through this book’) . There can be a very real sense of dislocation in using an electronic lab noteboo, especially when the same material can appear in different places on the page.

Alternative views

The ultimate aim for the Southampton blog notebook is to enable sophisticated searching through database of posts and their links. This would make it possible to ask questions like ‘How many PCR reactions have worked using this pot of polymerase vs that one’ . However this is some distance off. In the meantime there are relatively simple ways of representing the information that provide an alternative way of digestion the information.

RSS feeds: One of the simplest alternative is through an RSS feed reader. Each blog generates a simple RSS feed that can be used as a partially configurable view of what is happening. A future aim is to insert more of the metadata into the feed to allow more sophisticated manipulation and filtering using tools such as Yahoo Pipes (or workflow management tools like Taverna?). This has the potential to allow a very powerful means of any given reader pulling up the information of interest to them.

Timelines: The Timelines tool developed by the Simile project at MIT provides a new way of looking at the lab book. The web service takes an XML file with time/date information and generates a visual representation of the timeline. This is currently configured to display each post title with a colour coded button based on the post type (see here). Again the ideal would be to provide a user configurable filtering and colouring system. Regardless of the current limitations however, this provides a new of looking at the lab book that is not possible with a paper notebook. It represents our first step towards finding new and more effective ways of getting more information out of the system.

Network views: An appealling possible view is to represent the network of posts in a visual form that can the be navigated. While there are various web services available that show the relationship between web pages (e.g. links within a site – seems to be broken, or related pages according to Google) these do not actually provide the right information. Because the set of posts is served as a blog the relationship between web pages does not have a direct correspondence to the relationship between posts. This would not for instance be the case with a Wiki where a single web page is also a post. However the web site link visualisers are confused by sidebar menus of both Wikis and Blogs. None of these are insurmountable difficulties and we hope to talk to the people at TouchGraph about whether they have something that we can use.

Ultimately this has the potential to be a very powerful way of exploring the Blog. it is likely that the organisation of the posts will contain information that will be directly visible from the network (materials that are dodgy will be poorly connected, separate projects – or people who don’t work well together – may be nearly disjoint graphs. Pivoting from a timeline view to a network view while viewing each page has the potential to be a very human friedly way of dealing with a large quantity of information.

Who is the viewer?

Different viewers have different needs and we haven’t considered this in detail yet. In another post, Jean-Claude Bradley has commented on the parallel use of UsefulChem Wikis and Blogs to provide both the notebook repository and the ‘public face’ of the system. Researchers, supervisors, administrators, and outside viewers will all have different needs. What we focused mainly here is tools that are more useful for the researchers using the system. For outside viewers there will need to be additional systems in addition to the need for appropriate tagging so that pages appear in searches. One size will definitely not fit all but keeping things flexible and configurable will mean that a small number of systems will hopefully cover most needs. Filtering and collating tools available freely on the web are becoming quite sophisticated and easy enough to use that they may do a lot of the work for us.

Replication of Usefulchem Exp098 continued

I am continuing this in a new post rather than keeping mucking with the old one.

Currently I am working on reproducing the description of Exp098 from Jean-Claude Bradley’s UsefulChem Wiki within our blog based notebook to identify differences in practise. The reproduction can be found at;

http://chemtools.chem.soton.ac.uk/projects/blog/blogs.php/blog_id/15/

then click on ‘Usefulchem_exp098’ under the ‘Sandpit Group’ heading on the right hand side and explore from there

18/09/07 15:00 UTC I have added the next two steps of the experiment, the addition of methanol followed by the addition of NaOH to neutralise the solution. In doing this I have possible done something slightly the wrong way around as a reading of the original Exp098 suggests that this step was not originally planned but added later. I still need to add all the NMRs in a sensible fashion. This is it has to be said somewhat tedious but it does keep the relationships between things clear.

I have probably divided this up far more than is necessary in retrospect and the division makes it difficult to read. This probably is a distinction between the way we think about molecular biology and synthetic chemistry. Chemistry does often involve small steps with characterisation along the way on the various materials generated. Each of these steps only require a sentence or more to describe. Most of our molecular biology requires some form of table to detail the inputs and there is usually very little analysis along the way. The differences therefore seem to be: more things in, less in situ analysis.

It might well be more informative to actually do a synthetic chemistry experiment and record that in our system. Because I am doing this replication in bits of spare time I have a bit too much time to think about it as I go along.

Replication of UsefulChem Exp098 in the Southampton blog notebook

In a previous post I said I would try to replicate an experiment from the UsefulChem open Wiki notebook within our blog system to see how it might look. This post is to record what I am doing as I do it. Thus this is the lab book I am using to record the process and decisions I have taken in using a lab book. The pages in the notebook can be found at;

http://chemtools.chem.soton.ac.uk/projects/blog/blogs.php/blog_id/15/

I have chosen to use Exp098 as this involved several different stages and modifications to the wiki page over many weeks. My aim is to try and record this in the way we would. This will involve some changes to the text and the process but I will try to re-use the original text as far as is possible. In the spirit of open notebook science I will make this visible as I go but as this may take me a while (days to a week or two) to finish this page is likely to be unstable and you may wish to come back to it. Ironically I am therefore using this notebook more in the way that Jean-Claude’s group use the Wiki than the way we use the blog.

  1. 13/09 16:58 UTC I have added two initial posts. In the first I have created a product from the previous notional reaction. Exp098 is a study on the stability of a previously generated compound, utilising a specific instance of that compound described in a previous experiment (Exp064). The second post is the initial description of the protocol. This is cut and pasted from the very first version of Exp098 and represents the experimental plan that I would put into the blog before going into the lab.
  2. In Exp098 the process is split into two phases, first a series of reagents are mixed and the reaction is monitored by NMR. After completion the reaction was neutralised and then extracted into organic solvent and dried before further analysis. Because analysis has been carried out on two different ‘samples’ I am therefore going to split the post up in what might appear to be a slightly odd way to an organic chemist. The first stage will generate one product, which will then be subject to analyses (further procedures). The first product will then be subjected to a second procedure (neutralization and drying) to generate a further product (the ‘real’ product) which will be subjected to further analysis.
  3. 13/09 17:20 UTC I have added the first procedure and product post. I now need to make a decision about whether I set up one post with all the NMR analysis from the time course in it or multiple analyses, one for each time point (I can do this with a template) with the data for each in that. Or I could set it up with one procedure post containing all the NMR descriptors but a separate product for each that contains the actual spectrum. I think the latter may be the best. This provides a way of scraping metadata for each of the spectra. It also means that the data can be added slightly later without directly editing the procedure post. I will create a new section for ‘Analysis’ to distinguish it from procedure.
  4. 17/09 13:06 UTC: I have now added all the NMR data for the time course. This was a laborious process obviously but it does meant that it is reasonably clear what goes where. For the separate HOMODEC experiment I didn’t bother putting the data in separately as this is obvious which goes with which. I haven’t as yet put in the NMR data for the starting material 064C which ought really to have been there first.

UK E-Science all hands meeting – initial thoughts

If it hasn’t been obvious from what has gone previously I am fairly new to the whole E-science world. I am definitely not in any form a computer scientists. I’m not a computer-phobe either but my skills are pretty limited. It’s therefore a little daunting to be going for the first time to an e-science meeting. This is the usual story of not really knowing the people from this community and not necessarily having a clear idea of what people within the field or community think the priorities are.

The programme is available online and my first response on looking at it in detail was that I don’t even understand what most of the session titles mean. “OMII-UK” is a fairly inpenetrable workshop title for which the first talk is “Portalization Process for the Access Grid”. Now to be fair these are somewhat more specialised workshops and many of the plenary session names make more sense. This is normal when you go to an out-of-your-field conference but it will be interesting to see how much of the programme makes sense.

One of the issues with e-science programmes is the process of bringing the ‘outside’ scientist into the fold. Systems such as our lab e-notebook require an extra effort to use, certainly at the beginning, and during the development process there are often very few tangible benefits. Researchers are always time poor people so they want to see benefits. In theory we are here to demonstrate and promote our e-notebook system but I suspect this may be a case of preaching to the converted. It will be interesting to see a) whether we get much interest b) whether the comments we get are more on the technical implementation or the practical side of actually using it to record experiments.

One of the great things about starting this blog has been the way it has facilitated discussion with others interested in open notebook science and open science in general. I am less sure it has brought scientists who are interested in the details of the work in our notebook. My feeling is that this meeting may be a bit similar. On the other hand it may get us some good ideas on solving some of the problems of visualising the notebook that I want to discuss in a future post.

So if you are at the meeting and want to see the notebook please drop by to the BBSRC booth on Wednesday afternoon and do say hello if you see a shortish balding bearded guy who is looking lost or confused.

p.s. Thanks to whoever was running a meeting upstairs today. I didn’t realise I was stealing your lunch!

When is open notebook science not?

Well when it’s not open obviously.

There are many ways to provide all the information imagineable while still keeping things hidden. Or at least difficult to figure out or to find. The slogan ‘No insider information’ is useful because it provides a good benchmark to work towards. It is perhaps an ideal to attain rather than a practical target but thinking about what we know but is not clear from the blog notebook has a number of useful results. Clearly it helps us to see how open we are being but also it is helpful in identifying what it is that the notebook is not successfully capturing.

I have put up a series of posts recently in the ‘Sortase Cloning‘ blog notebook. The experiments I did on 29th August worked reasonably well. However this is not clear from the blog. Indeed I suspect our hypothetical ‘outsider’ would have a hard time figuring out what the point of the experiment is. Certainly the what is reasonably obvious, although it may be hidden in the detail, but the why is not. So the question is how to capture this effectively. We need a way of noting that an experiment works and that the results are interesting. In this case we have used Sortase to do two things that I don’t believe have yet been reported, fluorescently label a protein, and ligate a protein to a piece of DNA. This therefore represents the first report of this type of ligation using Sortase.

Perhaps more importantly, how do we then provide the keys that let interested people find the notebook? UsefulChem does this by providing InChi and smiles codes that identify specific molecules. Searching on the code by Google will usually bring UsefulChem up in the top few searches if the compound has been used. Searching on ‘Sortase’ the enzyme we are doing our conjugation with brings up our blog at number 14 or so. So not bad but not near the top and on the second page not the first. For other proteins with a wider community actively interested the blog would probably be much further down. Good tags and visibility on appropriate search engines (whatever they may turn out to be) is fairly critical to making this work.

Blogs vs Wikis and third party timestamps

I wanted to pull out some of the comments Jean-Claude Bradley has made on the e-notebook posts and think them through in more detail.

Jean-Claude‘s comment on this post:

There may be differences between fields but, in organic chemistry, we could not make a blog by itself work as an electronic notebook. The key problem was the assumption that an experiment could be recorded without further modification. But a lab notebook doesn’t work like that – the idea is to record work as it is being done and make observations and conclusions over time. For experiments that can take weeks, a blog post can be updated but there is no version tracking. It is thus difficult to prove who-knew-what-when. Using a wiki page as a lab notebook page gives you results in real time and a detailed trail of additions and corrections, with each version being addressable with a different url.

Thinking about this and looking at some examples on the UsefulChem Wiki I wondered whether this is largely down to a different way of thinking about the notebook rather than differences in field. I will use the UsefulChem Exp098 as an example for this.

This experiment has been modified over time and this can be tracked through the Wiki system. Now my initial reaction to this was ‘but you should never modify the notebook’. In our system we originally had no means of making any changes (for precisely this reason) but eventually introduced one because typographical errors were otherwise very annoying (and because at the moment incorporating all the links requires double editing). However our common use is not to modify the core of the post. Arguably this is a hangover from paper notebooks (never use whiteout, never remove pages).

In the case of the UsefulChem Wiki the rational objections to this kind of modification go away because it is properly tracked in a transparent fashion. However I still find myself somewhat uncomfortable with the notion it has been changed. I wonder whether this could cause an unfavourable impression in some minds? There is a good presentation with audio here where Jean-Claude describes the benefits of and good rationale for this flexibility as well as the history that brought them to the system they use.

Differences in use

The key difference I think between modifications in the respective systems is that in the UsefulChem case changes can be made over a period of weeks with corrections and details being sorted out. In the case of Exp098 this includes the analysis of a set of samples over the course of a week. There are then a series of further corrections over the course of over a month, although the main changes occur over a few weeks. Partly this is the nature of the experiment with it taking place over several days. We would probably handle this through multiple posts. I will try to set up a sandpit where I will see how we might have represented this experiment. The other element is the process of corrections and comments that are incorporated. I think we would implement this through comments in the blog rather than correcting the post.

So a key difference here is the presentation standards of the experiment. The aim for UsefulChem seems to be to provide a ‘finished’ or ‘polished’ representation of the experiment whereas I think our approach is more traditional ‘well that’s what I wrote down so that’s what is there’ kind of approach. The benefits of the former as Jean-Claude points out in his talk is that it is a great opportunity to improve the students standards of recording as they go. In principle if things really are brought up to standard then they can be incorporated directly into the methods section of a thesis, perhaps even a paper. In my group however I would do this as the methodology is transferred from the lab book (in whatever form) to the regular reports required in our department.

Third party timestamps and hosted systems

Jean-Claude’s comment again:

I think the main other issue is the third party time stamp. That’s one reason I like using a service, like Wikispaces, hosted by a large stable company. It also makes it easier for people to replicate overnight at zero cost if they are interested in trying it.

These are two good reasons for using a standard engine. Independent time stamps are very useful in demonstrating history, whether for precedence, or even for patent issues. If one of the key arguments in favour of open notebook science (or at least one of the main arguments against the idea that your risk being scooped) is that it provides a means of establishing precedence then it is important to have a reliable time stamp. I don’t know what the legal status of a computer based time stamp is but I do wonder whether from a legal perspective at least that an in house time stamp in a well regulated and transparent system might be as good (or no worse than) a signed and dated paper notebook. Again however, impressions are important, and while it may be impossible for me to fake a date in our system it doesn’t mean that people will necessarily believe me when I say it.

The second point here, that using a generic hosted system makes it much easier for other people to replicate is also a good one. A case could be made that if my group are doing open notebook science we are doing it with a closed system which at least partly defeats the purpose of the exercise. My answer to this is that we are trying to develop something that will take us further than a generic hosted system can – perhaps it may be possible to retro-fit what we develop into a generic blog or wiki at a later date but currently this isn’t possible. People are of course welcome to look at and use the system, this is part of what we have the grant for after all, but I recognise that this creates a barrier which we will need to overcome. I think we just see how this plays out over time.

Finally…

Final comment from Jean-Claude

But I think there is also a lot more to learn about the differences between how scientific fields (and researchers) operate. We may gain a better appreciation of this if a few of us do Open Notebook Science.

Couldn’t agree more. I have learnt a lot from doing this about how we think about experiments and how things are ordered (or not). There is a lot to learn from looking at how systems do and don’t work and how this differs in different fields.