The Southampton E-lab blog notebook – Part 2 ELN strategy
In Part 1 I outlined our aims in building an ELN system and the criteria we hope to satisfy. In this post I will discuss the outline of the system that has been developed.
The WebLog as an ELN system
A blog is a natural system on which to build an ELN. It provides free text entry, automatic date recording, the ability to include images and other files, and a system for publishing and collecting comments and advice. In this configuration a blog can function essentially as ‘electronic paper’ that can be used as a substitute for a paper notebook.
The blog can be used in a more sophisticated fashion by using internal links. Procedure B uses the material produced in Procedure A, therefore a link is inserted. Such links can go someway towards providing some metadata ‘Material from procedure A is used in procedure B’. Following the links backwards and forwards can provide some measure of sample or lot tracking, an understanding of where samples have come from and where they are going to.
This is a step forward but it is not ideal. It is one thing to say that Gel X has samples from PCR Y and PCR Y was carried out with samples from Culture Z. But which samples are which? The connections between items can probably be inferred by a human reader but not by a machine. If we wish to have a system where a machine can tell that Lane 2 on Gel X is from a PCR of material from Culture Z three hours after induction we need to be able to track samples through the system. For this to work the system needs to assign a unique ID to each ‘sample’.
The ‘One item-one post’ model
A blog does provide a natural way of providing IDs. Each post carries its own ID. This post has ID#8 in this blog (notwithstanding any pretty human readable ID at the top of the page). Therefore if each ‘sample’ has its own post it automatically has an ID (not strictly a UID as Andrew pointed out to me this morning). See here and here for examples of ‘product’ or sample posts (pcr product and primer respectively). Procedures take samples and process them to generate new samples. Thus if samples each have their own post any procedures will also need their own post. Product posts link to the procedure post in which they are generated(and vice versa) and procedures link back to the input materials. See here for an example of a PCR reaction. By following the links it is possible to trace a sample through the system.
The concept can be taken further. By categorising samples and procedures into classes it is possible to automatically capture a great deal of metadata for the experiment. Several pieces of data are already available ‘Procedure X made Product Y using materials A and B’. By adding that procedure X is a PCR reaction and product Y is a piece of double stranded DNA then significantly more can be inferred e.g. PCR reactions (can) make double stranded DNA or in a more sophsticated fashion ‘All PCR reactions contain Mg but Vent polymerase does not work in PCR reactions with MgCl2’. In the one item-one post model the blog becomes a repository of information of relationships between items. Adding categories, or tags, to these items adds much more value to to the repository. Such a data store has some of the characteristics of a triple store including, in principle, the potential for automated reasoning.
Such an approach does, however have distinct disadvantages. It requires the creation of a large number of posts, currently by hand. This creates two problems; firstly it is a lot of work and secondly it fills the blog up with posts which for a human reader do not contain any useful information, making it quite difficult to read. Neither of these are insurmountable problems but they make the process of recording data more complex and less appealing to the user. The challenge therefore is to provide a system that makes this easy for the user and encourages them to provide the extra informaton.
In Part 3 I will start to cover the implementation of the system.
There may be differences between fields but, in organic chemistry, we could not make a blog by itself work as an electronic notebook. The key problem was the assumption that an experiment could be recorded without further modification. But a lab notebook doesn’t work like that – the idea is to record work as it is being done and make observations and conclusions over time. For experiments that can take weeks, a blog post can be updated but there is no version tracking. It is thus difficult to prove who-knew-what-when. Using a wiki page as a lab notebook page gives you results in real time and a detailed trail of additions and corrections, with each version being addressable with a different url.
There may be differences between fields but, in organic chemistry, we could not make a blog by itself work as an electronic notebook. The key problem was the assumption that an experiment could be recorded without further modification. But a lab notebook doesn’t work like that – the idea is to record work as it is being done and make observations and conclusions over time. For experiments that can take weeks, a blog post can be updated but there is no version tracking. It is thus difficult to prove who-knew-what-when. Using a wiki page as a lab notebook page gives you results in real time and a detailed trail of additions and corrections, with each version being addressable with a different url.
I partially agree. I think our experiments are much more clearly defined with definite end points. We don’t generally track the process of a reaction for instance. Another difference is that when we do do a time course or something similar it is rarely a case of looking at that sample – we would almost always take a sample (product) and then do something to it (new procedure). Where you do just follow a reaction, or where the plan changes midstream, we do one of two things. One is to add a comment and the other is to change the post.
Our system does have a mechanism for making changes and the changes are all tracked. Additionally a reason for the modification is required before it can be committed. I should say it isn’t a standard blog engine – I hope to get more details up on that sometime soon but I would rather leave that to Andrew to put in his words.
Equally I think you could implement the outlines of our system in a Wiki framework but then you lose the ‘journal like’ nature of the blog which to me is more like a lab notebook. At core what we are talking about is a way of organising ‘posts’. Whether the posts are in a wiki or a blog is fairly immaterial.
Do you think a robust and transparent change tracking system answers your criticism or is there more to it than that?
I can see our system doesn’t naturally handle things like your Exp098 but I think a lot of this could be handled either as additions to the post, as separate posts, or as comments. Maybe we should try to do one of your reactions and see how we get on and what the results look like?
I partially agree. I think our experiments are much more clearly defined with definite end points. We don’t generally track the process of a reaction for instance. Another difference is that when we do do a time course or something similar it is rarely a case of looking at that sample – we would almost always take a sample (product) and then do something to it (new procedure). Where you do just follow a reaction, or where the plan changes midstream, we do one of two things. One is to add a comment and the other is to change the post.
Our system does have a mechanism for making changes and the changes are all tracked. Additionally a reason for the modification is required before it can be committed. I should say it isn’t a standard blog engine – I hope to get more details up on that sometime soon but I would rather leave that to Andrew to put in his words.
Equally I think you could implement the outlines of our system in a Wiki framework but then you lose the ‘journal like’ nature of the blog which to me is more like a lab notebook. At core what we are talking about is a way of organising ‘posts’. Whether the posts are in a wiki or a blog is fairly immaterial.
Do you think a robust and transparent change tracking system answers your criticism or is there more to it than that?
I can see our system doesn’t naturally handle things like your Exp098 but I think a lot of this could be handled either as additions to the post, as separate posts, or as comments. Maybe we should try to do one of your reactions and see how we get on and what the results look like?
Cameron,
Certainly if you have a modified blog engine that accurately tracks changes like a wiki, it may work just as well. EXP098 was a good example – there are currently 43 previous versions by several researchers. Also, that experiment still has comments to be addressed – the stuff in bold and italics – so it is not “complete”.
The blog we use to keep track of milestones (linking back to experiment pages on the wiki) is very important. I don’t expect people to keep up on the project by “reading” the wiki lab notebook. That’s hard to do even for people collaborating on the project. But, the raw data is available at any time by anyone for drilling down and resolving new hypotheses or making new observations.
So if you have something that acts like this blog-wiki hybrid system it probably will work just as well. I’d be happy to see if some of our data can fit – if anything it will bring some redundancy to our work, which is a good thing in the Science 2.0 world.
I think the main other issue is the third party time stamp. That’s one reason I like using a service, like Wikispaces, hosted by a large stable company. It also makes it easier for people to replicate overnight at zero cost if they are interested in trying it.
But I think there is also a lot more to learn about the differences between how scientific fields (and researchers) operate. We may gain a better appreciation of this if a few of us do Open Notebook Science.
Cameron,
Certainly if you have a modified blog engine that accurately tracks changes like a wiki, it may work just as well. EXP098 was a good example – there are currently 43 previous versions by several researchers. Also, that experiment still has comments to be addressed – the stuff in bold and italics – so it is not “complete”.
The blog we use to keep track of milestones (linking back to experiment pages on the wiki) is very important. I don’t expect people to keep up on the project by “reading” the wiki lab notebook. That’s hard to do even for people collaborating on the project. But, the raw data is available at any time by anyone for drilling down and resolving new hypotheses or making new observations.
So if you have something that acts like this blog-wiki hybrid system it probably will work just as well. I’d be happy to see if some of our data can fit – if anything it will bring some redundancy to our work, which is a good thing in the Science 2.0 world.
I think the main other issue is the third party time stamp. That’s one reason I like using a service, like Wikispaces, hosted by a large stable company. It also makes it easier for people to replicate overnight at zero cost if they are interested in trying it.
But I think there is also a lot more to learn about the differences between how scientific fields (and researchers) operate. We may gain a better appreciation of this if a few of us do Open Notebook Science.
[…] The Southampton E-lab blog notebook – Part 2 ELN strategy […]
[…] previous posts I have discussed the setup and rationale for how we are organising our blog-based electronic laboratory notebook. This has covered how the […]
License
To the extent possible under law, Cameron Neylon has waived all copyright and related or neighboring rights to Science in the Open. Published from the United Kingdom.
I am also found at...
Tags
Recent posts
Recent Posts
Most Commented