In Part 1 I outlined our aims in building an ELN system and the criteria we hope to satisfy. In this post I will discuss the outline of the system that has been developed.
The WebLog as an ELN system
A blog is a natural system on which to build an ELN. It provides free text entry, automatic date recording, the ability to include images and other files, and a system for publishing and collecting comments and advice. In this configuration a blog can function essentially as ‘electronic paper’ that can be used as a substitute for a paper notebook.
The blog can be used in a more sophisticated fashion by using internal links. Procedure B uses the material produced in Procedure A, therefore a link is inserted. Such links can go someway towards providing some metadata ‘Material from procedure A is used in procedure B’. Following the links backwards and forwards can provide some measure of sample or lot tracking, an understanding of where samples have come from and where they are going to.
This is a step forward but it is not ideal. It is one thing to say that Gel X has samples from PCR Y and PCR Y was carried out with samples from Culture Z. But which samples are which? The connections between items can probably be inferred by a human reader but not by a machine. If we wish to have a system where a machine can tell that Lane 2 on Gel X is from a PCR of material from Culture Z three hours after induction we need to be able to track samples through the system. For this to work the system needs to assign a unique ID to each ‘sample’.
The ‘One item-one post’ model
A blog does provide a natural way of providing IDs. Each post carries its own ID. This post has ID#8 in this blog (notwithstanding any pretty human readable ID at the top of the page). Therefore if each ‘sample’ has its own post it automatically has an ID (not strictly a UID as Andrew pointed out to me this morning). See here and here for examples of ‘product’ or sample posts (pcr product and primer respectively). Procedures take samples and process them to generate new samples. Thus if samples each have their own post any procedures will also need their own post. Product posts link to the procedure post in which they are generated(and vice versa) and procedures link back to the input materials. See here for an example of a PCR reaction. By following the links it is possible to trace a sample through the system.
The concept can be taken further. By categorising samples and procedures into classes it is possible to automatically capture a great deal of metadata for the experiment. Several pieces of data are already available ‘Procedure X made Product Y using materials A and B’. By adding that procedure X is a PCR reaction and product Y is a piece of double stranded DNA then significantly more can be inferred e.g. PCR reactions (can) make double stranded DNA or in a more sophsticated fashion ‘All PCR reactions contain Mg but Vent polymerase does not work in PCR reactions with MgCl2’. In the one item-one post model the blog becomes a repository of information of relationships between items. Adding categories, or tags, to these items adds much more value to to the repository. Such a data store has some of the characteristics of a triple store including, in principle, the potential for automated reasoning.
Such an approach does, however have distinct disadvantages. It requires the creation of a large number of posts, currently by hand. This creates two problems; firstly it is a lot of work and secondly it fills the blog up with posts which for a human reader do not contain any useful information, making it quite difficult to read. Neither of these are insurmountable problems but they make the process of recording data more complex and less appealing to the user. The challenge therefore is to provide a system that makes this easy for the user and encourages them to provide the extra informaton.
In Part 3 I will start to cover the implementation of the system.