The Southampton E-lab Blog Notebook – Part 3 Implementation
In Part 1 and Part 2 I discussed the criteria we set for our system to be successful and the broad outlines of a strategy for organisation. In this part I will outline how we apply this strategy in practise. This will not deal with the technical implementation and software design for the Blog engine which will be discussed later. I will note however that we are not using a standard blog engine but one which has been custom built for handling lab-based blogging.
In Part 2 I introduced the notion of ‘One item- one post’ as an organising principle. This has significant appeal as it means that each physical object that we handle, be it a buffer, a purchased oligonucleotide, or a sample generated in a previous experiment, has an ID within the system. This is potentially very useful for sample tracking, archival, and retreival. But it requires some careful thought about what these ID numbers actually represent.
Which items justify their own post?
If an experiment generates a particular sample then it is likely that that sample is unique. Therefore it clearly deserves an ID and therefore a post of its own. One of our motiviations is to identify when samples we buy or obtain, such as oligonucleotides or enzymes are not functioning properly. Therefore these items justify a post. Many of these items will be purchased on a regular basis but to identify a bad lot each version of the item will require its own ID and therefore own post. It follows from this that each post should correspond to a single physical instance of a material (a tube, a bottle) rather than to the all instances of a given material (EcoRI, TAE, plasmid pBAD/HisA).
In principle this would imply the generation of a post every time a buffer is made or a bucket of reagent is brought into the lab. In practise this is unlikely to occur and it is necessary to draw a line somewhere. It is worth pointing out that carrying this process to what may seem an extreme end (logging every bottle of salt, every time the TAE is made up) could actually provide significant dividends in terms of lab management (where has all the cell media gone? who used up the last of the TAE buffer? how much sucrose is left?) but involves significant extra work. However the system will not break if the ‘TAE’ post refers generically to that buffer rather than a particular batch that someone has made. The level of detail recorded will probably find some equilibrium based on how useful the users find it. For instance the recording of each batch of an unstable reagent will be quite useful as it can be immediately determined when it was opened and who has used it (which is often the most important question). Salts, water, and commonly used buffers will probably not be found to be as critical and therefore not worthy of independent recording. For some examples look at the posts: T4 DNA Ligase Buffer NEW 03.07.07, 50X TAE buffer, and OHC316 for examples of ‘materials‘ and Purified GlyGly-Tus PCR product and Product of PCR reaction with GlyGly primers for examples of experimentally produced products.
What about procedures?
There is an argument that if each tube of sample requires a post then the generation of each and every sample should also have a post. In terms of data storagae and logical organisation this argument is quite strong. By combining five different PCR reactions together in one post we lose track, at the level of links, of which input template is used for which reaction. To be completely rigorous we would need to track all the inputs into a single procedure or reaction and then all the outputs (samples) from that reaction. That is all five reactions would require their own post. However this creates two distinct problems. Firstly it multiplies further the number of posts that need to be generated and secondly it makes the blog almost impossible to read from a human perspective.
We have taken a pragmatic approach for procedures by organising where possible the description of the procedure in a tabular form. This has the advantage of being a natural readable format while at the same time being structured enough to allow it to be intepreted by automated systems, at least in theory. For work in molecular biology a table is usually a natural way to record a procedure. This is however a compromise and the full consequences have yet to be worked out. Italso raises a further issue; the input of tables into blog (or wiki) systems is awful. The text driven nature of input to both blogs and wikis makes tables a very unnatural object to code. If the value of a blog based system is the ability to generate natural unstructured text then the necessity of coding tables can create serious problems, especially for those uncomfortable with markup language. Examples include PCR amplification… and Horizontal DNA Gel electrophoresis as examples with tables. The first is slightly broken due to issues with the coding. Also Purification of GlyGly PCR products which is an example of a free text procedure.
Adopting a one item-one post system has the potential to be a very powerful way of tracking reagents and samples through a system. It can provide an ID for each material and links between posts can then track the use of and further manipulations of the material. In an extreme implementation every single physical instance of a material and the preparation of every single reaction would have their own post. This is not practical. For materials the system can find its own level. Most lab produced samples will have a post which refers to a specific tube or bottle of material and critical reagents will benefit from having lot specific posts. For generic materials such as salts and buffers the cataloging of each physical container is unlikely to be practical.
For similar reasons we have adopted a system where parallel procedures (multiple reactions) are usually handled in a single post. This preserves some ‘human readability’ and also reduces the number of posts generated. It does create two potential problems, firstly that some information is potentially lost by combining multiple procedures. Secondly that it necessitates the use of tables which when they are available in a blog or wiki system are not intuitive to code. However they are a natural way of recording procedures, at least for molecular biology, and therefore developing a system for handling table generation is a priority in any case.