This is the starting point for a description of the development work being done in the School of Chemistry at the University of Southampton on a blog-based electronic laboratory notebook for biochemistry/molecular biology. Ultimately the aim is for this document to form the first draft of a paper describing the system.
First things first, acknowledgements:
All the system development work is being done in the group of Professor Jeremy Frey, largely by Andrew Milstead a PhD student. The contribution of my group, mainly through the work of another PhD student, Jenny Hale, is to break the system, complain and generally make Andrew’s life difficult. Through various iterations this has lead us to propose a system that seems to fulfill a number of our criteria. A working prototype is being used ‘in anger’ in the laboratory. This work is partially funded by the UK biological research council (BBSRC) through an e-science development grant and partially by the engineering and physical sciences research council (EPSRC) through a Platform grant held by Jeremy.
Motivation
Directed evolution is a process whereby a library is generated by introducing random changes to a gene that encodes a protein and that library is then screened to identify protein variants with enhanced or desired function. The libraries can be large and the experimental process usually goes through multiple cycles, with variants selected in one cycle being used as a starting point for the next. Our ultimate aim is to capture the data from these experiments in such a way that allows us to identify the path a sample has taken through the whole experiment. These experiments can generate a lot of data, most of which is usually thrown away. By capturing the data we aim to enable significantly more detailed analysis of the experiment with the ultimate goal of optimising the directed evolution process in a rational way.
The purpose of and aims of an electronic laboratory notebook
Our aim in developing an electronic laboratory notebook system (ELN) is essentially two fold. The first is to collect the day to day workings and records of a research worker, or workers, in electronic form. This will allow searching through text, easy archival and backups, and provides a route towards making raw data available. This can be achieved by essentially any free form recording system including scanning a paper notebook and placing pdfs on the web.
Our second aim is to collect information on samples, protocols, and experimental data and link this together in a form that can be processed by a machine. Such a system would allow questions to be asked like ‘which PCR reactions used primer X’, or ‘who was the last person to use the gel running buffer?’. Ultimately such a system could also assist in archival of samples provide the basis of a database recording all materials, samples, and protocols used in the lab. However, recording all the relevant data would be an immense amount of work. A system that could capture and relate every element of an experiment would be unwieldy and probably unworkable in the context of an academic research laboratory.
Thus our target is to develop a system that captures as much of the ‘metadata’ from a laboratory procedure as possible, without imposing an undue burden on the research worker. Ideally such a system should be flexible, to allow any process, or modfication to a process to be captured. It must therefore allow free text. It should be easy to use and not place restrictions on the ability of users to carry out their experiments in the way they want. If it does require a ‘correct’ workflow to operate optimally, the system must encourage users to use that workflow by providing added value or ease of use. And the system should not break if a user pursues an ‘incorrect workflow. Finally, it must allow the uploading of any type of file that might be relevant. If a file type is not recognised some form of placeholder should be provided to show the file is there an allow download.
So our criteria for a successful system can be set out as follows:
- The system must allow free text entry (free description)
- It must allow the capture of metadata that describe, type, category, quality or any other aspect of a material, sample or protocol (flexible metadata)
- It should allow an inexperienced user to record their experiments without any undue extra burden above and beyond that required for best practise in using a paper notebook (ease of use)
- If a specific workflow is required for optimal performance the system should encourage this workflow but it should not break if the workflow is not followed (robustness)
- It must allow the upload and download of arbitrary file types (arbitrary uploads)
Part 2 will look at the underlying structure we have adopted to develop an ELN system.
How much of the content of the e-notebook will be public?
How much of the content of the e-notebook will be public?
Yes, I meant to make a comment on this but haven’t got that far yet. The answer is that the system as it currently stands only allows two settings, completely public or completely private. So while the neutral drift blog was always public the Sortase blog was private, until recently, hence the access issues. This includes all posts, comments, metadata, and uploaded files.
One can imagine a range of possibilities that lie betwen these extremes where certain things are published to the public domain and others are not. But this hasn’t been built in as yet and is apparently not a simple process.
So for me, one of the exciting things about this is that it can enable open notebook science. But for others, or for confidential or commercial projects, it doesn’t have to be exposed to the outside world.
Yes, I meant to make a comment on this but haven’t got that far yet. The answer is that the system as it currently stands only allows two settings, completely public or completely private. So while the neutral drift blog was always public the Sortase blog was private, until recently, hence the access issues. This includes all posts, comments, metadata, and uploaded files.
One can imagine a range of possibilities that lie betwen these extremes where certain things are published to the public domain and others are not. But this hasn’t been built in as yet and is apparently not a simple process.
So for me, one of the exciting things about this is that it can enable open notebook science. But for others, or for confidential or commercial projects, it doesn’t have to be exposed to the outside world.
Great to see this.
I see you use the term “Electronic Laboratory Notebook” – ELN – which has been around for sometime. ELNs have been advocated for a long time without much success (mainly because of the particular requirements of each pharma company). You are doing something more exciting due to the fluidity of the process and I’m wondering whether there is scope for a new name. For example in the next post you use “E-lab blog notebook” which is (to me) more descriptive than ELN
Great to see this.
I see you use the term “Electronic Laboratory Notebook” – ELN – which has been around for sometime. ELNs have been advocated for a long time without much success (mainly because of the particular requirements of each pharma company). You are doing something more exciting due to the fluidity of the process and I’m wondering whether there is scope for a new name. For example in the next post you use “E-lab blog notebook” which is (to me) more descriptive than ELN
Presumably in the Web 2.0 world we need some sort of concatenation but I was trying to avoid ‘Laboratory Blotebook’. ELN is easy to type of course but I take your point that a new name might be helpful. We just call it ‘The Blog’.
Presumably in the Web 2.0 world we need some sort of concatenation but I was trying to avoid ‘Laboratory Blotebook’. ELN is easy to type of course but I take your point that a new name might be helpful. We just call it ‘The Blog’.
This sounds potentially very useful. Will this be something that any lab can host on their own servers ? In the long turn I wonder if would not be better to have something like this hosted by an independent third party so that the authoring and time-stamping of contributions can have more credit.
This sounds potentially very useful. Will this be something that any lab can host on their own servers ? In the long turn I wonder if would not be better to have something like this hosted by an independent third party so that the authoring and time-stamping of contributions can have more credit.
Hi Pedro, I am reliably informed by Andrew that it can be run off a laptop so yes, in principle it is transportable and can be run off a local server. In practise at the moment its possibly not at that stage yet and I think some of the authentication is tied into our unversity system.
A third party host would indeed be good in terms of time-stamping etc. There are some possible arugements against (what happens if the server goes down, security etc.) as well but it can be run either way.
Hi Pedro, I am reliably informed by Andrew that it can be run off a laptop so yes, in principle it is transportable and can be run off a local server. In practise at the moment its possibly not at that stage yet and I think some of the authentication is tied into our unversity system.
A third party host would indeed be good in terms of time-stamping etc. There are some possible arugements against (what happens if the server goes down, security etc.) as well but it can be run either way.
Yes the blog can run off a laptop but is designed to be hosted by either a Lab, an Institution or indeed a third party.
We have been working on the problem of timestamps and have come to the conclusion that the third party should only be reporting a snapshot of the content and the timestamp, but the blog should remain as close to the source (ie the institution)
As institution repositories have taken off now as a must have, i would like to see this type of blog server should follow in the IR’s footsteps.
Yes the blog can run off a laptop but is designed to be hosted by either a Lab, an Institution or indeed a third party.
We have been working on the problem of timestamps and have come to the conclusion that the third party should only be reporting a snapshot of the content and the timestamp, but the blog should remain as close to the source (ie the institution)
As institution repositories have taken off now as a must have, i would like to see this type of blog server should follow in the IR’s footsteps.
The third-party time stamp is very important to us as well. Although not perfect, I’ve been reasonably happy with Google as a host (Wikispaces and Blogger). Note that the date/time stamp in Blogger can be changed by the author (like presumably any blogging software). On the other hand the Wikispaces time stamp seems very robust. Of course you get further protection through redundancy, for example by discussing and re-mixing your research in the web2 sphere and various public databases.
The third-party time stamp is very important to us as well. Although not perfect, I’ve been reasonably happy with Google as a host (Wikispaces and Blogger). Note that the date/time stamp in Blogger can be changed by the author (like presumably any blogging software). On the other hand the Wikispaces time stamp seems very robust. Of course you get further protection through redundancy, for example by discussing and re-mixing your research in the web2 sphere and various public databases.
I look forward to seeing your system at the AHM. I have been setting up a similar pilot project for scientists at the Centre for Ecology and Hydrology. Also based on blog / wiki technology. I think it will be useful to share our experiences. I am at currently at crossroads as they say where I am considering where to go next.
Nic
I look forward to seeing your system at the AHM. I have been setting up a similar pilot project for scientists at the Centre for Ecology and Hydrology. Also based on blog / wiki technology. I think it will be useful to share our experiences. I am at currently at crossroads as they say where I am considering where to go next.
Nic
Interesting and laudable. Honestly, your list of “criteria for success” looks a lot like the one we are using for our commercial ELN (CERF by Rescentris Inc). Our product also puts a strong emphasis on semantic relationships and metadata in order to try to make a more useful product. This is not surprising since our company is made up mainly of life scientists who (like you) understand the importance of managing different kinds of data in ways that make sense to human scientists, not just database engineers. We should talk. Feel free to contact us.
Interesting and laudable. Honestly, your list of “criteria for success” looks a lot like the one we are using for our commercial ELN (CERF by Rescentris Inc). Our product also puts a strong emphasis on semantic relationships and metadata in order to try to make a more useful product. This is not surprising since our company is made up mainly of life scientists who (like you) understand the importance of managing different kinds of data in ways that make sense to human scientists, not just database engineers. We should talk. Feel free to contact us.