Blogs vs Wikis and third party timestamps

I wanted to pull out some of the comments Jean-Claude Bradley has made on the e-notebook posts and think them through in more detail.

Jean-Claude‘s comment on this post:

There may be differences between fields but, in organic chemistry, we could not make a blog by itself work as an electronic notebook. The key problem was the assumption that an experiment could be recorded without further modification. But a lab notebook doesn’t work like that – the idea is to record work as it is being done and make observations and conclusions over time. For experiments that can take weeks, a blog post can be updated but there is no version tracking. It is thus difficult to prove who-knew-what-when. Using a wiki page as a lab notebook page gives you results in real time and a detailed trail of additions and corrections, with each version being addressable with a different url.

Thinking about this and looking at some examples on the UsefulChem Wiki I wondered whether this is largely down to a different way of thinking about the notebook rather than differences in field. I will use the UsefulChem Exp098 as an example for this.

This experiment has been modified over time and this can be tracked through the Wiki system. Now my initial reaction to this was ‘but you should never modify the notebook’. In our system we originally had no means of making any changes (for precisely this reason) but eventually introduced one because typographical errors were otherwise very annoying (and because at the moment incorporating all the links requires double editing). However our common use is not to modify the core of the post. Arguably this is a hangover from paper notebooks (never use whiteout, never remove pages).

In the case of the UsefulChem Wiki the rational objections to this kind of modification go away because it is properly tracked in a transparent fashion. However I still find myself somewhat uncomfortable with the notion it has been changed. I wonder whether this could cause an unfavourable impression in some minds? There is a good presentation with audio here where Jean-Claude describes the benefits of and good rationale for this flexibility as well as the history that brought them to the system they use.

Differences in use

The key difference I think between modifications in the respective systems is that in the UsefulChem case changes can be made over a period of weeks with corrections and details being sorted out. In the case of Exp098 this includes the analysis of a set of samples over the course of a week. There are then a series of further corrections over the course of over a month, although the main changes occur over a few weeks. Partly this is the nature of the experiment with it taking place over several days. We would probably handle this through multiple posts. I will try to set up a sandpit where I will see how we might have represented this experiment. The other element is the process of corrections and comments that are incorporated. I think we would implement this through comments in the blog rather than correcting the post.

So a key difference here is the presentation standards of the experiment. The aim for UsefulChem seems to be to provide a ‘finished’ or ‘polished’ representation of the experiment whereas I think our approach is more traditional ‘well that’s what I wrote down so that’s what is there’ kind of approach. The benefits of the former as Jean-Claude points out in his talk is that it is a great opportunity to improve the students standards of recording as they go. In principle if things really are brought up to standard then they can be incorporated directly into the methods section of a thesis, perhaps even a paper. In my group however I would do this as the methodology is transferred from the lab book (in whatever form) to the regular reports required in our department.

Third party timestamps and hosted systems

Jean-Claude’s comment again:

I think the main other issue is the third party time stamp. That’s one reason I like using a service, like Wikispaces, hosted by a large stable company. It also makes it easier for people to replicate overnight at zero cost if they are interested in trying it.

These are two good reasons for using a standard engine. Independent time stamps are very useful in demonstrating history, whether for precedence, or even for patent issues. If one of the key arguments in favour of open notebook science (or at least one of the main arguments against the idea that your risk being scooped) is that it provides a means of establishing precedence then it is important to have a reliable time stamp. I don’t know what the legal status of a computer based time stamp is but I do wonder whether from a legal perspective at least that an in house time stamp in a well regulated and transparent system might be as good (or no worse than) a signed and dated paper notebook. Again however, impressions are important, and while it may be impossible for me to fake a date in our system it doesn’t mean that people will necessarily believe me when I say it.

The second point here, that using a generic hosted system makes it much easier for other people to replicate is also a good one. A case could be made that if my group are doing open notebook science we are doing it with a closed system which at least partly defeats the purpose of the exercise. My answer to this is that we are trying to develop something that will take us further than a generic hosted system can – perhaps it may be possible to retro-fit what we develop into a generic blog or wiki at a later date but currently this isn’t possible. People are of course welcome to look at and use the system, this is part of what we have the grant for after all, but I recognise that this creates a barrier which we will need to overcome. I think we just see how this plays out over time.

Finally…

Final comment from Jean-Claude

But I think there is also a lot more to learn about the differences between how scientific fields (and researchers) operate. We may gain a better appreciation of this if a few of us do Open Notebook Science.

Couldn’t agree more. I have learnt a lot from doing this about how we think about experiments and how things are ordered (or not). There is a lot to learn from looking at how systems do and don’t work and how this differs in different fields.

The Southampton E-lab Blog Notebook – Part 3 Implementation

In Part 1 and Part 2 I discussed the criteria we set for our system to be successful and the broad outlines of a strategy for organisation. In this part I will outline how we apply this strategy in practise. This will not deal with the technical implementation and software design for the Blog engine which will be discussed later. I will note however that we are not using a standard blog engine but one which has been custom built for handling lab-based blogging.

Continue reading “The Southampton E-lab Blog Notebook – Part 3 Implementation”

Followup on ‘open methods’

I wanted to followup on the post I wrote a few days ago where I quoted a post from Black Knight on the concept of making methodology open. The point I wanted to make was the scientists in general might be even more protective of their methodology than they are of their data. However I realised afterwards that I may have given the impression that I thought BK was being less open than he ‘should’, which was not my intention. Anyway, yesterday I spent several few hours reading through his old posts (thoroughly enjoyable and definitely worth the effort) and discovered quite a few posts where he makes detailed and helpful methodological suggestions.

For example here is a post on good methods for recovery of DNA from gels as well as a rapid response to a methodological query. Here is another valuable hint on getting the best from PCR (though I would add this is more true for analytical PCR than if you just want to make as much DNA as possible). Nor is the helpful information limited just to lab methodology. Here is some excellent advice on how to give a good seminar. So here is a good example of providing just the sort of information I was writing about and indeed of open notebook science in action. I’d be interested to know how many people in the School now use the recipes suggested here.

p.s. his post Word of the week – 38

deuteragonist , n.

In a structural laboratory, one who labels his samples with 2H.

e.g. “Jill says that to be successful at small angle neutron scattering you have to be a good deuteragonist.”

c.f. protagonist

nearly got my computer completely covered in tea. And there is much more where that came from. They are probably more funny if you are an ex-pat Australian working in the UK but hey, that’s life.

The Southampton E-lab blog notebook – Part 2 ELN strategy

In Part 1 I outlined our aims in building an ELN system and the criteria we hope to satisfy. In this post I will discuss the outline of the system that has been developed.

The WebLog as an ELN system

A blog is a natural system on which to build an ELN. It provides free text entry, automatic date recording, the ability to include images and other files, and a system for publishing and collecting comments and advice. In this configuration a blog can function essentially as ‘electronic paper’ that can be used as a substitute for a paper notebook.

The blog can be used in a more sophisticated fashion by using internal links. Procedure B uses the material produced in Procedure A, therefore a link is inserted. Such links can go someway towards providing some metadata ‘Material from procedure A is used in procedure B’. Following the links backwards and forwards can provide some measure of sample or lot tracking, an understanding of where samples have come from and where they are going to.

This is a step forward but it is not ideal. It is one thing to say that Gel X has samples from PCR Y and PCR Y was carried out with samples from Culture Z. But which samples are which? The connections between items can probably be inferred by a human reader but not by a machine. If we wish to have a system where a machine can tell that Lane 2 on Gel X is from a PCR of material from Culture Z three hours after induction we need to be able to track samples through the system. For this to work the system needs to assign a unique ID to each ‘sample’.

The ‘One item-one post’ model

A blog does provide a natural way of providing IDs. Each post carries its own ID. This post has ID#8 in this blog (notwithstanding any pretty human readable ID at the top of the page). Therefore if each ‘sample’ has its own post it automatically has an ID (not strictly a UID as Andrew pointed out to me this morning). See here and here for examples of ‘product’ or sample posts (pcr product and primer respectively). Procedures take samples and process them to generate new samples. Thus if samples each have their own post any procedures will also need their own post. Product posts link to the procedure post in which they are generated(and vice versa) and procedures link back to the input materials. See here for an example of a PCR reaction. By following the links it is possible to trace a sample through the system.

The concept can be taken further. By categorising samples and procedures into classes it is possible to automatically capture a great deal of metadata for the experiment. Several pieces of data are already available ‘Procedure X made Product Y using materials A and B’. By adding that procedure X is a PCR reaction and product Y is a piece of double stranded DNA then significantly more can be inferred e.g. PCR reactions (can) make double stranded DNA or in a more sophsticated fashion ‘All PCR reactions contain Mg but Vent polymerase does not work in PCR reactions with MgCl2’. In the one item-one post model the blog becomes a repository of information of relationships between items. Adding categories, or tags, to these items adds much more value to to the repository. Such a data store has some of the characteristics of a triple store including, in principle, the potential for automated reasoning.

Such an approach does, however have distinct disadvantages. It requires the creation of a large number of posts, currently by hand. This creates two problems; firstly it is a lot of work and secondly it fills the blog up with posts which for a human reader do not contain any useful information, making it quite difficult to read. Neither of these are insurmountable problems but they make the process of recording data more complex and less appealing to the user. The challenge therefore is to provide a system that makes this easy for the user and encourages them to provide the extra informaton.

In Part 3 I will start to cover the implementation of the system.

Open methods vs open data – might the former be even harder?

Continuing the discussion set off by Black Knight and continued here and by Peter Murray-Rust I was interested in the following comment in Black Knight’s followup post (my emphasis and I have quoted slightly out of context to make my point).

But all that is not really what I wanted to write about now. The OpenWetWare (have you any idea how difficult it is to type that?) project is a laudable effort to promote collaboration within the life sciences. And this is cool, but then I realize that the devil is in the details.

Share my methods? Yeah! Put in some technical detail? Yea–hang on.

A lot of the debate has been about posting results and the risk of someone stealing them or otherwise using them. But in bioscience the competitive advantage that a laboratory has can lie in the methods. Little tricks that don’t necessarily make it into the methods sections of papers, that sometimes researchers aren’t even entirely aware of, but which form part of the culture of the lab.

The case for sharing methods is, at least on the surface, easier to make than sharing data. A community can really benefit from having all those tips and tricks available. You put yours up and I’ll put mine up means everyone benefits. But if there is something that gives you a critical competitive advantage then how easy is that going to be to give up? An old example is the ‘liquid gold’ transformation buffer developed by Doug Hanahan (read the story in Sambrook and Russell, third edition, p1.105 or online here – I think; its not open access). Hanahan ‘freely and generously distributed the buffer to anyone whose experiments needed high efficiencies…’ (Sambrook and Russell) but he was apparently less keen to make the recipe available. And when it was published (Hanahan, 1983) many labs couldn’t achieve the same efficiencies, again because of details like a critical requirement for absolutely clean glassware (how clean is clean?). How many papers these days even include or reference the protocol used for transformation of E. coli? Yet this could, and did, give a real competitive advantage to particular labs in the early 1980s.

So, if we are to make a case for making methodology open we need to tackle this. I think it is clear that making this knowledge available is good for the science community. But it could be a definite negative for specific groups and people. The challenge lies in making sure that altruistic behaviour that benefits the community is rewarded. And this won’t happen unless metrics of success and community stature are widened to include more than just publications.

A great example of ‘fun’

I wrote the other day about the idea of fun being a motivating factor to taking up open notebook science. Sometimes something is just cool and you want to share it. Then along comes a great example.

Via petermr’s blog:

At ‘Life of a Lab Rat‘:

This has got to be in the running for the coolest cloning experiment ever.

Last Tuesday a grad student in the reciprocal space cadet lab, let’s call him Fu Manchu, asked me if I had any GFP. ‘GFP’ expands to ‘green fluorescent protein’……[]

As petermr says this is just very cool. The molecular biology is fairly conventional. But that’s not the point. The point is that Black Knight did a fun experiment and felt it was worth sharing with the world. We might argue that there isn’t enough methodological detail to tell us exactly what was done here but that’s a minor quibble. The important thing is that it was fun and its out there!

Incidentally, I am writing this in the lab while waiting for PCR primers to melt so I can set up some PCR reactions (here if you want to look – though you may not be able to access this at the moment, still need to get this one to public access). I am beginning to think that one of the main issues of open notebook science for biochemistry/molecular biology may be the difficulty in using a track pad with nitrile gloves on!

The case for open notebook science

The reasons for pursuing more openess in science from the perspective of the science and funding communities have been well rehearsed and described elsewhere (see 3 Quarks Daily 1,2, and 3 for an excellent overview). There are excellent discussions of where this might take us in terms of capability and in terms of the efficient re-use of government or charity funded research. These are the reasons why many funding organisations and government bodies are beginning to mandate open access publication and making data publicly available.

Most scientists are, I think, reasonably happy with the concept of making raw data publicly available after publication. The main reasons for resistance are more to do with the hassle involved than with an in principle objection. However people are much less happy with making data available before publication. The reasons why many people are worried about the push towards making data publicly available have also been discussed (see for instance comments on Corie Lok’s Nature Network blog and the discussion at WYDIRD). The primary one is the fear of ‘being scooped’ or being ‘uncompetitive’. I think this is mostly (but not entirely) a fallacy and will come back to this in the future (after we get a paper accepted, oh the irony).

All the reasons for Open Science that others have described are good and noble. But much of this involves extra work. In particular making open notebook science work requires investment in time and tool development to get it off the ground. So what is the motivation for getting over these hurdles? Why is it worth the effort? What’s in it for me?

Well I think the answer to this may be quite simple. It could be fun. I got into science because I like talking to people about science. I worked in two great groups as an undergraduate and as a postgraduate where we argued over the details of results, the literature, and anything else of interest constantly. This was great fun. Wouldn’t it be so much more fun to talk with a global community of people who are interested in what we are doing? Alright, in practise no-one may be reading, but if it is up there and available then it’s surely more likely that someone might read it than if it’s stuck in a notebook on my desk.

There are lots of other good self-serving reasons why making stuff available could be good. Giving people the raw data makes procedures more repeatable. Methods papers are highly cited so getting your methods out there means your papers will get cited more (certainly putting the data out does). It will probably get you media coverage and help in profile building. This makes your papers more likely to be accepted, your grants more likely to be funded, your promotion more likely and all those things that make up the core of how science works in practice. But to be frank, it is worrying about all these things that I find takes a lot of the fun out of doing science. So I think it would be good to put some of it back in and this might be a good way to start to do it.

Open (adjective)

Open [oh-puhn ] (adjective) not closed…having no means of closing or barring…relatively free of obstruction…without restrictions as to who may participate…undecided; unsettled… (from Dictionary.com)

There is a great deal of confusion out there as to what ‘Open’ means, especially in science. The definitions above seem particularly apposite ‘…relatively free of obstruction…’. Certainly undecided or unsettled seems appropriate in some cases. The claims of a journal to be ‘Open Access’ can set off a barrage of comment in the blogosphere. Whether this makes any difference to the journal is unclear but definitions are clearly important. If my aim here is talk about Open Science then it is sensible to be clear what I mean.

So the following stand as definitions until they need to be changed;

Open Access (of journals, data, or anything else really): Means freely available and accesible to use, re-use, re-distribute, re-mix subject only to a requirement to attribute the work. Essentially as described in the Berlin and Bethesda declarations. Well summarised by Chris Surridge on his blog at PLoS ONE.

Freely accesible: On the web, indexed by search engines, in a useable format, with no requirement to pay for access and no exclusion of any potential users (except perhaps for antisocial behaviour).

Open Notebook Science: This is Jean-Claude Bradley‘s term which I think encompasses much of what I am interested in doing and has been pretty clearly defined (see here and here). To summarise this means that every experiment that is done and every piece of data that is collected is placed online in a freely accessible repository in a timely manner. I would add to this something which I don’t think is explicit in previous definitions but I think is implicit in the way his group works and make their data available. That is that there must be space for interaction, comments, and questions from the outside world.

Open Science is really too woolly a term to mean anything much but it encompasses the movement that is working towards more of the above throughout the science community. Its a good phrase, it captures the imagination, is evocative, and memorable. Its just too big to be pinned down. But its a big set of ideas, so let’s see where it leads us.