Frank Gibson has posted again in our ongoing conversation about using FUGE as a data model for laboratory notebooks. We have also been discussing things by email and I think we are both agreed that we need to see what actually doing this would look like. Frank is looking at putting some of my experiments into a FUGE framework and we will see how that looks. I think that will be the point where we can really make some progress. However here I wanted to pick up on a couple of points he has made in his last post.
From Frank: However, this is no denying that FuGE is a datamodel and does not come with a high degree of tool support or nice user interfaces, which Cameron is crying out for, as are most lab scientists from a usability point of view.
Certainly to implement FUGE we will need to provide a set of tools that provide interfaces that work. To a certain extent it is certainly true that Frank is talking about data models while I am talking about interfaces. Nonetheless any data model will influence and indeed limit the possibilities of user interfaces. My concern with any data model is that it will limit user interface design in a way that makes it inappropriate for use in a research laboratory.
Me: I got off to a very bad start here. I should have used the word ‘capture’ here. This to me is about capturing the data streams that come out of lab work.
Frank: This seems to be a change of tact :) The original post was about a data model for lab notebooks .
This is part of the semantic problem (pardon the pun). To me, the primary use of a lab book is to capture the processes that occur in the lab, as far as is practicable. Data stream here should be understood to mean streams from instruments, monitors, and the stream of descriptive information that comes from the scientist.
My central problem with implementing a data model for an experiment is that we do not necessarily know in advance what the boundaries of the experiment are. But the key question will be how can FuGE work as a data model in practice. This is why we need to actually try it out and see how it fits. In a sense the question of ‘what is the experiment’ is similar to the one we have developed at a practical level in deciding what element of our record justifies its own post.
Frank divides the issues into four different stages or issues and is correct in saying that we have to a large extent conflated these issues together.
Frank again: Summary
So I will start with re-pointing what I believe to be the areas of conflation within these discussions
1. the representation of experiments – the data model
2. the presentation or level of abstraction to the user (probably some what dependent on 3.)
3. the implementation of the data model
4. the publication of the data (Notification, RSS etc.)
I don’t disagree with this, but I think there is another phase or stage. A Stage 0 where the stuff that happens in the real world is mapped onto the data model. This is the part that worries me the most. It is also an important question to deal with in a general sense. Where do we impose our data model? Before the experiment, when we are planning it, or after it has been done and we have an understanding of what we have one and what it mean. So this is where we need to see how it works in practice and go from there.
Posting this from BioSysBio 2008 at Imperial College London.
Interesting discussion – another approach for Stage 0 above is just to capture plain text (+ uploaded data files etc) in the notebook, and add structure later, as an annotation layer, rather than get researchers to design and populate the perfect data forms from the start. I think there’s a balance between the effort people will go to entering info into notebooks, (designing / modifying data models is probably too much effort, unless it can be made as simple as using a word processor) and the potential reward other people will get from harvesting the nicely structured data which results.
For annotation of PDF and web pages in the browser, you might be interested in looking at http://a.nnotate.com – it lets you highlight text and add semantic tags, URLs, notes and discussions. We wrote a white paper on some extensions for migrating info from plain text documents [easy to create, hard to mine] to structured data annotations [easy to mine, hard to create] in http://www.textensor.com/enhancing-documents-2007.html
Interesting discussion – another approach for Stage 0 above is just to capture plain text (+ uploaded data files etc) in the notebook, and add structure later, as an annotation layer, rather than get researchers to design and populate the perfect data forms from the start. I think there’s a balance between the effort people will go to entering info into notebooks, (designing / modifying data models is probably too much effort, unless it can be made as simple as using a word processor) and the potential reward other people will get from harvesting the nicely structured data which results.
For annotation of PDF and web pages in the browser, you might be interested in looking at http://a.nnotate.com – it lets you highlight text and add semantic tags, URLs, notes and discussions. We wrote a white paper on some extensions for migrating info from plain text documents [easy to create, hard to mine] to structured data annotations [easy to mine, hard to create] in http://www.textensor.com/enhancing-documents-2007.html
Your experience could have implications for us as well – I’m looking forward to examples of your experiments represented with FUGE and what you found easy or hard to represent
Your experience could have implications for us as well – I’m looking forward to examples of your experiments represented with FUGE and what you found easy or hard to represent
Hi Fred. In some ways that what we do with the current incarnation of our blog based notebook and Jean-Claude is doing on Wikispaces. It allows an essentially free text entry. I did have a brief look at a.nnotate.com but haven’t gone into it in any depth.
I think with all these things in a question of balance. Some things clearly can and should be structured. I think lab blog 3.0 (we’re currently on 2.0 in effect) will move more towards a providing some sort of data structure where it is appropriate but leave the option for completely free text where it is not. What would be sensible would be to partake of structured data models where they exist and are appropriate.
And yes the user interface has to eventually look like a word processor. I think the work being done on the Integrated Content Environment at University of Southern Queensland is particularly interesting here.
Hi Fred. In some ways that what we do with the current incarnation of our blog based notebook and Jean-Claude is doing on Wikispaces. It allows an essentially free text entry. I did have a brief look at a.nnotate.com but haven’t gone into it in any depth.
I think with all these things in a question of balance. Some things clearly can and should be structured. I think lab blog 3.0 (we’re currently on 2.0 in effect) will move more towards a providing some sort of data structure where it is appropriate but leave the option for completely free text where it is not. What would be sensible would be to partake of structured data models where they exist and are appropriate.
And yes the user interface has to eventually look like a word processor. I think the work being done on the Integrated Content Environment at University of Southern Queensland is particularly interesting here.
Our experience is that it is ridiculously hard to add a semantic layer later. It’s better to try and stratify your datastreams into what you can capture into a data model, and what you can’t … and do both. That way you limit the amount of material you are trying to categorise, and you have something you can build automatic tools around. It’s a classic 80-20 problem. Don’t try and solve it all! (I think I’m agreeing with Cameron’s last post!)
Our experience is that it is ridiculously hard to add a semantic layer later. It’s better to try and stratify your datastreams into what you can capture into a data model, and what you can’t … and do both. That way you limit the amount of material you are trying to categorise, and you have something you can build automatic tools around. It’s a classic 80-20 problem. Don’t try and solve it all! (I think I’m agreeing with Cameron’s last post!)