Home » Featured

Google Wave in Research – Part II – The Lab Record

8 June 2009 24 Comments

In the previous post  I discussed a workflow using Wave to author and publish a paper. In this post I want to look at the possibility of using it as a laboratory record, or more specifically as a human interface to the laboratory record. There has been much work in recent years on research portals and Virtual Research Environments. While this work will remain useful in defining use patterns and interface design my feeling is that Wave will become the environment of choice, a framework for a virtual research environment that rewrites the rules, not so much of what is possible, but of what is easy.

Again I will work through a use case but I want to skip over a lot of what is by now I think becoming fairly obvious. Wave provides an excellent collaborative authoring environment. We can explicitly state and register licenses using a robot. The authoring environment has all the functionality of a wiki already built in so we can assume that and granular access control means that different elements of a record can be made accessible to different groups of people. We can easily generate feeds from a single wave and aggregate content in from other feeds. The modular nature of the Wave, made up of Wavelets, themselves made up of Blips, may well make it easier to generate comprehensible RSS feeds from a wiki-like environment. Something which has up until now proven challenging. I will also assume that, as seems likely, both spreadsheet and graphing capabilities are soon available as embedded objects within a Wave.

Let us imagine an experiment of the type that I do reasonably regularly, where we use a large facility instrument to study the structure of a protein in solution. We set up the experiment by including the instrument as a participant in the wave. This participant is a Robot which fronts a web service that can speak to the data repository for the instrument. It drops into the Wave a formatted table which provides options and spaces for inputs based on a previously defined structured description of the experiment. In this case it calls for a role for this particular run(is it a background or an experimental sample?) and asks where the description of the sample is.

The purification of the protein has already been described in another wave. As part of this process a wavelet was created that represents the specific sample we are going to use. This sample can be directly referenced via a URL that points at the wavelet itself making the sample a full member of the semantic web of objects. While the free text of the purification was being typed in another Robot, this one representing a web service interface to appropriate ontologies, automatically suggested using specific terms adding links back to the ontology where suggestions were accepted, and creating the wavelets that describe specific samples.

The wavelet that defines the sample is dragged and dropped into the table for the experiment. This copying process is captured by the internal versioning system and creates in effect an embedded link back to the purification wave, linking the sample to the process that it is being used in. It is rather too much at this stage to expect the instrument control to be driven from the Wave itself but the Robot will sit and wait for the appropriate dataset to be generated and check with the user it has got the right one.

Once everyone is happy the Robot will populate the table with additional metadata captured as part of the instrumental process, create a new wavelet (creating a new addressable object) and drop in the data in the default format. The robot naturally also writes a description of the relationships between all the objects in an appropriate machine readable form (RDFa, XML, or all of the above) in a part of the Wave that the user doesn’t necessarily need to see. It may also populate any other databases or repositories as appropriate. Because the Robot knows who the user is it can also automatically link the experimental record back to the proposal for the experiment. Valuable information for the facility but not of sufficient interest to the user for them to be bothered keeping a good record of it.

The raw data is not yet in a useful form though, we need to process it, remove backgrounds, that kind of thing. To do this we add the Reduction Robot as a participant. This Robot looks within the wave for a wavelet containing raw data, asks the user for any necessary information (where is the background data to be subtracted) and then runs a web service that does the subtraction. It then writes out two new wavelets, one describing what it has done (with all appropriate links to the appropriate controlled vocab obviously), and a second with the processed data in it.

I need to do some more analysis on this data, perhaps fit a model to start with, so again I add another Robot that looks for a wavelet with the correct data type, does the line fit, once again writes out a wavelet that describes what it has done, and a wavelet with the result in it. I might do this several times, using a range of different analysis approaches, perhaps doing some structural modelling and deriving some parameter from the structure which I can compare to my analytical model fit. Creating a wavelet with a spreadsheet embedded I drag and drop the parameter from the model fit and from the structure and format the cells so that it shows green if they are within 5% of each other.

Ok, so far so cool. Lots of drag and drop and using of groovy web services but nothing that couldn’t be done with a bit of work with a Workflow engine like Taverna and a properly set up spreadsheet. I now make a copy of the wave (properly versioned, its history is clear as a branch off the original Wave) and I delete the sample from the top of the table. The Robots re-process and realize there is not sufficient data to do any processing so all the data wavelets and any graphs and visualizations, including my colour-coded spreadsheet  go blank. What have I done here? What I have just created is a versioned, provenanced, and shareable workflow. I can pass the Wave to a new student or collaborator simply by adding them as a participant. I can then work with them, watching as they add data, point out any mistakes they might make and discuss the results with them, even if they are on the opposite side of the globe. Most importantly I can be reasonably confident that it will work for them, they have no need to download software or configure anything. All that really remains to make this truly powerful is to wrap this workflow into a new Robot so that we can pass multiple datasets to it for processing.

When we’ve finished the experiment we can aggregate the data by dragging and dropping the final results into a new wave to create a summary we can share with a different group of people. We can tweak the figure that shows the data until we are happy and then drop it into the paper I talked about in the previous post. I’ve spent a lot of time over the past 18 months thinking and talking about how we capture what is going and at the same time create granular web-native objects  and then link them together to describe relationships between them. Wave does all of that natively and it can do it just by capturing what the user does. The real power will lie in the web services behind the robots but the beauty of these is that the robots will make using those existing web services much easier for the average user. The robots will observe and annotate what the user is doing, helping them to format and link up their data and samples.

Wave brings three key things; proper collaborative documents which will encourage referring rather than cutting and pasting; proper version control for documents; and document automation through easy access to webservices. Commenting, version control and provenance, and making a cut and paste operation actually a fully functional and intelligent embed are key to building a framework for a web-native lab notebook. Wave delivers on these. The real power comes with the functionality added by Robots and Gadgets that can be relatively easily configured to do analysis. The ideas above are just what I have though of in the last week or so. Imagination really is the limit I suspect.


24 Comments »

  • Phil Wolff said:

    A primary virtue of your use case is that systems are triggering conversation. The systems bring results to users for comment in a conversational context.

    You’ll also need to create filters so system alerts don’t overwhelmed you (perceived as spam) and so you only see the items you find interesting or relevant.

    Great imagining, Cameron.

  • Phil Wolff said:

    A primary virtue of your use case is that systems are triggering conversation. The systems bring results to users for comment in a conversational context.

    You’ll also need to create filters so system alerts don’t overwhelmed you (perceived as spam) and so you only see the items you find interesting or relevant.

    Great imagining, Cameron.

  • Cameron Neylon said:

    Phil, I like your concept of a workflow as a conversation between automated participants, that can call out to the human user as appropriate. It seems to me to fit better with people’s natural workflow.

    Filtering will be a big issue. Managing the inbox will become a serious skill as will filtering out nonsense (spam) and things that don’t need urgent attention.

    Cheers

    Cameron

  • Cameron Neylon said:

    Phil, I like your concept of a workflow as a conversation between automated participants, that can call out to the human user as appropriate. It seems to me to fit better with people’s natural workflow.

    Filtering will be a big issue. Managing the inbox will become a serious skill as will filtering out nonsense (spam) and things that don’t need urgent attention.

    Cheers

    Cameron

  • John Erickson said:

    Cameron, I was very interested and excited to read your thoughts.

    For further inspiration, check out the following demo videos of a conceptual prototype we did for Fractal, a research project on content-centered collaboration. The prototype was centered around the collaborative and interactive production of a monthly report:

    Fractal Conceptual Prototype: Content Spaces
    http://www.hpl.hp.com/techreports/2009/HPL-2009-64.html

    Fractal Conceptual Prototype: Extensions Marketplace:
    http://www.hpl.hp.com/techreports/2009/HPL-2009-65.html

    Fractal Conceptual Prototype: Active Behaviors:
    http://www.hpl.hp.com/techreports/2009/HPL-2009-66.html

    Sadly, the Fractal Project was recently canceled as part of a major reduction of HPLabs in the UK and a few other places… :(

    John

  • John Erickson said:

    Cameron, I was very interested and excited to read your thoughts.

    For further inspiration, check out the following demo videos of a conceptual prototype we did for Fractal, a research project on content-centered collaboration. The prototype was centered around the collaborative and interactive production of a monthly report:

    Fractal Conceptual Prototype: Content Spaces
    http://www.hpl.hp.com/techreports/2009/HPL-2009-64.html

    Fractal Conceptual Prototype: Extensions Marketplace:
    http://www.hpl.hp.com/techreports/2009/HPL-2009-65.html

    Fractal Conceptual Prototype: Active Behaviors:
    http://www.hpl.hp.com/techreports/2009/HPL-2009-66.html

    Sadly, the Fractal Project was recently canceled as part of a major reduction of HPLabs in the UK and a few other places… :(

    John

  • Raik said:

    Hi Cameron,
    very convincing read! Thanks! My only minor objection is that you probably shouldn’t assume to have the whole process of discovery and data processing covered by wave robots. At least not for a while. Non-wave tools will remain important. Perhaps you still rather use your proven and customized local structure calculation tool rather then a remote robot…
    Anyway, how do we get this infrastructure started?
    /Raik

  • Raik said:

    Hi Cameron,
    very convincing read! Thanks! My only minor objection is that you probably shouldn’t assume to have the whole process of discovery and data processing covered by wave robots. At least not for a while. Non-wave tools will remain important. Perhaps you still rather use your proven and customized local structure calculation tool rather then a remote robot…
    Anyway, how do we get this infrastructure started?
    /Raik

  • Cameron Neylon said:

    Raik, definitely adoption would be patchy and incomplete. This was more about what was possible than what we will build tomorrow. Nonetheless building gateways from local tools to Wave should be pretty easy from what I can see. At its simplest a directory could pass files into a wave via a webservice as a notification tool.

  • Cameron Neylon said:

    Raik, definitely adoption would be patchy and incomplete. This was more about what was possible than what we will build tomorrow. Nonetheless building gateways from local tools to Wave should be pretty easy from what I can see. At its simplest a directory could pass files into a wave via a webservice as a notification tool.

  • Peter Nollert said:

    Great idea and technically feasible. Adoption is really a key issue. If WAVEs will get the same type of adoption as MS Word, Excel and Powerpoint, WAVEs as electronic notebooks may seem like a no-brainer. Communication platforms come and go, the ways scientific experiments are carried out change on a slower time scale.

    Also important: free, open source & mobile.

  • Peter Nollert said:

    Great idea and technically feasible. Adoption is really a key issue. If WAVEs will get the same type of adoption as MS Word, Excel and Powerpoint, WAVEs as electronic notebooks may seem like a no-brainer. Communication platforms come and go, the ways scientific experiments are carried out change on a slower time scale.

    Also important: free, open source & mobile.

  • Cameron Neylon said:

    Peter, agreed, adoption is key, although an interesting point is that for an internally accessible system you would only need you own server with the relevant people having accounts. I think one of the interesting things about Wave is precisely this federate nature which means it will be able to to exit in isolated pockets doing different things, very similar in fact to the way that email spread, so I am pretty confident about penetration in the long term.

    And yes, free and open source need to be a given for developments. Otherwise we’re not going to be able to build anything useful with the resources that are likely to be available.

  • Cameron Neylon said:

    Peter, agreed, adoption is key, although an interesting point is that for an internally accessible system you would only need you own server with the relevant people having accounts. I think one of the interesting things about Wave is precisely this federate nature which means it will be able to to exit in isolated pockets doing different things, very similar in fact to the way that email spread, so I am pretty confident about penetration in the long term.

    And yes, free and open source need to be a given for developments. Otherwise we’re not going to be able to build anything useful with the resources that are likely to be available.

  • Larry Weiss MD said:

    Hi Cameron,
    Very intriguing ideas – How woul you like to do a live demonstration at Sci Foo Camp next month? I will be there and was going to host a session on environmental microbiology. There will be lots of samples, results and, hopefully, an engaging discussion to follow. Perhaps using the Wave as the “lab notebook” would be an interesting focal point? If you are interested, please contact me.

    Larry Weiss, MD

  • Larry Weiss MD said:

    Hi Cameron,
    Very intriguing ideas – How woul you like to do a live demonstration at Sci Foo Camp next month? I will be there and was going to host a session on environmental microbiology. There will be lots of samples, results and, hopefully, an engaging discussion to follow. Perhaps using the Wave as the “lab notebook” would be an interesting focal point? If you are interested, please contact me.

    Larry Weiss, MD

  • Cameron Neylon said:

    Larry, sorry for the delay in replying. I am very hopeful we’ll be able to do something with Wave at Scifoo – have proposed a session but haven’t as yet managed to score a sandbox account. There was a suggestion that all campers might be able to get an account on the development server but haven’t as yet heard more on that. But it definitely sounds like an interesting thing to try out. Will stay in touch as any information comes to light.

  • Cameron Neylon said:

    Larry, sorry for the delay in replying. I am very hopeful we’ll be able to do something with Wave at Scifoo – have proposed a session but haven’t as yet managed to score a sandbox account. There was a suggestion that all campers might be able to get an account on the development server but haven’t as yet heard more on that. But it definitely sounds like an interesting thing to try out. Will stay in touch as any information comes to light.

  • Bobby James said:

    Has anyone yet tried to actually take a paper from first sentence through data analysis, figure production, and at least to submission? How has it worked? Where are the rough spots? I work as an epidemiologist/statistician and am wondering about Wave for all sorts of collaborative tasks, but I have yet to assemble a set of colleagues with accounts to actually skin our knees on the task.

  • Bobby James said:

    Has anyone yet tried to actually take a paper from first sentence through data analysis, figure production, and at least to submission? How has it worked? Where are the rough spots? I work as an epidemiologist/statistician and am wondering about Wave for all sorts of collaborative tasks, but I have yet to assemble a set of colleagues with accounts to actually skin our knees on the task.

  • Cameron Neylon said:

    Robert, I’ve played around a bit with an existing paper. This was mostly as an exercise to see what kind of tools we would need to aid the process. There are still some pretty big gaps, particularly in the data handling as well as the logistics of moving things around (i.e. blips that you might want to re-order in the drafting process which is currently not possible). So no, working towards it but I’m not aware of it be ing taken the whole way yet.

    I am hoping to put together a paper about the building tools to write a paper in wave, in wave, if that makes sense as a way of exploring the issues.

  • Cameron Neylon said:

    Robert, I’ve played around a bit with an existing paper. This was mostly as an exercise to see what kind of tools we would need to aid the process. There are still some pretty big gaps, particularly in the data handling as well as the logistics of moving things around (i.e. blips that you might want to re-order in the drafting process which is currently not possible). So no, working towards it but I’m not aware of it be ing taken the whole way yet.

    I am hoping to put together a paper about the building tools to write a paper in wave, in wave, if that makes sense as a way of exploring the issues.

  • Lasse Buck said:

    Hi Cameron – Great work!

    When we use a data modifying robot like the Reduction Robot, only the raw data and the resulting data is captured in the waves. The nature of the actual processing remains hidden.

    In your case, the goal is to create a versioned, provenanced, and shareable workflow. In many cases, versioned means that everything should also be trackable i.e. repeatable.
    In order to obtain that, proper effort must be made to:
    1. Ensure the credentials of the robots used.
    2. Record versioning of the robots used or make sure that they can not be altered.

    I expect my considerations are also relevant in your case?

  • Lasse Buck said:

    Hi Cameron – Great work!

    When we use a data modifying robot like the Reduction Robot, only the raw data and the resulting data is captured in the waves. The nature of the actual processing remains hidden.

    In your case, the goal is to create a versioned, provenanced, and shareable workflow. In many cases, versioned means that everything should also be trackable i.e. repeatable.
    In order to obtain that, proper effort must be made to:
    1. Ensure the credentials of the robots used.
    2. Record versioning of the robots used or make sure that they can not be altered.

    I expect my considerations are also relevant in your case?