google-wave – Science in the Open

August 5, 2010August 5, 2010

The triumph of document layout and the demise of Google Wave

: Image via Wikipedia

I am frequently overly enamoured of the idea of where we might get to, forgetting that there are a lot of people still getting used to where weâ€™ve been. I was forcibly reminded of this by Carole Goble on the weekend when I expressed a dislike of the Utopia PDF viewer that enables active figures and semantic markup of the PDFs of scientific papers. “Why canâ€™t we just do this on the web?” I asked, and Carole pointed out the obvious, most people donâ€™t read papers on the web. We know itâ€™s a functionally better and simpler way to do it, but that improvement in functionality and simplicity is not immediately clear to, or in many cases even useable by, someone who is more comfortable with the printed page.

In my defence I never got to make the second part of the argument which is that with the new generation of tablet devices, lead by the iPad, there is a tremendous potential to build active, dynamic and (under the hood hidden from the user) semantically backed representations of papers that are both beautiful and functional. The technical means, and the design basis to suck people into web-based representations of research are falling into place and this is tremendously exciting.

However while the triumph of the iPad in the medium term may seem assured, my record on predicting the impact of technical innovations is not so good given the decision by Google to pull out of futher development of Wave primarily due to lack of uptake. Given that I was amongst the most bullish and positive of Wave advocates and yet I hadnâ€™t managed to get onto the main site for perhaps a month or so, this cannot be terribly surprising but it is disappointing.

The reasons for lack of adoption have been well rehearsed in many places (see the Wikipedia page or Google News for criticisms). The interface was confusing, a lack of clarity as to what Wave is for, and simply the amount of user contribution required to build something useful. Nonetheless Wave remains for me an extremely exciting view of the possibilites. Above all it was the ability for users or communities to build dynamic functionality into documents and to make this part of the fabric of the web that was important to me. Indeed one of the most important criticisms for me was PT Seftonâ€™s complaint that Wave didnâ€™t leverage HTML formatting, that it was in a sense not a proper part of the document web ecosystem.

The key for me about the promise of Wave was its ability to interact with web based functionality, to be dynamic; fundamentally to treat a growing document as data and present that data in new and interesting ways. In the end this was probably just too abstruse a concept to grab hold of a user. While single demonstrations were easy to put together, building graphs, showing chemistry, marking up text, it was the bigger picture that this was generally possible that never made it through.

I think this is part of the bigger problem, similar to that we experience with trying to break people out of the PDF habit that we are conceptually stuck in a world of communicating through static documents. There is an almost obsessive need to control the layout and look of documents. This can become hilarious when you see TeX users complaining about having to use Word and Word users complaining about having to use TeX for fundamentally the same reason, that they feel a loss of control over the layout of their document. Documents that move, resize, or respond really seem to put people off. I notice this myself with badly laid out pages with dynamic sidebars that shift around, inducing a strange form of motion sickness.

There seems to be a higher aesthetic bar that needs to be reached for dynamic content, something that has been rarely achieved on the web until recently and virtually never in the presentation of scientific papers. While I philosophically disagree with Appleâ€™s iron grip over their presentation ecosystem I have to admit that this has made it easier, if not quite yet automatic, to build beautiful, functional, and dynamic interfaces.

The rapid development of tablets that we can expect, as the rough and ready, but more flexible and open platforms do battle with the closed but elegant and safe environment provided by the iPad, offer real possibilities that we can overcome this psychological hurdle. Does this mean that we might finally see the end of the hegemony of the static document, that we can finally consign the PDF to the dustbin of temporary fixes where it belongs? Iâ€™m not sure I want to stick my neck out quite so far again, quite so soon and say that this will happen, or offer a timeline. But I hope it does, and I hope it does soon.

Google Bails on Wave (wired.com)
So Long, Wave, But You’ll Live On In Your Successors (webworkerdaily.com)
Google Wave is dead! Long live Google Wave! (programmableweb.com)
Google’s Departing Wave (cyberculturalist.com)
Apathy kills Google’s new-age Wave (go.theregister.com)

August 27, 2009December 30, 2009

Writing a Wave Robot – Some thoughts on good practice for Research Robots

ChemSpidey lives! Even in the face of Karen James’ heavy irony I am still amazed that someone like me with very little programming experience was able to pull together something that actually worked effectively in a live demo. As long as you’re not actively scared of trying to put things together it is becoming relatively straightforward to build tools that do useful things. Building ChemSpidey relied heavily on existing services and other people’s code but pulling that together was a relatively straightforward process. The biggest problems were fixing the strange and in most cases undocumented behaviour of some of the pieces I used. So what is ChemSpidey?

ChemSpidey is a Wave robot that can be found at chemspidey@appspot.com. The code repository is available at Github and you should feel free to re-use it in anyway you see fit, although I wouldn’t really recommend it at the moment, it isn’t exactly the highest quality. One of the first applications I see for Wave is to make it easy to author (semi-)semantic documents which link objects within the document to records on the web. In chemistry it would be helpful to link the names of compounds through to records about those compounds on the relevant databases.

If ChemSpidey is added to a wave it watches for text of the form “chem[ChemicalName{;weight {m}g}]” where the curly bracketed parts are optional. When a blip is submitted by hitting the “done” button ChemSpidey searches through the blip looking for this text and if it finds it, strips out the name and sends it) to the ChemSpider SimpleSearch service. ChemSpider returns a list of database ids and the robot currently just pulls the top one off the list and adds the text ChemicalName (csid:####) to the wave, where the id is linked back to ChemSpider. If there is a weight present it asks the ChemSpider MassSpec API for the nominal molecular weight calculates the number of moles and inserts that. You can see video of it working here (look along the timeline for the ChemSpidey tag).

ChemSpidey

What have I learned? Well some stuff that is probably obvious to anyone who is a proper developer. Use the current version of the API. Google AppEngine pushes strings around as unicode which broke my tests because I had developed things using standard Python strings. But I think it might be useful to start drawing some more general lessons about how best to design robots for research, so to kick of the discussion here are my thoughts, many of which came out of discussions with Ian Mulvany as we prepared for last weeks demo.

Always add a Welcome Blip when the Robot is added to a wave. This makes the user confident that something has happened. Lets you notify users if a new version has been released, which might change the way the robot works, and lets you provide some short instructions.It’s good to include a version number here as well.
Have some help available. Ian’s Janey robot responds to the request (janey help) in a blip with an extended help blip explaining context. Blips are easily deleted later if the user wants to get rid of them and putting these in separate blips keeps them out of the main document.
Where you modify text leave an annotation. I’ve only just started to play with annotations but it seems immensely useful to at least attempt to leave a trace here of what you’ve done that makes it easy for either your own Robot or others, or just human users to see who did what. I would suggest leaving annotations that identfy the robot, include any text that was parsed, and ideally provide some domain information. We need to discuss how to setup some name spaces for this.
Try to isolate the “science handling” from the “wave handling”. ChemSpidey mixes up a lot of things into one Python script. Looking back at it now it makes much more sense to isolate the interaction with the wave from the routines that parse text, or do mole calculations. This means both that the different levels of code should become easier for others to re-use and also if Wave doesn’t turn out to be the one system to rule them all that we can re-use the code. I am no architecture expert and it would be good to get some clues from some good ones about how best to separate things out.

These are just some initial thoughts from a very novice Python programmer. My code satisfies essentially none of these suggestions but I will make a concerted attempt to improve on that. What I really want to do is kick the conversation on from where we are at the moment, which is basically playing around, into how we design an architecture that allows rapid development of useful and powerful functionality.

August 23, 2009December 30, 2009

Reflecting on a Wave: The Demo at Science Online London 2009

Yesterday, along with Chris Thorpe and Ian Mulvany I was involved in what I imagine might be the first of a series of demos of Wave as it could apply to scientists and researchers more generally. You can see the backup video I made in case we had no network on Viddler. I’ve not really done a demo like that live before so it was a bit difficult to tell how it was going from the inside but although much of the tweetage was apparently underwhelmed the direct feedback afterwards was very positive and perceptive.

I think we struggled to get across an idea of what Wave is, which confused a significant proportion of the audience, particularly those who weren’t already aware of it or who didn’t have a pre-conceived idea of what it might do for them. My impression was that those in the audience who were technically oriented were excited by what they saw. If I was to do a demo again I would focus more on telling a story about writing a paper – really give people a context for what is going on. One problem with Wave is that it is easy to end up with a document littered with chat blips and I think this confused an audience more used to thinking about documents.

The other problem is perhaps that a bunch of things “just working” is underwhelming when people are used to the idea of powerful applications that they do their work in. Developers get the idea that this is all happening and working in a generic environment, not a special purpose built one, and that is exciting. Users just expect things to work or they’re not interested. Especially scientists. And it would be fair to say that the robots we demonstrated, mostly the work of a few hours or a few days, aren’t incredibly impressive on the surface. In addition, when it is working at its best the success of Wave is that it can make it look easy, if not yet polished. Because it looks easy people then assume it is so its not worth getting excited about. The point is not that it is possible to automatically mark up text, pull in data, and then process it. It is that you can do this effectively in your email inbox with unrelated tools, that are pretty easy to build, or at least adapt. But we also clearly need some flashier demos for scientists.

Ian pulled off a great coup in my view by linking up the output of one Robot to a visualization provided by another. Ian has written a robot called Janey which talks to the Journal/Author Name Estimator service. It can either suggest what journal to send a paper to based on the abstract or suggest other articles of interest. Ian had configured the Robot the night before so it could also get the co-authorship graph for a set of papers and put that into a new blip in the form of a list of edges (or co-authorships).

The clever bit was that Ian had found another Robot, written by someone entirely different that visualizes connection graphs. Ian set the blip that Janey was writing to to be one that the Graph robot was watching and the automatically pulled data was automatically visualized [see a screencast here]. Two Robots written by different people for different purposes can easily be hooked up together and just work. I’m not even sure whether Ian had had a chance to test it or not prior to the demo…but it looked easy, why wouldn’t people expect two data processing tools to work seamlessly together? I mean, it should just work.

The idea of a Wave as a data processing workflow was implicit in what I have written previously but Ian’s demo, and a later conversation with Alan Cann really sharpened that up in my mind. Alan was asking about different visual representations of a wave. The current client essentially uses the visual metaphor of an email system. One of the points for me that came from the demo is that it will probably be necessary to write specific clients that make sense for specific tasks. Alan asked about the idea of a Yahoo Pipes type of interface. This suggests a different way of thinking about Wave, instead of a set of text or media elements, it becomes a way to wire up Robots, automated connections to webservices. Essentially with a set of Robots and an appropriate visual client you could build a visual programming engine, a web service interface, or indeed a visual workflow editing environment.

The Wave client has to walk a very fine line between presenting a view of the Wave that the target user can understand and working with and the risk of constraining the users thinking about what can be done. The amazing thing about Wave as a framework is that these things are not only do-able but often very easy. The challenge is actually thinking laterally enough to even ask the question in the first place. The great thing about a public demo is that the challenges you get from the audience make you look at things in different ways.

Allyson Lister blogged the session, there was a FriendFeed discussion, and there should be video available at some point.

July 19, 2009December 30, 2009

Sci – Bar – Foo etc. Part III – Google Wave Session at SciFoo

Google Wave has got an awful lot of people quite excited. And others are more sceptical. A lot of SciFoo attendees were therefore very excited to be able to get an account on the developer sandbox as part of the weekend. At the opening plenary Stephanie Hannon gave a demo of Wave and, although there were numerous things that didn’t work live, that was enough to get more people interested. On the Saturday morning I organized a session to discuss what we might do and also to provide an opportunity for people to talk about technical issues. Two members of the wave team came along and kindly offered their expertise, receiving a somewhat intense grilling as thanks for their efforts.

I think it is now reasonably clear that there are two short to medium term applications for Wave in the research process. The first is the collaborative authoring of documents and the conversations around those. The second is the use of wave as a recording and analysis platform. Both types of functionality were discussed with many ideas for both. Martin Fenner has also written up some initial impressions.

Naturally we recorded the session in Wave and even as I type, over a week later, there is a conversation going in real time about the details of taking things forward. There are many things to get used to, not leastwhen it is polite to delete other people’s comments and clean them up, but the potential (and the weaknesses and areas for development) are becoming clear.

I’ve pasted our functionality brainstorm at the bottom to give people an idea of what we talked about but the discussion was very wide ranging. Functionality divided into a few categories. Firstly Robots for bringing scientific objects, chemical structures, DNA sequences, biomolecular structures, videos, and images into the wave in a functional form with links back to a canonical URI for the object. In its simplest form this might just provide a link back to a database. So typing “chem:benzene” or “pdb:1ecr” would trigger a robot to insert a link back to the database entry. More complex robots could insert an image of the chemical (or protein structure) or perhaps rdf or microformats that provide a more detailed description of the molecule.

Taking this one step further we also explored the idea of pulling data or status information from larboratory instruments to create a “laboratory dashboard” and perhaps controlling them. This discussion was helpful in getting a feel for what Wave can and can’t do as well as how different functionalities are best implemented. A robot can be built to populate a wave with information or data from laboratory instruments and such a robot could also pass information from the wave back to the instrument in principle. However both of these will still require some form of client running on the instrument side that is capable of talking to the robot web service. So the actual problem of interfacing with the instrument will remain. We can hope that instrument manufacturers might think of writing out nice simple XML log files at some point but in the meantime this is likely to involve hacking things together. If you can manage this then a Gadget will provide a nice way of providing a visual dashboard type interface to keep you updated as to what is happening.

Sharing data analysis is something of significant interest to me and the fact that there is already a robot (called Monty) that will intepret Python is a very interesting starting point for exploring this. There is some basic graphing functionality (Graphy naturally). For me this is where some of the most exciting potential lies; not just sharing printouts or the results of data analysis procedures but the details of the data and a live representation of the process that lead to the results. Expect much more from me on this in the future as we start to take it forward.

The final area of discussion, and the one we probably spent the most time on, was looking at Wave in the authoring and publishing process. Formatting of papers, sharing of live diagrams and charts, automated reference searching and formatting, as well as submission processes, both to journals and to other repositories, and even the running of peer review process were all discussed. This is the area where the most obvious and rapid gains can be made. In a very real sense Wave was designed to remove the classic problem of sending around manuscript versions with multiple figure and data files by email so you would expect it to solve a number of the obvious problems. The interesting thing in my view will be to try it out in anger.

Which was where we finished the session. I proposed the idea of writing a paper, in Wave, about the development and application of tools needed to author papers in Wave. As well as the technical side, such a paper would discuss the user experience, and any of the social issues that arise out of such a live collaborative authoring experience. If it were possible to run an actual peer review process in Wave that would also be very cool however this might not be feasible given existing journal systems. If not we will run a “mock” peer review process and look at how that works. If you are interested in being involved, drop a note in the comments, or join the Google Group that has been set up for discussions (or if you have a developer sandbox account and want access to the Wave drop me a line).

There will be lots of details to work through but the overall feel of the session for me was very exciting and very positive. There will clearly be technical and logistical barriers to be overcome. Not least that a a significant quantity of legacy toolingmay not be a good fit for Wave. Some architectural thinking on how to most effectively re-use existing code may be required. But overall the problem seems to be where to start on the large set of interesting possibilities. And that seems a good place to be with any new technology.

Continue reading “Sci – Bar – Foo etc. Part III – Google Wave Session at SciFoo”

June 8, 2009January 3, 2010

Google Wave in Research – Part II – The Lab Record

In the previous postÂ I discussed a workflow using Wave to author and publish a paper. In this post I want to look at the possibility of using it as a laboratory record, or more specifically as a human interface to the laboratory record. There has been much work in recent years on research portals and Virtual Research Environments. While this work will remain useful in defining use patterns and interface design my feeling is that Wave will become the environment of choice, a framework for a virtual research environment that rewrites the rules, not so much of what is possible, but of what is easy.

Again I will work through a use case but I want to skip over a lot of what is by now I think becoming fairly obvious. Wave provides an excellent collaborative authoring environment. We can explicitly state and register licenses using a robot. The authoring environment has all the functionality of a wiki already built in so we can assume that and granular access control means that different elements of a record can be made accessible to different groups of people. We can easily generate feeds from a single wave and aggregate content in from other feeds. The modular nature of the Wave, made up of Wavelets, themselves made up of Blips, may well make it easier to generate comprehensible RSS feeds from a wiki-like environment. Something which has up until now proven challenging. I will also assume that, as seems likely, both spreadsheet and graphing capabilities are soon available as embedded objects within a Wave.

Let us imagine an experiment of the type that I do reasonably regularly, where we use a large facility instrument to study the structure of a protein in solution. We set up the experiment by including the instrument as a participant in the wave. This participant is a Robot which fronts a web service that can speak to the data repository for the instrument. It drops into the Wave a formatted table which provides options and spaces for inputs based on a previously defined structured description of the experiment. In this case it calls for a role for this particular run(is it a background or an experimental sample?) and asks where the description of the sample is.

The purification of the protein has already been described in another wave. As part of this process a wavelet was created that represents the specific sample we are going to use. This sample can be directly referenced via a URL that points at the wavelet itself making the sample a full member of the semantic web of objects. While the free text of the purification was being typed in another Robot, this one representing a web service interface to appropriate ontologies, automatically suggested using specific terms adding links back to the ontology where suggestions were accepted, and creating the wavelets that describe specific samples.

The wavelet that defines the sample is dragged and dropped into the table for the experiment. This copying process is captured by the internal versioning system and creates in effect an embedded link back to the purification wave, linking the sample to the process that it is being used in. It is rather too much at this stage to expect the instrument control to be driven from the Wave itself but the Robot will sit and wait for the appropriate dataset to be generated and check with the user it has got the right one.

Once everyone is happy the Robot will populate the table with additional metadata captured as part of the instrumental process, create a new wavelet (creating a new addressable object) and drop in the data in the default format. The robot naturally also writes a description of the relationships between all the objects in an appropriate machine readable form (RDFa, XML, or all of the above) in a part of the Wave that the user doesn’t necessarily need to see. It may also populate any other databases or repositories as appropriate. Because the Robot knows who the user is it can also automatically link the experimental record back to the proposal for the experiment. Valuable information for the facility but not of sufficient interest to the user for them to be bothered keeping a good record of it.

The raw data is not yet in a useful form though, we need to process it, remove backgrounds, that kind of thing. To do this we add the Reduction Robot as a participant. This Robot looks within the wave for a wavelet containing raw data, asks the user for any necessary information (where is the background data to be subtracted) and then runs a web service that does the subtraction. It then writes out two new wavelets, one describing what it has done (with all appropriate links to the appropriate controlled vocab obviously), and a second with the processed data in it.

I need to do some more analysis on this data, perhaps fit a model to start with, so again I add another Robot that looks for a wavelet with the correct data type, does the line fit, once again writes out a wavelet that describes what it has done, and a wavelet with the result in it. I might do this several times, using a range of different analysis approaches, perhaps doing some structural modelling and deriving some parameter from the structure which I can compare to my analytical model fit. Creating a wavelet with a spreadsheet embedded I drag and drop the parameter from the model fit and from the structure and format the cells so that it shows green if they are within 5% of each other.

Ok, so far so cool. Lots of drag and drop and using of groovy web services but nothing that couldn’t be done with a bit of work with a Workflow engine like Taverna and a properly set up spreadsheet. I now make a copy of the wave (properly versioned, its history is clear as a branch off the original Wave) and I delete the sample from the top of the table. The Robots re-process and realize there is not sufficient data to do any processing so all the data wavelets and any graphs and visualizations, including my colour-coded spreadsheetÂ go blank. What have I done here? What I have just created is a versioned, provenanced, and shareable workflow. I can pass the Wave to a new student or collaborator simply by adding them as a participant. I can then work with them, watching as they add data, point out any mistakes they might make and discuss the results with them, even if they are on the opposite side of the globe. Most importantly I can be reasonably confident that it will work for them, they have no need to download software or configure anything. All that really remains to make this truly powerful is to wrap this workflow into a new Robot so that we can pass multiple datasets to it for processing.

When we’ve finished the experiment we can aggregate the data by dragging and dropping the final results into a new wave to create a summary we can share with a different group of people. We can tweak the figure that shows the data until we are happy and then drop it into the paper I talked about in the previous post. I’ve spent a lot of time over the past 18 months thinking and talking about how we capture what is going and at the same time create granular web-native objectsÂ and then link them together to describe relationships between them. Wave does all of that natively and it can do it just by capturing what the user does. The real power will lie in the web services behind the robots but the beauty of these is that the robots will make using those existing web services much easier for the average user. The robots will observe and annotate what the user is doing, helping them to format and link up their data and samples.

Wave brings three key things; proper collaborative documents which will encourage referring rather than cutting and pasting; proper version control for documents; and document automation through easy access to webservices. Commenting, version control and provenance, and making a cut and paste operation actually a fully functional and intelligent embed are key to building a framework for a web-native lab notebook. Wave delivers on these. The real power comes with the functionality added by Robots and Gadgets that can be relatively easily configured to do analysis. The ideas above are just what I have though of in the last week or so. Imagination really is the limit I suspect.

June 8, 2009January 3, 2010

Google Wave in Research – the slightly more sober view – Part I – Papers

I, and many others have spent the last week thinking about Wave and I have to say that I am getting more, rather than less, excited about the possibilities that this represents. All of the below will have to remain speculation for the moment but I wanted to walk through two use cases and identify how the concept of a collaborative automated document will have an impact. In this post I will start with the drafting and publication of a paper because it is an easier step to think about. In the next post I will move on to the use of Wave as a laboratory recording tool.

Drafting and publishing a paper via Wave

I start drafting the text of a new paper. As I do this I add the Creative Commons robot as a participant. The robot will ask what license I wish to use and then provide a stamp, linked back to the license terms. When a new participant adds text or material to the document, they will be asked whether they are happy with the license, and their agreement will be registered within a private blip within the Wave controlled by the Robot (probably called CC-bly, pronounced see-see-bly). The robot may also register the document with a central repository of open content. A second robot could notify the authors respective institutional repository, creating a negative click repository in, well one click. More seriously this would allow the IR to track, and if appropriate modify, the document as well as harvest its content and metadata automatically.

I invite a series of authors to contribute to the paper and we start to write. Naturally the inline commenting and collaborative authoring tools get a good workout and it is possible to watch the evolution of specific sections with the playback tool. The authors are geographically distributed but we can organize scheduled hacking sessions with inline chat to work on sections of the paper. As we start to add references the Reference Formatter gets added (not sure whether this is a Robot or an Gadget, but it is almost certainly called “Reffy”). The formatter automatically recognizes text of the form (Smythe and Hoofback 1876) and searches the Citeulike libraries of the authors for the appropriate reference, adds an inline citation, and places a formatted reference in a separate Wavelet to keep it protected from random edits. Chunks of text can be collected from reports or theses in other Waves and the tracking system notes where they have come from, maintaing the history of the whole document and its sources and checking licenses for compatibility. Terminology checkers can be run over the document, based on the existing Spelly extension (although at the moment this works on the internal not the external API – Google say they are working to fix that) that check for incorrect or ambiguous use of terms, or identify gene names, structures etc. and automatically format them and link them to the reference database.

It is time to add some data and charts to the paper. The actual source data are held in an online spreadsheet. A chart/graphing widget is added to the document and formats the data into a default graph which the user can then modify as they wish. The link back to the live data is of course maintained. Ideally this will trigger the CC-bly robot to query the user as to whether they wish to dedicate the data to the Public Domain (therefore satisfying both the Science Commons Data protocol and the Open Knowledge Definition – see how smoothly I got that in?). When the users says yes (being a right thinking person) the data is marked with the chosen waiver/dedication and CKAN is notified and a record created of the new dataset.

The paper is cleaned up – informal comments can be easily obtained by adding colleagues to the Wave. Submission is as simple as adding a new participant, the journal robot (PLoSsy obviously) to the Wave. The journal is running its own Wave server so referees can be given anonymous accounts on that system if they choose. Review can happen directly within the document with a conversation between authors, reviewers, and editors. You don’t need to wait for some system to aggregate a set of comments and send them in one hit and you can deal with issues directly in conversation with the people who raise them. In addition the contribution of editors and referees to the final document is explicitly tracked. Because the journal runs its own server, not only can the referees and editors have private conversations that the authors don’t see, those conversations need never leave the journal server and are as secure as they can reasonably be expected to be.

Once accepted the paper is published simply by adding a new participant. What would traditionally happen at this point is that a completely new typeset version would be created, breaking the link with everything that has gone before. This could be done by creating a new Wave with just the finalized version visible and all comments stripped out. What would be far more exciting would be for a formatted version to be created which retained the entire history. A major objection to publishing referees comments is that they refer to the unpublished version. Here the reader can see the comments in context and come to their own conclusions. Before publishing any inline data will need to be harvested and placed in a reliable repository along with any other additional information. Supplementary information can simple be hidden under “folds” within the document rather than buried in separate documents.

The published document is then a living thing. The canonical “as published” version is clearly marked but the functionality for comments or updates or complete revisions is built in. The modular XML nature of the Wave means that there is a natural means of citing a specific portion of the document. In the future citations to a specific point in a paper could be marked, again via a widget or robot, to provide a back link to the citing source. Harvesters can traverse this graph of links in both directions easily wiring up the published data graph.

Based on the currently published information none of the above is even particularly difficult to implement. Much of it will require some careful study of how the work flows operate in practice and there will likely be issues of collisions and complications but most of the above is simply based on the functionality demonstrated at the Wave launch. The real challenge will lie in integration with existing publishing and document management systems and with the subtle social implications that changing the way that authors, referees, editors, and readers interact with the document. Should readers be allowed to comment directly in the Wave or should that be in a separate Wavelet? Will referees want to be anonymous and will authors be happy to see the history made public?

Much will depend on how reliable and how responsive the technology really is, as well as how easy it is to build the functionality described above. But the bottom line is that this is the result of about four day’s occasional idle thinking about what can be done. When we really start building and realizing what we can do, that is when the revolution will start.

Part II is here.

May 30, 2009January 24, 2010

OMG! This changes EVERYTHING! – or – Yet Another Wave of Adulation

Yes, I’m afraid it’s yet another over the top response to yesterday’s big announcement of Google Wave, the latest paradigm shifting gob-smackingly brilliant piece of technology (or PR depending on your viewpoint) out of Google. My interest, however is pretty specific, how can we leverage it to help us capture, communicate, and publish research? And my opinion is that this is absolutely game changing – it makes a whole series of problems simply go away, and potentially provides a route to solving many of the problems that I was struggling to see how to manage.

Firstly, lets look at the grab bag of generic issues that I’ve been thinking about. Most recently I wrote about how I thought “real time” wasn’t the big deal but giving the user control back over the timeframe in which streams came into them. I had some vague ideas about how this might look but Wave has working code. When the people who you are in conversation with are online and looking at the same wave they will see modifications in real time. If they are not in the same document they will see the comments or changes later, but can also “re-play” changes. But a lot of thought has clearly gone into thinking about the default views based on when and how a person first comes into contact with a document.

Another issue that has frustrated me is the divide between wikis and blogs. Wikis have generally better editing functionality, but blogs have workable RSS feeds, Wikis have more plugins, blogs map better onto the diary style of a lab notebook. None of these were ever fundamental philosophical differences but just historical differences of implementations and developer priorities. Wave makes most of these differences irrelevant by creating a collaborative document framework that easily incorporates much of the best of all of these tools within a high quality rich text and media authoring platform. Bringing in content looks relatively easy and pushing content out in different forms also seems to be pretty straightforward. Streams, feeds, and other outputs, if not native, look to be easily generated either directly or by passing material to other services. The Waves themselves are XML which should enable straightforward parsing and tweaking with existing tools as well.

One thing I haven’t written much about but have been thinking about is the process of converting lab records into reports and onto papers. While there wasn’t much on display about complex documents a lot of just nice functionality, drag and drop links, options for incorporating and embedding content was at least touched on. Looking a little closer into the documentation there seems to be quite a strong provenance model, built on a code repository style framework for handling document versioning and forking. All good steps in the right direction and with the open APIs and multitouch as standard on the horizon there will no doubt be excellent visual organization and authoring tools along very soon now. For those worried about security and control, a 30 second mention in the keynote basically made it clear that they have it sorted. Private messages (documents? mecuments?) need never leave your local server.

Finally the big issue for me has for some time been bridging the gap between unstructured capture of streams of events and making it easy to convert those to structured descriptions of the intepretation of experiments.Â The audience was clearly wowed by the demonstration of inline real time contextual spell checking and translation. My first thought was – I want to see that real-time engine attached to an ontology browser or DbPedia and automatically generating links back to the URIs for concepts and objects. What really struck me most was the use of Waves with a few additional tools to provide authoring tools that help us to build the semantic web, the web of data, and the web of things.

For me, the central challenges for a laboratory recording system are capturing objects, whether digital or physical, as they are created, and then serve those back to the user, as they need them to describe the connections between them. As we connect up these objects we will create the semantic web. As we build structured knowledge against those records we will build a machine-parseable record of what happened that will help us to plan for the future. As I understand it each wave, and indeed each part of a wave, can be a URL endpoint; an object on the semantic web. If they aren’t already it will be easy to make them that. As much as anything it is the web native collaborative authoring tool that will make embedding and pass by reference the default approach rather than cut and past that will make the difference. Google don’t necessarily do semantic web but they do links and they do embedding, and they’ve provided a framework that should make it easy to add meaning to the links. Google just blew the door off the ELN market, and they probably didn’t even notice.

Those of us interested in web-based and electronic recording and communication of science have spent a lot of the last few years trying to describe how we need to glue the existing tools together, mailing lists, wikis, blogs, documents, databases, papers. The framework was never right so a lot of attention was focused on moving things backwards and forwards, how to connect one thing to another. That problem, as far as I can see has now ceased to exist. The challenge now is in building the right plugins and making sure the architecture is compatible with existing tools. But fundamentally the framework seems to be there. It seems like it’s time to build.

A more sober reflection will probably follow in a few days ;-)

Related articles by Zemanta