Home » Blog, Featured

Google Wave in Research – the slightly more sober view – Part I – Papers

8 June 2009 20 Comments

I, and many others have spent the last week thinking about Wave and I have to say that I am getting more, rather than less, excited about the possibilities that this represents. All of the below will have to remain speculation for the moment but I wanted to walk through two use cases and identify how the concept of a collaborative automated document will have an impact. In this post I will start with the drafting and publication of a paper because it is an easier step to think about. In the next post I will move on to the use of Wave as a laboratory recording tool.

Drafting and publishing a paper via Wave

I start drafting the text of a new paper. As I do this I add the Creative Commons robot as a participant. The robot will ask what license I wish to use and then provide a stamp, linked back to the license terms. When a new participant adds text or material to the document, they will be asked whether they are happy with the license, and their agreement will be registered within a private blip within the Wave controlled by the Robot (probably called CC-bly, pronounced see-see-bly). The robot may also register the document with a central repository of open content. A second robot could notify the authors respective institutional repository, creating a negative click repository in, well one click. More seriously this would allow the IR to track, and if appropriate modify, the document as well as harvest its content and metadata automatically.

I invite a series of authors to contribute to the paper and we start to write. Naturally the inline commenting and collaborative authoring tools get a good workout and it is possible to watch the evolution of specific sections with the playback tool. The authors are geographically distributed but we can organize scheduled hacking sessions with inline chat to work on sections of the paper. As we start to add references the Reference Formatter gets added (not sure whether this is a Robot or an Gadget, but it is almost certainly called “Reffy”). The formatter automatically recognizes text of the form (Smythe and Hoofback 1876) and searches the Citeulike libraries of the authors for the appropriate reference, adds an inline citation, and places a formatted reference in a separate Wavelet to keep it protected from random edits. Chunks of text can be collected from reports or theses in other Waves and the tracking system notes where they have come from, maintaing the history of the whole document and its sources and checking licenses for compatibility. Terminology checkers can be run over the document, based on the existing Spelly extension (although at the moment this works on the internal not the external API – Google say they are working to fix that) that check for incorrect or ambiguous use of terms, or identify gene names, structures etc. and automatically format them and link them to the reference database.

It is time to add some data and charts to the paper. The actual source data are held in an online spreadsheet. A chart/graphing widget is added to the document and formats the data into a default graph which the user can then modify as they wish. The link back to the live data is of course maintained. Ideally this will trigger the CC-bly robot to query the user as to whether they wish to dedicate the data to the Public Domain (therefore satisfying both the Science Commons Data protocol and the Open Knowledge Definition – see how smoothly I got that in?). When the users says yes (being a right thinking person) the data is marked with the chosen waiver/dedication and CKAN is notified and a record created of the new dataset.

The paper is cleaned up – informal comments can be easily obtained by adding colleagues to the Wave. Submission is as simple as adding a new participant, the journal robot (PLoSsy obviously) to the Wave. The journal is running its own Wave server so referees can be given anonymous accounts on that system if they choose. Review can happen directly within the document with a conversation between authors, reviewers, and editors. You don’t need to wait for some system to aggregate a set of comments and send them in one hit and you can deal with issues directly in conversation with the people who raise them. In addition the contribution of editors and referees to the final document is explicitly tracked. Because the journal runs its own server, not only can the referees and editors have private conversations that the authors don’t see, those conversations need never leave the journal server and are as secure as they can reasonably be expected to be.

Once accepted the paper is published simply by adding a new participant. What would traditionally happen at this point is that a completely new typeset version would be created, breaking the link with everything that has gone before. This could be done by creating a new Wave with just the finalized version visible and all comments stripped out. What would be far more exciting would be for a formatted version to be created which retained the entire history. A major objection to publishing referees comments is that they refer to the unpublished version. Here the reader can see the comments in context and come to their own conclusions. Before publishing any inline data will need to be harvested and placed in a reliable repository along with any other additional information. Supplementary information can simple be hidden under “folds” within the document rather than buried in separate documents.

The published document is then a living thing. The canonical “as published” version is clearly marked but the functionality for comments or updates or complete revisions is built in. The modular XML nature of the Wave means that there is a natural means of citing a specific portion of the document. In the future citations to a specific point in a paper could be marked, again via a widget or robot, to provide a back link to the citing source. Harvesters can traverse this graph of links in both directions easily wiring up the published data graph.

Based on the currently published information none of the above is even particularly difficult to implement. Much of it will require some careful study of how the work flows operate in practice and there will likely be issues of collisions and complications but most of the above is simply based on the functionality demonstrated at the Wave launch. The real challenge will lie in integration with existing publishing and document management systems and with the subtle social implications that changing the way that authors, referees, editors, and readers interact with the document. Should readers be allowed to comment directly in the Wave or should that be in a separate Wavelet? Will referees want to be anonymous and will authors be happy to see the history made public?

Much will depend on how reliable and how responsive the technology really is, as well as how easy it is to build the functionality described above. But the bottom line is that this is the result of about four day’s occasional idle thinking about what can be done. When we really start building and realizing what we can do, that is when the revolution will start.

Part II is here.


  • Interesting how you bring the wave concepts to the whole chain of creating the new article. In an earlier blogpost, I thought about similar ideas, but only concerning the part of submission//archiving the publication afterwards.

    One of the key points you make is that the publication is indeed a living thing, and that there’s a huge opportunity for wave because current workflows and systems are too rigid.

    Looking forward to read more from you !

  • Interesting how you bring the wave concepts to the whole chain of creating the new article. In an earlier blogpost, I thought about similar ideas, but only concerning the part of submission//archiving the publication afterwards.

    One of the key points you make is that the publication is indeed a living thing, and that there’s a huge opportunity for wave because current workflows and systems are too rigid.

    Looking forward to read more from you !

  • Anna

    o/

    Excellent case – I’ve been busily expounding the transformation that Wave could potentially produce this weekend at a conference – some people can see the possibilities immediately. I just wish we could get started using immediately. Should I post (you) my outline for the teaching module (which I’ll be running in October if Wave is available by then) along a similar vein?

  • Anna

    \o/

    Excellent case – I’ve been busily expounding the transformation that Wave could potentially produce this weekend at a conference – some people can see the possibilities immediately. I just wish we could get started using immediately. Should I post (you) my outline for the teaching module (which I’ll be running in October if Wave is available by then) along a similar vein?

  • Bruce D’Arcus

    Yup.

    On the citation stuff, I posted something on the API group actually, and on the zotero dev list. Suffice it to say that I really hope we can avoid the continued fragmentation in these efforts, and figure out a solution that is, like Wave, based on a distributed model from the beginning.

  • Bruce D’Arcus

    Yup.

    On the citation stuff, I posted something on the API group actually, and on the zotero dev list. Suffice it to say that I really hope we can avoid the continued fragmentation in these efforts, and figure out a solution that is, like Wave, based on a distributed model from the beginning.

  • Bruce, that would be very cool. There has been a bit of interest expressed at least on the part of Mendeley on Friendfeed. If you are interested I could put you in touch with them. As you say would be great if efforts weren’t fragmented, a Robot with a single configuration to say, where your library is would be brilliant.

  • Bruce, that would be very cool. There has been a bit of interest expressed at least on the part of Mendeley on Friendfeed. If you are interested I could put you in touch with them. As you say would be great if efforts weren’t fragmented, a Robot with a single configuration to say, where your library is would be brilliant.

  • Bruce D’Arcus

    I’m in touch with the Mendeley guys. They use the CSL language I and the Zotero team have designed and developed (and they borrow some of the Zotero code for that matter for their word-processor integration). My basic idea with the robot is that it would a) use CSL for style config, and b) that it could be configured to pull in metadata from multiple places.

  • Bruce D’Arcus

    I’m in touch with the Mendeley guys. They use the CSL language I and the Zotero team have designed and developed (and they borrow some of the Zotero code for that matter for their word-processor integration). My basic idea with the robot is that it would a) use CSL for style config, and b) that it could be configured to pull in metadata from multiple places.

  • That’s great. As you say a distributed but integrative approach seems appealing. There are unlikely to be the resources to allow much more in the way of fragmentation I suspect. Or at least if there is much more fragmentation then the limited resources won’t allow very much more to be achieved.

  • That’s great. As you say a distributed but integrative approach seems appealing. There are unlikely to be the resources to allow much more in the way of fragmentation I suspect. Or at least if there is much more fragmentation then the limited resources won’t allow very much more to be achieved.

  • I’m looking forward to wave, but I’m afraid that entire communities (eg. physics and math) will not be make use of it for research unless someone develops a way to integrate LaTeX, which is used for essential all papers in these fields.

  • I’m looking forward to wave, but I’m afraid that entire communities (eg. physics and math) will not be make use of it for research unless someone develops a way to integrate LaTeX, which is used for essential all papers in these fields.

  • John – thanks for the comment. There is in fact already a LaTex robot that has been written for Wave and there is some work going on with LaTex to MathML conversions that would actually enable the maths to not only be displayed right but also be machine readable which opens up some very interesting possibilities.

  • John – thanks for the comment. There is in fact already a LaTex robot that has been written for Wave and there is some work going on with LaTex to MathML conversions that would actually enable the maths to not only be displayed right but also be machine readable which opens up some very interesting possibilities.

  • Saw a reference to this post in the Google blog announcement. Been thinking about similar things. Not surprised that Bruce D’Arcus would end up here… ;-)
    The OA angle is an important one. Not only because ease of putting things in a repository is important in getting people to do the right thing, but because it shows what “publishing” really means. In this case, as in several others, GWave puts some core issues at the forefront.
    Can’t wait to start using Wave.

  • Saw a reference to this post in the Google blog announcement. Been thinking about similar things. Not surprised that Bruce D’Arcus would end up here… ;-)
    The OA angle is an important one. Not only because ease of putting things in a repository is important in getting people to do the right thing, but because it shows what “publishing” really means. In this case, as in several others, GWave puts some core issues at the forefront.
    Can’t wait to start using Wave.

  • Douglas

    I suppose you would have seen Igor by now. Wave auto-inline-citations robot…
    http://bit.ly/AWEJE

  • Douglas

    I suppose you would have seen Igor by now. Wave auto-inline-citations robot…
    http://bit.ly/AWEJE