Home » Blog

Increasing the persistence of online Open Notebooks

15 November 2007 12 Comments

Weird. I came across WebCite this morning while having a quick scan through the Eysenbach paper on Open Access increasing number of citations in PLoS Biology. At the bottom was the comment that all the web pages have been archived on WebCite. Going across to WebCite I find the following:

What is WebCite®?
WebCite® is an archiving system for webreferences (cited webpages and websites), which can be used by authors, editors, and publishers of scholarly papers and books, to ensure that cited webmaterial will remain available to readers in the future. If cited webreferences in journal articles, books etc. are not archived, future readers may encounter a “404 File Not Found” error when clicking on a cited URL.

A WebCite® reference is an archived webcitation, and rather than linking to the live website (which can and probably will disappear in the future), authors of scholarly works will link to the archived WebCite® copy on webcitation.org.

Now this is interesting in its own right but my thoughts went back to a question I was asked several times during the talks I’ve given recently, ‘How do you know it will still be there in X years time?’ So might this be a good secondary archive for pages of electronic lab books, particularly when we wish to refer to specific pages in a peer reviewed paper.

Part of the answer to this may be to have many independent copies in different forms. However another answer is explicit archiving, whether that be in institutional repositories, general online repositories, embedded in supplementary information, or just trusting in Google. WebCite or something similar could have a useful role to play here. It could also provide a method of getting third party datestamps on the notebook.

That’s not the weird bit:

The wierd thing is that Deepak Singh just blogged about WebCite a few hours ago, which he picked up on from Jon Udell’s blog. Perhaps it is just morphic resonance :)

Some of the comments on Jon’s blog mirror my own concerns. The business model for this doesn’t seem to be well thought out. I can’t see many publishers buying into a system which is essentially trying to bridge the gap between their collapsing business model and a future of online publishing. In this context it is interesting that the list of members does not for instance include Nature Publishing Group or any PLoS journals who you would think wouldbe at the forefront of this kind of initiative. I would support the comment by Jon Galloway that this would be a good case for contextual advertising.

So I can see that this could well be useful as an archive for cited web pages, and I will use it when citing material from the web in future in my own papers. In terms of using it more generally as an archive for the notebooks there are two major issues. Firstly, will there be objections to us dumping large quantities of material into the system? The FAQ explicitly excludes charging authors but currently only asks people to be considerate in terms of what they archive. Secondly I would want to see a more clearly sustainable business model.


  • Speaking of co-incidences I was just asking you about citing your lab notebook on the PLoS ONE discussion for your recent article.

    The key question I think is at what intervals and level of interpretation do you want to do this. It would not make sense to do it every time you do an update on your system but periodic third party time stamps on milestones makes a lot of sense. I think of Nature Precedings very much in this way and that allows affixing official authors along with the DOI and Nature’s editorial stamp of approval.

    But the more redundancy the better – I look forward to seeing how you end up using it.

    This also brings up the question of closure. I don’t consider any of the experiments we do on our wiki to be finished – ever. There is always the possibility of discovering and fixing an error down the road. The advantage of a hosted wiki is that we can link to each version with a third-party time stamp. Or at the very least if we link to the most recent page, anyone can investigate if any changes were made since the publication of the document citing it.

    Concerning guarantees by publishers, read the fine print. The ScienceDirect Preprint archive was supposed to be permanently archived but when they stopped accepting new submissions they disabled direct linking and forced people to register to view. Again, redundancy is so important.

  • Speaking of co-incidences I was just asking you about citing your lab notebook on the PLoS ONE discussion for your recent article.

    The key question I think is at what intervals and level of interpretation do you want to do this. It would not make sense to do it every time you do an update on your system but periodic third party time stamps on milestones makes a lot of sense. I think of Nature Precedings very much in this way and that allows affixing official authors along with the DOI and Nature’s editorial stamp of approval.

    But the more redundancy the better – I look forward to seeing how you end up using it.

    This also brings up the question of closure. I don’t consider any of the experiments we do on our wiki to be finished – ever. There is always the possibility of discovering and fixing an error down the road. The advantage of a hosted wiki is that we can link to each version with a third-party time stamp. Or at the very least if we link to the most recent page, anyone can investigate if any changes were made since the publication of the document citing it.

    Concerning guarantees by publishers, read the fine print. The ScienceDirect Preprint archive was supposed to be permanently archived but when they stopped accepting new submissions they disabled direct linking and forced people to register to view. Again, redundancy is so important.

  • I don’t seem to be able to get onto the PLoS ONE site at the moment. Perhaps a denial of service attack by Prism? (THATS A JOKE for any lawyers our there!).

    We haven’t cited the lab book in that paper because this is the student who hasn’t gone onto an ONS footing (which in retrospect I regret). But we can have that discussion once PONE comes back up. What I will try to do is put up the raw data on my lab book and put links back to it in the annotations. Actually that’s a really good reason for publishing more stuff in PloS ONE, the annotations feature could provide specific links back to the lab book!

    Yes, the issue of intervals and what the purpose of any archive is comes back up again. I was thinking of using WebCite in the specific case where a paper refers to a specific experiment, say for example in the paper we are currently writing where we did expose a result before the scooping paper came out (post coming when I have a moment to write it!).

    I am still thinking that for the future when we have properly ONS papers that we will extract a version of our lab book to put in the supplementary information. I think that is as close as anything ever gets to being ‘finished’ and it will be good to put that in _and_ link back to the lab book to generate more interest. It still could all change of course so as you say, its never finished, only a version is date stamped and archived.

    And yes, the more redundancy the better. The discussion at Jon Udell’s post has moved on now to other aspects of this problem so there is more to see over there now.

  • I don’t seem to be able to get onto the PLoS ONE site at the moment. Perhaps a denial of service attack by Prism? (THATS A JOKE for any lawyers our there!).

    We haven’t cited the lab book in that paper because this is the student who hasn’t gone onto an ONS footing (which in retrospect I regret). But we can have that discussion once PONE comes back up. What I will try to do is put up the raw data on my lab book and put links back to it in the annotations. Actually that’s a really good reason for publishing more stuff in PloS ONE, the annotations feature could provide specific links back to the lab book!

    Yes, the issue of intervals and what the purpose of any archive is comes back up again. I was thinking of using WebCite in the specific case where a paper refers to a specific experiment, say for example in the paper we are currently writing where we did expose a result before the scooping paper came out (post coming when I have a moment to write it!).

    I am still thinking that for the future when we have properly ONS papers that we will extract a version of our lab book to put in the supplementary information. I think that is as close as anything ever gets to being ‘finished’ and it will be good to put that in _and_ link back to the lab book to generate more interest. It still could all change of course so as you say, its never finished, only a version is date stamped and archived.

    And yes, the more redundancy the better. The discussion at Jon Udell’s post has moved on now to other aspects of this problem so there is more to see over there now.

  • I still think self or journal archiving is the way to go. Wikis provide a timestamping solution out of the box, so why build some big, expensive, hard-to-use repository mirror site?

    If they were just proposing a persistent URLs index DOI-style, that would be one thing, but this inscrutable URL mirror repository is just trying to make a solution with yesterday’s technology that totally won’t scale.

    I don’t mean to be a big downer, and any effort is better than none, but I think this idea needs a fair amount of work on the basic idea.

  • I still think self or journal archiving is the way to go. Wikis provide a timestamping solution out of the box, so why build some big, expensive, hard-to-use repository mirror site?

    If they were just proposing a persistent URLs index DOI-style, that would be one thing, but this inscrutable URL mirror repository is just trying to make a solution with yesterday’s technology that totally won’t scale.

    I don’t mean to be a big downer, and any effort is better than none, but I think this idea needs a fair amount of work on the basic idea.

  • Now I see that CrossRef mentioned a month ago that they were considering doing something that pretty much makes WebCite irrelevant.

  • Now I see that CrossRef mentioned a month ago that they were considering doing something that pretty much makes WebCite irrelevant.

  • It is not correct “that CrossRef mentioned a month ago that they were considering doing something that pretty much makes WebCite irrelevant.”. Rather, WebCite has been in talks with CrossRef for a while, trying to convince them that it would be a good thing for publishers (specifically those who are member in CrossRef) to be the custodians of the WebCite service. CrossRef seems now to be willing to seriously consider the idea.
    As to “this idea needs a fair amount of work on the basic idea.” – any constructive comments on how to improve the WebCite service or code contributions are certainly most welcome (this is an open source project).
    There are plenty of alternative business models out there if the CrossRef/publisher-funding model doesn’t fly, so I am not concerned about sustainability (advertising has been mentioned, but there are other potential revenue streams). Even for the publishers this could be a great thing (somebody cites a webpage, system looks up the CrossRef/PMC or other databases and suggests similar / related papers which could also be cited, etc.
    Self-archiving (for preprints, blogs, etc) might be a solution making WebCite archiving unnecessary, but authors are usually not good at that (the NIH self-archiving request, adopted in 2004, it has failed, miserably, with deposit rate

  • It is not correct “that CrossRef mentioned a month ago that they were considering doing something that pretty much makes WebCite irrelevant.”. Rather, WebCite has been in talks with CrossRef for a while, trying to convince them that it would be a good thing for publishers (specifically those who are member in CrossRef) to be the custodians of the WebCite service. CrossRef seems now to be willing to seriously consider the idea.
    As to “this idea needs a fair amount of work on the basic idea.” – any constructive comments on how to improve the WebCite service or code contributions are certainly most welcome (this is an open source project).
    There are plenty of alternative business models out there if the CrossRef/publisher-funding model doesn’t fly, so I am not concerned about sustainability (advertising has been mentioned, but there are other potential revenue streams). Even for the publishers this could be a great thing (somebody cites a webpage, system looks up the CrossRef/PMC or other databases and suggests similar / related papers which could also be cited, etc.
    Self-archiving (for preprints, blogs, etc) might be a solution making WebCite archiving unnecessary, but authors are usually not good at that (the NIH self-archiving request, adopted in 2004, it has failed, miserably, with deposit rate

  • Hi Gunther

    Thanks for the comment. I agree that the CrossRef link up sounds like a good thing. I also take your point that it is best not to rely on authors to archive. Self archiving is not working at the moment.

    My concern stems from the fact that for _me_ to rely on WebCite to provide a ‘permanent’ pointer so therefore being convinced that the business model is sustainable is important for me to for e.g. use WebCite as the main pointer for blog posts in a paper. That said I intend to use it for the specific cases I mentioned at least and will be interested to see how it goes forward.

    Do you have a current perspective on the other issue of using WebCite as a secondary repository for a whole lab book?

    I will try to pick your comment up in a new blog post rather than lose a potential conversation in the comments.

  • Hi Gunther

    Thanks for the comment. I agree that the CrossRef link up sounds like a good thing. I also take your point that it is best not to rely on authors to archive. Self archiving is not working at the moment.

    My concern stems from the fact that for _me_ to rely on WebCite to provide a ‘permanent’ pointer so therefore being convinced that the business model is sustainable is important for me to for e.g. use WebCite as the main pointer for blog posts in a paper. That said I intend to use it for the specific cases I mentioned at least and will be interested to see how it goes forward.

    Do you have a current perspective on the other issue of using WebCite as a secondary repository for a whole lab book?

    I will try to pick your comment up in a new blog post rather than lose a potential conversation in the comments.