Home » Blog

What are the services we really want for recording science online?

8 February 2009 8 Comments

There is an interesting meta-discussion going on in a variety of places at the moment which touch very strongly on my post and talk (slides, screencast) from last week about “web native” lab notebooks. Over at Depth First, Rich Apodaca has a post with the following little gem of a soundbite:

Could it be that both Open Access and Electronic Laboratory Notebooks are examples of telephone-like capabilities being used to make a better telegraph?

Web-Centric Science offers a set of features orthoginal to those of paper-centric science. Creating the new system in the image of the old one misses the point, and the opportunity, entirely.

Meanwhile a discussion on The Realm of Organic Synthesis blog was sparked off by a post about the need for a Wikipedia inspired chemistry resource (thanks again to Rich and Tony Williams for pointing the post and discussion out respectively). The initial idea here was something along the lines of;

I envision a hybrid of Doug Taber’s Organic Chemistry Portal, Wikipedia and a condensed version of SciFinder.  I’ll gladly contribute!  How do we get the ball rolling?

This in turn has led into a discussion of whether Chemspider and ChemMantis partly fill this role already. The key point being made here is the problem of actually finding and aggregating the relevant information. Tony Williams makes the point in the comments that Chemspider is not about being a central repository in the way that J proposes in the original TROS blog post but that if there are resources out there they can be aggregated into Chemspider. There are two problems here, capturing the knowledge into one “place” and then aggregating.

Finally there is an ongoing discussion in the margins of a post at RealClimate. The argument here is over the level of “working” that should be shown when doing analyses of data, something very close to my heart. In this case both the data and the MatLab routines used to process the data have been made available. What I believe is missing is the detailed record, or log, of how those routines were used to process the data. The argument rages over the value of providing this, the amount of work involved, and whether it could actually have a chilling effect on people doing independent validation of the results. In this case there is also the political issue of providing more material for antagonistic climate change skeptics to pore over and look for minor mistakes that they will then trumpet as holes. It will come as no suprise that I think the benefits of making the log available outweigh the problems. But that we need the tools to do it. This is beautifully summed up in one comment by Tobias Martin at number 64:

So there are really two main questions: if this [making the full record available – CN] is hard work for the scientist, for heaven’s sake, why is it hard work? (And the corrolary: how are you confident that your results are correct?)

And the answer – it is hard work because we still think in terms of a paper notebook paradigm, which isn’t well matched to the data analysis being done within MatLab. When people actually do data analysis using computational systems they very rarely keep a systematic log of the process. It is actually a rather difficult thing to do – even though in principle the system could (and in some cases does) keep that entire record for you.

My point is that if we completely re-imagine the shape and functionality of the laboratory record, in the way Rich and I, and others, have suggested; if the tools are built to capture what happens and then provide that to the outside world in a useful form (when the researcher chooses), then not only will this record exist but it will provide the detailed “clickstream” records that Richard Akerman refers to in answer to a Twitter proposal from Michael Barton:

Michael Barton: Website idea: Rank scientific articles by relevance to your research; get uptakes on them via citations and pubmed “related articles”

Richard Akerman: This is a problem of data, not of technology. Amazon has millions of people with a clear clickstream through a website. We’ve got people with PDFs on their desktops.

Exchange “data files” and “Matlab scripts” for PDFs and you have a statement of the same problem that they guys at RealClimate face. Yes it is there somewhere, but it is a pain in the backside to get it out and give it to someone.

If that “clickstream” and “file use stream” and “relationship” stream was automatically captured then we get closer to the thing that I think many of us a yearning for (and have been for some time). The Amazon recommendation tool for science.


  • I am aggregating chemistry oriented web databases where chemists can host their data (ChemSpider is not the only option):

    http://chem-bla-ics.blogspot.com/2009/02/where-can-i-host-my-experimental-data.html

    Additions most welcome; I’ll keep the index updated. The database does not have to be Open Access to show up in the list; submission must be free though. Archival is the key here, though obviously, Open Access or Open Data would be my personal preference.

  • I am aggregating chemistry oriented web databases where chemists can host their data (ChemSpider is not the only option):

    http://chem-bla-ics.blogspot.com/2009/02/where-can-i-host-my-experimental-data.html

    Additions most welcome; I’ll keep the index updated. The database does not have to be Open Access to show up in the list; submission must be free though. Archival is the key here, though obviously, Open Access or Open Data would be my personal preference.

  • Hi Cameron.

    For using software in science, unit testing can also be used to test the validity of the results. Andrew Clegg writes something interesting about this.

    http://biotext.org.uk/on-the-importance-of-testing-in-research-software

    There’s a saying the Rails/Ruby community, if something is hard work you’re not using enough shortcuts. Creating a dynamic interactive website with a database backend used to be hard work before Rails. Now Rails/Merb/Sinatra makes it easy.

    I sound like a broken record, but a programming framework for science could provide easy shortcuts to scientific tasks, or make it easy for people to write a plugin to do so.

  • Hi Cameron.

    For using software in science, unit testing can also be used to test the validity of the results. Andrew Clegg writes something interesting about this.

    http://biotext.org.uk/on-the-importance-of-testing-in-research-software

    There’s a saying the Rails/Ruby community, if something is hard work you’re not using enough shortcuts. Creating a dynamic interactive website with a database backend used to be hard work before Rails. Now Rails/Merb/Sinatra makes it easy.

    I sound like a broken record, but a programming framework for science could provide easy shortcuts to scientific tasks, or make it easy for people to write a plugin to do so.

  • Cameron, thanks for pulling together the different threads. The link to the comments in the RealClimate post was especially interesting.

    This is why I see discussions focussed on just information consumption (creating/using databases, open access) or just information production (electronic laboratory notebooks) as ultimately not going anywhere very interesting.

    If the Web’s success stories have anything in common, it’s that the line between information creator and information producer is pretty thin. This artificial distinction results from previous experiences with one-way media (print, tv, radio, video). The Web is all about taking that artificial distinction warping it, and playing with it in unconventional ways, to the horror of those who control the existing one-way media.

    I believe the new breed of scientific information systems that end up really working will be those that solve both the information creation and information consumption problems simultaneously and definitively.

    Mike, you’ve hit the nail on the head wrt Web frameworks like Rails. The barrier to creating useful, two-way communications channels for science has gotten very low indeed. The possibilities are very much within reach to those who take the challenge.

  • Cameron, thanks for pulling together the different threads. The link to the comments in the RealClimate post was especially interesting.

    This is why I see discussions focussed on just information consumption (creating/using databases, open access) or just information production (electronic laboratory notebooks) as ultimately not going anywhere very interesting.

    If the Web’s success stories have anything in common, it’s that the line between information creator and information producer is pretty thin. This artificial distinction results from previous experiences with one-way media (print, tv, radio, video). The Web is all about taking that artificial distinction warping it, and playing with it in unconventional ways, to the horror of those who control the existing one-way media.

    I believe the new breed of scientific information systems that end up really working will be those that solve both the information creation and information consumption problems simultaneously and definitively.

    Mike, you’ve hit the nail on the head wrt Web frameworks like Rails. The barrier to creating useful, two-way communications channels for science has gotten very low indeed. The possibilities are very much within reach to those who take the challenge.

  • Cameron, There was an interesting post to bring The realm of Organic Synthesis discussions to an “end”. Mitch commented that the chemical informatics guys don’t get it and there is something big to come. Let’s see what they will unveil…He commented..

    “Reading through all the comments I can’t help but feel the chemical informatics guys don’t get it. The critical error seems to derive from making tools that can do cool things but not having a critical mass of users to make it relevant. I’ve always approached problems from the critical mass side, and let others worry about indexing, tagging, and developing tools in the future.

    What J and I are planning is truly awesome, but it is not going to fall within the realm of the type of collaborative work that the chemical informatics people know so well. J will announce the project in the next couple of days. I think you will find it completely awesome, or simply not get it. It could go either way from reading through the comments.”

    I’ll expand on your comments separately in a post I am working on..this in regards to “but that if there are resources out there they can be aggregated into Chemspider. There are two problems here, capturing the knowledge into one “place” and then aggregating.” I believe that all original web-based knowledge can remain at it’s original site but can be replicated or linked, if permission allows, from our site. We’re not trying to replace everyone’s efforts..we are trying to help people find those resources. The approach may not be the most elegant informatics solution but I don’t think most informatics people would consider that Wikipedia is pushing the edge either…but it’s value is obvious.

    Our focus is on how to help people get to information of value as fast as possible and do it, for now, from a structure-centric point of view. We are focused on structures since this remains a largely unfulfilled issue other than by commercial systems. We keeping hearing about “an alternative to Scifinder” and seeing the exchanges about how people can’t afford to access that tool. It is possible to build an alternative and it will take agreements between publishers and authors and participation
    of the community.

  • Cameron, There was an interesting post to bring The realm of Organic Synthesis discussions to an “end”. Mitch commented that the chemical informatics guys don’t get it and there is something big to come. Let’s see what they will unveil…He commented..

    “Reading through all the comments I can’t help but feel the chemical informatics guys don’t get it. The critical error seems to derive from making tools that can do cool things but not having a critical mass of users to make it relevant. I’ve always approached problems from the critical mass side, and let others worry about indexing, tagging, and developing tools in the future.

    What J and I are planning is truly awesome, but it is not going to fall within the realm of the type of collaborative work that the chemical informatics people know so well. J will announce the project in the next couple of days. I think you will find it completely awesome, or simply not get it. It could go either way from reading through the comments.”

    I’ll expand on your comments separately in a post I am working on..this in regards to “but that if there are resources out there they can be aggregated into Chemspider. There are two problems here, capturing the knowledge into one “place” and then aggregating.” I believe that all original web-based knowledge can remain at it’s original site but can be replicated or linked, if permission allows, from our site. We’re not trying to replace everyone’s efforts..we are trying to help people find those resources. The approach may not be the most elegant informatics solution but I don’t think most informatics people would consider that Wikipedia is pushing the edge either…but it’s value is obvious.

    Our focus is on how to help people get to information of value as fast as possible and do it, for now, from a structure-centric point of view. We are focused on structures since this remains a largely unfulfilled issue other than by commercial systems. We keeping hearing about “an alternative to Scifinder” and seeing the exchanges about how people can’t afford to access that tool. It is possible to build an alternative and it will take agreements between publishers and authors and participation
    of the community.