Home » Blog

The Southampton Open Science Workshop – a brief report

1 October 2008 8 Comments

On Monday 1 September we had a one day workshop in Southampton discussing the issues that surround ‘Open Science’. This was very free form and informal and I had the explicit aim of getting a range of people with different perspectives into the room to discuss a wide range of issues, including tool development, the social and career structure issues, as well as ideas about standards and finally, what concrete actions could actually be taken. You can find live blogging and other commentary in the associated Friendfeed room and information on who attended as well as links to many of the presentations on the conference wiki.

Broadly speaking the day was divided into three chunks, the first was focussed on tools and services and included presentations on MyExperiment, Mendeley, Chemtools, and Inkspot Science. Branwen Hide of Research Information Network has written more on this part. Given that the room contained more than the usual suspects the conversation focussed on usability and interfaces rather than technical aspects although there was a fair bit of that as well.

The second portion of the day revolved more around social challenges and issues. Richard Grant presented his experience of blogging on an official university sanctioned site and the value of that for both outreach and education. One point he made was that the ‘lack of adoption problem’ seen in science just doesn’t seem to exist in the humanities. Perhaps this is because scientists don’t generally see ‘writing’ as a valuable thing in its own right. Certainly there is a preponderance of scientists who happen also to see themselves as writers on Nature Network.

Jennifer Rohn followed on from Richard, and objected to my characterising her presentation as “the skeptic’s view”. A more accurate characterisation would have been “I’d love to be open but at the moment I can’t: This is what has to change to make it work”. She presented a great summary of the proble, particularly from the biological scientist’s point of view as well as potential solutions. Essentially the problem is that of the ‘Minimum Publishable Unit’ or research quantum as well as what ‘counts’ as publication. Her main point was that for people to be prepared to publish material that falls short of a full paper they need to get some proportional credit for that. This folds closely into the discussion of what can be cited and what should be cited in particular contexts. I have used the phrase ‘data sized peg into a paper shaped hole’ to describe this in the past.

After lunch Liz Lyon from UKOLN talked about curation and long term archival storage which lead into an interesting discussion about the archiving of blogs and other material. Is it worth keeping? One answer to this was to look at the real interest today in diaries from the second world war and earlier from ‘normal people’. You don’t necessarily need to be a great scientist, or even a great blogger, for the material to be of potential interest to historians in 50-100 years time. But doing this properly is hard – in the same way that maintaining and indexing data is hard. Disparate sites, file formats, places of storage, and in the end whose blog is it actually? Particularly if you are blogging for, or recording work done at, a research institution.

The final session was about standards or ‘brands’. Yaroslav Nikolaev talked about semantic representations of experiments. While important it was probably a shame in the end we did this at the end of the day because it would have been helpful to get more of the non-techie people into that discussion to iron out both the communication issues around semantic web as well as describing the real potential benefits. This remains a serious gap – the experimental scientists who could really use semantic tools don’t really get the point, and the people developing the tools don’t communicate well what the benefits are, or in some cases (not all I hasten to add!) actually build the tools the experimentalists want.

I talked about the possibility of a ‘certificate’ or standard for Open Science, and the idea of an organisation to police this. It would be safe to say that, while people agreed that clear definitions would be hepful, the enhusiasm level for a standards organisation was pretty much zero. There are more fundamental issues of actually building up enough examples of good practice, and working towards identifying best practice in open science, that need to be dealt with before we can really talk about standards.

On the other hand the idea of ‘the fully supported’ paper got immediate and enthusiastic support. The idea here is deceptively simple, and has been discussed elsewhere; simply that all the relevant supporting information for a paper (data, detailed methodology, software tools, parameters, database versions etc. as well as access to required materials at reasonable cost) should be available for any published paper. The challenge here lies in actually recording experiments in such a way that this information can be provided. But if all of the record is available in this form then it can be made available whenever the researcher chooses. Thus by providing the tools that enable the fully supported paper you are also providing tools that enable open science.

Finally we discussed what we could actually do: Jean-Claude Bradley discussed the idea of an Open Notebook Science challenge to raise the profile of ONS (this is now setup – more on this to follow). Essentially a competition type approach where individuals or groups can contribute to a larger scientific problem by collecting data – where the teams get judged on how well they describe what they have done and how quickly they make it available.

The most specific action proposed was to draft a ‘Letter to Nature’ proposing the idea of the fully supported paper as a submission standard. The idea would be to get a large number of high profile signatories on a document which describes  a concrete step by step plan to work towards the final goal, and to send that as correspondence to a high profile journal. I have been having some discussions about how to frame such a document and hope to be getting a draft up for discussion reasonably soon.

Overall there was much enthusiasm for things Open and a sense that many elements of the puzzle are falling into place. What is missing is effective coordinated action, communication across the whole community of interested and sympathetic scientsts, and critically the high profile success stories that will start to shift opinion. These ought to, in my opinion, be the targets for the next 6-12 months.


  • I’m interested by the idea of a “Semantic representation of an experiment”.
    I know of some attempts to provide a structured description of an experiment (eg MAGE-TAB for microarray experiments) but they are all domain-specific and not flexible enough for the average scientist.
    At the other end of the spectrum is the text description of an experiment (eg wiki or blog), although some structure may be achieved through the conventional headings(eg Objective, Procedure, Results etc. in usefulChem).
    Are there any attempts to design a general format for an experiment that is somewhere between these examples? I’m thinking of a semantic experiment description which might start off as plain text, but allow users to mark up text as a “Parameter” or a “Reagent” etc.

  • I’m interested by the idea of a “Semantic representation of an experiment”.
    I know of some attempts to provide a structured description of an experiment (eg MAGE-TAB for microarray experiments) but they are all domain-specific and not flexible enough for the average scientist.
    At the other end of the spectrum is the text description of an experiment (eg wiki or blog), although some structure may be achieved through the conventional headings(eg Objective, Procedure, Results etc. in usefulChem).
    Are there any attempts to design a general format for an experiment that is somewhere between these examples? I’m thinking of a semantic experiment description which might start off as plain text, but allow users to mark up text as a “Parameter” or a “Reagent” etc.

  • Will, you and me both! I think the answer may lie in a combination of the very general and the domain (and indeed protocol specific). So for example if you are doing a 2D gel, you should use GelML to describe the procedure (see Frank, I do pay attention!). But if you’re doing something that doesn’t fit an existing agreed template you need something very general.

    Others in the ontology/CV community have suggested that FuGE or similar things may serve this purpose. My personal view is that we need a very simple resource and link description format. So there is a URL for each input material and a URL for the procedure and the links are labelled as ‘is reagent in’ or ‘is product of’ or ‘is data file generated by’.

    I think you’d need a very small vocabulary with not much more than ‘material’, ‘reagent’, procedure, data, and maybe a few others. As I have said of our LaBLog system our data model is;

    We have stuff. We do stuff to the stuff. We get other stuff.

    This seems sufficiently general so far…

  • Will, you and me both! I think the answer may lie in a combination of the very general and the domain (and indeed protocol specific). So for example if you are doing a 2D gel, you should use GelML to describe the procedure (see Frank, I do pay attention!). But if you’re doing something that doesn’t fit an existing agreed template you need something very general.

    Others in the ontology/CV community have suggested that FuGE or similar things may serve this purpose. My personal view is that we need a very simple resource and link description format. So there is a URL for each input material and a URL for the procedure and the links are labelled as ‘is reagent in’ or ‘is product of’ or ‘is data file generated by’.

    I think you’d need a very small vocabulary with not much more than ‘material’, ‘reagent’, procedure, data, and maybe a few others. As I have said of our LaBLog system our data model is;

    We have stuff. We do stuff to the stuff. We get other stuff.

    This seems sufficiently general so far…

  • Very interesting! I like the idea of the “fully supported paper”. It actually sounds very similar to what we are trying to do in computational sciences under the term “reproducible research”. Funny how those ideas pop up in various domains, each time under different names.
    We have actually set up a server for such reproducible papers from our lab using EPrints: http://rr.epfl.ch. It also allows for other people to post reviews and comments, evaluating how reproducible the paper is. I think this can really help a lot.

    Anyway, great work, and great blog!

  • Very interesting! I like the idea of the “fully supported paper”. It actually sounds very similar to what we are trying to do in computational sciences under the term “reproducible research”. Funny how those ideas pop up in various domains, each time under different names.
    We have actually set up a server for such reproducible papers from our lab using EPrints: http://rr.epfl.ch. It also allows for other people to post reviews and comments, evaluating how reproducible the paper is. I think this can really help a lot.

    Anyway, great work, and great blog!

  • Hi Patrick, yes what I am talking about here is applying what you guys are talking about all the way through experimental science. I’ve avoided ‘reproducible’ because, as was pointed out to me by Jeremy Frey, some experimental data collections cannot be reproduced (e.g. environmental data collection). But that’s just a semantic point. I think that RR is a great effort – and it reminds me I should refer to it in the paper that I have just serialised here on the blog (starting here.

    Thanks for dropping by!

  • Hi Patrick, yes what I am talking about here is applying what you guys are talking about all the way through experimental science. I’ve avoided ‘reproducible’ because, as was pointed out to me by Jeremy Frey, some experimental data collections cannot be reproduced (e.g. environmental data collection). But that’s just a semantic point. I think that RR is a great effort – and it reminds me I should refer to it in the paper that I have just serialised here on the blog (starting here.

    Thanks for dropping by!