Reflections from a parallel universe
On Wednesday and Thursday this week I was lucky to be able to attend a conference on Electronic Laboratory Notebooks run by an organization called SMI. Lucky because the registration fee was £1500 and I got a free ticket. Clearly this was not a conference aimed at academics. This was a discussion of the capabilities and implications for Electronic Laboratory Notebooks used in industry, and primarily in big pharma.
For me it was very interesting to see these commercial packages. I am often asked how what we do compares to these packages and I have always had to answer that I simply don’t know, I’ve never had the chance to look at one because they are way to expensive. Having now seen them I can say that they have very impressive user interfaces with lots of integrated tools and widgets. They are fundamentally built around specific disciplines and this allows them to be reasonably structured in their presentation and organisation. I think we would break them in our academic research setting but it might take a while. More importantly we wouldn’t be able to afford the customisation that it looks as though you need to get a product that does just what you want it to. Deployment costs of around around £10,000 per person were being bandied around with total contracts costs clearly in the millions of dollars.
Coming out of various recent discussions I would say that I think the overall software design of these products is flawed going forward. The vendors are being paid a lot by companies who want things integrated into their systems so there is no motivation for them to develop open platforms with data portability and easy integration of web services etc. All of these systems run on thick clients against a central database. Going forward these have to go into web portals as a first step before working towards a full customisable interface with easily collectable widgets to enable end-user configured integration.
But these were far from the most interesting things at the meeting. We commonly assume that keeping, preserving, and indexing data is a clear good. And indeed many of the attendees were assuming the same thing. Then we got a talk on ‘Compliance and ELNs’ by Simon Coles of Amphora Research Systems. The talk can be found here. In this was an example of just how bizarre the legal process for patent protection can make industrial process. In the process for preparing for a patent suit you will need to pay your lawyers to go through all the relevant data and paperwork. Indeed if you lose you will probably pay for the oppositions lawyers to go through all the relevant paperwork. These are not just lawyers, they are expensive lawyers. If you have a whole pile of raw data floating around this is not just going to give the lawyers a field day finding something to pin you to the wall on, it is going to burn through money like nobody’s business. The simple conclusion: It is far cheaper to re-do the experiment than it is to risk the need for lawyers to go through raw data. Throw the raw data away as soon as you can afford to! Like I said, a parallel universe where you think things are normal until they suddenly go sideways on you.
On a more positive sense there were some interesting talks on big companies deploying ELNs. Now we can look at this at some level as a model of a community adopting open notebooks. At least within the company (in most cases) everyone can see everyone else’s notebook. A number of speakers mentioned that this had caused problems and a couple said that it had been necessary to develop and promulgate standards of behaviour. This is interesting in the light of the recent controversy over the naming of a new dinosaur (see commentary at Blog around the Clock) and Shirley Wu’s post on One Big Lab. It reinforces the need for generally accepted standards of behaviour and the growing importance of these as data becomes more open.
The rules? The first two came from the talk, the rest are my suggestion. Basically they boil down to ‘Be Polite’.
- Always ask before using someone else’s data or results
- User beware: if you rely on someone else’s results its your problem if it blows up in your face (especially if you didn’t ask them about it)
- If someone asks if they can use your data or results you say yes. If you don’t want them to, give them a clear timeline on which they can or specific reasons why you can’t release the data. Give clear warnings about any caveats or concerns
- If someone asks you not to use their results (whether or not they are helpful or reasonable about it) think very carefully about whether you should ignore their request. If having done this you still feel you are being reasonable in using them, then think again.
- Any data that has not been submitted for peer review after 18 months is fair game
- If you incorporate someone else’s data within a paper discuss your results with them. Then include them as an author.
- Always, without fail and under any cicrumstances, acknowledge any source of information and do so generously and without conditions.