Home » Blog

Re-inventing the wheel (again) – what the open science movement can learn from the history of the PDB

17 August 2008 18 Comments

One of the many great pleasures of SciFoo was to meet with people who had a different, and in many cases much more comprehensive, view of managing data and making it available. One of the long term champions of data availability is Professor Helen Berman, the head of the Protein Data Bank (the international repository for biomacromolecular structures), and I had the opportunity to speak with her for some time on the Friday afternoon before Scifoo kicked off in earnest (in fact this was one of many somewhat embarrasing situations where I would carefully explain my background in my very best ‘speaking to non-experts’ voice only to find they knew far more about it than I did – however Jim Hardy of Gahaga Biosciences takes the gold medal for this event for turning to the guy called Larry next to him while having dinner at Google Headquarters and asking what line of work he was in).

I have written before about how the world might look if the PDB and other biological databases had never existed, but as I said then I didn’t know much of the real history. One of the things I hadn’t realised was how long it was after the PDB was founded before deposition of structures became expected for all atomic resolution biomacromolecular structures. The road from a repository of seven structures with a handful of new submissions a year to the standards that mean today that any structure published in a reputable journal must be deposited was a long and rocky one. The requirement to deposit structures on publication only became general in the early 1990s, nearly twenty years after it was founded and there was a very long and extended process where the case for making the data available was only gradually accepted by the community.

Helen made the point strongly that it had taken 37 years to get the PDB to where it is today; a gold standard international and publically available repository of a specific form of research data supported by a strong set of community accepted, and enforced, rules and conventions.  We don’t want to take another 37 years to achieve the widespread adoption of high standards in data availability and open practice in research more generally. So it is imperative that we learn the lessons and benefit from the experience of those who built up the existing repositories. We need to understand where things went wrong and attempt to avoid repeating mistakes. We need to understand what worked well and use this to our advantage. We also need to recognise where the technological, social, and political environment that we find ourselves in today means that things have changed, and perhaps to recognise that in many ways, particularly in the way people behave, things haven’t changed at all.

I’ve written this in a hurry and therefore not searched as thoroughly as I might but I was unable to find any obvious ‘history of the PDB’ online. I imagine there must be some out there – but they are not immediately accessible. The Open Science movement could benefit from such documents being made available – indeed we could benefit from making them required reading. While at Scifoo Michael Nielsen suggested the idea of a panel of the great and the good – those who would back the principles of data availability, open access publication, and the free transfer of materials. Such a panel would be great from the perspective of publicity but as an advisory group it could have an even greater benefit by providing the opportunity to benefit from the experience many of these people have in actually doing what we talk about.


18 Comments »

  • Jim H said:

    Cameron,

    I am just glad that the next day, upon seeing a lovely young blond woman come into one of the sessions, that I didn’t hit on her. She was very pretty, but turns out she’s that guy Larry’s wife…..

  • Jim H said:

    Cameron,

    I am just glad that the next day, upon seeing a lovely young blond woman come into one of the sessions, that I didn’t hit on her. She was very pretty, but turns out she’s that guy Larry’s wife…..

  • Jean-Claude Bradley said:

    As we discussed, it seems that ChemSpider is in a good position to do the same for organic chemistry. It would be extremely valuable if everyone had to deposit the JCAMP files of their IR, NMR and MS along with their molecules in the database upon publication.

  • Jean-Claude Bradley said:

    As we discussed, it seems that ChemSpider is in a good position to do the same for organic chemistry. It would be extremely valuable if everyone had to deposit the JCAMP files of their IR, NMR and MS along with their molecules in the database upon publication.

  • Cameron Neylon said:

    Jean-Claude – yes would be good, the message here is hopefully we can get to something like this in less than 15 years! Actually in many ways the situation is quite analagous.

  • Cameron Neylon said:

    Jean-Claude – yes would be good, the message here is hopefully we can get to something like this in less than 15 years! Actually in many ways the situation is quite analagous.

  • Anders said:

    Probably Genbank has the same kind of valuable history?

    http://www.ncbi.nlm.nih.gov/Genbank/

  • Anders said:

    Probably Genbank has the same kind of valuable history?

    http://www.ncbi.nlm.nih.gov/Genbank/

  • Cameron Neylon said:

    Anders, yes absolutely genbank (and its various partners) is the other great example of a publically available database and the same argument certainly applies. It was just that I wasn’t sitting with the head of genbank :) Is there a social history of genbank around somewhere?

  • Cameron Neylon said:

    Anders, yes absolutely genbank (and its various partners) is the other great example of a publically available database and the same argument certainly applies. It was just that I wasn’t sitting with the head of genbank :) Is there a social history of genbank around somewhere?

  • christine said:

    Hi,

    One article describing the history of the PDB is available here.

  • christine said:

    Hi,

    One article describing the history of the PDB is available here.

  • Anders said:

    I just googled quickly around and didn’t see any. Sorry. But at ESOF2008 in Barcelona Richard J. Roberts mentioned his involvement and alluded to the long, windy road for success for Genbank

    http://en.wikipedia.org/wiki/Richard_J._Roberts

  • Anders said:

    I just googled quickly around and didn’t see any. Sorry. But at ESOF2008 in Barcelona Richard J. Roberts mentioned his involvement and alluded to the long, windy road for success for Genbank

    http://en.wikipedia.org/wiki/Richard_J._Roberts

  • Cameron Neylon said:

    Christine, thanks for that – very much what I was looking for. The only ones I found were much more technical than this (discussing either the history of visualisation or of data fomats/computational analysis). Has anyone actually done a full social history that you know of? I would imagine it would be a great history/sociology phd project.

  • Cameron Neylon said:

    Christine, thanks for that – very much what I was looking for. The only ones I found were much more technical than this (discussing either the history of visualisation or of data fomats/computational analysis). Has anyone actually done a full social history that you know of? I would imagine it would be a great history/sociology phd project.

  • Michael Nielsen said:

    In this general vein, I find Peter Suber’s “Open Access News” and all the materials around it an amazing resource: http://www.earlham.edu/~peters/fos/fosblog.html

    In particular, Peter’s timeline is outstanding:
    http://www.earlham.edu/~peters/fos/timeline.htm

  • Michael Nielsen said:

    In this general vein, I find Peter Suber’s “Open Access News” and all the materials around it an amazing resource: http://www.earlham.edu/~peters/fos/fosblog.html

    In particular, Peter’s timeline is outstanding:
    http://www.earlham.edu/~peters/fos/timeline.htm