Re-inventing the wheel (again) – what the open science movement can learn from the history of the PDB
One of the many great pleasures of SciFoo was to meet with people who had a different, and in many cases much more comprehensive, view of managing data and making it available. One of the long term champions of data availability is Professor Helen Berman, the head of the Protein Data Bank (the international repository for biomacromolecular structures), and I had the opportunity to speak with her for some time on the Friday afternoon before Scifoo kicked off in earnest (in fact this was one of many somewhat embarrasing situations where I would carefully explain my background in my very best ‘speaking to non-experts’ voice only to find they knew far more about it than I did – however Jim Hardy of Gahaga Biosciences takes the gold medal for this event for turning to the guy called Larry next to him while having dinner at Google Headquarters and asking what line of work he was in).
I have written before about how the world might look if the PDB and other biological databases had never existed, but as I said then I didn’t know much of the real history. One of the things I hadn’t realised was how long it was after the PDB was founded before deposition of structures became expected for all atomic resolution biomacromolecular structures. The road from a repository of seven structures with a handful of new submissions a year to the standards that mean today that any structure published in a reputable journal must be deposited was a long and rocky one. The requirement to deposit structures on publication only became general in the early 1990s, nearly twenty years after it was founded and there was a very long and extended process where the case for making the data available was only gradually accepted by the community.
Helen made the point strongly that it had taken 37 years to get the PDB to where it is today; a gold standard international and publically available repository of a specific form of research data supported by a strong set of community accepted, and enforced, rules and conventions. We don’t want to take another 37 years to achieve the widespread adoption of high standards in data availability and open practice in research more generally. So it is imperative that we learn the lessons and benefit from the experience of those who built up the existing repositories. We need to understand where things went wrong and attempt to avoid repeating mistakes. We need to understand what worked well and use this to our advantage. We also need to recognise where the technological, social, and political environment that we find ourselves in today means that things have changed, and perhaps to recognise that in many ways, particularly in the way people behave, things haven’t changed at all.
I’ve written this in a hurry and therefore not searched as thoroughly as I might but I was unable to find any obvious ‘history of the PDB’ online. I imagine there must be some out there – but they are not immediately accessible. The Open Science movement could benefit from such documents being made available – indeed we could benefit from making them required reading. While at Scifoo Michael Nielsen suggested the idea of a panel of the great and the good – those who would back the principles of data availability, open access publication, and the free transfer of materials. Such a panel would be great from the perspective of publicity but as an advisory group it could have an even greater benefit by providing the opportunity to benefit from the experience many of these people have in actually doing what we talk about.