Reflections on Science 2.0 from a distance – Part I

Some months ago now I gave a talk at very exciting symposium organized by Greg Wilson as a closer for the Software Carpentry course he was running at Toronto University. It was exciting because of the lineup but also because it represented a real coming together of views on how developments in computer science and infrastructure as well as new social capabilities brought about by computer networks are changing scientific research.I talked, as I have several times recently, about the idea of a web-native laboratory record, thinking about what the paper notebook would look like if it were re-invented with today’s technology. Jon Udell gave a two tweet summary of my talk which I think captured the two key aspects of my view point perfectly. In this post I want to explore the first of these.

@cameronneylon: “The minimal publishable unit of science — the paper — is too big, too monolithic. The useful unit: a blog post.”#osci20

The key to the semantic web, linked open data, and indeed the web and the internet in general, is the ability to be able to address objects. URLs in and of themselves provide an amazing resource making it possible to identify and relate digital objects and resources. The “web of things” expands this idea to include addresses that identify physical objects. In science we aim to connect physical objects in the real world (samples, instruments) to data (digital objects) via concepts and models. All of these can be made addressable at any level of granularity we choose. But the level of detail is important. From a practical perspective too much detail means that the researcher won’t, or even can’t, record it properly. Too little detail and the objects aren’t flexible enough to allow re-wiring when we discover we’ve got something wrong.

A single sample deserves an identity. A single data file requires an identity, although it may be wrapped up within a larger object. The challenge comes when we look at process, descriptions of methodology and claims. A traditionally published paper is too big an object, something that is shown clearly by the failure of citations to papers to be clear. A paper will generally contain multiple claims, and multiple processes. A citation could  refer to any of these. At the other end I have argued that a tweet, 140 characters, is too small, because while you can make a statement it is difficult to provide context in the space available. To be a unit of science a tweet really needs to contain a statement and two references or citations, providing the relationship between two objects. It can be done but its just a bit too tight in my view.

So I proposed that the natural unit of science research is the blog post. There are many reasons for this. Firstly the length is elastic, accommodating something (nearly) as short as a tweet, to thousands of lines of data, code, or script. But equally there is a broad convention of approximate length, ranging from a few hundred to a few thousand words, about the length in fact of of a single lab notebook page, and about the length of a simple procedure. The second key aspect of a blog post is that it natively comes with a unique URL. The blog post is a first class object on the web, something that can be pointed at, scraped, and indexed. And crucially the blog post comes with a feed, and a feed that can contain rich and flexible metadata, again in agreed and accessible formats.

If we are to embrace the power of the web to transform the laboratory and scientific record then we need to think carefully about what the atomic components of that record are. Get this wrong and we make a record which is inaccessible, and which doesn’t take advantage of the advanced tooling that the consumer web now provides. Get it right and the ability to Google for scientific facts will come for free. And that would just be the beginning.

If you would like to read more about these ideas I have a paper just out in the BMC Journal Automated Experimentation.