Some months ago now I gave a talk at very exciting symposium organized by Greg Wilson as a closer for the Software Carpentry course he was running at Toronto University. It was exciting because of the lineup but also because it represented a real coming together of views on how developments in computer science and infrastructure as well as new social capabilities brought about by computer networks are changing scientific research.I talked, as I have several times recently, about the idea of a web-native laboratory record, thinking about what the paper notebook would look like if it were re-invented with today’s technology. Jon Udell gave a two tweet summary of my talk which I think captured the two key aspects of my view point perfectly. In this post I want to explore the first of these.
@cameronneylon: “The minimal publishable unit of science — the paper — is too big, too monolithic. The useful unit: a blog post.”#osci20
The key to the semantic web, linked open data, and indeed the web and the internet in general, is the ability to be able to address objects. URLs in and of themselves provide an amazing resource making it possible to identify and relate digital objects and resources. The “web of things†expands this idea to include addresses that identify physical objects. In science we aim to connect physical objects in the real world (samples, instruments) to data (digital objects) via concepts and models. All of these can be made addressable at any level of granularity we choose. But the level of detail is important. From a practical perspective too much detail means that the researcher won’t, or even can’t, record it properly. Too little detail and the objects aren’t flexible enough to allow re-wiring when we discover we’ve got something wrong.
A single sample deserves an identity. A single data file requires an identity, although it may be wrapped up within a larger object. The challenge comes when we look at process, descriptions of methodology and claims. A traditionally published paper is too big an object, something that is shown clearly by the failure of citations to papers to be clear. A paper will generally contain multiple claims, and multiple processes. A citation could refer to any of these. At the other end I have argued that a tweet, 140 characters, is too small, because while you can make a statement it is difficult to provide context in the space available. To be a unit of science a tweet really needs to contain a statement and two references or citations, providing the relationship between two objects. It can be done but its just a bit too tight in my view.
So I proposed that the natural unit of science research is the blog post. There are many reasons for this. Firstly the length is elastic, accommodating something (nearly) as short as a tweet, to thousands of lines of data, code, or script. But equally there is a broad convention of approximate length, ranging from a few hundred to a few thousand words, about the length in fact of of a single lab notebook page, and about the length of a simple procedure. The second key aspect of a blog post is that it natively comes with a unique URL. The blog post is a first class object on the web, something that can be pointed at, scraped, and indexed. And crucially the blog post comes with a feed, and a feed that can contain rich and flexible metadata, again in agreed and accessible formats.
If we are to embrace the power of the web to transform the laboratory and scientific record then we need to think carefully about what the atomic components of that record are. Get this wrong and we make a record which is inaccessible, and which doesn’t take advantage of the advanced tooling that the consumer web now provides. Get it right and the ability to Google for scientific facts will come for free. And that would just be the beginning.
If you would like to read more about these ideas I have a paper just out in the BMC Journal Automated Experimentation.
Although a blog post is a highly amenable tool, smaller chunks of data are also highly useful.
A researcher could use twitter to rapidly post data, a blog to organize related tweets into a unified theme, and some form of wiki to organize the metadata so that knowledge can be more easily distributed.
Twitter: @h2oindio
Although a blog post is a highly amenable tool, smaller chunks of data are also highly useful.
A researcher could use twitter to rapidly post data, a blog to organize related tweets into a unified theme, and some form of wiki to organize the metadata so that knowledge can be more easily distributed.
Twitter: @h2oindio
Rick I don’t disagree. I’m more focussed on the volume of the minimal unit and the way that blogs seem to map nicely on the human part of record keeping. I agree that single measurements can easily be tweeted but single measurements have a fixed context so its not just the tweet.
For example, we could regularly tweet the temperature of some probe but you have to know where that probe is, what it is measuring, and how time and temperature are measured for it to be be useful – which is more than just the single tweet. I am very much in favour of using the right tool for the right job.
Rick I don’t disagree. I’m more focussed on the volume of the minimal unit and the way that blogs seem to map nicely on the human part of record keeping. I agree that single measurements can easily be tweeted but single measurements have a fixed context so its not just the tweet.
For example, we could regularly tweet the temperature of some probe but you have to know where that probe is, what it is measuring, and how time and temperature are measured for it to be be useful – which is more than just the single tweet. I am very much in favour of using the right tool for the right job.
“The second key aspect of a blog post is that it natively comes with a unique URL”
One problem that comes to mind would be the lack of permanence for web content. If a researcher decides to take down his or her blog, or even simply move the blog, then the URL’s are no longer valid. Perhaps something between a blog and an online open-access journal with Wikipedia-like history tracking would fix the problem?
“The second key aspect of a blog post is that it natively comes with a unique URL”
One problem that comes to mind would be the lack of permanence for web content. If a researcher decides to take down his or her blog, or even simply move the blog, then the URL’s are no longer valid. Perhaps something between a blog and an online open-access journal with Wikipedia-like history tracking would fix the problem?
Well, the blog post might be adequate for certain audiences, but assuming it is the ideal for all cases seems restricting. Where I am going with this is that the article should be modular and dynamically presented to address different audiences. So, in the case of a biomedical research paper a peer would like to get down to the level of lab notes, whereas a patient or a policy maker would like to know strictly what is in for them.
Well, the blog post might be adequate for certain audiences, but assuming it is the ideal for all cases seems restricting. Where I am going with this is that the article should be modular and dynamically presented to address different audiences. So, in the case of a biomedical research paper a peer would like to get down to the level of lab notes, whereas a patient or a policy maker would like to know strictly what is in for them.
My two cents:
– blog posts are crap if you want to start a somekind formal knowledge structure
– Blogs are a crappy but these days really affordable to write uninformal things, as a magazine
– my feeling is most towards Atom publishing repositories: atom entries associated to resources’ URI’s, but didn’t got the time to explore it
– upon a webarch (http://www.w3.org/TR/webarch/ ) information schema you’d be able to develop, ie, formal reviewing processes, etc
My two cents:
– blog posts are crap if you want to start a somekind formal knowledge structure
– Blogs are a crappy but these days really affordable to write uninformal things, as a magazine
– my feeling is most towards Atom publishing repositories: atom entries associated to resources’ URI’s, but didn’t got the time to explore it
– upon a webarch (http://www.w3.org/TR/webarch/ ) information schema you’d be able to develop, ie, formal reviewing processes, etc