Blog – Science in the Open

December 22, 2014December 23, 2014

Loss, time and money

May - Oct 2006 Calendar — May – Oct 2006 Calendar (Photo credit: Wikipedia)

For my holiday project I’m reading through my old blog posts and trying to track the conversations that they were part of. What is shocking, but not surprising with a little thought, is how many of my current ideas seem to spring into being almost whole in single posts. And just how old some of those posts are. At the some time there is plenty of misunderstanding and rank naivety in there as well.

The period from 2007-10 was clearly productive and febrile. The links out from my posts point to a distributed conversation that is to be honest still a lot more sophisticated than much current online discussion onÂ scholarly communications. Yet at the same time that fabric is wearing thin. Broken links abound, both internal from when I moved my own hosting and external. Neil Saunders’ posts are all still accessible, but Deepak Singh’s seem to require a trip to the Internet Archive. The biggest single loss, though occurs through the adoption of FriendfeedÂ in mid-2008 by our small community. Some links to discussions resolve, some discussions of discussions survive as posts but whole chunks of the record ofÂ those conversations – about researcher IDs, peer review, and incentives and credit appear to have disappeared.

As I dig deeper through those conversations it looks like much of it can be extracted from the Internet Archive, but it takes time. Time is a theme that runs through posts starting in 2009Â as the “real time web” started becoming a mainstream thing, resurfaced in 2011Â and continues to bother. Time also surfaces as a cycle. Comments on peer review from 2011 still seem apposite and themes of feeds, aggregations and social data continue to emerge over time. On the other hand, while much of my recounting of conversations about Researcher IDs in 2009Â will look familiar to those who struggled with getting ORCID up and running, a lot of the technology ideas were…well probably best left in same place as my enthusiasm for Google Wave. And my concernsÂ about the involvement of Crossref in Researcher IDs isÂ ironic given I now sit on their board as second representing PLOS.

The theme that travels throughout the whole seven-ish years is that of incentives. Technical incentives, the idea that recording research should be a byproduct of what the researcher is doing anyway and ease of use (often as rants about institutional repositories) appear often. But the core is the question of incentives for researchers to adopt open practice, issues of “credit” and how it might beÂ givenÂ as well as the challenges that involves, but also of exchange systems that might turn “credit” into something real and meaningful. Whether that was to be real money wasn’t clear at the time. The concerns with real money come later as this open letter to David Willets suggests a year before the Finch review. Posts from 2010 on frequently mention the UK’s research funding crisis and in retrospect that crisis is the crucible that formed my views on impactÂ and re-use as well as how new metrics might support incentives that encourage re-use.

The themes are the same, the needs have not changes so much and many of the possibilities remain unproven and unrealised. At the same time the technology has marched on, making much of what was hard easy, or even trivial. What remains true is that the real value was created in conversations, arguments and disagreements, reconciliations and consensus. The value remains where it has always been – in a well crafted network of constructive critics and in a commitment to engage in the construction and care of those networks.

November 5, 2009January 3, 2010

Reflections on Science 2.0 from a distance – Part I

Some months ago now I gave a talk atÂ very exciting symposiumÂ organized byÂ Greg WilsonÂ as a closer for theÂ Software CarpentryÂ course he was running at Toronto University. It was exciting because of the lineup but also because it represented a real coming together of views on how developments in computer science and infrastructure as well as new social capabilities brought about by computer networks are changing scientific research.I talked, as I have several times recently, about the idea of a web-native laboratory record, thinking about what the paper notebook would look like if it were re-invented with todayâ€™s technology.Â Jon UdellÂ gave a two tweet summary of my talk which I think captured the two key aspects of my view point perfectly. In this post I want to explore theÂ first of these.

@cameronneylon: “The minimal publishable unit of science — the paper — is too big, too monolithic. The useful unit: a blog post.”#osci20

The key to the semantic web, linked open data, and indeed the web and the internet in general, is the ability to be able to address objects. URLs in and of themselves provide an amazing resource making it possible to identify and relate digital objects and resources. The â€œweb of thingsâ€ expands this idea to include addresses that identify physical objects. In science we aim to connect physical objects in the real world (samples, instruments) to data (digital objects) via concepts and models. All of these can be made addressable at any level of granularity we choose. But the level of detail is important. From a practical perspective too much detail means that the researcher wonâ€™t, or even canâ€™t, record it properly. Too little detail and the objects arenâ€™t flexible enough to allow re-wiring when we discover weâ€™ve got something wrong.

A single sample deserves an identity. A single data file requires an identity, although it may be wrapped up within a larger object. The challenge comes when we look at process, descriptions of methodology and claims. A traditionally published paper is too big an object, something that is shown clearly by the failure of citations to papers to be clear. A paper will generally contain multiple claims, and multiple processes. A citation couldÂ refer to any of these. At the other end I have argued that a tweet, 140 characters, is too small, because while you can make a statement it is difficult to provide context in the space available. To be a unit of science a tweet really needs to contain a statement and two references or citations, providing the relationship between two objects. It can be done but its just a bit too tight in my view.

So I proposed that the natural unit of science research is the blog post. There are many reasons for this. Firstly the length is elastic, accommodating something (nearly) as short as a tweet, to thousands of lines of data, code, or script. But equally there is a broad convention of approximate length, ranging from a few hundred to a few thousand words, about the length in fact of of a single lab notebook page, and about the length of a simple procedure. The second key aspect of a blog post is that it natively comes with a unique URL. The blog post is a first class object on the web, something that can be pointed at, scraped, and indexed. And crucially the blog post comes with a feed, and a feed that can contain rich and flexible metadata, again in agreed and accessible formats.

If we are to embrace the power of the web to transform the laboratory and scientific record then we need to think carefully about what the atomic components of that record are. Get this wrong and we make a record which is inaccessible, and which doesnâ€™t take advantage of the advanced tooling that the consumer web now provides. Get it right and the ability to Google for scientific facts will come for free. And that would just be the beginning.

If you would like to read more about these ideas I have aÂ paper just outÂ in the BMC JournalÂ Automated Experimentation.