Use Cases for Provenance – eScience Institute – 20 April

On Monday I am speaking as part of a meeting on Use Cases for Provenance (Programme), which has a lot of interesting talks scheduled. I appear to be last. I am not sure whether that means I am the comedy closer or the pre-dinner entertainment. This may, however, be as a result of the title I chose:

In your worst nightmares: How experimental scientists are doing provenance for themselves

On the whole experimental scientists, particularly those working in traditional, small research groups, have little knowledge of, or interest in, the issues surrounding provenance and data curation. There is however an emerging and evolving community of practice developing the use of the tools and social conventions related to the broad set of web based resources that can be characterised as “Web 2.0”. This approach emphasises social, rather than technical, means of enforcing citation and attribution practice, as well as maintaining provenance. I will give examples of how this approach has been applied, and discuss the emerging social conventions of this community from the perspective of an insider.

The meeting will be webcast (link should be available from here) and my slides will with any luck be up at least a few minutes before my talk in the usual place.

Call for submissions for a project on The Use and Relevance of Web 2.0 Tools for Researchers

The Research Information Network has put out a cal for expressions of interest in running a research project on how Web 2.0 tools are changing scientific practice. The project will be funded up to £90,000. Expressions of interest are due on Monday 3 November (yes next week) and the projects are due to start in January. You can see the call in full here but in outline RIN seeking evidence whether web 2.0 tools are:

• making data easier to share, verify and re-use, or otherwise

facilitating more open scientific practices;

• changing discovery techniques or enhancing the accessibility of

research information;

• changing researchers’ publication and dissemination behaviour,

(for example, due to the ease of publishing work-in-progress and

grey literature);

• changing practices around communicating research findings (for

example through opportunities for iterative processes of feedback,

pre-publishing, or post-publication peer review).

Now we as a community know that there are cases where all of these are occurring and have fairly extensively documented examples. The question is obviously one of the degree of penetration. Again we know this is small – I’m not exactly sure how you would quantify it.

My challenge to you is whether it would be possible to use the tools and community we already have in place to carry out the project? In the past we’ve talked a lot about aggregating project teams and distributed work but the problem has always been that people don’t have the time to spare. We would need to get some help from social scientists on process and design of the investigation but with £90,000 there is easily enough money to pay people properly for their time. Indeed I know there are some people out there freelancing already who are in many ways already working on these issues anyway. So my question is: Are people interested in pursuing this? And if so, what do you think your hourly rate is?

Convergent evolution of scientist behaviour on Web 2.0 sites?

A thought sparked off by a comment from Maxine Clarke at Nature Networks where she posted a link to a post by David Crotty. The thing that got me thinking was Maxine’ statement:

I would add that in my opinion Cameron’s points about FriendFeed apply also to Nature Network. I’ve seen lots of examples of highly specific questions being answered on NN in the way Cameron describes for FF…But NN and FF aren’t the same: they both have the same nice feature of discussion of a partiular question or “article at a URL somewhere”, but they differ in other ways,…[CN- my emphasis]

Alright, in isolation this doesn’t look like much, read through both David’s post and the comments, and then come back to Maxine’s,  but what struck me was that on many of these sites many different communities seem to be using very different functionality to do very similar things. In Maxine’s words ‘…discussion of a…paricular URL somewhere…’ And that leads me to wonder the extent to which all of these sites are failing to do what it is that we actually want them to do. And the obvious follow on question: What is it we want them to do?

There seem to be two parts to this. One, as I wrote in my response to David, is that a lot of this is about the coffee room conversation, a process of building and maintaining a social network. It happens that this network is online, which makes it tough to drop into each others office, but these conversational tools are the next best thing. In fact they can be better because they let you choose when someone can drop into your office, a choice you often don’t have in the physical world. Many services; Friendfeed, Twitter, Nature Networks, Faceboo, or a combination can do this quite well – indeed the conversation spreads across many services helping the social network (which bear in mind probably actually has less than 500 total members) to grow, form, and strengthen the connections between people.

Great. So the social bit, the bit we have in common with the general populace, is sorted. What about the science?

I think what we want as scientists is two things. Firstly we want the right URL delivered at the right time to our inbox (I am assuming anything important is a resource on the web – this may not be true now but give it 18 months and it will be) . Secondly we want a rapid and accurate assessment of this item, its validity, its relevance, and its importance to us judged by people we trust and respect. Traditionally this was managed by going to the library and reading the journals – and then going to the appropriate conference and talking to peopl. We know that the volume of material and the speed at which we need to deal with this is way too fast. Nothing new there.

My current thinking is that we are failing in building the right tools because we keep thinking of these two steps as separate when actually combining them into one integrated process would actual provide efficiency gains for both phases. I need to sleep on this to get it straight in my head, there are issues of resource discovery, timeframes, and social network maintenance that are not falling into place for me at the moment, so that will be the subject of another post.

However, whether I am right or wrong in that particular line of thought, if it is true that we are reasonably consistent in what we want then it is not suprising that we try to bend the full range of services available into achieving those goals. The interesting question is whether we can discern what the killer app would be by looking at the details of what people do to different services and where they are failing. In a sense, if there is a single killer app for science then it should be discernable what it would do based on what scientists try to do with different services…