The integrated lab record – or the web native lab notebook

At Science Online 09 and at the Smi Electronic Laboratory Notebook meeting in London later in January I talked about how laboratory notebooks might evolve. At Science Online 09 the session was about Open Notebook Science and here I wanted to take the idea of what a “web native” lab record could look like and show that if you go down this road you will get the most out if you are open. At the ELN meeting which was aimed mainly at traditional database backed ELN systems for industry I wanted to show the potential of a web native way of looking at the laboratory record, and in passing to show that these approaches work best when they are open, before beating a retreat back to the position of “but if you’re really paranoid you can implement all of this behind your firewall” so as not to scare them off too much. The talks are quite similar in outline and content and I wanted to work through some of the ideas here.The central premise is one that is similar to that of many web-service start ups: “The traditional paper notebook is to the fully integrated web based lab record as a card index is to Google”. Or to put it another way, if you think in a “web-native” way then you can look to leverage the power of interlinked networks, tagging, social network effects, and other things that don’t exist on a piece of paper, or indeed in most databases. This means stripping back the lab record to basics and re-imagining it as thought it were built around web based functionality.

So what is a lab notebook? At core it is a journal of events, a record of what has happened. Very similar to a Blog in many ways. An episodic record containing dates, times, bits and pieces of often disparate material, cut and pasted into a paper notebook. It is interesting that in fact most people who use online notebooks based on existing services use Wikis rather than blogs. This is for a number of reasons; better user interfaces, sometimes better services and functionality, proper versioning, or just personal preference. But there is one thing that Wikis tend to do very badly that I feel is crucial to thinking about the lab record in a web native way; they generate at best very ropey RSS feeds. Wikis are well suited to report writing and formalising and sharing procedures but they don’t make very good diaries. At the end of the day it ought to possible to do clever things with a single back end database being presented as both blog and wiki but I’ve yet to see anything really impressive in this space so for the moment I am going to stick with the idea of blog as lab notebook because I want to focus on feeds.

So we have the idea of a blog as the record – a minute to minute and day to day record. We will assume we have a wonderful backend and API and a wide range of clients that suit different approaches to writing things down and different situations where this is being done. Witness the plethora of clients for Twittering in every circumstance and mode of interaction for instance. We’ll assume tagging functionality as well as key value pairs that are exposed as microformats and RDF as appropriate. Widgets for ontology look up and autocompleton if it is desired and the ability to automatically generate input forms from any formal description of what an experiment should look like. But above all, this will be exposed in a rich machine readable format in an RSS/Atom feed.What we don’t need is the ability to upload data. Why not? Because we’re thinking web native. On a blog you don’t generally upload images and video directly, you host them on an appropriate service and embed them on the blog page. All of the issues are handled for you and a nice viewer is put in place. The hosting service is optimised for handling the kind of content you need; Flickr for photos, YouTube (Viddler, Bioscreencast) for video, Slideshare for presentations etc. In a properly built ecosystem there would be a data hosting service, ideally one optimised for your type of data, that would provide cut and paste embed codes providing the appropiate visualisations. The lab notebook only needs to point at the data; doesn’t need to know anything much about that data beyond the fact that it is related to the stuff going on around it and that it comes with some html code to embed a visualisation of some sort.

That pointing is the next thing we need to think about. In the way I use the Chemtools LaBLog I use a one post, one item system. This means that every object gets its own post. Each sample, each bottle of material, should have its own post and its own identity. This creates a network of posts that I have written about before. What it also means is that it is possible to apply page rank style algorithms and link analysis more generally in looking at large quantities of posts. Most importantly it encodes the relationship between objects, samples, procedures, data, and analysis in the way the web is tooled up to understand: the relationships are encoded in links. This is a lightweight way of starting to build up a web of data – it doesn’t matter so much to start with whether this is in hardcore RDF as long as there is enough contextual data to make it useful. Some tagging or key-value pairs would be a good start. Most importantly it means that it doesn’t matter at all where our data files are as long as we can point at them with sufficient precision.

But if we’re moving the datafiles off the main record then what about the information about samples? Wouldn’t it be better to use the existing Laboratory Information Management System, or sample management system or database? Well again, as long as you can point at each sample independently with the precision you need then it doesn’t matter. You can use a GoogleSpreadsheet if you want to – you can give URL for each cell, there is a powerful API that would let you build services to make putting the links in easy. We use the LaBLog to keep information on our samples because we have such a wide variety of different materials put to different uses, that the flexibility of using that system rather than a database with a defined schema is important for our way of working. But for other people this may not be the case. It might even be better to use multiple different systems, a database for oligonucleotides, a spreadsheet for environmental samples, and a full blown LIMS for barcoding and following the samples through preparation for sequencing. As long; as it can be pointed at, it can be used. Similar to the data case, it is best to use the system that is best suited to the specific samples. These systems are better developed than they are for data – but many of the existing systems don’t allow a good way of pointing at specific samples from an external document – and very few make it possible to do this via a simple http compliant URL.

So we’ve passed off the data, we’ve passed off the sample management. What we’re left with is the procedures which after all is the core of the record, right? Well no. Procedures are also just documents. Maybe they are text documents, but perhaps they are better expressed as spreadsheets or workflows (or rather the record of running a workflow). Again these may well be better handled by external services, be they word processors, spreadsheets, or specialist services. They just need to be somewhere where we can point at them.

What we are left with is the links themselves, arranged along a timeline. The laboratory record is reduced to a feed which describes the relationships between samples, procedures, and data. This could be a simple feed containing links or a sophisticated and rich XML feed which points out in turn to one or more formal vocabularies to describe the semantic relationship between items. It can all be wired together, some parts less tightly coupled than others, but in principle it can at least be connected. And that takes us one significant step towards wiring up the data web that many of us dream of

The beauty of this approach is that it doesn’t require users to shift from the applications and services that they are already using, like, and understand. What it does require is intelligent and specific repositories for the objects they generate that know enough about the object type to provide useful information and context. What it also requires is good plugins, applications, and services to help people generate the lab record feed. It also requires a minimal and arbitrarily extensible way of describing the relationships. This could be as simple html links with tagging of the objects (once you know an object is a sample and it is linked to a procedure you know a lot about what is going on) but there is a logic in having a minimal vocabulary that describes relationships (what you don’t know explicitly in the tagging version is whether the sample is an input or an output). But it can also be fully semantic if that is what people want. And while the loosely tagged material won’t be easily and tightly coupled to the fully semantic material the connections will at least be there. A combination of both is not perfect, but it’s a step on the way towards the global data graph.

A funny thing happened on the (way to the) forum

I love Stephen Sondheim musicals. In particular I love the way he can build an ensemble piece in which there can be 10-20 people onstage, apparently singing, shouting, and speaking complete disconnected lines, which nonetheless build into a coherent whole. Into the Woods (1987) contains many brilliant examples of the thoughts, fears, and hopes of a whole group of people building into a coherent view and message (see the opening for a taste and links to other clips). Those who believe in the wisdom of crowds in its widest sense see a similar possibility in aggregating the chatter found on the web into coherent and accurate assessments of problems. Those who despair of the ignorance of the lowest common denominator see most Web2 projects as a waste of time. I sit somewhere in the middle – believing that with the right tools, a community of people who care about a problem and have some form of agreed standards of behavior and disputation can rapidly aggregate a well informed and considered view of a problem and what it’s solution might be.

Yesterday and today, I saw one of the most compelling examples of that I’ve yet seen. Yesterday I posted a brain dump of what I had been thinking about following discussions in Hawaii and in North Carolina, about the possibilities of using OpenID to build a system for unique researcher IDs. The discussion on Friendfeed almost immediately aggregated a whole set of material, some of which I had not previously seen, proceded through a coherent discussion of many points, with a wide range of disparate views, towards some emerging conclusions. I’m not going to pre-judge those conclusions except to note there are some positions clearly developing that are contrary to my own view (e.g. on CrossRef being the preferred organisation to run such a service). This to me suggests the power of this approach for concensus building, even when that concensus is opposite to the position of the person kicking off the discussion.

What struck me with this was the powerful way in which Friendfeed rapidly enabled the conversation – and also the potential negative effect it had on widening the conversation beyond that community. Friendfeed is a very powerful tool for very rapidly widening the reach of a discussion like this one. It would be interesting to know how many people saw the item in their feeds. I could calculate it I suppose but for now I will just guess it was probably in the low to mid thousands. Many, many, more than subscribe to the blog anyway. What will be interesting to see is whether the slower process of blogospheric diffusion is informed by the Friendfeed discussion or runs completely independent of it (incidentally Friendfeed widget will hopefully be coming soon on the blog as well to try to and tie things together). Andy Powell of the Eduserv Foundation comments in his post of today that;

There’s a good deal of discussion about the post in Cameron’s FriendFeed. (It’s slightly annoying that the discussion is somewhat divorced from the original blog post but I guess that is one of the, err…, features of using FriendFeed?) [Andy also goes on to make some good point about delegation – CN]

The speed with which Friendfeed works, and the way in which it helps you build an interested community, and  separated communities where appropriate, is indeed a feature of Friendfeed. Equally that speed and the fact that you need an account to comment, if not to watch, can be exclusionary. It is also somewhat closed off from the rest of the world. While I am greatly excited by what happened yesterday and today, indeed possibly just as excited as I am about yesterday’s other important news, it is important to make sure that the watering and care of the community doesn’t turn into the building of a walled garden.

A specialist OpenID service to provide unique researcher IDs?

Following on from Science Online 09 and particularly discussions on Impact Factors and researcher incentives (also on Friendfeed and some video available at Mogulus via video on demand) as well as the article in PloS Computational Biology by Phil Bourne and Lynn Fink the issue of unique researcher identifiers has really emerged as absolutely central to making traditional publication work better, effectively building a real data web that works, and making it possible to aggregate the full list of how people contribute to the community automatically.

Good citation practice lies at the core of good science. The value of research data is not so much in the data itself but its context, its connection with other data and ideas. How then is it that we have no way of citing a person? We need a single, unique way, of identifying researchers. This will help traditional publishers and the existing ecosystem of services by making it possible to uniquely identify authors and referees. It will make it easier for researchers to be clear about who they are and what they have done. And finally it is a critical step in making it possible to automatically track all the contributions that people make. We’ve all seen CVs where people say they have refereed for Nature or the NIH or served on this or that panel. We can talk about micro credits but until there are validated ways of pulling that information and linking it to an identity that follows the person, not who they work for, we won’t make much progress.

On the other hand most of us do not want to be locked into one system, particularly if it is controlled by one commercial organization.  Thomson ISI’s ResearcherID is positioned as a solution to this problem, but I for one am not happy with being tied into using one particular service, regardless of who runs it.

In the PLoS Comp Biol article Bourne and Fink argue that one solution to this is OpenID. OpenID isn’t a service, it is a standard. This means that an identity can be hosted by a range of services and people can choose between them based on the service provided, personal philosophy, or any other reason. The central idea is that you have a single identity which you can use to sign on to a wide range of sites. In principle you sign into your OpenID and then you never see another login screen. In practice you often end up typing in your ID but at least it reduces the pain in setting up new accounts. It also provides in most cases a “home page”. If you go to http://cameron.neylon.myopenid.com you will see a (pretty limited) page with some basic information.

OpenID is becoming more popular with a wide range of webservices providing it as a login option including Dopplr, Blogger, and research sites including MyExperiment. Enabling OpenID is also on the list for a wide range of other services, although not always high up the priority list. As a starting point it could be very easy for researchers with an OpenID simply to add it to their address when publishing papers, thus providing a unique, and easily trackable identifier that is carried through the journal, abstracting services, and the whole ecosystem services built around them.

There are two major problems with OpenID. The first is that it is poorly supported by big players such as Google and Yahoo. Google and Yahoo will let you use your account with them as an OpenID but they don’t accept other OpenID providers. More importantly, people just don’t seem to get OpenID. It seems unnatural for some reason for a person’s identity marker to be a URL rather than a number, a name, or an email address. Compounded with the limited options provided by OpenID service providers this makes the practical use of such identifiers for researchers very much a minority activity.

So what about building an OpenID service specifically for researchers? Imagine a setup screen that asks sensible questions about where you work and what field you are in. Imagine that on the second screen, having done a search through literature databases it presents you with a list of publications to check through, remove any mistakes, allow you to add any that have been missed. And then imagine that the default homepage format is similar to an academic CV.

Problem 1: People already have multiple IDs and sometimes multiple OpenIDs. So we make at least part of the back end file format, and much of what is exposed on the homepage FOAF, making it possible to at least assert that you are the same person as, say cameronneylon@yahoo.com.

Problem 2: Aren’t we just locking people into a specific service again? Well no, if people don’t want to use it they can use any OpenID provider, even set one up themselves. It is an open standard.

Problem 3: What is there to make people sign up? This is the tough one really. It falls into two parts. Firstly, for those of us who already have OpenIDs or other accounts on other systems, isn’t this just (yet) another “me too” service. So, in accordance with the five rules I have proposed for successful researcher web services, there has to be a compelling case for using it.

For me the answer to this comes in part from the question. One of the things that comes up again and again as a complaint from researchers is the need to re-format their CV (see Schleyer et al, 2008 for a study of this). Remember that the aim here is to automatically aggregate most of the information you would put in a CV. Papers should be (relatively) easy, grants might be possible. Because we are doing this for researchers we know what the main categories are and what they look like. That is we have semantically structured data.

Ok so great I can re-format my CV easier and I don’t need to worry about whether it is up to date with all my papers but what about all these other sites where I need to put the same information? For this we need to provide functionality that lets all of this be carried easily to other services. Simple embed functionality like that you see on YouTube, and most other good file hosting services, which generates a little fragment of code that can easily be put in place on other services (obviously this requires other services to allow that – which could be a problem in some cases). But imagine the relief if all the poor people who try to manage university department websites could just throw in some embed codes to automatically keep their staff pages up to date? Anyone seeing a business model here yet?

But for this to work the real problem to be solved is the vast majority of researchers for whom this concept is totally alien. How do we get them to be bothered to sign up for this thing which apparently solves a problem they don’t have? The best approach would be if journals and grant awarding bodies used OpenIDs as identifiers. This would be a dream result but doesn’t seem likely. It would require significant work on changing many existing systems and frankly what is in it for them? Well one answer is that it would provide a mechanism for journals and grant bodies to publicly acknowledge the people who referee for them. An authenticated RSS feed from each journal or funder could be parsed and displayed on each researcher’s home page. The feed would expose a record of how many grants or papers that each person has reviewed (probably with some delay to prevent people linking that to the publication of specific papers). Of course such a feed could be used for lot of other interesting things as well, but none of them will work without a unique person identifier.

I don’t think this is compelling enough in itself, for the moment, but a simpler answer is what was proposed above – just encouraging people to include an OpenID as part of their address. Researchers will bend over backwards to make people happy if they believe those people have an impact on their chances of being published or getting a grant. A little thing could provide a lot of impetus and that might bring into play the kind of effects that could result from acknowledgement and ultimately make the case that shifting to OpenID as the login system is worth the effort. This would particularly the case for funders who really want to be able to aggregate information about the people they fund effectively.

There are many details to think about here. Can I use my own domain name (yes, re-directs should be possible). Will people who use another service be at a disadvantage (probably, otherwise any business model won’t really work).  Is there a business model that holds water (I think there is but the devil is in the details). Should it be non-profit or for profit or run by a respected body (I would argue that for-profit is possible and should be pursued to make sure the service keeps improving – but then we’re back with a commercial provider).

There are many good questions that need to be thought through but I think the principle of this could work, and if such an approach is to be successful it needs to get off the ground soon and fast.

Note: I am aware that a number of people are working behind the scenes on components of this and on similar ideas. Some of what is written above is derived from private conversations with these people and as soon as I know that their work has gone public I will add references and citations as appropriate at the bottom of this post. 

Very final countdown to Science Online 09

I should be putting something together for the actual sessions I am notionally involved in helping running but this being a very interactive meeting perhaps it is better to leave things to very last minute. Currently I am at a hotel at LAX awaiting an early flight tomorrow morning. Daily temperatures in the LA area have been running around 25-30 C for the past few days but we’ve been threatened with the potential for well below zero in Chapel Hill. Nonetheless the programme and the people will more than make up for it I have no doubt. I got to participate in a bit of the meeting last year via streaming video and that was pretty good but a little limited – not least because I couldn’t really afford to stay up all night unlike some people who were far more dedicated.

This year I am involved in three sessions (one on Blog Networks, one on Open Notebook Science, and one on Social Networks for Scientists – yes those three are back to back…) and we will be aiming to be video casting, live blogging, posting slides, images, and comments; the whole deal. If you’ve got opinions then leave them at the various wiki pages (via the programme) or bring them along to the sessions. We are definitely looking for lively discussion. Two of these are being organised with the inimitable Deepak Singh who I am very much looking forward to finally meeting in person – along with many others I feel I know quite well but have never met – and others I have met and look forward to catching up with including Jean-Claude who has instigated the Open Notebook session.

With luck I will get to the dinner tomorrow night so hope to see some people there. Otherwise I hope to see many in person or online over the weekend. Thanks for Bora and Anton and David for superb organisation (and not a little pestering to make sure I decided to come!)

Reflections on the Open Science workshop at PSB09

In a few hours I will be giving a short presentation to the whole of the PSB conference on the workshop that we ran on Monday. We are still thinking through the details of what has come out of this and hopefully the discussion will continue in any case so this is a personal view. The slides for the presentation are available at Slideshare.

To me there were a couple of key points that came out. Many of these are not surprising but bear repeating:

  • Citation and improving and expanding the way it is used lies at the core of making sure that people get credit for the work they do and making the widest range of useful contributions to the research community
  • Persistence of identity, and persistence of objects (in general the persistence of resources) is absolutely critical to making a wider citation culture work. We must know who generated something and be able to point to it in the long term to deliver on the potential of credits.
  • “If you build it they won’t come” – building a service, whether a technical or a social one, depends on a community that uses and adds value to those services. Build the service for the community and build the community for the service. Don’t solve the problems that you think people have – solve the ones that they tell you they have

The main point for me grew out of the panel session and was perhaps articulated best by Drew Endy. Identify specific problems (not ideological issues) and make process more efficient. Ideology may help to guide us but it can also blind you to specific issues and hide the underlying reasons for specific successes and failures from our view. We have a desperate need for both qualitative data, stories about successes and failures, and quantitative data, hard numbers on uptake and the consequences of uptake of specific practice.

Taking inspiration from Drew’s keynote, we have an evolved system for doing research that is not designed to be easily understood or modified. We need to take an experimental approach to identifying and solving specific problems that would let us increase the efficiency of the research process. Drew’s point was that this should be a proper research discipline in it’s own right, with the funding and respect that goes with it. For the presentation I summarised this as follows:

Improving the research process is an area for (experimental) research that requires the same rigour, standards (and funding) as anything else that we do

Brief running report on the Open Science Workshop at PSB09

Just a very brief rundown of what happened at the workshop this morning and some central themes that came out of it. The slides from the talks are available on Slideshare and recorded video from most of the talks (unfortunately not Dave de Roure‘s or Phil Bourne‘s at the moment) is available on my Mogulus channel (http://www.mogulus.com/cameron_neylon – click on Video on Demand and select the PSB folder). The commentary from the conference is available in the PSB 2009 Friendfeed room.

For me there were three main themes that came through from the talks and the panel session. The first was one that has come up in many contexts but most recently in Phil Bourne and Lyn Fink’s perspectives article in PLos Computational Biology; the need for persistent identity tokens to track people’s contributions, and a need to re-think how citation works, and what citations are used for.

The second theme was a need for more focus on specific issues, including domain specific problems or barriers, where “greasing the wheels” could make a direct difference to people’s ability to do their research. Solving specific problems that are not necessarily directly associated with “openness” as an ideological movement. Similar ideas were raised in discussion of tool and service development, the need to build the user into the process of service design and solve the problems users have, rather than those the developer may think they ought to be worrying about.

But probably the main theme that came through for me was the need to identify and measure real outcomes from adopting more open practice. This was the central theme of Heather‘s talk but also came up strongly in the panel session. We have little if any quantitative information on the benefits of open practice and  there are still relatively few examples of complete examples of open research projects. More research and more aggregation of examples will help here but there is a desperate need for numbers and details to help funders, policy makers, and researchers themselves to make informed choices about what approaches are worth adopting, and indeed which are not.

The session was a good conversation, with some great talks, and lots of people involved throughout. Even with a three hour slot we ran 30 minutes over and could have kept talking for quite a bit longer. We will keep posting material over the next few days so please continue the discussion over at Friendfeed and on the workshop website.

Final countdown to Open Science@PSB

As I noted in the last post we are rapidly counting down towards the final few days before the Open Science Workshop at the Pacific Symposium on Biocomputing. I am flying out from Sydney to Hawaii this afternoon and may or may not have network connectivity in the days leading up the meeting. So just some quick notes here on where you can find any final information if you are coming or if you want to follow online.

The workshop website is available at psb09openscience.wordpress.com and this is where information will be posted in the leadup to the workshop and links to presentations and any other information posted afterwards.

If you want to follow in closer to real time then there is a Friendfeed room available at friendfeed.com/rooms/psb-2009 which will have breaking information and live blogging during the workshop and throughout the conference. I will be aiming to broadcast video of the workshop at www.mogulus.com/cameron_neylon but this will depend on how well the wireless is working on the day. This will not be the highest priority. Updates on whether it is functioning or not will be in the friendfeed room and I will not be monitoring the chat room on the mogulus feed. If there are technical issues please leave a message in the friendfeed room and I will try to fix the problem or at least say if I can’t.

Otherwise I hope to see many of you at the workshop either in person or online!

New Year’s Resolutions 2009

Sydney Harbour Bridge NYE fireworksAll good traditions require someone to make an arbitrary decision to do something again. Last year I threw up a few New Year’s resolutions in the hours before NYE in the UK. Last night I was out on the shore of Sydney Harbour. I had the laptop – I thought about writing something – and then I thought – nah I can just lie here and look at the pretty lights. However I did want to follow up the successes and failures of last year’s resolutions and maybe make a few more for this year.

So last year’s resolutions were, roughly speaking, 1) to adopt the principles of the NIH Open Access mandate when choosing journals for publications, 2) to get more of the existing data within my group online and available, 3) to take the whole research group fully open notebook, 4) to mention Open Notebooks in every talk I gave, and 5) attempt to get explicit funding for developing open notebook approaches.

So successes – the research group at RAL is now (technically) working on an Open Notebook basis. This has taken a lot longer than we expected and the guys are still really getting a feel for what that means both in terms of how the record things and how they feel about it. I think it will improve over time and it just reinforces the message that none of this is easy.  I also made a point about talking about the Open Notebook approach is every talk I gave – mostly this was well received – often there was some scepticism but the message is getting out there.

However we didn’t do so well on picking journals – most of the papers I was on this year were driven by other people or were directed requests for special issues, or both. The papers that I had in mind I still haven’t got written, some drafts exist, but they’re definitely not finished. I also haven’t done any real work on getting older data online – it has been enough work just trying to manage the stuff we already have.

Funding is a mixed bag – the network proposal that was in last New Year’s was rejected. A few proposals have gone in – more haven’t gone in but exist in draft form – and a group of us went close to getting a tender to do some research into the uptake of Web 2. tools in science (more on that later but Gavin Baker has written about it and our tender document itself is available). The success of the year was the funding that Jean-Claude Bradley obtained from Submeta (as well as support from Aldrich Chemicals and Nature Publishing Group) to support the Open Notebook Science Challenge. I can’t take any credit for this but I think it is a good sign that we may have more luck this coming year.

So for this year – there are some follow ons – and some new ones:

  1. I will re-write the network application (and will be asking for help) and re-submit it to a UK funder
  2. I will clean up the “Personal View of Open Science” series of blog posts and see if I can get it published as a perspectives article in a high ranking journal
  3. I will get some of those damn papers finished – and decide which ones are never going to be written and give up on them. Papers I have full control over will go by first preference to Gold OA journals.
  4. I will pull together the pieces needed to take action on the ideas that came out of the Southampton Open Science workshop, specifically the idea of a letter signed by a wide range of scientists and interested people to a high ranking journal stating the importance of working towards published papers being fully supported by data and methodological detail that is fully available
  5. I will focus on doing less things and doing them better – or at least making sure the resources are available to do more of the things I take on…

I think five is enough things to be going on with. Hope you all have a happy new year, whenever it may start, and that it takes you further in the direction you want to go (whether you know what that is now or not) than you thought was possible.

p.s. I noticed in the comments to last year’s post a comment from one Shirley Wu suggesting the idea of running a session at the 2009 Pacific Symposium on Biocomputing – a proposal that resulted in the session we are holding in a few days (again more later on – we hope – streaming video, micro blogging etc). Just thinking about how much has changed in the way such an idea would be raised and explored in the last twelve months is food for thought.

The failure of online communication tools

Coming from me that may sound a strange title, but while I am very positive about the potential for online tools to improve the way we communicate science, I sometimes despair about the irritating little barriers that constantly prevent us from starting to achieve what we might. Today I had a good example of that.

Currently I am in Sydney, a city where many old, and some not so old friends live. I am a bit rushed for time so decided the best way to catch up was to propose a date, send out a broadcast message to all the relevant people, and then sort out the minor details of where and exactly when to meet up. Easy right? After all tools like Friendfeed and Facebook provide good broadcast functionality. Except of course, as many of these are old friends, they are not on Friendfeed. But that’s ok because I’ve many of them are on Facebook. Except some of them are not old friends, or are not people I have yet found on Facebook, but that’s ok, they’re on Friendfeed, so I just need to send two messages. Oh, except there are some people who aren’t on Facebook, so I need to email them – but they don’t all know each other so I shouldn’t send their email addresses in the clear. That’s ok, that’s what bcc is for. Oh, but this email address is about five years old…is it still correct?

So – I end up sending three independent messages, one via Friendfeed, three via Facebook (one status message, one direct message, and another direct message to the person I found but hadn’t yet friended), and one via email (some unfortunate people got all three – and it turns out they have to do their laundry anyway). It almost came down to trying some old mobile numbers to send out text. Twitter (which I don’t use very much) wouldn’t have helped either. But that’s not so bad – only took me ten minutes to cut and paste and get them all sent. They seem to be getting through to people as well which is good.

Except now I am getting back responses via email, via Facebook, and at some point via Friendfeed as well no doubt. All of which are inaccessible to me when I am out and about anyway because I’m not prepared to pay the swinging rates for roaming data.

What should happen is that I have a collection of people, I choose the send them a message, whether private or broadcast, and they choose how to receive that message and how to prioritise it. They then reply to me, and I see all their responses nicely aggregated because they are all related to my one query. As this query was time dependent I would have prioritised responses so perhaps I would receive them by text or direct to my mobile in some other form. The point is that each person controls the way they receive information from different streams and is in control of the way they deal with it.

It’s not just filter failure which is creating the impression of the information overload. The tools we are using, their incompatibility, and the cost of transferring items from one stream to another are also contributing to the problem. The web is designed to be sticky because the web is designed to sell advertising. Every me-too site wants to hold its users and communities, my community, my specific community that I want to meet up with for a drink, is split across multiple services. I don’t have a solution to the business model problem – I just want services with proper APIs that let other people build services that get all of my streams into one place. I hope someone comes up with a business model – but I also have to accept that maybe I just need to pay for it.

The people you meet on the train…

Yesterday on the train I had a most remarkable experience of synchronicity. I had been at the RIN workshop on the costs of scholarly publishing (more on that later) in London and was heading of to Oxford for a group dinner. On the train I was looking for a seat with a desk and took one up opposite a guy with a slightly battered looking mac laptop. As I pulled out my new Macbook (13” 2.4 GHz, 4 Gb memory since you ask) he leaned across to have a good look, as you do, and we struck up a conversation. He asked what I did and I talked a little about being a scientist and my role at work. He was a consultant who worked on systems integration.
At some stage he made a throwaway comment about the fact that he had been going back to learn or re-learn some fairly advanced statistics and that he had had a lot of trouble getting access to some academic papers, certainly he didn’t want to pay for them, but had managed to find free versions of what he wanted online. I managed to keep my mouth somewhat shut at this point, except to say I had been at a workshop looking at these issues. However it gets better, much better. He was looking into quantitative risk issues and this lead into a discussion about the problems of how science and particularly medicine reporting in the media doesn’t provide links back to the original research (which is generally not accessible anyway) and that, what is worse, the original data is usually not available (and this was all unprompted by me, honestly!). To paraphrase his comment “the trouble with science is that I can’t get at the numbers behind the headlines; what is the sample size, how was the trial run…” Well at this point, all thought of getting any work done went out the window and we had a great discussion about data availability, the challenges of recording it in the right form (his systems integration work includes efforts to deal with mining of large, badly organised data sets), drifted into identity management and trust networks and was a great deal of fun.
What do I take from this? That there is a a demand for this kind of information and data from an educated and knowledgable public. One of the questions he asked was whether as a scientist I ever see much in the way of demand from the public. My response was that, aside from pushing the taxpayer access to taxpayer funded research myself, I hadn’t seen much evidence of real demand. His argument was that there is a huge nascent demand there from people who haven’t thought about their need to get into the detail of news stories that effect them. People want the detail, they just have no idea of how to go about getting it. Spread the idea that access to that detail is a right and we will see the demand for access to the outputs of research grow rapidly. The idea that “no-one out there is interested or competent to understand the details” is simply not true. The more respect we have for the people who fund our research the better frankly.