Tweeting the lab

Free twitter badge
Image via Wikipedia

I’ve been interested for some time in capturing information and the context in which that information is created in the lab. The question of how to build an efficient and useable laboratory recording system is fundamentally one of how much information is necessary to record and how much of that can be recorded while bothering the researcher themselves as little as possible.

The Beyond the PDF mailing list has, since the meeting a few weeks ago, been partly focused on attempts to analyse human written text and to annotate these as structured assertions, or nanopublications. This is also the approach that many Electronic Lab Notebook systems attempt to take, capturing an electronic version of the paper notebook and in some cases trying to capture all the information in it in a structured form. I can’t help but feel that, while this is important, it’s almost precisely backwards. By definition any summary of a written text will throw away information, the only question is how much. Rather than trying to capture arbitrary and complex assertions in written text, it seems better to me to ask what simple vocabulary can be provided that can express enough of what people want to say to be useful.

In classic 80/20 style we ask what is useful enough to interest researchers, how much would we lose, and what would that be? This neatly sidesteps the questions of truth (though not of likelihood) and context that are the real challenge of structuring human authored text via annotation because the limited vocabulary and the collection of structured statements made provides an explicit context.

This kind of approach turns out to work quite well in the lab. In our blog based notebook we use a one item-one post approach where every research artifact gets its own URL. Both the verbs, the procedures, and the nouns, the data and materials, all have a unique identifier. The relationships between verbs and nouns is provided by simple links. Thus the structured vocabulary of the lab notebook is [Material] was input to [Process] which generated [Data] (where Material and Data can be interchanged depending on the process).  This is not so much 80/20 as 30/70 but even in this very basic form in can be quite useful. Along with records of who did something and when, and some basic tagging this actually makes a quite an effective lab notebook system.

The question is, how can we move beyond this to create a record which is richer enough to provide a real step up, but doesn’t bother the user any more than is necessary and justified by the extra functionality that they’re getting. In fact, ideally we’d capture a richer and more useful record while bothering the user less. A part of the solution lies in the work that Jeremy Frey’s group have done with blogging instruments. By having an instrument create a record of it’s state, inputs and outputs, the user is freed to focus on what their doing, and only needs to link into that record when they start to do their analysis.

Another route is the approach that Peter Murray-Rust’s group are exploring with interactive lab equipment, particularly a fume cupboard that can record spoken instructions and comments and track where objects are, monitoring an entire process in detail. The challenge in this approach lies in translating that information into something that is easy to use downstream. Audio and video remain difficult to search and worth with. Speech recognition isn’t great for formatting and clear presentation.

In the spirit of a limited vocabulary another approach is to use a lightweight infrastructure to record short comments, either structured, or free text. A bakery in London has a switch on its wall which can be turned to one of a small number of baked good as a batch goes into the oven. This is connected to a very basic twitter client then tells the world that there are fresh baked baguettes coming in about twenty minutes. Because this output data is structured it would in principle be possible to track the different baking times and preferences for muffins vs doughnuts over the day and over the year.

The lab is slightly more complex than a bakery. Different processes would take different inputs. Our hypothetical structured vocabulary would need to enable the construction of sentences with subjects, predicates, and objects, but as we’ve learnt with the lab notebook, even the simple predicate “is input to”, “is output of” can be very useful. “I am doing X” where X is one of a relatively small set of options provides real time bounds on when important events happened. A little more sophistication could go a long way. A very simple twitter client that provided a relatively small range of structured statements could be very useful. These statements could be processed downstream into a more directly useable record.

Last week I recorded the steps that I carried out in the lab via the hashtag #tweetthelab. These free text tweets make a serviceable, if not perfect, record of the days work. What is missing is a URI for each sample and output data file, and links between the inputs, the processes, and the outputs. But this wouldn’t be too hard to generate, particularly if instruments themselves were actually blogging or tweeting its outputs. A simple client on a tablet, phone, or locally placed computer would make it easy to both capture and to structure the lab record. There is still a need for free text comments and any structured description will not be able to capture everything but the potential for capturing a lot of the detail of what is happening in a lab, as it happens, is significant. And it’s the detail that often isn’t recorded terribly well, the little bits and pieces of exactly when something was done, what did the balance really read, which particular bottle of chemical was picked up.

Twitter is often derided as trivial, as lowering the barrier to shouting banal fragments to the world, but in the lab we need tools that will help us collect, aggregate and structure exactly those banal pieces so that we have them when we need them. Add a little bit of structure to that, but not too much, and we could have a winner. Starting from human discourse always seemed too hard for me, but starting with identifying the simplest things we can say that are also useful to the scientist on the ground seems like a viable route forward.

Enhanced by Zemanta

A little bit of federated Open Notebook Science

Girl Reading a Letter at an Open Window
Image via Wikipedia

Jean-Claude Bradley is the master when it comes to organising collaborations around diverse sets of online tools. The UsefulChem and Open Notebook Science Challenge projects both revolved around the use of wikis, blogs, GoogleDocs, video, ChemSpider and whatever tools are appropriate for the job at hand. This is something that has grown up over time but is at least partially formally organised. At some level the tools that get used are the ones Jean-Claude decides will be used and it is in part his uncompromising attitude to how the project works (if you want to be involved you interact on the project’s terms) that makes this work effectively.

At the other end of the spectrum is the small scale, perhaps random collaboration that springs up online, generates some data and continues (or not) towards something a little more organised. By definition such “projectlets” will be distributed across multiple services, perhaps uncoordinated, and certainly opportunistic. Just such a project has popped up over the past week or so and I wanted to document it here.

I have for some time been very interested in the potential of visualising my online lab notebook as a graph. The way I organise the notebook is such that it, at least in a sense, automatically generates linked data and for me this is an important part of its potential power as an approach. I often use a very old graph visualisation in talks I give out the notebook as a way of trying to indicate the potential which I wrote about previously, but we’ve not really taken it any further than that.

A week or so ago, Tony Hirst (@psychemedia) left a comment on a blog post which sparked a conversation about feeds and their use for generating useful information. I pointed Tony at the feeds from my lab notebook but didn’t take it any further than that. Following this he posted a series of graph visualisations of the connections between people tweeting at a set of conferences and then the penny dropped for me…sparking this conversation on twitter.

@psychemedia You asked about data to visualise. I should have thought about our lab notebook internal links! What formats are useful? [link]

@CameronNeylon if the links are easily scrapeable, it’s easy enough to plot the graph eg http://blog.ouseful.info/2010/08/30/the-structure-of-ouseful-info/ [link]

@psychemedia Wouldn’t be too hard to scrape (http://biolab.isis.rl.ac.uk/camerons_labblog) but could possibly get as rdf or xml if it helps? [link]

@CameronNeylon structured format would be helpful… [link]

At this point the only part of the whole process that isn’t publicly available takes place as I send an email to find out how to get an XML download of my blog and then report back  via Twitter.

@psychemedia Ok. XML dump at http://biolab.isis.rl.ac.uk/camerons_labblog/index.xml but I will try to hack some Python together to pull the right links out [link]

Tony suggests I pull out the date and I respond that I will try to get the relevant information into some sort of JSON format, and I’ll try to do that over the weekend. Friday afternoons being what they are and Python being what is I actually manage to do this much quicker than I expect and so I tweet that I’ve made the formatted data, raw data, and script publicly available via DropBox. Of course this is only possible because Tony tweeted the link above to his own blog describing how to pull out and format data for Gephi and it was easy for me to adapt his code to my own needs, an open source win if there ever was one.

Despite the fact that Tony took the time out to put the kettle on and have dinner and I went to a rehearsal by the time I went to bed on Friday night Tony had improved the script and made it available (with revisions) via a Gist, identified some problems with the data, and posted an initial visualisation. On Saturday morning I transfer Tony’s alterations into my own code, set up a local Git repository, push to a new Github repository, run the script over the XML dump as is (results pushed to Github). I then “fix” the raw data by manually removing the result of a SQL insertion attack – note that because I commit and push to the remote repository I get data versioning for free – this “fixing” is transparent and recorded. Then I re-run the script, pushing again to Github. I’ve just now updated the script and committed once more following further suggestions from Tony.

So over a couple of days we used Twitter for communication, DropBox, GitHub, Gists, and Flickr for sharing data and code, and the whole process was carried out publicly. I wouldn’t have even thought to ask Tony about this if he hadn’t  been publicly posting his visualisations (indeed I remember but can’t find an ironic tweet from Tony a few weeks back about it would be clearly much better to publish in a journal in 18 months time when no-one could even remember what the conference he was analysing was about…).

So another win for open approaches. Again, something small, something relatively simple, but something that came together because people were easily connected in a public space and were routinely sharing research outputs, something that by default spread into the way we conducted the project. It never occurred to me at the time, I was just reaching for the easiest tool at each stage, but at every stage every aspect of this was carried out in the open. It was just the easiest and most effective way to do it.

Enhanced by Zemanta

Why the web of data needs to be social

Picture1
Image by cameronneylon via Flickr

If you’ve been around either myself or Deepak Singh you will almost certainly have heard the Jeff Jonas/Jon Udell soundbite: ‘Data finds data. Then people find people’. Jonas is referring to data management frameworks and knowledge discovery and Udell is referring to the power of integrated data to bring people together.

At some level Jonas’ vision (see his chapter[pdf] in Beautiful Data) is what the semantic web ought to enable, the automated discovery of data or objects based on common patterns or characteristics. Thus far in practical terms we have signally failed to make this a reality, particularly for research data and objects.

Udell’s angle (or rather, my interpretation of his overall stance) is more linked to the social web – the discovery of common contexts through shared data frameworks. These contexts might be social groups, as in conventional social networks, a particular interest or passion, or – in the case of Jon’s championing of the iCalendar standard –  a date and place as demonstrated by the  the elmcity project supporting calendar curation and aggregation. Shared context enables the making of new connection, the creation of new links. But still mainly links between people.

It’s not the scientists who are social; it’s the data – Neil Saunders

The naïve analysis of the success of consumer social networks and the weaknesses of science communication has lead to efforts that almost precisely invert the Jonas/Udell concept. In the case of most of these “Facebooks for Scientists” the idea is that people find people, and then they connect with data through those people.

My belief is that it is this approach that has led to the almost complete failure of these networks to gain traction. Services that place the object  research at the centre; the reference management and bookmarking services, to some extent Twitter and Friendfeed, appear to gain much more real scientific use because they mediate the interactions that researchers are interested in, those between themselves and research objects. Friendfeed in particular seems to support this discovery pattern. Objects of interest are brought into your stream, which then leads to discovery of the person behind them.  I often use Citeulike in this mode. I find a paper of interest, identify the tags other people have used for it and the papers that share those tags. If these seems promising, I then might look at the library of the person, but I get to that person through the shared context of the research object, the paper, and the tags around that object.

Data, data everywhere, but not a lot of links – Simon Coles

A common complaint made of research data is that people don’t make it available. This is part of the problem but increasingly it is a smaller part. It is easy enough to put data up that many researchers are doing so, in supplementary data of journal articles, on personal websites, or on community or consumer sites. From a linked data perspective we ought to be having a field day with this, even if it represents only a small proportion of the total. However little of this data is easily discoverable and most of it is certainly not linked in any meaningful way.

A fundamental problem that I feel like I’ve been banging on about for years now is that dearth of well built tools for creating these links. Finally these tools are starting to appear with Freebase Gridworks being an early example. There is a good chance that it will become easier over time for people to create links as part of the process of making their own record. But the fundamental problems we always face, that this is hard work, and often unrewarded work, are limiting progress.

Data friends data…then knowledge becomes discoverable

Human interaction is unlikely to work at scale. We are going to need automated systems to wire the web of data together. The human process simply cannot keep up with the ongoing annotation and connection of data at the volumes that are being generated today. And we can’t afford not to if we want to optimize the opportunities of research to deliver useful outcomes.

When we think about social networks we always place people at their centre. But there is nothing to stop us replacing people with data or other research objects. Software that wants to find data, data that wants to find complementary or supportive data, or wants to find the right software to convert or analyze it. Instead of Farmville or Mafia Wars imagine useful tools that make these connections, negotiate content, and identify common context. As pointed out to me by Paul Walk this is very similar to what was envisioned in the 90s as the role of software agents. In this view the human research users are the poorly connected users on the outskirts of the web.

The point is that the hard part of creating linked data is making the links, not publishing the data. The semantic web has always suffered from the chicken and egg problem of a lack of user-friendly ways to generate RDF and few tools that could really use that RDF in exciting ways even if it did exist. I still can’t do a useful search on which restaurants in Bath will be open next Sunday. The reality is that the innards of this should be hidden from the user, the making of connections needs to be automated as far as possible, and as natural as possible when the user has to be involved. As easy as hitting that “like” button, or right clicking and adding a citation.

We have learnt a lot about the principles of when and how social networks work. If we can apply those lessons to the construction of open data management and discovery frameworks then we may stand some chance of actually making some of the original vision of the web work.

Reblog this post [with Zemanta]

The personal and the institutional

Twittering and microblogging not permitted
Image by cameronneylon via Flickr

A number of things recently have lead me to reflect on the nature of interactions between social media, research organisations and the wider community. There has been an awful lot written about the effective use of social media by organisations, the risks involved in trusting staff and members of an organisation to engage productively and positively with a wider audience. Above all there seems a real focus on the potential for people to embarrass the organisation. Relatively little focus is applied to the ability of the organisation to embarrass its staff but that is perhaps a subject for another post.

In the area of academic research this takes on a whole new hue due to the presence of a strong principle and community expectation of free speech, the principle of “academic freedom”. No-one really knows what academic freedom is. It’s one of those things that people can’t define but will be very clear about when it has been taken away. In general terms it is the expectation that a tenured academic has earnt the right to be able to speak their opinion, regardless of how controversial. We can accept there are some bounds on this, of ethics, taste, and legality – racism would generally be regarded as unacceptable – while noting that the boundary between what is socially unacceptable and what is a validly held and supported academic opinion is both elastic and almost impossible to define. Try expressing the opinion, for example, that their might be a biological basis to the difference between men and women on average scores on a specific maths test. These grey areas, looking at how the academy ( or academies) censor themselves are interesting but aren’t directly relevant to this post. Here I am more interested in how institutions censor their staff.

Organisations always seek to control the messages they release to the wider community. The first priority of any organisation or institution is its own survival. This is not necessarily a bad thing – presumably the institution exists because it is  (or at least was) the most effective way of delivering a specific mission. If it ceases to exist, that mission can’t be delivered. Controlling the message is a means of controlling others reactions and hence the future. Research institutions have always struggled with this – the corporate centre sending once message of clear vision, high standards, continuous positive development, while the academics privately mutter in the privacy of their own coffee room about creeping beauracracy, lack of resources, and falling standards.

There is fault on both sides here. Research administration and support only very rarely puts the needs and resources of academics at its centre. Time and time again the layers of beauracracy mean that what may or may not have been a good idea gets buried in a new set of unconnected paperwork, that more administration is required taking resources away from frontline activities, and that target setting results in target meeting but at the cost of what was important in the first place. There is usually a fundamental lack of understanding of what researchers do and what motivates them.

On the other side academics are arrogant and self absorbed, rarely interested in contributing to the solution of larger problems. They fail to understand, or take any interest in the corporate obligations of the organisations that support them and will only rarely cooperate and compromise to find solutions to problems. Worse than this, academics build social and reward structures that encourage this kind of behaviour, promoting individual achievement rather than that of teams, penalising people for accepting compromises, and rarely rewarding the key positive contribution of effective communication and problem solving between the academic side and administration.

What the first decade of the social web has taught us is that organisations that effectively harness the goodwill of their staff or members using social media tools do well. Organisations that effectively use Twitter or Facebook enable and encourage their staff to take the shared organisational values out to the wider public. Enable your staff to take responsibility and respond rapidly to issues, make it easy to identify the right person to engage with a specific issue, and admit (and fix) mistakes early and often, is the advice you can get from any social media consultant. Bring the right expert attention to bear on a problem and solve it collaboratively, whether its internal or with a customer. This is simply another variation on Michael Nielsen’s writing on markets in expert attention – the organisations that build effective internal markets and apply the added value to improving their offering will win.

This approach is antithetical to traditional command and control management structures. It implies a fluidity and a lack of direct control over people’s time. It is also requires that there be slack in the system, something that doesn’t sit well with efficiency drives. In its extreme form it removes the need for the organisation to formally exist, allowing a fluid interaction of free agents to interact in a market for their time. What it does do though is map very well onto a rather traditional view of how the academy is “managed”. Academics provide a limited resource, their time, and apply it to a large extent in a way determined by what they think is important. Management structures are in practice fairly flat (and used to be much more so) and interactions are driven more by interests and personal whim than by widely accepted corporate objectives. Research organisations, and perhaps by extension those commercial interests that interact most directly with them, should be ideally suited to harness the power of the social web to first solve their internal problems and secondly interact more effectively with their customers and stakeholders.

Why doesn’t this happen? A variety of reasons, some of them the usual suspects, a lack of adoption of new tools by academics, appalling IT procurement procedures and poor standards of software development, and a simple lack of time to develop new approaches, and a real lack of appreciation of the value that diversity of contributions can bring to a successful department and organisation. The biggest one though I suspect is a lack of good will between administrations and academics. Academics will not adopt any tools en masse across a department, let alone an organisation because they are naturally suspicious of the agenda and competence of those choosing the tools. And the diversity of tools they choose on their own means that none have critical mass within the organisation – few academic institutions had a useful global calendar system until very recently. Administration don’t trust the herd of cats that make up their academic staff to engage productively with the problems they have and see the need to have a technical solution that has critical mass of users, and therefore involves a central decision.

The problems of both diversity and lack of critical mass are a solid indication that the social web has some way to mature – these conversations should occur effectively across different tools and frameworks – and the uptake at research institutions should (although it may seem paradoxical) be expected to much slower than in more top down, managed organisation, or at least organisations with a shared focus. But it strikes me that the institutions that get this right, and they won’t be the traditional top institutions, will very rapidly accrue a serious advantage, both in terms of freeing up staff time to focus on core activities and releasing real monetary resource to support those activities. If the social side works, then the resource will also go to the right place. Watch for academic institutions trying to bring in strong social media experience into senior management. It will be a very interesting story to follow.

Reblog this post [with Zemanta]

Friendfeed for Research? First impressions of ScienceFeed

Image representing FriendFeed as depicted in C...
Image via CrunchBase

I have been saying for quite some time that I think Friendfeed offers a unique combination of functionality that seems to work well for scientists, researchers, and the people they want to (or should want to) have conversations with. For me the core of this functionality lies in two places: first that the system explicitly supports conversations that centre around objects. This is different to Twitter which supports conversations but doesn’t centre them around the object – it is actually not trivial to find all the tweets about a given paper for instance. Facebook now has similar functionality but it is much more often used to have pure conversation. Facebook is a tool mainly used for person to person interactions, it is user- or person-centric. Friendfeed, at least as it is used in my space is object-centric, and this is the key aspect in which “social networks for science” need to differ from the consumer offerings in my opinion. This idea can trace a fairly direct lineage via Deepak Singh to the Jeff Jonas/Jon Udell concatenation of soundbites:

“Data finds data…then people find people”

The second key aspect about Friendfeed is that it gives the user a great deal of control over what they present to represent themselves. If we accept the idea that researchers want to interact with other researchers around research objects then it follows that the objects that you choose to represent yourself is crucial to creating your online persona. I choose not to push Twitter into Friendfeed mainly because my tweets are directed at a somewhat different audience. I do choose to bring in video, slides, blog posts, papers, and other aspects of my work life. Others might choose to include Flickr but not YouTube. Flexibility is key because you are building an online presence. Most of the frustration I see with online social tools and their use by researchers centres around a lack of control in which content goes where and when.

So as an advocate of Friendfeed as a template for tools for scientists it is very interesting to see how that template might be applied to tools built with researchers in mind. ScienceFeed launched yesterday by Ijad Madisch, the person behind ResearchGate. The first thing to say is that this is an out and out clone of Friendfeed, from the position of the buttons to the overall layout. It seems not to be built on the Tornado server that was open sourced by the Friendfeed team so questions may hang over scalability and architecture but that remains to be tested. The main UI difference with Friendfeed is that the influence of another 18 months of development of social infrastructure is evident in the use of OAuth to rapidly leverage existing networks and information on Friendfeed, Twitter, and Facebook. Although it still requires some profile setup, this is good to see. It falls short of the kind of true federation which we might hope to see in the future but then so does everything else.

In terms of specific functionality for scientists the main additions is a specialised tool for adding content via a search of literature databases. This seems to be adapted from the ResearchGate tool for populating a profile’s publication list. A welcome addition and certainly real tools for researchers must treat publications as first class objects. But not groundbreaking.

The real limitation of ScienceFeed is that it seems to miss the point of what Friendfeed is about. There is currently no mechanism for bringing in and aggregating diverse streams of content automatically. It is nice to be able to manually share items in my citeulike library but this needs to happen automatically. My blog posts need to come in as do my slideshows on slideshare, my preprints on Nature Precedings or Arxiv. Most of this information is accessible via RSS feeds so import via RSS/Atom (and in the future real time protocols like XMPP) is an absolute requirement. Without this functionality, ScienceFeed is just a souped up microblogging service. And as was pointed out yesterday in one friendfeed thread we have a twitter-like service for scientists. It’s called Twitter. With the functionality of automatic feed aggregation Friendfeed can become a presentation of yourself as a researcher on the web. An automated publication list that is always up to date and always contains your latest (public) thoughts, ideas, and content. In short your web-native business card and CV all rolled into one.

Finally there is the problem of the name. I was very careful at the top of this post to be inclusive in the scope of people who I think can benefit from Friendfeed. One of the great strengths of Friendfeed is that it has promoted conversations across boundaries that are traditionally very hard to bridge. The ongoing collision between the library and scientific communities on Friendfeed may rank one day as its most important achievement, at least in the research space. I wonder whether the conversations that have sparked there would have happened at all without the open scope that allowed communities to form without prejudice as to where they came from and then to find each other and mingle. There is nothing in ScienceFeed that precludes anyone from joining as far as I can see, but the name is potentially exclusionary, and I think unfortunate.

Overall I think ScienceFeed is a good discussion point, a foil to critical thinking, and potentially a valuable fall back position if Friendfeed does go under. It is a place where the wider research community could have a stronger voice about development direction and an opportunity to argue more effectively for business models that can provide confidence in a long term future. I think it currently falls far short of being a useful tool but there is the potential to use it as a spur to build something better. That might be ScienceFeed v2 or it might be an entirely different service. In a follow-up post I will make some suggestions about what such a service might look like but for now I’d be interested in what other people think.

Other Friendfeed threads are here and here and Techcrunch has also written up the launch.

Reblog this post [with Zemanta]

The trouble with business models (Facebook buys Friendfeed)

…is that someone needs to make money out of them. It was inevitable at some point that Friendfeed would take a route that lead it towards mass adoption and away from the needs of the (rather small) community of researchers that have found a niche that works well for them. I had thought it more likely that Friendfeed would gradually move away from the aspects that researchers found attractive rather than being absorbed wholesale by a bigger player but then I don’t know much about how Silicon Valley really works. It appears that Friendfeed will continue in its current form as the two companies work out how they might integrate the functionality into Facebook but in the long term it seems unlikely that current service will survive. In a sense the sudden break may be a good thing because it forces some of the issues about providing this kind of research infrastructure out into the open in a way a gradual shift probably wouldn’t.

What is about Friendfeed that makes it particularly attractive to researchers? I think there are a couple of things, based more on hunches than hard data but in comparing with services like Twitter and Facebook there are a couple of things that standout.

  1. Conversations are about objects. At the core of the way Friendfeed works are digital objects, images, blog posts, quotes, thoughts, being pushed into a shared space. Most other services focus on the people and the connections between them. Friendfeed (at least the way I use it) is about the objects and the conversations around them.
  2. Conversation is threaded and aggregated. This is where Twitter loses out. It is almost impossible to track a specific conversation via Twitter unless you do so in real time. The threaded nature of FF makes it possible to track conversations days or months after they happen (as long as you can actually get into them)
  3. Excellent “person discovery” mechanisms. The core functionality of Friendfeed means that you discover people who “like” and comment on things that either you, or your friends like and comment on. Friendfeed remains one of the most successful services I know of at exploiting this “friend of a friend” effect in a useful way.
  4. The community. There is a specific community, with a strong information technology, information management, and bioinformatics/structural biology emphasis, that grew up and aggregated on Friendfeed. That community has immense value and it would be sad to lose it in any transition.

So what can be done? One option is to set back and wait to be absorbed into Facebook. This seems unlikely to be either feasible or popular. Many people in the FF research community don’t want this for reasons ranging from concerns about privacy, through the fundamentals of how Facebook works, to just not wanting to mix work and leisure contacts. All reasonable and all things I agree with.

We could build our own. Technically feasible but probably not financially. Lets assume a core group of say 1000 people (probably overoptimistic) each prepared to pay maybe $25 a year subscription as well as do some maintenance or coding work. That’s still only $25k, not enough to pay a single person to keep a service running let alone actually build something from scratch. Might the FF team make some of the codebase Open Source? Obviously not what they’re taking to Facebook but maybe an earlier version? Would help but there would still need to be either a higher subscription or many more subscribers to keep it running I suspect. Chalk one up for the importance of open source services though.

Reaggregating around other services and distributing the functionality would be feasible perhaps. A combination of Google Reader, Twitter, with services like Tumblr, Posterous, and StoryTlr perhaps? The community would be likely to diffuse but such a distributed approach could be more stable and less susceptible to exactly this kind of buy out. Nonetheless these are all commercial services that can easily dissappear. Google Wave has been suggested as a solution but I think has fundamental differences in design that make it at best a partial replacement. And it would still require a lot of work.

There is a huge opportunity for existing players in the Research web space to make a play here. NPG, Research Gate, and Seed, as well as other publishers or research funders and infrastructure providers (you know who you are) could fill this gap if they had the resource to build something. Friendfeed is far from perfect, the barrier to entry is quite high for most people, the different effective usage patterns are unclear for new users. Building something that really works for researchers is a big opportunity but it would still need a business model.

What is clear is that there is a signficant community of researchers now looking for somewhere to go. People with a real critical eye for the best services and functionality and people who may even be prepared to pay something towards it. And who will actively contribute to help guide design decisions and make it work. Build it right and we may just come.

“Real Time”: The next big thing or a pointer to a much more interesting problem?

There has been a lot written and said recently about the “real time” web most recently in an interview of Paul Buchheit on ReadWriteWeb. The premise is that if items and conversations are carried on in “real time” then they are more efficient and more engaging. The counter argument has been that they become more trivial. That by dropping the barrier to involvement to near zero, the internal editorial process that forces each user to think a little about what they are saying, is lost generating a stream of drivel. I have to admit upfront that I really don’t get the excitement. It isn’t clear to me that the difference between a five or ten second refresh rate versus a 30 second one is significant.

In one sense I am all for getting a more complete record onto the web, at least if there is some probability of it being archived. After all this is what we are trying to do with the laboratory recording effort; creat as complete a record on the web as possible. But at some point there is always going to be an editorial process. In a blog it takes some effort to write a post and publish it, creating a barrier which imposes some editorial filter. Even on Twitter the 140 character limit forces people to be succinct and often means a pithy statement gets refined before hitting return. In an IM or chat window you will think before hitting return (hopefully!). Would true “real time” mean watching as someone typed or would it have to be a full brain dump as it happened? I’m not sure I want either of these, if I want real time conversation I will pick up the phone.

But while everyone is focussed on “real time” I think it is starting to reveal a more interesting problem. One I’ve been thinking about for quite a while but have been unable to get a grip on. All of these services have different intrinsic timeframes. One of the things I dislike about the new FriendFeed interface is the “real time” nature of it. What I liked previously was that it had a slower intrinsic time than, say, Twitter or instant messenging, but a faster intrinsic timescale than a blog or email. On Twitter/IM conversations are fast, seconds to minutes, occassionally hours. On FriendFeed they tend to run from minutes to hours, with some continuing on for days, all threaded and all kept together. Conversations in blog comments run over hours, to days, email over days, newspapers over weeks, academic literature over months and years.

Different people are comfortable with interacting with streams running at these different rates. Twitter is too much for some, as is FriendFeed, or online content at all. Many don’t have time to check blog comments, but perhaps are happy to read the posts once a day. But these people probably appreciate that the higher rate data is there. Maybe they come across an interesting blog post referring to a comment and want to check the comment, maybe the comment refers to a conversation on Twitter and they can search to find that. Maybe they find a newspaper article that leads to a wiki page and on to a pithy quote from an IM service. This type of digging is enabled by good linking practice. And it is enabled by a type of social filtering where the user views the stream at a speed which is compatible with their own needs.

The tools and social structures are well developed now for this kind of social filtering where a user outsources that function to other people, whether they are on FriendFeed, or are bloggers or traditional dead-tree journalist. What I am less sure about is the tooling for controlling the rate of the stream that I am taking in. Deepak wrote an interesting post recently on social network filtering, with the premise that you needed to build a network that you trusted to bring important material to your attention. My response to this is that there is a fundamental problem that, at the moment, you can’t independently control both the spread of the net you set, and the speed at which information comes in. If you want to cover a lot of areas you need to follow a lot of people and this means the stream is faster.

Fundamentally, as the conversation has got faster and faster, no-one seems to be developing tools that enable us to slow it down. Filtering tools such as those built into Twitter clients help. One of the things I do like about the new Friendfeed interface is the search facility that allows you to set filters that display only those items with a certain number of “likes” or comments help. But what I haven’t seen are tools that are really focussed on controlling the rate of a stream, that work to help you optimize your network to provide both spread and rate. And I haven’t seen much thought go into tools or social practices that enable you to bump an item from one stream to a slower stream to come back to later. Delicious is the obvious tool here; bookmarking objects for later attention, but how many people actually go back to their bookmarks on a regular basis and check over them?

Dave Allen probably best described the concept of a “Tickler File“, a file where you place items into a date marked slot based on when you think you need to be reminded about them.  The way some people regularly review their recent bookmarks and then blog the most interesting ones is an example of a process that achives the same thing. I think this is probably a good model to think about. A tool, or set of practices, that park items for a specified, and item or class specific, period of time and then pulls them back up and puts them in front of you. Or perhaps does it in a context dependent fashion, or both, picking the right moment in a specific time period to have it pop up. Ideally it will also put them, or allow you to put them, back in front of your network for further consideration as well. We still want just the one inbox for everything. It is a question of having control over the intrinsic timeframes of the different streams coming into it, including streams that we set up for ourselves.

As I said, I really haven’t got a good grip on this, but my main point is that I think Real Time is just a single instance of giving users access to one specific intrinsic timeframe. The much more interesting problem, and what I think will be one of the next big things is the general issue of giving users temporal control within a service, particularly for enterprise applications.

Friendfeed for scientists: What, why, and how?

There has been lots of interest amongst some parts of the community about what has been happening on FriendFeed. A growing number of people are signed up and lots of interesting conversations are happening. However it was suggested that as these groups grow they become harder to manage and the perceived barriers to entry get higher. So this is an attempt to provide a brief intro to FriendFeed for the scientist who may be interested in using it; what it is, why it is useful, and some suggestions on how to get involved without getting overwhelmed. This are entirely my views and your mileage may obviously vary.

What is FriendFeed?

FriendFeed is a ‘lifestreaming’ service or more simply a personal aggregator. It takes data streams that you generate and brings them all together into one place where people can see them. You choose to subscribe to any of the feeds you already generate (Flickr stream, blog posts, favorited YouTube videos, and lots of other services integrated). In addition you can post links to specific web pages or just comments into your stream. A number of these types of services have popped up in the recent months including Profilactic and Social Thing but FriendFeed has two key aspects that have led it to the fore. Firstly the commenting facilities enable rapid and effective conversations and secondly there was rapid adoption by a group of life scientists which has created a community. Like anything some of the other services have advantages and probably have their own communities but for science and in particular the life sciences FriendFeed is where it is at.

My FriendFeed

As well as allowing other people to look at what you have been doing FriendFeed allows you to subscribe to other people and see what they have been doing. You have the option of ‘liking’ particular items and commenting on them. In addition to seeing the items of your friends, people you are subscribed to, you also see items that they have liked or commented on. This helps you to find new people you may be interested in following. It also helps people to find you. As well as this items with comments or likes then get popped up to the top of the feed so items that are generating a conversation keep coming back to your attention.

These conversations can happen very fast. Some conversations baloon within minutes, most take place at a more sedate pace over a couple of hours or days but it is important to be aware that many people are live most of the time.

Why is FriendFeed useful?

So how is FriendFeed useful to a scientist? First and foremost it is a great way of getting rapid notification of interesting content from people you trust. Obviously this depends on there people who are interested in the same kinds of things that you are but this is something that will grow as the community grows. A number of FriendFeed users stream both del.icio.us bookmark pages as well as papers or web articles they have put into citeulike or connotea or simply via sharing it in Google Reader. Also you can get information that people have shared on opportunities, meetings, or just interesting material on the web. Think of it as an informal but continually running journal club – always searching for the next thing you will need to know about.

Notifications of interesting material on friendfeed

But FriendFeed is about much more than finding things on the web. One of its most powerful features is the conversations that can take place. Queries can be answered very rapidly going some way towards making possible the rapid formation of collaborative networks that can come together to solve a specific problem. Its not there yet but there are a growing number of examples where specific ideas were encouraged, developed, or problems solved quickly by bringing the right expertise to bear.

One example is shown in the following figure where I was looking for some help in building a particular protein model for a proposal. I didn’t really know how to go about this and didn’t have the appropriate software to hand. Pawel Szczesny offered to help and was able to quickly come up with what I wanted. In the future we hope to generate data which Pawel may be able to help us analyse. You can see the whole story and how it unfolded after this at http://friendfeed.com/search?q=mthkMthK model by Friendfeed

We are still a long way from the dream of just putting out a request and getting an answer but it is worth point out that the whole exchange here lasted about four hours. Other collaborative efforts have also formed, most recently leading to the formation of BioGang, a collaborative area for people to work up and comment on possible projects.

So how do I use it? Will I be able to cope?

FriendFeed can be as high volume as you want it be but if its going to be useful to you it has to be manageable. If you’re the kind of person who already manages 300 RSS feeds, your twitter account, Facebook and everthing else then you’ll be fine. In fact your’re probably already there. For those of you who are looking for something a little less high intensity the following advice may be helpful.

  1. Pick a small amount of your existing feeds as a starting point to see what you feel comfortable with sharing. Be aware that if you share e.g. Flickr or YouTube feeds it will also include your favourites, including old ones. Do share something – even if only some links – otherwise people won’t know that you’re there.
  2. Subscribe to someone you know and trust and stick with just one or two people for a while as you get to understand how things work. As you see extra stuff coming in from other people (friends of your friends) start to subscribe to one or two of them that you think look interesting. Do not subscribe to Robert Scoble if you don’t want to get swamped.
  3. Use the hide button. You probably don’t need to know about everyone’s favourite heavy metal bands (or perhaps you do). The hide button can get rid of a specific service from a specific person but you can set it so that you do so it if other people like it.
  4. Don’t worry if you can’t keep up. Using the Best of the Day/Week/Month button will let you catch up on what people thought was important.
  5. Find a schedule that suits you and stick to it. While the current users are dominated by the ‘always on’ brigade that doesn’t mean you need to do it the same way. But also don’t feel that because you came in late you can’t comment. It may just be that you are needed to kick that conversation back onto some people’s front page
  6. Join the Life Scientists Room and share interesting stuff. This provides a place to put particularly interesting links and is followed by a fair number of people, probably more than you are. If it is worthy of comment then put it in front of people. If you aren’t sure whether its relevant ask, you can always start a new room if need be.
  7. Enjoy, comment and participate in a way you feel comfortable with. This is a (potential) work tool. If it works for you, great! If not well so be it – there’ll be another one along in a minute.
Related articles






Zemanta Pixie

Twittering labs? That is just so last year…

mars phoenix twitter stream

The Mars Phoenix landing has got a lot of coverage around the web, particularly from some misty eyed old blokes who remember watching landings via the Mosaic browser in an earlier, simpler age. The landing is cool, but one thing I thought was particularly clever was the use of Twitter by JPL to publicise the landing and what is happening on a minute to minute basis. Now my suspicion is that they haven’t actually installed Twhirl on the Phoenix Lander and that there is actually a person at JPL writing the Tweets. But that isn’t the point. The point is that the idea of an instrument (or in this case a spacecraft) outputting a stream of data is completely natural to people. The idea of the overexcited lander saying ‘come on rocketsssssss!!!!!!!!’ is very appealing (you can tell it’s a young spaceship, it hasn’t learnt not to shout yet; although if your backside was at 2,000 °C you might have something to say about it as well).

I’ve pointed out some cool examples of this in the past including London Bridge and Steve Wilson, in Jeremy Frey’s group at Southampton, has been doing some very fun stuff both logging what happens in a laboratory, and blogging that out to the web, using the tools developed by the Simile team at MIT. The notion of the instrument generating a data stream and using that stream as an input to an authoring tool like a laboratory notebook or into other automated processes is a natural one that fits well both with the way we work in the laboratory (even when your laboratory is the solar system) and our tendency to anthropomorphise our kit. However, the day the FPLC tells me it had a hard night and doesn’t feel like working this morning is the day it gets tossed out. And the fact that it was me that fed it the 20% ethanol is neither here nor there.

Now the question is; can I persuade JPL to include actual telemetry, command, and acknowledgement data in the twitter stream? That would be very cool.