Interesting conversation yesterday on Twitter with Evgeniy Meyke of EarthCape prompted in part by my last post. We started talking about what a Friendfeed replacement might look like and how it might integrate more directly into scientific data. Is it possible to build something general or will it always need to be domain specific. Might this in fact be an advantage? Evgeniy asked:
@CameronNeylon do you think that “something new” could be more vertically oriented rather then for “research community” in general?
His thinking being, as I understand it that to get at domain specific underlying data is always likely to take local knowledge. As he said in his next tweet:
@CameronNeylon It might be that the broader the coverage the shallower is integration with underlining research data, unless api is good
This lead me to thinking about integration layers between data and people and recalled something that I said in jest to someone some time ago;
“If you’re using a human as your API then you need to work on your user interface.”
Thinking about the way Friendfeed works there is a real sense in which the system talks to a wide range of automated APIs but at the core there is a human layer that firstly selects feeds of interest and then when presented with other feeds selects from them specific items. What Friendfeed does very well in some senses is provide a flexible API between feeds and the human brain. But Evegeniy made the point that this “works only 4 ‘discussion based’ collaboration (as in FF), not 4 e.g. collab. taxonomic research that needs specific data inegration with taxonomic databases”.
Following from this was an interesting conversation [Webcite Archived Version] about how we might best integrate the “human API” for some imaginary “Science Stream” with domain specific machine APIs that work at the data level. In a sense this is the core problem of scientific informatics. How do you optimise the ability of machines to abstract and use data and meaning while at the same time fully exploiting the ability of the human scientist to contribute their own unique skills, pattern recognition, insight, lateral thinking. And how do you keep these in step with each other so both are optimally utilised? Thinking in computational terms about the human as a layer in the system with its own APIs could be a useful way to design systems.
Friendfeed in this view is a peer to peer system for pushing curated and annotated data streams. It mediates interactions with the underlying stream but also with other known and unknown users. Friendfeed seems to get three things very right: 1) Optimising the interaction with the incoming data stream; 2) Facilitating the curation and republication of data into a new stream for consumption by others, creating a virtuous feedback look in fact; and 3) Facilitating discovery of new peers. Friendfeed is actually a bittorrent for sharing conversational objects.
This conversational layer, a research discourse layer if you like, is at the very top of the stack, keeping the humans to a high level abstracted level of conversation, where we are probably still at our best. And my guess is that something rather like Friendfeed is pretty good at being the next layer down, the API to feeds of interesting items. But Evgeniy’s question was more about the bottom of the stack, where the data is being generated and needs to be turned into a useful and meaningful feed, ready to be consumed. The devil is always in the details and vertical integration is likely to help her. So what do these vertical segments look like?
In some domains these might be lab notebooks, in some they might be specific databases, or they might be a mixture of both and of other things. At the coal face it is likely to be difficult to find a way of describing the detail in a way that is both generic enough to be comprehensible and detailed enough to be useful. The needs of the data generator are likely to be very different to those of a generic data consumer. But if there is a curation layer, perhaps human or machine mediated, that partly abstracts this then we may be on the way to generating the generic feeds that will be finally consumed at the top layer. This curation layer would enable semantic markup, ideally automatically, would require domain specific tooling to translate from the specific to the generic, and provide a publishing mechanism. In short it sounds (again) quite a bit like Wave. Actually it might just as easily be Chem4Word or any other domain specific semantic authoring tool, or just a translation engine that takes in detailed domain specific info and correlates it with a wider vocabulary.
One of the things that appeals to me about Wave, and Chem4Word, is that they can (or at least have the potential to) hide the complexities of the semantics within a straightforward and comprehensible authoring environment. Wave can be integrated into domain specific systems via purpose built Robots making it highly extensible. Both are capable of “speaking web” and generating feeds that can be consumed and processed in other places and by other services. At the bottom layer we can chew the problem off one piece at a tim, including human processing where it is appropriate and avoiding it where we can.
The middleware is of coures, as always, the problem. The middleware is agreed and standardised vocabularies and data formats. While in the past I have thought this near intractable actually it seems as though many of the pieces are actually falling into place. There is still a great need for standardisation and perhaps a need for more meta-standards but it seems like a lot of this is in fact on the way. I’m still not convinced that we have a useful vocabulary for actually describing experiments but enough smart people disagree with me that I’m going to shut up on that one until I’ve found the time to have a closer look at the various things out there in more detail.
These are half baked thoughts – but I think the idea of where we optimally place the human in the system is a useful question. It also hasn’t escaped my notice that I’m talking about something very similar to the architecture that Simon Coles of Amphora Research Systems always puts up in his presentations on Electronic Lab Notebooks. Fundamentally because the same core drivers are there.