Home » Blog

Where is the best place in the Research Stack for the human API?

13 August 2009 2 Comments

Interesting conversation yesterday on Twitter with Evgeniy Meyke of EarthCape prompted in part by my last post. We started talking about what a Friendfeed replacement might look like and how it might integrate more directly into scientific data. Is it possible to build something general or will it always need to be domain specific. Might this in fact be an advantage? Evgeniy asked:

@CameronNeylon do you think that “something new” could be more vertically oriented rather then for “research community” in general?

His thinking being, as I understand it that to get at domain specific underlying data is always likely to take local knowledge. As he said in his next tweet:

@CameronNeylon It might be that the broader the coverage the shallower is integration with underlining research data, unless api is good

This lead me to thinking about integration layers between data and people and recalled something that I said in jest to someone some time ago;

“If you’re using a human as your API then you need to work on your user interface.”

Thinking about the way Friendfeed works there is a real sense in which the system talks to a wide range of automated APIs but at the core there is a human layer that firstly selects feeds of interest and then when presented with other feeds selects from them specific items. What Friendfeed does very well in some senses is provide a flexible API between feeds and the human brain. But Evegeniy made the point that this “works only 4 ‘discussion based’ collaboration (as in FF), not 4 e.g. collab. taxonomic research that needs specific data inegration with taxonomic databases”.

Following from this was an interesting conversation [Webcite Archived Version] about how we might best integrate the “human API” for some imaginary “Science Stream” with domain specific machine APIs that work at the data level. In a sense this is the core problem of scientific informatics. How do you optimise the ability of machines to abstract and use data and meaning while at the same time fully exploiting the ability of the human scientist to contribute their own unique skills, pattern recognition, insight, lateral thinking. And how do you keep these in step with each other so both are optimally utilised? Thinking in computational terms about the human as a layer in the system with its own APIs could be a useful way to design systems.

Friendfeed in this view is a peer to peer system for pushing curated and annotated data streams. It mediates interactions with the underlying stream but also with other known and unknown users. Friendfeed seems to get three things very right: 1) Optimising the interaction with the incoming data stream; 2) Facilitating the curation and republication of data into a new stream for consumption by others, creating a virtuous feedback look in fact; and 3) Facilitating discovery of new peers. Friendfeed is actually a bittorrent for sharing conversational objects.

This conversational layer, a research discourse layer if you like, is at the very top of the stack, keeping the humans to a high level abstracted level of conversation, where we are probably still at our best. And my guess is that something rather like Friendfeed is pretty good at being the next layer down, the API to feeds of interesting items.  But Evgeniy’s question was more about the bottom of the stack, where the data is being generated and needs to be turned into a useful and meaningful feed, ready to be consumed. The devil is always in the details and vertical integration is likely to help her. So what do these vertical segments look like?

In some domains these might be lab notebooks, in some they might be specific databases, or they might be a mixture of both and of other things. At the coal face it is likely to be difficult to find a way of describing the detail in a way that is both generic enough to be comprehensible and detailed enough to be useful. The needs of the data generator are likely to be very different to those of a generic data consumer. But if there is a curation layer, perhaps human or machine mediated, that partly abstracts this then we may be on the way to generating the generic feeds that will be finally consumed at the top layer.  This curation layer would enable semantic markup, ideally automatically, would require domain specific tooling to translate from the specific to the generic, and provide a publishing mechanism. In short it sounds (again) quite a bit like Wave. Actually it might just as easily be Chem4Word or any other domain specific semantic authoring tool, or just a translation engine that takes in detailed domain specific info and correlates it with a wider vocabulary.

One of the things that appeals to me about Wave, and Chem4Word, is that they can (or at least have the potential to) hide the complexities of the semantics within a straightforward and comprehensible authoring environment. Wave can be integrated into domain specific systems via purpose built Robots making it highly extensible. Both are capable of “speaking web” and generating feeds that can be consumed and processed in other places and by other services. At the bottom layer we can chew the problem off one piece at a tim, including human processing where it is appropriate and avoiding it where we can.

The middleware is of coures, as always, the problem. The middleware is agreed and standardised vocabularies and data formats. While in the past I have thought this near intractable actually it seems as though many of the pieces are actually falling into place. There is still a great need for standardisation and perhaps a need for more meta-standards but it seems like a lot of this is in fact on the way. I’m still not convinced that we have a useful vocabulary for actually describing experiments but enough smart people disagree with me that I’m going to shut up on that one until I’ve found the time to have a closer look at the various things out there in more detail.

These are half baked thoughts – but I think the idea of where we optimally place the human in the system is a useful question. It also hasn’t escaped my notice that I’m talking about something very similar to the architecture that Simon Coles of Amphora Research Systems always puts up in his presentations on Electronic Lab Notebooks. Fundamentally because the same core drivers are there.


2 Comments »

  • Rory Macneil said:

    This picks up on your previous post about friendfeed and Deepak’s post . Deepak makes the point that “The Life Scientists are not scientists at all, but include librarians, techies with interests in science, etc.” That covers the ‘who’. As to the ‘what’ that was being discussed on friendfeed, it was primarily ‘developments’ – technical, research techniques, etc., and general news, events and queries. It was rarely about someone’s research, and then not about the substance of the research but a query about a technique or relevant bit of information.

    In this post you have made the transition to talking about communication not about these general topics that were the lifeblood of friendfeed but rather about scientific data or actual research, i.e. the one thing that people did Nottalk about on friendfeed.

    To my mind a question arises — why did people not talk about their data/research on friendfeed? Is it because friendfeed did not include a convenient platform for integrating scientific data? You seem to be implying or perhaps hoping that if the platform existed — and you speculate about what the platform might look like — people would begin to talk about their data/research on a ‘friendfeed version 2’.

    Another possibility is that the mix of Life Scientists who used friendfeed were not interested in using a forum like friendfeed — which consisted of many non-specialists in their area of research — for discussing their data/research, i.e. the audience was not right.

    If these are the two main factors behind the the lack of discussion about data/research on friendfeed, then there should be hope that domain specific groups would adopt an appropriate platform — friendfeed version 2 — as a medium for discussing their data/research.

    But there is another possible factor behind the lack of discussion about data/research on friendfeed — that some or many people do not want to talk publicly or widely about their data/research until it is more developed/in a more presentable or defensible form. If this is the dominant factor then even platforms of the kind that you speculate about would not be sufficient to stimulate widespread friendfeed style discussions about data/research.

    So, it would actually be quite interesting to find out more about the attitudes of life scientists who did not find friendfeed useful (i.e. the vast majority of them) — why not, was it the audience, the nature of the discussions that were taking place, the limitations of the platform or a basic aversion to discussing their research in a public forum? Knowing more about the answers to those questions would go some way to providing a starting point for thinking about what a friendfeed version 2 with broad appeal might look like.

    In your discussion you were primarily, it seems to me, coming at the issue from a technical perspective. The point I am trying to make is that it is equally important to look at the issue from a market perspective – i.e. what is it that people want or need? This is extremely difficult to do by posing questions about theoretical platforms that do not yet exist because most people will find the question too vague and/or not be interested enough to give a meaningful answer. But we might be able to get some insights into what people want/need by understanding their reaction to something which does exist — i.e. friendfeed.

  • Rory Macneil said:

    This picks up on your previous post about friendfeed and Deepak’s post . Deepak makes the point that “The Life Scientists are not scientists at all, but include librarians, techies with interests in science, etc.” That covers the ‘who’. As to the ‘what’ that was being discussed on friendfeed, it was primarily ‘developments’ – technical, research techniques, etc., and general news, events and queries. It was rarely about someone’s research, and then not about the substance of the research but a query about a technique or relevant bit of information.

    In this post you have made the transition to talking about communication not about these general topics that were the lifeblood of friendfeed but rather about scientific data or actual research, i.e. the one thing that people did Nottalk about on friendfeed.

    To my mind a question arises — why did people not talk about their data/research on friendfeed? Is it because friendfeed did not include a convenient platform for integrating scientific data? You seem to be implying or perhaps hoping that if the platform existed — and you speculate about what the platform might look like — people would begin to talk about their data/research on a ‘friendfeed version 2’.

    Another possibility is that the mix of Life Scientists who used friendfeed were not interested in using a forum like friendfeed — which consisted of many non-specialists in their area of research — for discussing their data/research, i.e. the audience was not right.

    If these are the two main factors behind the the lack of discussion about data/research on friendfeed, then there should be hope that domain specific groups would adopt an appropriate platform — friendfeed version 2 — as a medium for discussing their data/research.

    But there is another possible factor behind the lack of discussion about data/research on friendfeed — that some or many people do not want to talk publicly or widely about their data/research until it is more developed/in a more presentable or defensible form. If this is the dominant factor then even platforms of the kind that you speculate about would not be sufficient to stimulate widespread friendfeed style discussions about data/research.

    So, it would actually be quite interesting to find out more about the attitudes of life scientists who did not find friendfeed useful (i.e. the vast majority of them) — why not, was it the audience, the nature of the discussions that were taking place, the limitations of the platform or a basic aversion to discussing their research in a public forum? Knowing more about the answers to those questions would go some way to providing a starting point for thinking about what a friendfeed version 2 with broad appeal might look like.

    In your discussion you were primarily, it seems to me, coming at the issue from a technical perspective. The point I am trying to make is that it is equally important to look at the issue from a market perspective – i.e. what is it that people want or need? This is extremely difficult to do by posing questions about theoretical platforms that do not yet exist because most people will find the question too vague and/or not be interested enough to give a meaningful answer. But we might be able to get some insights into what people want/need by understanding their reaction to something which does exist — i.e. friendfeed.