How to make Connotea a killer app for scientists

So Ian Mulvaney asked, and as my solution did not fit into the margin I thought I would post here. Following on from the two rants of a few weeks back and many discussions at Scifoo I have been thinking about how scientists might be persuaded to make more use of social web based tools. What does it take to get enough people involved so that the network effects become apparent. I had a discussion with Jamie Heywood of Patients Like Me at Scifoo because I was interested as to why people with chronic diseases were willing to share detailed and very personal information in a forum that is essentially public. His response was that these people had an ongoing and extremely pressing need to optimise as far as is possible their treatment regime and lifestyle and that by correlating their experiences with others they got to the required answers quicker. Essentially successful management of their life required rapid access to high quality information sliced and diced in a way that made sense to them and was presented in as efficient and timely a manner as possible. Which obviously left me none the wiser as to why scientists don’t get it….

Nonetheless there are some clear themes that emerge from that conversation and others looking at uptake and use of web based tools. So here are my 5 thoughts. These are framed around the idea of reference management but the principles I think are sufficiently general to apply to most web services.

  1. Any tool must fit within my existing workflows. Once adopted I may be persuaded to modify or improve my workflow but to be adopted it has to fit to start with. For citation management this means that it must have one click filing (ideally from any place I might find an interesting paper)  but will also monitor other means of marking papers by e.g. shared items from Google reader, ‘liked’ items on Friendfeed, or scraping tags in del.icio.us.
  2. Any new tool must clearly outperform all the existing tools that it will replace in the relevant workflows without the requirement for network or social effects. Its got to be absolutely clear on first use that I am going to want to use this instead of e.g. Endnote. That means I absolutely have to be able to format and manage references in a word processor or publication document. Technically a nightmare I am sure (you’ve got to worry about integration with Word, Open Office, GoogleDocs, Tex) but an absolute necessity to get widespread uptake. And this has to be absolutely clear the first time I use the system, before I have created any local social network and before you have a large enough user base for theseto be effective.
  3. It must be near 100% reliable with near 100% uptime. Web services have a bad reputation for going down. People don’t trust their network connection and are much happier with local applications still. Don’t give them an excuse to go back to a local app because the service goes down. Addendum – make sure people can easily backup and download their stuff in a form that will be useful even if your service dissappears. Obviously they’ll never need to but it will make them feel better (and don’t scrimp on this because they will check if it works).
  4. Provide at least one (but not too many) really exciting new feature that makes people’s life better. This is related to #2 but is taking it a step further. Beyond just doing what I already do better I need a quick fix of something new and exciting. My wishlist for Connotea is below.
  5. Prepopulate. Build in publically available information before the users arrive. For a publications database this is easy and this is something that BioMedExperts got right. You have a pre-existing social network and pre-existing library information. Populate ‘ghost’ accounts with a library that includes people’s papers (doesn’t matter if its not 100% accurate) and connections based on co-authorships. This will give people an idea of what the social aspect can bring and encourage them to bring more people on board.

So that is so much motherhood and applepie. And nothing that Ian didn’t already know (unlike some other developers who I shan’t mention). But what about those cool features? Again I would take a back to basics approach. What do I actually want?

Well what I want is a service that will do three quite different things. I want it to hold a library of relevant references in a way I can search and use and I want to use this to format and reference documents when I write them. I want it to help me manage the day to day process of dealing with the flood of literature that is coming in (real time search). And I want it to help me be more effective when I am researching a new area or trying to get to grips with something (offline search). Real time search I think is a big problem that isn’t going to be solved soon. The library and document writing aspects I think are a given and need to be the first priority. The third problem is the one that I think is amenable to some new thinking.

What I would really like to see here is a way of pivoting my view of the literature around a specific item. This might be a paper, a dataset, or a blog post. I want to be able to click once and see everything that item cites, click again and see everything that cites it. Pivot away from that to look at what GoPubmed thinks the paper is about and see what it has which is related and then pivot back and see how many of those two sets are common. What are the papers in this area that this review isn’t citing? Is there a set of authors this paper isn’t citing? Have they looked at all the datasets that they should have? Are there general news media items in this area, books on Amazon, books in my nearest library, books on my bookshelf? Are they any good? Have any of my trusted friends published or bookmarked items in this area? Do they use the same tags or different ones for this subject? What exactly is Neil Saunders doing looking at that gene? Can I map all of my friends tags onto a controlled vocabulary?

Essentially I am asking for is to be able to traverse the graph of how all these things are interconnected. Most of these connections are already explicit somewhere but nowhere are they all brought together in a way that the user can slice and dice them the way they want. My belief is that if you can start to understand how people use that graph effectively to find what they want then you can start to automate the process and that that will be the route towards real time search that actually works.

…but you’ll struggle with uptake…