How I got into open science – a tale of opportunism and serendipity

So Michael Nielsen, one morning at breakfast at Scifoo asked one of those questions which never has a short answer; ‘So how did you get into this open science thing?’ and I realised that although I have told the story to many people I haven’t ever written it down. Perhaps this is a meme worth exploring more generally but I thought others might be interested in my story, partly because it illustrates how funding drives scientists, and partly because it shows how the combination of opportunism and serendipity can make for successful bedfellows.

In late 2004 I was spending a lot of my time on the management of a large collaborative research project and had had a run of my own grant proposals rejected. I had a student interested in doing a PhD but no direct access to funds to support the consumables cost of the proposed project. Jeremy Frey had been on at me for a while to look at implementing the electronic lab notebook system that he had lead the development of and at the critical moment he pointed out to me a special call from the BBSRC for small projects to prototype, develop, or implement e-science technologies in the biological sciences. It was a light touch review process and a relatively short application. More to the point it was a way of funding some consumables.

So the grant was written. I wrote the majority of it, which makes somewhat interesting reading in retrospect. I didn’t really know what I was talking about at the time (which seems to be a theme with my successful grants). The original plan was to use the existing, fully semantic, rdf backed electronic lab notebook and develop models for use in a standard biochemistry lab. We would then develop systems to enable a relational database to be extracted from the rdf representation and present this on the web.

The grant was successful but the start was delayed due to shenanigans over the studentship that was going to support the grant and the movement of some of the large project to another institution with one of the investigators. Partly due to the resulting mess I applied for the job I ultimately accepted at RAL and after some negotiation organised an 80:20 split between RAL and Southampton.

By the time we had a student in place and had got the grant started it was clear that the existing semantic ELN was not in a state that would enable us to implement new models for our experiments. However at this stage there was a blog system that had been developed in Jeremy’s group and it was thought it would be an interesting experiment to use this as a notebook. This would be almost the precise opposite of the rdf backed ELN. Looking back at it now I would describe it as taking the opportunity to look at a Web 2.0 approach to the notebook as compared to a Web 3.0 approach but bear in mind that at the time I had little or no idea of what these terms meant, let alone the care with which they need to be used.

The blog based system was great for me as it meant I could follow the student’s work online and doing this I gradually became aware of blogs in general and the use of feed readers. The RSS feed of the LaBLog was a great help as it made following the details of experiments remotely straightforward. This was important as by now I was spending three or four days a week at RAL while the student was based in Southampton. As we started to use the blog, at first in a very naïve way we found problems and issues which ultimately led to us thinking about and designing the organisational approach I have written about elsewhere [1, 2]. By this stage I had started to look at other services online and was playing around with OpenWetWare and a few other services, becoming vaguely aware of Creative Commons licenses and getting a grip on the Web 2.0 versus Web 3.0 debate.

To implement our newly designed approach to organising the LaBLog we decided the student would start afresh with a clean slate in a new blog. By this stage I was playing with using the blog for other things and had started to discover that there were issues that meant the ID authentication we were using didn’t always work through the RAL firewall. I ended up having complicated VPN setups, particularly working from home, where I couldn’t log on to the blog and I have my email online at the same time. This, obviously, was a pain and as we were moving to a new blog which could have new security settings I said, ‘stuff it, let’s just make it completely visible and be done with it’.

So there you go. The critical decision to move to an Open Notebook status was taken as the result of a firewall. So serendipity, or at least the effect of outside pressures, was what made it happen.  I would like to say it was a carefully thought out philosophical decision but, although the fact that I was aware of the open access movement, creative commons, OpenWetWare, and others no doubt prepared the background that led me to think down that route, it was essentially the result of frustration.

So, so far, opportunism and serendipity, which brings us back to opportunism again, or at least seizing an opportunity. Having made the decision to ‘go open’ two things clicked in my mind. Firstly the fact that this was rather radical. Secondly, the fact that all of these Web 2.0 tools combined with an open approach could lead to a marked improvement in the efficiency of collaborative science, a kind of ‘Science 2.0’ [yes, I know, don’t laugh, this would have been around March 2007]. Here was an opportunity to get my name on a really novel and revolutionary concept! A quick Google search revealed that, funnily enough, I wasn’t the first person to think of this (yes! I’d been scooped!), but more importantly it led to what I think ought to be three of the Standard Works of Open Science, Bill Hooker’s three part series on Open Science at 3 Quarks Daily [1, 2, 3], Jean-Claude Bradley’s presentation on Open Notebook Science at Nature Precedings (and the associated original blog post coining the term), and Deepak Singh’s talk on Open Science at Ignite Seattle. From there I was inspired to seize the opportunity, get a blog of my own, and get involved. The rest of my story story, so far, is more or less available online here and via the usual sources.

Which leads me to ask. What got you involved in the ‘open’ movement? What, for you, were the ‘primary texts’ of open science and open research? There is a value in recording this, or at least our memories of it, for ourselves, to learn from our mistakes and perhaps discern the direction going forward. Perhaps it isn’t even too self serving to think of it as history in the making. Or perhaps, more in line with our own aims as ‘open scientists’, that we would be doing a poor job if we didn’t record what brought us to where we are and what is influencing our thinking going forward. I think the blogosphere does a pretty good job of the latter, but perhaps a little more recording of the former would be helpful.

How to make Connotea a killer app for scientists

So Ian Mulvaney asked, and as my solution did not fit into the margin I thought I would post here. Following on from the two rants of a few weeks back and many discussions at Scifoo I have been thinking about how scientists might be persuaded to make more use of social web based tools. What does it take to get enough people involved so that the network effects become apparent. I had a discussion with Jamie Heywood of Patients Like Me at Scifoo because I was interested as to why people with chronic diseases were willing to share detailed and very personal information in a forum that is essentially public. His response was that these people had an ongoing and extremely pressing need to optimise as far as is possible their treatment regime and lifestyle and that by correlating their experiences with others they got to the required answers quicker. Essentially successful management of their life required rapid access to high quality information sliced and diced in a way that made sense to them and was presented in as efficient and timely a manner as possible. Which obviously left me none the wiser as to why scientists don’t get it….

Nonetheless there are some clear themes that emerge from that conversation and others looking at uptake and use of web based tools. So here are my 5 thoughts. These are framed around the idea of reference management but the principles I think are sufficiently general to apply to most web services.

  1. Any tool must fit within my existing workflows. Once adopted I may be persuaded to modify or improve my workflow but to be adopted it has to fit to start with. For citation management this means that it must have one click filing (ideally from any place I might find an interesting paper)  but will also monitor other means of marking papers by e.g. shared items from Google reader, ‘liked’ items on Friendfeed, or scraping tags in del.icio.us.
  2. Any new tool must clearly outperform all the existing tools that it will replace in the relevant workflows without the requirement for network or social effects. Its got to be absolutely clear on first use that I am going to want to use this instead of e.g. Endnote. That means I absolutely have to be able to format and manage references in a word processor or publication document. Technically a nightmare I am sure (you’ve got to worry about integration with Word, Open Office, GoogleDocs, Tex) but an absolute necessity to get widespread uptake. And this has to be absolutely clear the first time I use the system, before I have created any local social network and before you have a large enough user base for theseto be effective.
  3. It must be near 100% reliable with near 100% uptime. Web services have a bad reputation for going down. People don’t trust their network connection and are much happier with local applications still. Don’t give them an excuse to go back to a local app because the service goes down. Addendum – make sure people can easily backup and download their stuff in a form that will be useful even if your service dissappears. Obviously they’ll never need to but it will make them feel better (and don’t scrimp on this because they will check if it works).
  4. Provide at least one (but not too many) really exciting new feature that makes people’s life better. This is related to #2 but is taking it a step further. Beyond just doing what I already do better I need a quick fix of something new and exciting. My wishlist for Connotea is below.
  5. Prepopulate. Build in publically available information before the users arrive. For a publications database this is easy and this is something that BioMedExperts got right. You have a pre-existing social network and pre-existing library information. Populate ‘ghost’ accounts with a library that includes people’s papers (doesn’t matter if its not 100% accurate) and connections based on co-authorships. This will give people an idea of what the social aspect can bring and encourage them to bring more people on board.

So that is so much motherhood and applepie. And nothing that Ian didn’t already know (unlike some other developers who I shan’t mention). But what about those cool features? Again I would take a back to basics approach. What do I actually want?

Well what I want is a service that will do three quite different things. I want it to hold a library of relevant references in a way I can search and use and I want to use this to format and reference documents when I write them. I want it to help me manage the day to day process of dealing with the flood of literature that is coming in (real time search). And I want it to help me be more effective when I am researching a new area or trying to get to grips with something (offline search). Real time search I think is a big problem that isn’t going to be solved soon. The library and document writing aspects I think are a given and need to be the first priority. The third problem is the one that I think is amenable to some new thinking.

What I would really like to see here is a way of pivoting my view of the literature around a specific item. This might be a paper, a dataset, or a blog post. I want to be able to click once and see everything that item cites, click again and see everything that cites it. Pivot away from that to look at what GoPubmed thinks the paper is about and see what it has which is related and then pivot back and see how many of those two sets are common. What are the papers in this area that this review isn’t citing? Is there a set of authors this paper isn’t citing? Have they looked at all the datasets that they should have? Are there general news media items in this area, books on Amazon, books in my nearest library, books on my bookshelf? Are they any good? Have any of my trusted friends published or bookmarked items in this area? Do they use the same tags or different ones for this subject? What exactly is Neil Saunders doing looking at that gene? Can I map all of my friends tags onto a controlled vocabulary?

Essentially I am asking for is to be able to traverse the graph of how all these things are interconnected. Most of these connections are already explicit somewhere but nowhere are they all brought together in a way that the user can slice and dice them the way they want. My belief is that if you can start to understand how people use that graph effectively to find what they want then you can start to automate the process and that that will be the route towards real time search that actually works.

…but you’ll struggle with uptake…

The problem of academic credit and the value of diversity in the research community

This is the second in a series of posts (first one here) in which I am trying to process and collect ideas that came out of Scifoo. This post arises out of a discussion I had with Michael Eisen (UC Berkely) and Sean Eddy (HHMI Janelia Farm) at lunch on the Saturday. We had drifted from a discussion of the problem of attribution stacking and citing datasets (and datasets made up of datasets) into the problem of academic credit. I had trotted out the usual spiel about the need for giving credit for data sets and for tool development.

Michael made two interesting points. The first was that he felt people got too much credit for datasets already and that making them more widely citeable would actually devalue the contribution. The example he cited was genome sequences. This is a case where, for historical reasons, the publication of a dataset as a paper in a high ranking journal is considered appropriate.

In a sense I agree with this case. The problem here is that for this specific case it is allowable to push a dataset sized peg into a paper sized hole. This has arguably led to an over valuing of the sequence data itself and an undervaluing of the science it enables. Small molecule crystallography is similar in some regards with the publication of crystal structures in paper form bulking out the publication lists of many scientists. There is a real sense in which having a publication stream for data, making the data itself directly citeable, would lead to a devaluation of these contributions. On the other hand it would lead to a situation where you would cite what you used, rather than the paper in which it was, perhaps peripherally described. I think more broadly that the publication of data will lead to greater efficiency in research generally and more diversity in the streams to which people can contribute.

Michael’s comment on tool development was more telling though. As people at the bottom of the research tree (and I count myself amongst this group) it is easy to say ‘if only I got credit for developing this tool’, or ‘I ought to get more credit for writing my blog’, or anyone of a thousand other things we feel ‘ought to count’. The problem is that there is no such thing as ‘credit’. Hiring decisions and promotion decisions are made on the basis of perceived need. And the primary needs of any academic department are income and prestige. If we believe that people who develop tools should be more highly valued then there is little point in giving them ‘credit’ unless that ‘credit’ will be taken seriously in hiring decisions. We have this almost precisely backwards. If a department wanted tool developers then it would say so, and would look at CVs for evidence of this kind of work. If we believe that tool developers should get more support then we should be saying that at a higher, strategic level, not just trying to get it added as a standard section in academic CVs.

More widely there is a question as to why we might think that blogs, or public lectures, or code development, or more open sharing of protocols are something for which people should be given credit. There is often a case to be made for the contribution of a specific person in a non-traditional medium, but that doesn’t mean that every blog written by a scientists is a valuable contribution. In my view it isn’t the medium that is important, but the diversity of media and the concomitant diversity of contributions that they enable. In arguing for these contributions being significant what we are actually arguing for is diversity in the academic community.

So is diversity a good thing? The tightening and concentration of funding has, in my view, led to a decrease in diversity, both geographical and social, in the academy. In particular there is a tendency to large groups clustered together in major institutions, generally led by very smart people. There is a strong argument that these groups can be more productive, more effective, and crucially offer better value for money. Scifoo is a place where those of us who are less successful come face to face with the fact that there are many people a lot smarter than us and that these people are probably more successful for a reason. And you have to question whether your own small contribution with a small research group is worth the taxpayer’s money. In my view this is something you should question anyway as an academic researcher – there is far too much comfortable complacency and sense of entitlement, but that’s a story for another post.

So the question is; do I make a valid contribution? And does that provide value for money? And again for me Scifoo provides something of an answer. I don’t think I spoke to any person over the weekend without at least giving them something new to think about, a slightly different view on a situation, or just an introduction to something that hadn’t heard of before. These contributions were in very narrow areas, ones small enough for me to be expert, but my background and experience provided a different view. What does this mean for me? Probably that I should focus more on what makes my background and experience unique – that I should build out from that in the directions most likely to provide a complementary view.

But what does it mean more generally? I think that it means that a diverse set of experiences, contributions, and abilities will improve the quality of the research effort. At one session of Scifoo, on how to support ground breaking science, I made the tongue in cheek comment that I thought we needed more incremental science, more filling in of tables, of laying the foundations properly. The more I think about this the more I think it is important. If we don’t have proper foundations, filled out with good data and thought through in detail, then there are real risks in building new skyscrapers. Diversity adds reinforcement by providing better tools, better datasets, and different views from which to examine the current state of opinion and knowledge. There is an obvious tension between delivering radical new technologies and knowledge and the incremental process of filling in, backing up, and checking over the details. But too often the discussion is purely about how to achieve the first, with no attention given to the importance of the second. This is about balance not absolutes.

So to come back around to the original point, the value of different forms of contribution is not due to the fact that they are non-traditional or because of the medium per se, it is because they are different. If we value diversity at hiring committees, and I think we should, then looking at a diverse set of contributions, and the contribution that a given person is likely to make in the future based on their CVs, we can assess more effectively how they will differ from the people we already have. The tendency of ‘the academy’ to hire people in its own image is well established. No monoculture can ever be healthy; certainly not in a rapidly changing environment. So diversity is something we should value for its own sake, something we should try to encourage, and something that we should search CVs for evidence of. Then the credit for these activities will flow of its own accord.

Re-inventing the wheel (again) – what the open science movement can learn from the history of the PDB

One of the many great pleasures of SciFoo was to meet with people who had a different, and in many cases much more comprehensive, view of managing data and making it available. One of the long term champions of data availability is Professor Helen Berman, the head of the Protein Data Bank (the international repository for biomacromolecular structures), and I had the opportunity to speak with her for some time on the Friday afternoon before Scifoo kicked off in earnest (in fact this was one of many somewhat embarrasing situations where I would carefully explain my background in my very best ‘speaking to non-experts’ voice only to find they knew far more about it than I did – however Jim Hardy of Gahaga Biosciences takes the gold medal for this event for turning to the guy called Larry next to him while having dinner at Google Headquarters and asking what line of work he was in).

I have written before about how the world might look if the PDB and other biological databases had never existed, but as I said then I didn’t know much of the real history. One of the things I hadn’t realised was how long it was after the PDB was founded before deposition of structures became expected for all atomic resolution biomacromolecular structures. The road from a repository of seven structures with a handful of new submissions a year to the standards that mean today that any structure published in a reputable journal must be deposited was a long and rocky one. The requirement to deposit structures on publication only became general in the early 1990s, nearly twenty years after it was founded and there was a very long and extended process where the case for making the data available was only gradually accepted by the community.

Helen made the point strongly that it had taken 37 years to get the PDB to where it is today; a gold standard international and publically available repository of a specific form of research data supported by a strong set of community accepted, and enforced, rules and conventions.  We don’t want to take another 37 years to achieve the widespread adoption of high standards in data availability and open practice in research more generally. So it is imperative that we learn the lessons and benefit from the experience of those who built up the existing repositories. We need to understand where things went wrong and attempt to avoid repeating mistakes. We need to understand what worked well and use this to our advantage. We also need to recognise where the technological, social, and political environment that we find ourselves in today means that things have changed, and perhaps to recognise that in many ways, particularly in the way people behave, things haven’t changed at all.

I’ve written this in a hurry and therefore not searched as thoroughly as I might but I was unable to find any obvious ‘history of the PDB’ online. I imagine there must be some out there – but they are not immediately accessible. The Open Science movement could benefit from such documents being made available – indeed we could benefit from making them required reading. While at Scifoo Michael Nielsen suggested the idea of a panel of the great and the good – those who would back the principles of data availability, open access publication, and the free transfer of materials. Such a panel would be great from the perspective of publicity but as an advisory group it could have an even greater benefit by providing the opportunity to benefit from the experience many of these people have in actually doing what we talk about.

Southampton Open Science Workshop 31 August and 1 September

An update on the Workshop that I announced previously. We have a number of people confirmed to come down and I need to start firming up numbers. I will be emailing a few people over the weekend so sorry if you get this via more than one route. The plan of attack remains as follows:

Meet on evening of Sunday 31 August in Southampton, most likely at a bar/restaurant near the University to coordinate/organise the details of sessions.

Commence on Monday at ~9:30 and finish around 4:30pm (with the option of discussion going into the evening) with three or four sessions over the course of the day broadly divided into the areas of tools, social issues, and policy. We have people interested and expert in all of these areas coming so we should be able to to have a good discussion. The object is to keep it very informal but to keep the discussion productive. Numbers are likely to be around 15-20 people. For those not lucky enough to be in the area we will aim to record and stream the sessions, probably using a combination of dimdim, mogulus, and slideshare. Some of these may require you to be signed into our session so if you are interested drop me a line at the account below.

To register for the meeting please send me an email to my gmail account (cameronneylon). To avoid any potential confusion, even if you have emailed me in the past week or so about this please email again so that I have a comprehensive list in one place. I will get back to you with a request via PayPal for £15 to cover coffees and lunch for the day (so if you have a PayPal account you want to use please send the email from that address). If there is a problem with the cost please state so in your email and we will see what we can do. We can suggest options for accomodation but will ask you to sort it out for yourself.

I have set up a wiki to discuss the workshop which is currently completely open access. If I see spam or hacking problems I will close it down to members only (so it would be helpful if you could create an account) but hopefully it might last a few weeks in the open form. Please add your name and any relevant details you are happy to give out to the Attendees page and add any presentations or demos you would be interested in giving, or would be interested in hearing about, on the Programme suggestion page.

Notes from Scifoo

I am too tired to write anything even vaguely coherent. As will have been obvious there was little opportunity for microblogging, I managed to take no video at all, and not even any pictures. It was non-stop, at a level of intensity that I have very rarely encountered anywhere before. The combination of breadth and sharpness that many of the participants brought was, to be frank, pretty intimidating but their willingness to engage and discuss and my realisation that, at least in very specific areas, I can hold my own made the whole process very exciting. I have many new ideas, have been challenged to my core about what I do, and how; and in many ways I am emboldened about what we can achieve in the area of open data and open notebooks. Here are just some thoughts that I will try to collect some posts around in the next few days.

  • We need to stop fretting about what should be counted as ‘academic credit’. In another two years there will be another medium, another means of communication, and by then I will probably be conservative enough to dismiss it. Instead of just thinking that diversifying the sources of credit is a good thing we should ask what we want to achieve. If we believe that we need a more diverse group of people in academia than that is what we should articulate – Courtesy of a discussion with Michael Eisen and Sean Eddy.
  • ‘Open Science’ is a term so vague as to be actively dangerous (we already knew that). We need a clear articulation of principles or a charter. A set of standards that are clear, and practical in the current climate. As these will be lowest common denominator standards at the beginning we need a mechanism that enables or encourages a process of incrementally raising those standards. The electronic Geophysical Year Declaration is a good working model for this – Courtesy of session led by Peter Fox.
  • The social and personal barriers to sharing data can be codified and made sense of (and this has been done). We can use this understanding to frame structures that will make more data available – session led by Christine Borgman
  • The Open Science movement needs to harness the experience of developing the open data repositories that we now take for granted. The PDB took decades of continuous work to bring to its current state and much of it was a hard slog. We don’t want to take that much time this time round – Courtesy of discussion led by Sarah Berman
  • Data integration is tough, but it is not helped by the fact that bench biologists don’t get ontologies, and that ontologists and their proponents don’t really get what the biologists are asking. I know I have an agenda on this but social tagging can be mapped after the fact onto structured data (as demonstrated to me by Ben Good). If we get the keys right then much else will follow.
  • Don’t schedule a session at the same time as Martin Rees does one of his (aside from anything else you miss what was apparently a fabulous presentation).
  • Prosthetic limbs haven’t changed in 100 years and they suck. Might an open source approach to building a platform be the answer – discussion with Jon Kuniholm, founder of the Open Prosthetics Project.
  • The platform for Open Science is very close and some of the key elements are falling into place. In many ways this is no longer a technical problem.
  • The financial system backing academic research is broken when the cost of reproducing or refuting specific claims rises to 10 to 20-fold higher than the original work. Open Notebook Science is a route to reducing this cost – discussion with Jamie Heywood.
  • Chris Anderson isn’t entirely wrong – but he likes being provocative in his articles.
  • Google run a fantasticaly slick operation. Down to the fact that the chocolate coated oatmeal biscuit icecream sandwiches are specially ordered in made with proper sugar instead of hugh fructose corn syrup.

Enough. Time to sleep.

BioBarCamp – Meeting friends old and new and virtual

So BioBarCamp started yesterday with a bang and a great kick off. Not only did we somehow manage to start early we were consistently running ahead of schedule. With several hours initially scheduled for introductions this actually went pretty quick, although it was quite comprehensive. During the introduction many people expressed an interest in ‘Open Science’, ‘Open Data’, or some other open stuff, yet it was already pretty clear that many people meant many different things by this. It was suggested that with the time available we have a discussion session on what ‘Open Science’ might mean. Pedro and mysey live blogged this at Friendfeed and the discussion will continue this morning.

I think for me the most striking outcome of that session was that not only is this a radically new concept for many people but that many people don’t have any background understanding of open source software either which can make the discussion totally impenetrable to them. This, in my view strengthens the need for having some clear brands, or standards, that are easy to point to and easy to sign up to (or not). I pitched the idea, basically adapting from John Wilbank’s pitch at the meeting in Barcelona, that our first target should that all data and analysis associated with a published paper should be available. This seems an unarguable basic standard, but is one that we currently fall far short of. I will pitch this again in the session I have proposed on ‘Building a data commons’.

The schedule for today is up as a googledoc spreadsheet with many difficult decisions to make. My current thinking is;

  1. Kaitlin Thaney – Open Science Session
  2. Ricardo Vidal and Vivek Murthy (OpenWetWare and Epernicus).  Using online communities to share resources efficiently.
  3. Jeremy England & Mark Kaganovich – Labmeeting, Keeping Stalin Out of Science (though I would also love to do John Cumbers on synthetic biology for space colonization, that is just so cool)
  4. Pedro Beltrao & Peter Binfield – Dealing with Noise in Science / How should scientific articles be measured.
  5. Hard choice: Andrew Hessel – building an open source biotech company or Nikesh Kotecha + Shirley Wu – Motivating annotation
  6. Another doozy: John Cumbers – Science Worship / Science Marketing or Hilary Spencer & Mathias Crawford – Interests in Scientific IP – Who Owns/Controls Scientific Communication and Data?  The Major Players.
  7. Better turn up to mine I guess :)
  8.  Joseph Perla – Cloud computing, Robotics and the future of Science and  Joel Dudley & Charles Parrot – Open Access Scientific Computing Grids & OpenMac Grid

I am beginning to think I should have brought two laptops and two webcams. Then I could have recorded one and gone to the other. Whatever happens I will try to cover as much as I can in the BioBarCamp room at FriendFeed, and where possible and appropriate I will broadcast and record via Mogulus. The wireless was a bit tenuous yesterday so I am not absolutely sure how well this will work.

Finally, this has been great opportunity to meet up with people I know and have met before, those who I feel I know well but have never met face to face, and indeed those whose name I vaguely know (or should know) but have never connected with before. I’m not going to say who is in which list because I will forget someone! But if I haven’t said hello yet do come up and harass me because I probably just haven’t connected your online persona with the person in front of me!

An open letter to the developers of Social Network and ‘Web 2.0’ tools for scientists

My aim is to email this to all the email addresses that I can find on the relevant sites over the next week or so, but feel free to diffuse more widely if you feel it is appropriate.

Dear Developer(s)

I am writing to ask your support in undertaking a critical analysis of the growing number of tools being developed that broadly fall into the category of social networking or collaborative tools for scientists. There has been a rapid proliferation of such tools and significant investment in time and effort for their development. My concern, which I wrote about in a recent blog post (here), is that the proliferation of these tools may lead to a situation where, because of a splitting up of the potential user community, none of these tools succeed.

One route forward is to simply wait for the inevitable consolidation phase where some projects move forward and others fail. I feel that this would be missing an opportunity to critically analyse the strengths and weaknesses of these various tools, and to identify the desirable characteristics of a next generation product. To this end I propose to write a critical analysis of the various tools, looking at architecture, stability, usability, long term funding, and features. I have proposed some criteria and received some comments and criticisms of these. I would appreciate your views on what the appropriate criteria are and would welcome your involvement in the process of writing this analysis. This is not meant as an attack on any given service or tool, but as a way of getting the best out of the development work that has already taken place, and taking the opportunity to reflect on what has worked and what has not in a collaborative and supportive fashion.

I will also be up front and say that I have an agenda on this. I would like to see a portable and agreed data model that would enable people to utilise the best features of all these services without having to rebuild their network within each site. This approach is very much part of the data portability agenda and would probably have profound implications for the design architecture of your site. My feeling, however, is that this would be the most productive architectural approach. It does not mean that I am right of course and I am prepared to be convinced otherwise if the arguments are strong.

I hope you will feel free to take part in this exercise and contribute. I do believe that if we take a collaborative approach then it will be possible to identify the features and range of services that the community needs and wants. Please comment at the blog post or request access to the GoogleDoc where we propose to write up this analysis.

Yours sincerely,

Cameron Neylon

First neutrons from ISIS TS-2!

In a break from your regularly scheduled programme on Open Science we bring you news from deepest TS-2 first neutronsdarkest Oxfordshire. I am based at ISIS, the UK’s neutron source, where my job is to bring in and support more biological science that uses neutrons. Neutron scattering, while it has made a number of crucial contributions to the biological sciences, has always been a bit player in comparison to x-ray crystallography and NMR. My job, is to try and build and strengthen this activity and to see the potential of neutron scattering in structural biology realised.

The Second Target Station project at ISIS is a huge part of this, and the reason I have a job here. TS-2 is designed specifically to provide a high flux of low energy neutrons, which are ideally suited to looking at large scale structures and biological molecules. The energy characteristics of the neutrons mean they have wavelengths ranging from angstroms up to around 2 nm, meaning they will be well suited to looking the overall shape and size of biomolecules and their complexes. The increase in flux, probably about 10-20 fold over the existing target station, means that experiments can be faster, or smaller, or more dilute. All things that make the bioscientists job easier. Over £140M has been spent on building the target and the instruments that will make use of these neutrons.

At 1308 yesterday the first neutrons were detected on the Inter beamline with a spectrum and flux pretty much dead on what was expected. In fact the first shot flipped out the detector it was so strong. This has been a massive project that despite news coverage to the contrary has been delivered essentially on time and on budget. Congratulations are due to all those involved in pulling this off. As the instruments themselves start to come fully online now we are going to get the chance to do many things that were either difficult or impossible before. In particular I am excited about what we will be able to do with the new small angle instrument SANS2d and INTER, the reflectometer, particularly in the area of membrane biology.

Facebooks for scientists – they’re breeding like rabbits!

I promised some of you I would do this a while ago and I simply haven’t got to it. But enough of the excuses. There has been a huge number of launches in the past few months of sites and services that are intended to act as social network sites for scientists. These join a number of older services including Nature Network, OpenWetWare, and others. My concern is that with so many sites in the same space there is a risk none of them will succeed because the user community will be too diluted. I am currently averaging around three emails a week, all from different sites, suggesting I should persuade more people to sign up.

What I would like to do is attempt a critical and comprehensive analysis of the sites and services available as part of an exercise in thinking about how we might rationally consolidate this area, and how we might enable the work that has gone into building these services be used effectively to build the ‘next generation’ of sites. All of these sites have good features and it would be a shame to see them lost. I also don’t want to see people discouraged from building new and useful tools. I just want to see this work.

My dream would be to see an open source framework with an open data model that allows people to move their data from one place to another depending on what features they want. Then the personal networks can spread through the communities of all of these sites rather than being restricted to one, and the community can help build features that they want. As someone else said ‘Damnit, we’re scientists, we hold the stuff of the universe in our hands’ – can’t we have a think about what the best way to do this is?

What I want to do with this post is try to put together a comprehensive list of sites and services, including ones that get heavy scientific use but are not necessarily designed for scientists. I will miss many so please comment to point this out and I will add them. Then I want to try and put together a list of criteria as to how we might compare and contrast. Again please leave comments feel free to argue. I don’t expect this to necessarily be an easy or straightforward process, and I don’t expect to get complete agreement. But I am worried if things are just left to run that none of these sites will get the amount of support that is needed to make them viable.

So here goes.

Sites

Blog collections: Nature Network, ScienceBlogs, Scientific Blogging, WordPress, Blogspot, (OpenWetWare),

Social Networks: Laboratree, Ologeez, Research Gate, Epernicus, LabMeeting, Graduate Junction, (Nature Network), oh and Facebook, and Linkedin

Protocol sharing: Scivee, Bioscreencast, OpenWetWare, YouTube, lots of older ones I can’t remember at the moment

Others: Friendfeed, Twitter, GoogleDocs, GoogleGroups, Upcoming, Seesmic,

Critical criteria

Stability: funding, infrastructure, uptime, scalability, slashdot resistance, long term personnel committment

Architecture: open data model? ability to export data? compatibility with other sites? plugins? rss?

Design: user interface, ‘look’,  responsiveness

Features: what features do you think are important? I don’t even want to start putting my own predjudices here.

How to take this forward?

Comment here or at Friendfeed, or anywhere else, but if you can please tag the page with Fb4Sci. I have put up a GoogleDoc which is visible at http://docs.google.com/Doc?id=dhs5x5kr_572hccgvcct (current just contains this post). If you want access drop me an email at cam eron ney lon (no spaces) at googlemail (not gmail) and I will give anyone who requests editing rights. Comment and contributions from the development teams is welcome but I expect everyone to make a conflict of interest declaration. Mine is:

I blog at OpenWetWare and use the wiki extensively. I have been known to ring into steering committee meetings and have discussed specific features with the development team. I am an irregular user of Nature Network and a regular user of Friendfeed and Twitter. I have a strong bias towards open data models and architectures.

[ducks]