There are crowds, and then there are crowds…
I am constantly intrigued by the way that some sort of collective web-based intelligence seems to cause specific issues to pop up in multiple places at the same time. The probable reason is that my primitive brain, while trying to create a narrative out of all the things that I am attempting to absorb, creates these specific reference points to make it easier for me to understand what is going on. One of the things I am interested in is how we take the behavioral tendencies that developed in small groups of social primates and work with those to get the kind of results we need or want. In my case I find a lot of these web discussions work quite effectively as a way of either tipping off my own subconscious or perhaps acting as an extended subconscious that helps me to surface ideas and helps me to write about them.
Co-incidentally in the past week, two things that got my attention were about crowd-sourcing in science. The first was a quote from a talk that John Wilbanks gave at ETech 2009 on Open Data and building a data commons. The quote that got the publicity was “there is no crowd”, by which I think John meant that scientific communities are not large and homogeneous enough for the kind of crowd sourcing and opinion markets that are beloved of some Web2 commentators to work. In John’s own terms the knowledge required to make a meaningful contribution is uncommon making both the logistics and economics of what is in effect a transfer of labour very different than for problems or questions which have direct meaning for the wider population.
The second piece that got me thinking was Tim Gower‘s dicussion of the experience of the Polymath project (via Michael Nielsen). For those who missed this, the project aimed to solve a defined mathematical problem through a series of small contributions from a group of people; essentially building a mathematical proof, or the outlines of one, via crowd-sourcing. Tim’s blogpost and the accompanying comments are a gold mine for anyone who wants to understand how this type of project does and does not work. The main success of the project was in solving the problem, or in fact a more general form of the originally stated problem. The main “failure” was that the team of actively involved people was rather small and made up of people who might be seen as “the usual suspects” in this context, a group of the best mathematicians who are active on the web.
Many of the reasons for the relatively small core group of people involved should be very familiar. Lack of time to invest, the pace of conversation being too fast for people to dip in and dip out of, fear of asking “stupid” questions, and finally a lack of awareness that the project was even occurring. Let me be clear that from what I can see this project was an immense success, but that it is also a very clear and time bounded example from which we can draw lessons. Any social scientists out there interested in how this type of collaboration is evolving should be piling in now and grabbing as much data as they can on the process!
Some of these problems are logistical and design based and some of the them are technical and interface based. There is a clear need both for well designed interfaces that let people rapidly triage a barrage of threads to determine what and where they can contribute. Equally there is a need for an engineering approach to dividing up problems into modular sections that can be attacked independently. The theme of mismatched intrinsic time frames for people and conversations also keeps coming up, and is something I have been struggling to get my head around for some time, with little progress. Equally the social problems (am I going to ask my silly question in the presence of several of the world’s best mathematicians? what can I possibly contribute? will I get anything for my small contribution?) are familiar require a mixture of technical, interface, and social solutions. In aggregate though it would appear that the experience of the polymath project supports John’s soundbite, “There is no crowd” because only a small number of people can usefully contribute.
Now I disagree with John rarely and I think very hard before doing so. In this case he is actually trying to make a different point entirely, to do with the reasons why sharing is less immediately popular in science, but I think the isolated soundbite is misleading in terms of what crowdsourcing can do for science. One point is that there are at least two distinctly different types of crowd sourcing. The “wisdom of crowds” or opinion markets style would not at first glance appear to be applicable to science, but I will come back to how I think it can be. The second type of crowd sourcing doesn’t seem to have a good name as far as I know, so I will characterise it as “broadcast request – expert response”. This is when you make a broadcast request to a community or network in the hope that someone will have the knowledge you require. A similar thing is seen when a request to work with a specific problem is broadcast, and a small group of interested people contribute. The expectation is not that thousands will appear to solve your problem in small fragments but that a small group of interested, dedicated, and appropriately skilled people will put in the time because they think it is worth their while. In the extreme this is one person who just happens to have the correct answer to your question.
Where scientists tend to go wrong is in mixing these two approaches up. There is an expectation that by broadcasting a request you will gather a huge number of people to contribute a small amount of time to a project. Both the recent project on putting together a dictionary of named chemical reactions, and the Polymath project started with the expectation of large numbers of contributors but ended up with a smaller number of interested people. On a smaller scale the broadcast request has delivered on many occasions for any people, again generally when the request involves a small amount of work, and again where there is an existing community that tends to contain the expertise that is required.
Clearly these two approaches are not orthogonal and there is a massive grey area between them. You can use Twitter both to ask “what do you think of Obama’s performance so far?” and “does anyone know the density of D2O to four decimal places?”. What is interesting is that you would probably build your network on Twitter differently depending on the type of questions you want to ask. Building these networks online, as well as in real life, is going to be an increasingly important part of being a successful person. It is still the case that who you know matters more than what you know. In fact if your community becomes your knowledge base it is going to become much more the case. You just have more options and ability to ensure that you know the right people.
You would also probably ask different questions in different ways. And this last point to me is the most important. There are successful examples of scientific crowd sourcing in the “mass market” sense. Projects like SETI@home provided a very low barrier to entry for anyone with a computer. Two very good examples that get people directly involved are Galaxy Zoo and FoldIt. What unites these is the effort put into framing the scientific question in a way that enables a large number of people to contribute. Usually in tackling a scientific question an experimental scientist asks, “how many postdocs and what new piece of kit do I need to answer this question?” In todays world of rapidly dwindling government income, poor public perception of science, and a stated need for more science graduates and teachers, perhaps it is more appropriate to be asking “how can I modularise this research to make specific parts of it accessible to students/teachers/the general public”. And by accessible I don’t mean that they can understand it, but that they can do it. It is not an accident that this notion of dividing the problem up keeps appearing.
Jean-Claude Bradley’s UsefulChem project and some of our efforts to take parts of it into class rooms has become a great example of this approach. The project itself has been divided up into separate modules, computational screening, compound synthesis, in vitro testing of compounds, and the measurement and modelling of compound solubility, which was developed into a separate strand for the Open Notebook Science Challenge. I’m not sure to what extent Jean-Claude’s original intentions involved taking the project into undergraduate or high school teaching settings, but the explicit aim was always to structure the project so that anyone could become involved. And as some of us think about how we can push that towards schools and public workshops we need to think hard about how we can make both the science, its motivation, and the experiments themselves accessible (which includes safe!) for people with no chemistry experience to carry out. It’s a challenge but I think it is worth it.
So in the end, my belief is that there is a crowd, or a set of crowds, all of which may be useful for your science. To pursue crowd sourcing in its broadest sense requires a lot of work and thought, but the rewards in terms of public engagement, involvement, and the volume of science you can do are enormous. To pursue “agile collaboration” and “broadcast request – expert answer” styles of crowd sourcing work is also required to build and care for a community and network that will provide you with access to the right people at the right time. Different approaches are better for different problems and for different timescales. Both require work and investment of time. But with that initial input it is still possible for your crowd(s) to pay you back a big dividend