Home » Blog

There are crowds, and then there are crowds…

15 March 2009 24 Comments

I am constantly intrigued by the way that some sort of collective web-based intelligence seems to cause specific issues to pop up in multiple places at the same time. The probable reason is that my primitive brain, while trying to create a narrative out of all the things that I am attempting to absorb, creates these specific reference points to make it easier for me to understand what is going on. One of the things I am interested in is how we take the behavioral tendencies that developed in small groups of social primates and work with those to get the kind of results we need or want. In my case I find a lot of these web discussions work quite effectively as a way of either tipping off my own subconscious or perhaps acting as an extended subconscious that helps me to surface ideas and helps me to write about them.

Co-incidentally in the past week, two things that got my attention were about crowd-sourcing in science. The first was a quote from a talk that John Wilbanks gave at ETech 2009 on Open Data and building a data commons. The quote that got the publicity was “there is no crowd”, by which I think John meant that scientific communities are not large and homogeneous enough for the kind of crowd sourcing and opinion markets that are beloved of some Web2 commentators to work. In John’s own terms the knowledge required to make a meaningful contribution is uncommon making both the logistics and economics of what is in effect a transfer of labour very different than for problems or questions which have direct meaning for the wider population.

The second piece that got me thinking was Tim Gower‘s dicussion of the experience of the Polymath project (via Michael Nielsen). For those who missed this, the project aimed to solve a defined mathematical problem through a series of small contributions from a group of people; essentially building a mathematical proof, or the outlines of one, via crowd-sourcing. Tim’s blogpost and the accompanying comments are a gold mine for anyone who wants to understand how this type of project does and does not work. The main success of the project was in solving the problem, or in fact a more general form of the originally stated problem. The main “failure” was that the team of actively involved people was rather small and made up of people who might be seen as “the usual suspects” in this context, a group of the best mathematicians who are active on the web.

Many of the reasons for the relatively small core group of people involved should be very familiar. Lack of time to invest, the pace of conversation being too fast for people to dip in and dip out of, fear of asking “stupid” questions, and finally a lack of awareness that the project was even occurring. Let me be clear that from what I can see this project was an immense success, but that it is also a very clear and time bounded example from which we can draw lessons. Any social scientists out there interested in how this type of collaboration is evolving should be piling in now and grabbing as much data as they can on the process!

Some of these problems are logistical and design based and some of the them are technical and interface based. There is a clear need both for well designed interfaces that let people rapidly triage a barrage of threads to determine what and where they can contribute. Equally there is a need for an engineering approach to dividing up problems into modular sections that can be attacked independently. The theme of mismatched intrinsic time frames for people and conversations also keeps coming up, and is something I have been struggling to get my head around for some time, with little progress. Equally the social problems (am I going to ask my silly question in the presence of several of the world’s best mathematicians? what can I possibly contribute? will I get anything for my small contribution?) are familiar require a mixture of technical, interface, and social solutions. In aggregate though it would appear that the experience of the polymath project supports John’s soundbite, “There is no crowd” because only a small number of people can usefully contribute.

Now I disagree with John rarely and I think very hard before doing so. In this case he is actually trying to make a different point entirely, to do with the reasons why sharing is less immediately popular in science, but I think the isolated soundbite is misleading in terms of what crowdsourcing can do for science.  One point is that there are at least two distinctly different types of crowd sourcing. The “wisdom of crowds” or opinion markets style would not at first glance appear to be applicable to science, but I will come back to how I think it can be. The second type of crowd sourcing doesn’t seem to have a good name as far as I know, so I will characterise it as “broadcast request – expert response”. This is when you make a broadcast request to a community or network in the hope that someone will have the knowledge you require. A similar thing is seen when a request to work with a specific problem is broadcast, and a small group of interested people contribute. The expectation is not that thousands will appear to solve your problem in small fragments but that a small group of interested, dedicated, and appropriately skilled people will put in the time because they think it is worth their while. In the extreme this is one person who just happens to have the correct answer to your question.

Where scientists tend to go wrong is in mixing these two approaches up. There is an expectation that by broadcasting a request you will gather a huge number of people to contribute a small amount of time to a project.  Both the recent project on putting together a dictionary of named chemical reactions, and the Polymath project started with the expectation of large numbers of contributors but ended up with a smaller number of interested people. On a smaller scale the broadcast request has delivered on many occasions for any people, again generally when the request involves a small amount of work, and again where there is an existing community that tends to contain the expertise that is required.

Clearly these two approaches are not orthogonal and there is a massive grey area between them. You can use Twitter both to ask “what do you think of Obama’s performance so far?” and “does anyone know the density of D2O to four decimal places?”. What is interesting is that you would probably build your network on Twitter differently depending on the type of questions you want to ask. Building these networks online, as well as in real life, is going to be an increasingly important part of being a successful person. It is still the case that who you know matters more than what you know. In fact if your community becomes your knowledge base  it is going to become much more the case. You just have more options and ability to ensure that you know the right people.

You would also probably ask different questions in different ways. And this last point to me is the most important. There are successful examples of scientific crowd sourcing in the “mass market” sense. Projects like SETI@home provided a very low barrier to entry for anyone with a computer. Two very good examples that get people directly involved are Galaxy Zoo and FoldIt. What unites these is the effort put into framing the scientific question in a way that enables a large number of people to contribute. Usually in tackling a scientific question an experimental scientist asks, “how many postdocs and what new piece of kit do I need to answer this question?” In todays world of rapidly dwindling government income, poor public perception of science, and a stated need for more science graduates and teachers, perhaps it is more appropriate to be asking “how can I modularise this research to make specific parts of it accessible to students/teachers/the general public”. And by accessible I don’t mean that they can understand it, but that they can do it. It is not an accident that this notion of dividing the problem up keeps appearing.

Jean-Claude Bradley’s UsefulChem project and some of our efforts to take parts of it into class rooms has become a great example of this approach. The project itself has been divided up into separate modules, computational screening, compound synthesis, in vitro testing of compounds, and the measurement and modelling of compound solubility, which was developed into a separate strand for the Open Notebook Science Challenge. I’m not sure to what extent Jean-Claude’s original intentions involved taking the project into undergraduate or high school teaching settings, but the explicit aim was always to structure the project so that anyone could become involved. And as some of us think about how we can push that towards schools and public workshops we need to think hard about how we can make both the science, its motivation, and the experiments themselves accessible (which includes safe!) for people with no chemistry experience to carry out. It’s a challenge but I think it is worth it.

So in the end, my belief is that there is a crowd, or a set of crowds, all of which may be useful for your science. To pursue crowd sourcing in its broadest sense requires a lot of work and thought, but the rewards in terms of public engagement, involvement, and the volume of science you can do are enormous. To pursue “agile collaboration” and “broadcast request – expert answer” styles of crowd sourcing work is also required to build and care for a community and network that will provide you with access to the right people at the right time. Different approaches are better for different problems and for different timescales. Both require work and investment of time. But with that initial input it is still possible for your crowd(s) to pay you back a big dividend


24 Comments »

  • Seb Paquet said:

    Excellent post. What do you mean by “mismatched intrinsic time frames for people and conversations” ?

  • Seb Paquet said:

    Excellent post. What do you mean by “mismatched intrinsic time frames for people and conversations” ?

  • Jean-Claude Bradley said:

    Again it comes down to the definition of success. Small groups of motivated people working in the open can get a lot done. To me what gets done is more important than how many people it takes to make up a crowd.

    As for the involvement of undergraduates, certainly the ONSsolubility project was designed to include them. But it was Brent Friesen who volunteered to extend the reach into teaching laboratories. We have barely explored the potential of this. It does take work but we only need a few people like Brent to really get things moving.

  • Jean-Claude Bradley said:

    Again it comes down to the definition of success. Small groups of motivated people working in the open can get a lot done. To me what gets done is more important than how many people it takes to make up a crowd.

    As for the involvement of undergraduates, certainly the ONSsolubility project was designed to include them. But it was Brent Friesen who volunteered to extend the reach into teaching laboratories. We have barely explored the potential of this. It does take work but we only need a few people like Brent to really get things moving.

  • Cameron Neylon said:

    Seb I’m not exactly sure which is why it always comes out wrong. Essentially the problem is that person A is monitoring and responding to a thread say one time an hour on the west coast of the US while person B is responding in the morning and the evening in central Europe. For a short part of the day they can have a synchronous and meaningful conversation but after Person B retires for the day Person A keeps going at a rapid rate meaning that Person B when they come back the following day have more to catch up with than they have time to catch up in.

    The “always on” person always comes to dominate the intrinsic timeframe of any service which makes it harder for the person dipping in to contribute meanigfully. If you combine this with a mixture of twitter, blogs, and email communication, which all go at different rates you end up with an unholy mess of asynchronous conversations that are difficult to manage and aggregate. I think it could be done but I’m still trying to get my head around how you even talk about the problem.

  • Cameron Neylon said:

    Seb I’m not exactly sure which is why it always comes out wrong. Essentially the problem is that person A is monitoring and responding to a thread say one time an hour on the west coast of the US while person B is responding in the morning and the evening in central Europe. For a short part of the day they can have a synchronous and meaningful conversation but after Person B retires for the day Person A keeps going at a rapid rate meaning that Person B when they come back the following day have more to catch up with than they have time to catch up in.

    The “always on” person always comes to dominate the intrinsic timeframe of any service which makes it harder for the person dipping in to contribute meanigfully. If you combine this with a mixture of twitter, blogs, and email communication, which all go at different rates you end up with an unholy mess of asynchronous conversations that are difficult to manage and aggregate. I think it could be done but I’m still trying to get my head around how you even talk about the problem.

  • Frank Norman said:

    The “broadcast request – expert answer” situation reminds of the process of going out to tender. You put your ad in the OJEU and get responses (sometimes a small number) from companies who can meet your need. How does “tendering” strike you?

  • Frank Norman said:

    The “broadcast request – expert answer” situation reminds of the process of going out to tender. You put your ad in the OJEU and get responses (sometimes a small number) from companies who can meet your need. How does “tendering” strike you?

  • Cameron Neylon said:

    Frank, yes that sounds about right “open tendering”? “crowd tendering”? It also captures the notion of the specific exchange of resources as well.

  • Cameron Neylon said:

    Frank, yes that sounds about right “open tendering”? “crowd tendering”? It also captures the notion of the specific exchange of resources as well.

  • Eva said:

    “What is interesting is that you would probably build your network on Twitter differently depending on the type of questions you want to ask.”

    Yes. That’s exactly my problem. At one point I had two Twitter accounts to try and control this. I had one personal account, and one for my blog, but I couldn’t really separate all my thoughts into two separate things. My head just doesn’t work that way. And even if I could, I could not control the content of the streams of the people I was following in either account. I’m sure I’m a huge disappointment to the people who follow me because they think I will talk about science and have to deal with my babbling about the weather and my cat. But I am in turn following people just because I know them, and have to deal with *their* yapping asbout their passions, which may not be mine. Or I follow someone I don’t know because half of their tweets are really interesting to me, and I still sit through the other half of their tweets which are about food or their kids or something. And they’re basically strangers, so that’s sometimes weird.

  • Eva said:

    “What is interesting is that you would probably build your network on Twitter differently depending on the type of questions you want to ask.”

    Yes. That’s exactly my problem. At one point I had two Twitter accounts to try and control this. I had one personal account, and one for my blog, but I couldn’t really separate all my thoughts into two separate things. My head just doesn’t work that way. And even if I could, I could not control the content of the streams of the people I was following in either account. I’m sure I’m a huge disappointment to the people who follow me because they think I will talk about science and have to deal with my babbling about the weather and my cat. But I am in turn following people just because I know them, and have to deal with *their* yapping asbout their passions, which may not be mine. Or I follow someone I don’t know because half of their tweets are really interesting to me, and I still sit through the other half of their tweets which are about food or their kids or something. And they’re basically strangers, so that’s sometimes weird.

  • MIckey Schafer said:

    “And by accessible I don’t mean that they can understand it, but that they can do it” — so, I’ll be one of those going out on the proverbial limb by adding comments despite a relative lack of expertise vis-a-vis the conversation participants…but I think this excerpt captures quite well an approach to science that can be started young, at the K-12 level, and is akin to “community-based science” projects. There are all kinds of easy, non-dangerous measurements a classroom of kids can take on their environment, from rainfall and temperature to quarterly measurements of height and foot size across grades, the data of which can be dumped nicely in an Excel file and analyzed periodically. Hypotheses can be generated, complete with discussions of sources — write ups can be posted, even in a journal style, teaching reading, writing, and peer review. The whole shebang could be made public and a group of schools around the country/planet could make some very nice and fairly benign comparisons/collaborations. In the US, this could be started by the 2nd grade, if not sooner, but usually by the 2nd because most kids are good enough readers to begin to follow lines of argumentation. Measurement is a hands-on activity, and watching my daughter’s 2nd grade class learning units for estimation followed by verification with a ruler revealed a lot of excitement about something adults find fairly mundane. From a sociological perspective, there’s pretty good empirical evidence (well, okay, for those who will buy into the value of qualitative findings alongside quantitative) that children often practice the form of something before they understand its content — this includes social behaviors from playing games and telling jokes to early experimentation in romance, where the behavioral norms of “going with” someone are more important than any profound romantic feeling. Really, science could the same thing. It would require something of a sea change on the part of scientists themselves who are often proud of the relative inaccessibility of their fields. True — there’s a ton of stuff out there that takes years to understand. But there is also a ton of easy science activity that can be done to create a generation of people who experience the world in science-friendly terms.

    Hmm. I’ve had a lot of caffeine this morning! Off to class now.

  • MIckey Schafer said:

    “And by accessible I don’t mean that they can understand it, but that they can do it” — so, I’ll be one of those going out on the proverbial limb by adding comments despite a relative lack of expertise vis-a-vis the conversation participants…but I think this excerpt captures quite well an approach to science that can be started young, at the K-12 level, and is akin to “community-based science” projects. There are all kinds of easy, non-dangerous measurements a classroom of kids can take on their environment, from rainfall and temperature to quarterly measurements of height and foot size across grades, the data of which can be dumped nicely in an Excel file and analyzed periodically. Hypotheses can be generated, complete with discussions of sources — write ups can be posted, even in a journal style, teaching reading, writing, and peer review. The whole shebang could be made public and a group of schools around the country/planet could make some very nice and fairly benign comparisons/collaborations. In the US, this could be started by the 2nd grade, if not sooner, but usually by the 2nd because most kids are good enough readers to begin to follow lines of argumentation. Measurement is a hands-on activity, and watching my daughter’s 2nd grade class learning units for estimation followed by verification with a ruler revealed a lot of excitement about something adults find fairly mundane. From a sociological perspective, there’s pretty good empirical evidence (well, okay, for those who will buy into the value of qualitative findings alongside quantitative) that children often practice the form of something before they understand its content — this includes social behaviors from playing games and telling jokes to early experimentation in romance, where the behavioral norms of “going with” someone are more important than any profound romantic feeling. Really, science could the same thing. It would require something of a sea change on the part of scientists themselves who are often proud of the relative inaccessibility of their fields. True — there’s a ton of stuff out there that takes years to understand. But there is also a ton of easy science activity that can be done to create a generation of people who experience the world in science-friendly terms.

    Hmm. I’ve had a lot of caffeine this morning! Off to class now.

  • Cameron Neylon said:

    Mickey, it would be great if we could do this. It is obviously harder in chemistry, at least in traditional chemistry with solvents but I think if scientists take the time to think hard about what parts of their projects could be done in different ways, and in different contexts, then we could have a lot of projects like this. My dream is that we could have a body that would coordinate and package these projects up and make them available for different groups at the appropriate levels.

  • Cameron Neylon said:

    Mickey, it would be great if we could do this. It is obviously harder in chemistry, at least in traditional chemistry with solvents but I think if scientists take the time to think hard about what parts of their projects could be done in different ways, and in different contexts, then we could have a lot of projects like this. My dream is that we could have a body that would coordinate and package these projects up and make them available for different groups at the appropriate levels.

  • MIckey Schafer said:

    Hi, Cameron. Most of the inexpensive (read “free”) projects my kids have done have come from teachers who use the internet to find such stuff. What has most impressed me is that at a very young age (6 and 8), my children have a sense that they can “do” science; not capitalizing on that seems a terrible waste of energy. The key, as you’ve pointed out in the post, is that whatever the science is, it has to be things that people can DO — watching the chemist make explosions is fun, but not as long lasting as creating rock candy from a saturated sugar solution. My hesitation with a “body” that packages projects up is the cost of buying the packages themselves…which I realize is probably unavoidable given some sciences. But I think there’s an opportunity here culturally to put the open science concept to use by employing the community to watch/measure/record things (such as Audubon backyard bird count [http://www.audubon.org/gbbc/index.shtml] also called “citizen science”]), organized, perhaps, within school systems, that could become as natural as learning basic math. It’s a utopian vision, to be sure, but one that somehow seems less distant than it did a few years ago. Could be the ambient community contact with the science 2.0 set has influenced my thinking some…:).

  • MIckey Schafer said:

    Hi, Cameron. Most of the inexpensive (read “free”) projects my kids have done have come from teachers who use the internet to find such stuff. What has most impressed me is that at a very young age (6 and 8), my children have a sense that they can “do” science; not capitalizing on that seems a terrible waste of energy. The key, as you’ve pointed out in the post, is that whatever the science is, it has to be things that people can DO — watching the chemist make explosions is fun, but not as long lasting as creating rock candy from a saturated sugar solution. My hesitation with a “body” that packages projects up is the cost of buying the packages themselves…which I realize is probably unavoidable given some sciences. But I think there’s an opportunity here culturally to put the open science concept to use by employing the community to watch/measure/record things (such as Audubon backyard bird count [http://www.audubon.org/gbbc/index.shtml] also called “citizen science”]), organized, perhaps, within school systems, that could become as natural as learning basic math. It’s a utopian vision, to be sure, but one that somehow seems less distant than it did a few years ago. Could be the ambient community contact with the science 2.0 set has influenced my thinking some…:).

  • Cameron Neylon said:

    Mickey, definitely agree with the cost – and the opportunities. I possibly should have been a little clearer. This foundation or whatever it is could have a whole mix of funding streams. Direct allocation from research funders to support specific projects or more generic engagement programmes, educational and research charities, sales to schools, and direct sales to the general public. The balance would have to be got right – but if these were real research projects then at least some of those could be funded. And a “buy one give one away” type approach might work for sales through toy shops or to well funded schools.

    We’re currently working on a couple of ideas that should be near zero cost with the infrastructure funded externally so hopefully we can roll something interesting out to test the waters over the next 12 months or so.

  • Cameron Neylon said:

    Mickey, definitely agree with the cost – and the opportunities. I possibly should have been a little clearer. This foundation or whatever it is could have a whole mix of funding streams. Direct allocation from research funders to support specific projects or more generic engagement programmes, educational and research charities, sales to schools, and direct sales to the general public. The balance would have to be got right – but if these were real research projects then at least some of those could be funded. And a “buy one give one away” type approach might work for sales through toy shops or to well funded schools.

    We’re currently working on a couple of ideas that should be near zero cost with the infrastructure funded externally so hopefully we can roll something interesting out to test the waters over the next 12 months or so.

  • Anna said:

    ” at least in traditional chemistry with solvents ” – which shouldn’t hold us back as there is lots of cool chemistry that could be done in water (and _nominally_ environmentally friendly – but then that’s extra things for discussion, right?)

  • Anna said:

    ” at least in traditional chemistry with solvents ” – which shouldn’t hold us back as there is lots of cool chemistry that could be done in water (and _nominally_ environmentally friendly – but then that’s extra things for discussion, right?)

  • Cameron Neylon said:

    Absolutely – and the nice thing is that by being pushed into making the reactions “safe” you are also at the same time making them green. By making them “easy” to work up you restrict yourself to reactions that should be relatively straightforward to scale up to production levels. That’s what I love about the whole thing. The way all of these “problems” actually force you to do what would need to be done anyway, just earlier along the path, if you actually want to make something practical out of your work.

  • Cameron Neylon said:

    Absolutely – and the nice thing is that by being pushed into making the reactions “safe” you are also at the same time making them green. By making them “easy” to work up you restrict yourself to reactions that should be relatively straightforward to scale up to production levels. That’s what I love about the whole thing. The way all of these “problems” actually force you to do what would need to be done anyway, just earlier along the path, if you actually want to make something practical out of your work.