Blog – Page 25 – Science in the Open

December 9, 2008December 30, 2009

Recording the fiddly bits of experimental and data analysis work

We are in the slow process of gearing up within my group at RAL to adopting the Chemtools LaBLog system and in the process moving properly to an Open Notebook status. This has taken much longer than I had hoped but there have been some interesting lessons along the way. Here I want to think a bit about a problem that has been troubling me for a while.

I haven’t done a very good job of recording what I’ve been doing in the times that I have been in a lab over the past couple of months. Anyone who has been following along will have seen small bursts of apparently unrelated activity where nothing much ever seems to come to a conclusion. This has been divided up mainly into a) a SANS experiment we did in early November which has now moved into a data analysis phase, b) some preliminary, and thus far fairly unconvincing experiments, attempting to use a very new laser tweezers setup at the Central Laser Facility to measure protein-DNA interactions at the single molecule level and c) other random odds and sods that have come by. None of these have been very well recorded for a variety of reasons.

Data analysis, particularly when it uses a variety of specialist software tools, is something I find very challenging to record. A common approach is to take some relatively raw data, run it through some software, and repeat, while fiddling with parameters to get a feel for what is going on. Eventually the analysis is run “for real” and the finalised (at least for the moment) structure/number/graph is generated. The temptation is obviously just to formally record the last step but while this might be ok as a minimum standard if only one person is involved, when more people are working through data sets it makes sense to try and keep track of exactly what has been done and which data has been partially processes in which ways. This helps us both in terms of being able to quickly track where we are with the process but also reduces the risk of replicating effort.

The laser tweezers experiment involves a lot of optimising of buffer conditions, bead loading levels, instrumental parameters and whatnot. Essentially a lot of fiddling, rapid shifts from one thing to another and not always being too sure exactly what is going on. We are still at the stage of getting a feel for things rather than stepping through a well ordered experiment. Again the recording tends to be haphazard as you try on thingÂ and then another. We’re not even absolutely sure what we should be recording for each “run” or indeed really what a “run” is yet.

The common theme here is “fiddling” and the difficulty of recording it efficiently, accurately, and usefully. What I would prefer to be doing is somehow capturing the important aspects of what we’re doing as we do it. What is less clear is what the best way to do that is. In the case of data analysis we have good model for how to do this well. Good use of repositories and the use of versioned scripts for handling data conversions, in the way the Michael Barton in particular has talked about provide an example of good practice. Unfortunately it is good practice that is almost totally alien to experimental biochemists and is also not easily compatible with a lot of the software we use.

The ideal would be a work bench using a graphical representation of data analysis tools and data repositories that would automatically generate scripts and deposit these and versioned data files into an appropriate repository. This would enable the “docking” of arbitrary web services, software packages and whatever, as well as connection to shared data stores. The purpose of the workbench would be to record what is done, although it might also provide some automation tools. In many ways this is what I think of when look at work flow engines like Taverna and platforms for sharing workflows like MyExperiment.

Its harder in the real world. Here the workbench is, well, the workbench but the idea of recording everything along with contextual metadata is pretty similar. The challenge lies in recording enough different aspects of what is going on to capture the important stuff without generating a huge quantity or data that can never be searched effectively. It is possible to record multiple video streams, audio, screencast any control computers , but it will be almost impossible to find anything in these data streams.

A challenge that emerges over and over again in laboratory recording is that you always seem to not be recording the thing that you really now need to have. Yet if you record everything you still won’t have it because you won’t be able to find it. Video, image, and audio search will one day make a huge difference to this but in the meantime I think we’re just going to keep muddling on.

December 2, 2008December 30, 2009

Quick update from International Digital Curation Conference

Just a quick note from the IDCC given I was introduced as “one of those people who are probably blogging the conference”. I spoke this morning giving a talk on Radical Sharing – Transforming Science? A version of the slides is available at slideshare. It seemed to go reasonably well and I got some positive comments. The highlight for me today was John Wilbanks speaking this evening – John always gives a great talk (slides will also be on his slideshare at some point) and I invariably learn something. Today that was the importance of distinguishing between citation (which is a term from the scholarly community) and attribution (which is a term with specific legal meaning in copyright law). Having used the two interchangeably in my talk (no recording unfortunately) John made the point that it is important to distinguish the two practices, particularly the reasons that motivate themand the different enforcement frameworks.

Interesting talks this afternoon on costing for digital curation – not something I have spent a lot of time thinking about but clearly something that is rather important. Also this morning talks on CARMEN and iPLANT, projects that are delivering on infrastructure for sharing and re-using data. Tonight we are off to Edinburgh castle for the dinner which should be fun and tomorrow I make an early getaway to get to more meetings.

November 23, 2008December 30, 2009

Links as the source code of our thinking – Tim O’Reilly

I just wanted to point to a post that Tim O’Reilly wrote just before the US election a few weeks back. There was an interesting discussion about the rights and wrongs of him posting on his political views and the rights and wrongs of that being linked to from the O’Reilly Media front page. In amongst the abuse that you have come to expect in public political discussions there is some thought provoking stuff. But what I wanted to point out and hopefully revive a discussion of is a point he makes right near the bottom.

[I have conflated two levels of comments here (Tim is quoting his own comment) – see the original post for the context]

“Thanks to everyone for wading in, especially those of you who are marshalling reasoned arguments and sharing actual sources and references, showing you’ve done your homework, and helping other people to see the data that helped to shape your point of view. We need a LOT more of that in this discussion, rather than slinging unsupported allegations back and forth.

Bringing this back to tech – showing the data behind your argument is a lot like open source. It’s a way of verifying the “code” that’s inside your head. If you can’t show us your code, it’s a lot harder to trust your results!”

Links as source code for your thinking: that’s a meme that should survive the particulars of this particular debate!

In a sense Tim is advocating the wholesale adoption of the very strong attribution culture we (like to think we) have in academic research. The importance of acknowedging your sources is clear but it also has much more value than that. By tracing back the influences that have brought someone to a specific conclusion or belief it is possible for other people to gain a much deeper insight into how those ideas evolved. Being able to parse the dependencies between ideas, data, samples, papers, and knowledge in an automatic, machine readable, way is the promise of the semantic web, but in the meantime just helping the poor old humans to trace back and understand where someone is coming from is very helpful.

November 10, 2008December 30, 2009

More on â€œtheftâ€ and the problem of identity

Following my hopefully getting towards three-quarters baked post there has been more helpful comments and discussion both here and on friendfeed. I wanted to pick out a specific issue that has come up in both places. At Friendfeed the discussion ran into the question of plagiarism more generally and why it is bad. Anders Norgaard made the point that plagiarism is bad regardless of whether it breaks rules or not and a discussion on why that is followed.Â I think the conclusion we came to is that plagiarism reduces value by making it more difficult to find the right person with the right expertise when you need something done. It reduces the value of the body of work in helping you find the person who can do the job that you need doing.

David Crotty, in a comment on the blog post makes a comment that I think probes the same issues:

Do you mind if I start a blog called “Science in the open” and pretend that my name is “Cameron Neylon” and then fill that blog with dreadful, hateful nonsense? After all, your name and your blog’s name aren’t limited physical resources, right?Â Â Â Does ownership extend to your online identity?Â Isn’t using someone else’s logo a misrepresentation of identity?

Now this is important for two reasons, firstly because it probes the extreme end of my argument that â€œobjects that can be infinitely copied should not be treated as propertyâ€ and also because it revolves around the issue of identity. Reliable identity lies at the core of building the trust networks that make social web tools work. Does that mean it is one area where the full weight of property based law should be brought to bear? So I think this is worth unpicking in detail.

So, letâ€™s start with the honest answer. If this happened I would be angry and upset. I would be likely to storm around the office/house a bit and possibly rant at people and objects that were unfortunate enough to cross my path. But after, hopefully, calming down a bit I hope I would follow something like the following course.

Write the person a polite note explaining that they seem to have both the same name and same name of blog and that this probably is bad for both of us as there is the potential for confusion. Ambiguity is bad because it reduces trust in attribution. As I used these names first I would ask them to consider changing. I would assume it was a simple coincidence, a mistake made in good faith.
If they did not I would dissociate myself publically from their work making a clear statement about where my work could be found. I would consider changing the name of my blog (after all it is the feed that people follow â€“ does anyone care that much what it is called?), but not my name.
If it was clear that this was a case of deliberate misrepresentation I would present the evidence that this was the case and request the help of the community to make that very publically clear.

My case is that allowing the free re-use of my name and my blog name ought to add value on average. Indeed my experience thus far is that, allowing people to use these names, to point to me and the work I have generated has indeed been net positive. Iâ€™ve never objected to people quoting me, using my name, reproducing blog posts, or whatever. Whether itâ€™s â€œfair useâ€ or a copyright violation, or appropriately licensed re-use is irrelevant. Itâ€™s all good because it brings more interested people to my blog and to me. One negative experience would probably not actually tip that balance.Several nasty ones might.

The key here is that the real resource is me. I am not infinitely replicable, no-one else can write my posts. The name is just a pointer. An important pointer and one which I will defend, in as much as I will try to make clear what I think and why I think it, as well as to be clear about who I am and why I say what I do. Someone who plagiarizes my work or reproduces it without attribution or someone who deliberately misrepresents what I write reduces the value of my work because they reduce the ability of people who are looking for someone with my expertise to use that work to find me.

But it is not the reproduction of work that is the problem here, it is the misrepresentation of its origin, either by an author falsely claiming it as theirs, or by some mis-attributing someone elseâ€™s work or views to me. The problem is not the act of copying but the act of lying. The problem with lying is that it reduces trust, the problem with reducing trust is that it reduces the value of the networks we used to find things that are useful and the people who have the expertise to make them. Identity is crucial to trust and trust is what adds value to networks. Very few things reduce the value of web based networks more effectively than lying about identity.

We will never build a perfect system that solves this problem. My belief though is that it will be more effective to build strong social and technical systems rather than to apply the rules of â€œownershipâ€ to my name. Do I own my name? No idea. Will I defend my name and ask others to help me do that if someone attacks it? Yes. Will I use the best technical systems to try to be clear about who I am in all the places where I act? Well I could do better on this, but then a lot of us could really. I will work to build trust in my name, in my brand if you like, and if that trust is attacked I will defend it.

So where does this leave the story of Ricardoâ€™s logo? Well the first point was the plagiarism of the image. This breaks the link between the image and the author which reduces its use to Ricardo. The lack of attribution means that people who think â€œwhat a cool logoâ€ will not be able to find Ricardo to do them a cool logo of their own. But it is not the copying per se which does the damage but the plagiarism, the lack of attribution. Arguably, as the community leaps to Ricardoâ€™s defence (and points out what a cool logo it is) he actually benefits from a raised profile across a wider community. I had seen a few examples of his work before but hadnâ€™t realised how many he had done and how good they are. Ricardo pointed out in the original Friendfeed thread that the reason the image was copyright was that he was making a living at the time from design. It is not inconceivable he may be better placed to do that now than he was before the logo was misappropriated. That is for Ricardo to decide though, not me.

Does the use of the logo by a company selling hokum misrepresent Ricardo? Well given they didnâ€™t attribute it to him not directly. But letâ€™s imagine that the image was CC-BY and that the company did attribute it. Arguably Ricardo would not want to be associated with that and that would be fair enough but there wouldnâ€™t be anything he could do about it from a legal perspective. Because the image is actually copyright all rights reserved he can prevent these kinds of re-use. Or can at least in principle. He retains control in a way that CC-BY licenses do not allow. My argument is that to legally defend this position would take much more money and energy than clearly and publically distancing yourself from the re-use of the work. And probably wouldnâ€™t be much more effective. Furthermore my argument is that the good that comes from allowing re-use outweighs the bad. The re-use of your work actually gives you a platform to distance yourself from that re-use if you so choose. Once that is made clear it is just more good publicity for you.

More importantly if you believe, as I do, in the value of allowing re-use then you cannot reasonably pick and choose who and what re-uses are appropriate. Consistency requires that you allow re-use that you do and do not disagree with. I may not approve of that re-use, and it is perfectly reasonable to say so, but that gives me no right to object. To mis-quote Hall channelling Voltaire â€œI disagree with the way you have re-used my work, but I will defend your right to do so and the value you add by doing it â€ â€“ and no I will not defend it to the death. I donâ€™t take it that seriouslyâ€¦

November 9, 2008December 30, 2009

My Bad…or how far should the open mindset go?

So while on the train yesterday in somewhat pre-caffeinated state I stuck my foot in it somewhat. Several others have written (Nils Reinton, Bill Hooker, Jon Eisen, Hsien-Hsien Lei, Shirley Wu) on the unattributed use of an image that was put together by Ricardo Vidal for the DNA Network of blogs. The company that did this are selling hokum. No question of that. Now the logo is in fact clearly marked as copyright on Flickr but even if it were marked as CC-BY then the company would be in violation of the license for not attributing. But, despite the fact that it is clearly technically wrong, I felt that the outrage being expressed was inconsistent with the general attitude that materials should be shared, re-useable, and available for re-purposing.

So in the related Friendfeed thread I romped in, offended several people (particularly by using the word hypocritical which I should not have done, like I said, pre-caffeine) and had to back up and re-think what it was I was trying to say. Actually this is a good thing about Friendfeed, the rapid fire discussion can encourage semi-baked comments and ideas which are then leapt on and need to be more carefully thought through and refined. In science criticism is always valuable, agreement is often a waste of time.

So at core my concern is largely about the apparent message that can be sent by a group of “open” activists objecting about the violation of the copyright of a member of their community. As I wrote further down in the comments;

“…There is a danger that this kind of thing comes across as ‘everything should be pd [pubic domain] but when my mate copyrights something and you violate it I will jump down your throat’. The subtext being it is ok to violate copyright for ‘good’ reasons but not for ‘bad’ reasons… “

It is crucially important to me that when you argue that an area of law is poorly constructed, ineffective or having unexpected consequences, that you scrupulously operate within that law, while not criticising those who cut corners. At the same time if I argue that the risks of having people ‘steal’ my work are outweighed by the benefits of sharing then I should roll with the punches when bad stuff does happen.There is the specific issue that what was done is a breach of copyright as well and then the general issue that if people were more able to do this kind of thing that it would be good. The fact that it was used for a nasty service preying on people’s fears is at one level neither here nor there (or rather the moral rights issue is I think a separate, and rather complicated one that will not fit in this particular margin, does the use of the logo misrepresent Ricardo? Does it misrepresent the DNA network – who remember don’t own it?).

More broadly I think there is a mindset that goes with the way the web works and the way that sharing works that means we need to get away from the idea of the object or the work as property.The value of objects lies only in their scarcity, or their lack of presence. With the advent of the world’s greatest copying machine, no digital object need be scarce. It is not the object that has value, because it can be infinitely copied for near zero cost, it is the skill and expertise in putting the object together that has value. The argument of the “commonists” is that you will spend more on using licences and secrecy to protect objects than you could be making by finding the people who need your skills to make just the thing that they need, right now. If this is true it presumably holds for data, for scientific papers, for photos, for video, for software, for books, and for logos.

The argument that I try to promote (and many others do much better) is that we need to get away from the concepts and language of ownership of these digital objects. That even thinking in terms of it being “mine” is counterproductive and actually reduces value. It may be the case that there are limits to where these arguments hold, and if there is it probably has something to do with the intrinsic timeframe of the production cycle for a class of objects, but that is a thought for another time. What worried me was that people seemed to be using language that is driven by thinking about propery and scarcity; “theft”, “stealing”. In my view we should be talking about “service quality”, “delivery time”, and “availability”. This is where value lies on the net, not in control, and not in ownership of objects.

None of which is to say that people should not be completely free to license work which they produce in any way that they choose, and I will defend their right to do this. But at the same time I will work to persuade these same people that some types of license are counterproductive, particularly those that attempt to control content. If you beleive that science is better for the things that make it up being shared and re-used, that the value of a person’s work is increased by others re-using this why shouldn’t that apply to other types of work? The key thing is a consistent and clear message.

I try to be consistent, and I am by no means always successful, but its a work in progress.Â Anyone is free to re-use and re-purpose anything I generate in whatever way they choose. If I disagree with the use I will say so. If it is unattributed I might comment, and I might name names, but I won’t call in the lawyers. If I am inconsistent I invite, and indeed expect, people to say so. I would hope that criticism would come from the friendly faces before it comes from people with another agenda. That, at the end of the day, is the main benefit of being open. It’s all just error checking in the end.

November 6, 2008December 30, 2009

It’s a little embarrassing…

…but being straightforward is always the best approach. Since we published our paper in PLoS ONE a few months back I haven’t been as happy as I was about the activity of our Sortase. What this means is that we are now using a higher concentration of the enzyme to do our ligation reactions. They seem to be working well and with high yields, but we need to put in more enzyme. If you don’t understand that don’t worry – just imagine you posted a carefully thought out recipe and then discovered you couldn’t get that same taste again unless you added ten times as much saffron.

None of this prevents the method being useful and doesn’t change the fundamental point of our paper, but if people are following our methods, particularly if they only go to the paper and don’t get in contact, they may run into trouble. Traditionally this would be a problem, and would probably lead to our results being regarded as unreliable. However in our case we can do a simple fix. Because the paper is in PLoS ONE which has some good commenting features, I can add a note to the paper itself, right where we give the concentration of enzyme (scroll down to note 3 in results) that we used. I can also add a note to direct people to where we have put more of our methdology online, at OpenWetWare. As we get more of this work into our online lab notebooks we will also be able to point directly back to example experiments to show how the reaction rate varies, and hopefully in the longer term sort it out. All easily done on the web, but impossible on paper, and in an awful lot (but not all!) of the other journals around.

Or we could just let people find out for themselves…

Note to the PLoS team: Even better would be if I could have a link that went to a page where the comment was displayedÂ in the context of the paper (i.e. what youÂ get when you click on the marker when reading the paper ) Â :-)

November 6, 2008December 30, 2009

Connecting the dots – the well posed question and code as a liability

Just a brief thought prompted by two, partly related, things streaming past my nose. Firstly Michael Nielsen discussed the views of Aristotle and Sunstein on collective intelligence. The thing that caught my attention was the idea that deliberation can make can make group functioning worse, leading to a collective decision that is muddled rather than actually identifying the best answer presented by members of the community. The exception to this is well posed questions, where deliberation can help. In science we are familiar with the idea that getting the question right (correct design of experiment, well organised theory) can be more important than the answer.

The second item was a blog post entitled “Data is good, code is a liability” from Greg Linden that was shared by Deepak Singh. Greg discussed a talk given by Peter Norvig which focusses on the idea that it is better to get a good sized dataset and use very sparing code to get at an answer rather than attempt to get at the answer de novo via complex code. Quoting from the post:

In one of several examples, Peter put up a slide showing an excerpt for a rule-based spelling corrector. The snippet of code, that was just part of a much larger program, contained a nearly impossible to understand let alone verify set of case and if statements that represented rules for spelling correction in English. He then put up a slide containing a few line Python program for a statistical spelling correction program that, given a large data file of documents, learns the likelihood of seeing words and corrects misspellings to their most likely alternative. This version, he said, not only has the benefit of being simple, but also easily can be used in different languages.

What struck me was the connection between being able to write a short, readable snippet of code, and the “well posed question”. The dataset provides the collective intelligence. So is it possible to propose the following?

“A well posed question is one which, given an appropriate dataset, can be answered by easily prepared and comprehensible code”

This could also possibly be turned on its head as “a good programming environment is one in which well posed questions can be readily converted to programs”. But it also raises an important point about how the structure of datasets relates to the questions you want to ask. The challenge in recording data is to structure it in such a way that the widest possible set of questions can be asked of that data. Data models all pre-suppose the kind of questions that will be asked. And any sufficiently general data model will be inefficient for most specific types of query.

Rajarshi Guha and Pierre Lindenbaum have been busy preparing different datastores for the solubility data being generated as part of the Open Notebook Science Challenge announced by Jean-Claude Bradley (more on this later). Rajarshi’s form based input has an SQL backend while Pierre has been working to extract the information as RDF. The point is not that one approach is better than the other, but that we need both, and possibly many more formats – and ideally we need to interconvert between them on the fly. A well posed question can easily founder on an inappropriately structured dataset (this is actually just a rephrasing of the Saunders Principle). It will be by enabling easy conversion between different formats that we might approach a situation where the aphorism I have suggested could become true.

October 30, 2008January 24, 2010

What Russel Brand and Jonathan Ross can teach us about the value of community norms

For anyone in the UK who lives under a stone, or those people elsewhere in the world who donâ€™t follow British news, this week there has been at least some news beyond the ongoing economic crisis and a U.S. election. Two media â€˜personalitiesâ€™ have been excoriated for leaving what can only be described as crass and offensive messages on an elderly actorâ€™s answer phone, while on air. What made the affair worse was that the radio programme was in fact recorded and someone, somewhere, made a decision to broadcast it in full. Even worse was the fact that the broadcaster was that bastion of British values, the BBC.

If you want to get more of the details of what exactly happened then do a search on their names, but what I wanted to focus on here was some of the public and institutional reactions and their relation to the presumed need within the science community for â€˜rulesâ€™, â€˜licencesâ€™, and â€˜copyrightâ€™ over works and data. Consistently we try to explain why this is not a good approach and developing strong community norms is better [1, 2]. I think this affair gives an example of why.

Much of the media and public outcry has been of the type â€˜there must be some law, or if not some BBC rule that must have been broken, bang them up!â€™ There is a sense that there can only be recourse if someone has broken a rule. This is quite similar to the sense amongst many researchers, that they will only be able to â€˜protectâ€™ the results they make public by making them available under an explicit licence. That the only way they can have any recourse against someone â€˜misusingâ€™ â€˜theirâ€™ results is if they are able to show that they have broken the terms of a licence.
The problem with this, as we know, is two-fold. First if someone does break the terms of the licence then frankly your chance of actually doing anything about it is pretty minimal. Secondly, and more importantly from the perspective of those of us interested in re-use and re-purposing, we know that pretty much any licensing system will create incompatibilities that prevent combining datasets, or using them in new ways, even when that wasnâ€™t the intention of the original licensor.

There is an interesting parallel here with the Brand/Ross affair. It is entirely possible that no laws, or even BBC rules, have been broken. Does this mean they get off scott free? No, Brand has resigned and Ross has been suspended with his popular Friday night TV show apparently not to be recorded this week. The most interesting thing about the whole affair is that the central failure at the BBC was an editorial one. Some, so far unnamed, senior editor signed off and allowed the programme to be broadcast. What should have happened was that this editor should have blocked the programme or removed the offending passages. Not because a rule was broken but because it was not appropriate for the BBCâ€™s editorial standards. Because it violated the community norms of what is acceptable for the BBC to broadcast. Whether or not they broke any rules what was done was crass and offensive. Whether or not someone is technically in violation of a data re-use license, failing to provide adequate attribution to the generators of that dataset is equally crass and unacceptable behaviour.

What the BBC discovered was that when it doesnâ€™t live up to the standards that the wider community expects of it, that it receives withering censure. Indeed much of the most serious criticism came from some of its own programmes. It was the voice of the wider community (as mediated through the mass media admittedly) which has lead to the resignation and suspension. If it were just a question of â€˜rulesâ€™ it is entirely possible that nothing could have been done. And if rules were put in place that would have prevented it then the unintended consequence would almost certainly have been to block programmes that had valid dramatic or narrative reasons for carrying such a passage. Again, community censure was much more powerful than any tribunal arbitrating some set of rules.

Yes this is nuanced, yes it is difficult to get right, and yes there is the potential for mob rule. That is why there is a team of senior professional editors at the BBC charged with policing and protecting the â€˜community normsâ€™ of what is acceptable for the BBC brand. That is why the damage done to the BBCâ€™s brand will be severe. Standards, where it is explicit that the spirit is applied rather than the letter, where there are grey areas, can be much more effective than legalistic rules. When someone or some group clearly steps outside of the bounds then widespread censure is appropriate. It is then for individuals and organisations to decide how to apply that censure. And in turn to expect to be held to the same standards.

The cheats will always break the rules. If you use legalistic rules, then you invite legalistic approaches to getting around them. Those that try to apply the rules properly will then be hamstrung in their attempts to do anything useful while staying within the letter of the law. Community norms and standards of behaviour, appropriate citation, respect for peopleâ€™s work and views, can be much more effective.

Wilbanks, John. The Control Fallacy: Why OA Out-Innovates the Alternative. Available from Nature Precedings <http://hdl.handle.net/10101/npre.2008.1808.1> (2008)
Wilbanks, John. Chemspider: Good intentions and the fog of licensing. http://network.nature.com/people/wilbanks/blog/2008/05/10/chemspider-good-intentions-and-the-fog-of-licensing (2008)

October 28, 2008December 30, 2009

Call for submissions for a project on The Use and Relevance of Web 2.0 Tools for Researchers

The Research Information Network has put out a cal for expressions of interest in running a research project on how Web 2.0 tools are changing scientific practice. The project will be funded up to Â£90,000. Expressions of interest are due on Monday 3 November (yes next week) and the projects are due to start in January. You can see the call in full here but in outline RIN seeking evidence whether web 2.0 tools are:

â€¢ making data easier to share, verify and re-use, or otherwise

facilitating more open scientific practices;

â€¢ changing discovery techniques or enhancing the accessibility of

research information;

â€¢ changing researchersâ€™ publication and dissemination behaviour,

(for example, due to the ease of publishing work-in-progress and

grey literature);

â€¢ changing practices around communicating research findings (for

example through opportunities for iterative processes of feedback,

pre-publishing, or post-publication peer review).

Now we as a community know that there are cases where all of these are occurring and have fairly extensively documented examples. The question is obviously one of the degree of penetration. Again we know this is small â€“ Iâ€™m not exactly sure how you would quantify it.

My challenge to you is whether it would be possible to use the tools and community we already have in place to carry out the project? In the past weâ€™ve talked a lot about aggregating project teams and distributed work but the problem has always been that people donâ€™t have the time to spare. We would need to get some help from social scientists on process and design of the investigation but with Â£90,000 there is easily enough money to pay people properly for their time. Indeed I know there are some people out there freelancing already who are in many ways already working on these issues anyway. So my question is: Are people interested in pursuing this? And if so, what do you think your hourly rate is?

October 27, 2008December 30, 2009

Workshop on finding and re-using open science resources – London November 8

The Open Knowledge Foundation is running a workshop on finding and re-using open science resources. More details are available on the okf blog and wiki. I will be there along with a number of more interesting and important people. Come along and contribute to the discussion of how we can use what’s out there and how we can get a lot more of it.