Researcher as Teenager: Parsing Danah Boyd’s It’s Complicated

I have a distinct tendency to see everything through the lens of what it means for research communities. I have just finally read Danah Boyd’s It’s Complicated a book that focuses on how and why U.S. teenagers interact with and through social media. The book is well worth reading for the study itself, but I would argue it is more worth reading for the way it challenges many of the assumptions we make about how social interactions online and how they are mediated by technology.

The main thrust of Boyd’s argument is that the teenagers she studied are engaged in a process of figuring out what their place is amongst various publics and communities. Alongside this she diagnoses a long standing trend of reducing the availability of the unstructured social interactions through which teens explore and find their place.

A consistent theme is that teens go online not to escape the real world, or because of some attraction to the technology but because it is the place where they can interact with their communities, test boundaries and act out in spaces where they feel in control of the process. She makes the point that through these interactions teens are learning how to be public and also how to be in public.

So the interactions and the needs they surface are not new, but the fact that they occur in online spaces where those interactions are more persistent, visible, spreadable and searchable changes the way in which adults view and interact with them. The activities going on are the same as in the past: negotiating social status, sharing resources, seeking to understand what sharing grants status, pushing the boundaries, claiming precedence and seeking control of their situation.

Boyd is talking about U.S. teenagers but I was consistently struck by the parallels with the research community and its online and offline behavior. The wide prevalence of imposter syndrome amongst researchers is becoming better known – showing how strongly the navigation and understanding of your place in the research community effects even senior researchers. Prestige in the research community arises from two places, existing connections (where you came from, who you know) and the sharing of resources (primarily research papers). Negotiating status, whether offline or on, remains at the core of researcher behavior throughout careers. In a very real sense we never grow up.

People generally believe that social media tools are designed to connect people in new ways. In practice, Boyd points out, mainstream tools effectively strengthen existing connections. My view has been that “Facebooks for Science” fail because researchers have no desire to be social as researchers in the same way the do as people – but that they socialize through research objects. What Boyd’s book leads me to wonder is whether in fact the issue is more that the existing tools do little to help researchers negotiate the “networked publics” of research.

Teens are learning and navigating forms of power, prestige and control that are highly visible. The often do this through sharing objects that are easily intepretable, text and images (although see the chapter on privacy for how this can be manipulated). The research community buries those issues because we would like to think we are a transparent meritocracy.

Where systems have attempted to surface prestige or reputation in a research context through point systems they have never really succeeded. Partly this is because those points are not fungible – they don’t apply in the “real” world (StackExchange wins in part precisely because those points did cross over rapidly into real world prestige). Is it perhaps precisely our pretence that this sense-making and assignment of power and prestige is supposed to be hidden that makes it difficult to build social technologies for research that actually work?

An Aside: I got a PDF copy of the book from Danah Boyd’s website because a) I don’t need a paper copy and b) I didn’t want to buy the ebook from Amazon. What I’d really like to do is buy a copy from an independent bookstore and have it sent somewhere where it will be read, a public or school library perhaps. Is there an easy way to do that?

Binary decisions are a real problem in a grey-scale world

Peer Review Monster
Image by Gideon Burton via Flickr

I recently made the most difficult decision I’ve had to take thus far as a journal editor. That decision was ultimately to accept the paper; that probably doesn’t sound like a difficult decision until I explain that I made this decision despite a referee saying I should reject the paper with no opportunity for resubmission not once, but twice.

One of the real problems I have with traditional pre-publication peer review is the way it takes a very nuanced problem around a work which has many different parts and demands that you take a hard yes/no decision. I could point to many papers that will probably remain unpublished where the methodology or the data might have been useful but there was disagreement about the interpretation. Or where there was no argument except that perhaps this was the wrong journal (with no suggestion of what the right one might be). Recently we had a paper rejected because we didn’t try to make up some spurious story about the biological reason for an interesting physical effect. Of course, we wanted to publish in a biologically slanted journal because that’s where it might come to the attention of people with ideas about what the biological relevance was.

So the problem is two-fold. Firstly that the paper is set up in a way that requires it to go forward or to fail as a single piece, despite the fact that one part might remain useful while another part is clearly wrong. The second is that this decision is binary, there is no way to “publish with reservations about X”, in most cases indeed no way to even mark which parts of the paper were controversial within the review process.

Thus when faced with this paper where, in my opinion, the data reported were fundamentally sound and well expressed but the intepretation perhaps more speculative than the data warranted, I was torn. The guidelines of PLoS ONE are clear: conclusions must be supported by valid evidence. Yet the data, even if the conclusions are proven wrong, are valuable in their own right. The referee objected fundamentally to the strength of the conclusion as well as having some doubts about the way those conclusions were drawn.

So we went through a process of couching the conclusions in much more careful terms, a greater discussion of the caveats and alternative interpretations. Did this fundamentally change the paper? Not really. Did it take a lot of time? Yes, months in the end. But in the end it felt like a choice between making the paper fit the guidelines, or blocking the publication of useful data. I hope the disagreement over the interpretation of the results and even the validity of the approach will play out in the comments for the paper or in the wider literature.

Is there a solution? Well I would argue that if we published first and then reviewed later this would solve many problems. Continual review and markup as well as modification would match what we actually do as our ideas change and the data catches up and propels us onwards. But making it actually happen? Still very hard work and a long way off.

In any case, you can always comment on the paper if you disagree with me. I just have.

Enhanced by Zemanta

A collaborative proposal on research metrics

Measuring time
Image by aussiegall via Flickr

tldr: Proposed project to connect metrics builders with those who can most effectively use them to change practice. Interested? Get involved! Proposal doc is here and free to edit.

When we talk about open research practice, more efficient research communication, wider diversity of publication we always come up against the same problem. What’s in it for the jobbing scientist? This is so prevalent that it has been reformulated as “Singh’s Law” (by analogy with Godwin’s law) that any discussion of research practice will inevitably end when someone brings up career advancement or tenure. The question is what do we actually do about this?

The obvious answer is to make these things matter. Research funders have the most power here in that they have the power to influence behaviour through how they distribute resources. If the funder says something is important then the research community will jump to it. The problem of course it that in practice funders have to take their community with them. Radical and rapid change is not usually possible. A step in the right direction would be to provide funders and researchers with effective means of measuring and comparing themselves and their outputs. In particular means of measuring performance in previously funded activities.

There are many current policy initiatives on trying to make these kinds of judgements. There are many technical groups building and discussing different types of metrics. Recently there have also been calls to ensure that the data that underlies these metrics is made available. But there is relatively little connection between these activities. There is an opportunity to connect technical expertise and data with the needs of funders, researchers, and perhaps even the mainstream media and government.

An opportunity has arisen for some funding to support a project here. My proposal is to bring a relevant group of stakeholders together; funders, technologists, scientists, adminstrators, media, publishers, and aggregators, to identify needs and then to actually build some things. Essentially the idea is a BarCamp style day and a bit meeting followed by a two day hackfest. Following on from this the project would fund some full time effort to take the most promising ideas forward.

I’m looking for interested parties. This will be somewhat UK centric just because of logistics and funding but the suggestion has already been made that following up with a similar North American or European project could be interesting. The proposal is available to view and edit as a GoogleDoc. Feel free to add your name, contact me directly, or suggest the names of others (probably better to me directly). I have a long list of people to contact directly as well but feel free to save me the effort.

Ed. Note: This proposal started as a question on Friendfeed where I’ve already got a lot of help and ideas. Hopefully soon I will write another post about collaborative and crowdsourced grant writing and how it has changed since the last time I tried this some years back.

Enhanced by Zemanta

It’s not information overload, nor is it filter failure: It’s a discovery deficit

Clay Shirky
Image via Wikipedia

Clay Shirky’s famous soundbite has helped to focus on minds on the way information on the web needs to be tackled and a move towards managing the process of selecting and prioritising information. But in the research space I’m getting a sense that it is fuelling a focus on preventing publication in a way that is analogous to the conventional filtering process involved in peer reviewed publication.

Most recently this surfaced at Chronicle of Higher Education, to which there were many responses, Derek Lowe’s being one of the most thought out. But this is not isolated.

@JISC_RSC_YH: How can we provide access to online resources and maintain quality of content?  #rscrc10 [twitter via@branwenhide]

Me: @branwenhide @JISC_RSC_YH isn’t the point of the web that we can decouple the issues of access and quality from each other? [twitter]

There is a widely held assumption that putting more research onto the web makes it harder to find the research you are looking for. Publishing more makes discovery easier.

The great strength of the web is that you can allow publication of anything at very low marginal cost without limiting the ability of people to find what they are interested in, at least in principle. Discovery mechanisms are good enough, while being a long way from perfect, to make it possible to mostly find what you’re looking for while avoiding what you’re not looking for.  Search acts as a remarkable filter over the whole web through making discovery possible for large classes of problem. And high quality search algorithms depend on having a lot of data.

It is very easy to say there is too much academic literature – and I do. But the solution which seems to be becoming popular is to argue for an expansion of the traditional peer review process. To prevent stuff getting onto the web in the first place. This is misguided for two important reasons. Firstly it takes the highly inefficient and expensive process of manual curation and attempts to apply it to every piece of research output created. This doesn’t work today and won’t scale as the diversity and sheer number of research outputs increases tomorrow. Secondly it doesn’t take advantage of the nature of the web. They way to do this efficiently is to publish everything at the lowest cost possible, and then enhance the discoverability of work that you think is important. We don’t need publication filters, we need enhanced discovery engines. Publishing is cheap, curation is expensive whether it is applied to filtering or to markup and search enhancement.

Filtering before publication worked and was probably the most efficient place to apply the curation effort when the major bottleneck was publication. Value was extracted from the curation process of peer review by using it reduce the costs of layout, editing, and printing through simple printing less.  But it created new costs, and invisible opportunity costs where a key piece of information was not made available. Today the major bottleneck is discovery. Of the 500 papers a week I could read, which ones should I read, and which ones just contain a single nugget of information which is all I need? In the Research Information Network study of costs of scholarly communication the largest component of publication creation and use cycle was peer review, followed by the cost of finding the articles to read which represented some 30% of total costs. On the web, the place to put in the curation effort is in enhancing discoverability, in providing me the tools that will identify what I need to read in detail, what I just need to scrape for data, and what I need to bookmark for my methods folder.

The problem we have in scholarly publishing is an insistence on applying this print paradigm publication filtering to the web alongside an unhealthy obsession with a publication form, the paper, which is almost designed to make discovery difficult. If I want to understand the whole argument of a paper I need to read it. But if I just want one figure, one number, the details of the methodology then I don’t need to read it, but I still need to be able to find it, and to do so efficiently, and at the right time.

Currently scholarly publishers vie for the position of biggest barrier to communication. The stronger the filter the higher the notional quality. But being a pure filter play doesn’t add value because the costs of publication are now low. The value lies in presenting, enhancing, curating the material that is published. If publishers instead vied to identify, markup, and make it easy for the right people to find the right information they would be working with the natural flow of the web. Make it easy for me to find the piece of information, feature work that is particularly interesting or important, re-intepret it so I can understand it coming from a different field, preserve it so that when a technique becomes useful in 20 years the right people can find it. The brand differentiator then becomes which articles you choose to enhance, what kind of markup you do, and how well you do it.

All of these are things that publishers already do. And they are services that authors and readers will be willing to pay for. But at the moment the whole business and marketing model is built around filtering, and selling that filter. By impressing people with how much you are throwing away. Trying to stop stuff getting onto the web is futile, inefficient, and expensive. Saving people time and money by helping them find stuff on the web is an established and successful business model both at scale, and in niche areas. Providing credible and respected quality measures is a viable business model.

We don’t need more filters or better filters in scholarly communications – we don’t need to block publication at all. Ever. What we need are tools for curation and annotation and re-integration of what is published. And a framework that enables discovery of the right thing at the right time. And the data that will help us to build these. The more data, the more reseach published, the better. Which is actually what Shirky was saying all along…

Enhanced by Zemanta

Contributor IDs – an attempt to aggregate and integrate

Following on from my post last month about using OpenID as a way of identifying individual researchers,  Chris Rusbridge made the sensible request that when conversations go spreading themselves around the web it would be good if they could be summarised and aggregated back together. Here I am going to make an attempt to do that – but I won’t claim that this is a completely unbiased account. I will try to point to as much of the conversation as possible but if I miss things out or misprepresent something please correct me in the comments or the usual places.

The majority of the conversation around my post occured on friendfeed, at the item here, but also see commentary around Jan Aert’s post (and friendfeed item) and Bjoern Bremb’s summary post. Other commentary included posts from Andy Powell (Eduserv), Chris Leonard (PhysMathCentral), Euan, Amanda Hill of the Names project, and Paul Walk (UKOLN). There was also a related article in Times Higher Education discussing the article (Bourne and Fink) in PLoS Comp Biol that kicked a lot of this off [Ed – Duncan Hull also pointed out there is a parallel discussion about the ethics of IDs that I haven’t kept up with – see the commentary at the PLoS Comp Biol paper for examples]. David Bradley also pointed out to me a post he wrote some time ago which touches on some of the same issues although from a different angle. Pierre set up a page on OpenWetWare to aggregate material to, and Martin Fenner has a collected set of bookmarks with the tag authorid at Connotea.

The first point which seems to be one of broad agreement is that there is a clear need for some form of unique identifier for researchers. This is not necessarily as obvious as it might seem. With many of these proposals there is significant push back from communities who don’t see any point in the effort involved. I haven’t seen any evidence of that with this discussion which leads me to believe that there is broad support for the idea from researchers, informaticians, publishers, funders, and research managers. There is also strong agreement that any system that works will have to be credible and trustworthy to researchers as well as other users, and have a solid and sustainable business model. Many technically minded people pointed out that building something was easy – getting people to sign up to use it was the hard bit.

Equally, and here I am reading between the lines somewhat, any workable system would have to be well designed and easy to use for researchers. There was much backwards and forwards about how “RDF is too hard”, “you can’t expect people to generate FOAF” and “OpenID has too many technical problems for widespread uptake”. Equally people thinking about what the back end would have to look like to even stand a chance of providing an integrated system that would work felt that FOAF, RDF, OAuth, and OpenID would have to provide a big part of the gubbins. The message for me was that the way the user interface(s) is presented have to be got right. There are small models of aspects of this that show that easy interfaces can be built to capture sophisticated data, but getting it right at scale will be a big challenge.

Where there is less agreement is on the details, both technical and organisational of how best to go about creating a useful set of unique identifiers. There was some to-and-fro as to whether CrossRef was the right organisation to manage such a system. Partly this turned on concern over centralised versus distributed systems and partly over issues of scope and trust. Nonetheless the majority view appeared to be that CrossRef would be right place to start and CrossRef do seem to have plans in this direction (from Geoffry Bilder see this Friendfeed item).

There was also a lot of discussion around identity tokens versus authorisation. Overall it seemed that the view was that these can be productively kept separate. One of the things that appealed to me in the first instance was that OpenIDs could be used as either tokens (just a unique code that is used as an identifier) as well as a login mechanism. The machinery is already in place to make that work. Nonetheless it was generally accepted, I think, that the first important step is an identifier. Login mechansisms are not necessarily required, or even wanted, at the moment.

The discussion as to whether OpenID is a good mechanism seemed in the end to go around in circles. Many people brought up technical problems they had with getting OpenIDs to work, and there are ongoing problems both with the underlying services that support and build on the standard as well as with the quality of some of the services that provide OpenIDs. This was at the core of my original proposal to build a specialist provider, that had an interface, and functionality that worked for researchers. As Bjoern pointed out, I should of course be applying my own five criteria for successful web services (got to the last slide) to this proposal. Key questions: 1) can it offer something compelling? Well no, not unless someone, somewhere requires you to have this thing 2) can you pre-populate? Well yes, and maybe that is the key…(see later). In the end, as with the concern over other “informatics-jock” terms and approaches, the important thing is that all of the technical side is invisible to end users.

Another important discussion, that again, didn’t really come to a conclusion, was who would pass out these identifiers? And when? Here there seemed to be two different perspectives. Those who wanted the identifiers to be completely separated from institutional associations, at least at first order. Others seemed concerned that access to identifiers be controlled via institutions. I definitely belong in the first camp. I would argue that you just give them to everyone who requests them. The problem then comes with duplication, what if someone accidentally (or deliberately) ends up with two or more identities. At one level I don’t see that it matters to anyone except to the person concerned (I’d certainly be trying to avoid having my publication record cut in half). But at the very least you would need to have a good interface for merging records when it was required. My personal belief is that it is more important to allow people to contribute than to protect the ground. I know others disagree and that somewhere we will need to find a middle path.

One thing that was helpful was the fact that we seemed to do a pretty good job of getting various projects in this space aggregated together (and possibly more aware of each other). Among these is ResearcherID, a commercial offering that has been running for a while now, the Names project, a collaboration of Mimas and the British Library funded by JISC, ClaimID is an OpenID provider that some people use that provides some of the flexible “home page” functionality (see Maxine Clark’s for instance) that drove my original ideas, PublicationsList.org provides an online homepage but does what ClaimID doesn’t, providing a PubMed search that makes it easier (as long as your papers are in PubMed) to populate that home page with your papers (but not easier to include datasets, blogs, or wikis – see here for my attempts to include a blog post on my page). There are probably a number of others, feel free to point out what I’ve missed!

So finally where does this leave us? With a clear need for something to be done, with a few organisations identified as the best ones to take it forward, and with a lot of discussion required about the background technicalities required. If you’re still reading this far down the page then you’re obviously someone who cares about this. So I’ll give my thoughts, feel free to disagree!

  1. We need an identity token, not an authorisation mechanism. Authorisation can get easily broken and is technically hard to implement across a wide range of legacy platforms. If it is possible to build in the option for authorisation in the future then that is great but it is not the current priority.
  2. The backend gubbins will probably be distributed RDF. There is identity information all over the place which needs to be aggregated together. This isn’t likely to change so a centralised database, to my mind, will not be able to cope. RDF is built to deal with these kinds of problems and also allows multiple potential identity tokens to be pulled together to say they represent one person.
  3. This means that user interfaces will be crucial. The simpler the better but the backend, with words like FOAF and RDF needs to be effectively invisible to the user. Very simple interfaces asking “are you the person who wrote this paper” are going to win, complex signup procedures are not.
  4. Publishers and funders will have to lead. The end view of what is being discussed here is very like a personal home page for researchers. But instead of being a home page on a server it is a dynamic document pulled together from stuff all over the web. But researchers are not going to be interested for the most part in having another home page that they have to look after. Publishers in particular understand the value (and will get most value out of in the short term) unique identifiers so with the most to gain and the most direct interest they are best placed to lead, probably through organisations like CrossRef that aggregate things of interest across the industry. Funders will come along as they see the benefits of monitoring research outputs, and forward looking ones will probably come along straight away, others will lag behind. The main point is that pre-populating and then letting researchers come along and prune and correct is going to be more productive than waiting for ten millions researchers to sign up to a new service.
  5. The really big question is whether there is value in doing this specially for researchers. This is not a problem unique to research and one in which a variety of messy and disparate solutions are starting to arise. Maybe the best option is to sit back and wait to see what happens. I often say that in most cases generic services are a better bet than specially built ones for researchers because the community size isn’t there and there simply isn’t a sufficient need for added functionality. My feeling is that for identity that there is a special need, and that if we capture the whole research community that it will be big enough to support a viable service. There is a specific need for following and aggregating the work of people that I don’t think is general, and is different to the authentication issues involved in finance. So I think in this case it is worth building specialist services.

The best hope I think lies in individual publishers starting to disambiguate authors across their existing corpus. Many have already put a lot of effort into this. In turn, perhaps through CrossRef, it should be possible to agree an arbitrary identifier for each individual author. If this is exposed as a service it is then possible to start linking the information up. People can and will and the services will start to grow around that. Once this exists then some of the ideas around recognising referees and other efforts will start to flow.

Call for submissions for a project on The Use and Relevance of Web 2.0 Tools for Researchers

The Research Information Network has put out a cal for expressions of interest in running a research project on how Web 2.0 tools are changing scientific practice. The project will be funded up to £90,000. Expressions of interest are due on Monday 3 November (yes next week) and the projects are due to start in January. You can see the call in full here but in outline RIN seeking evidence whether web 2.0 tools are:

• making data easier to share, verify and re-use, or otherwise

facilitating more open scientific practices;

• changing discovery techniques or enhancing the accessibility of

research information;

• changing researchers’ publication and dissemination behaviour,

(for example, due to the ease of publishing work-in-progress and

grey literature);

• changing practices around communicating research findings (for

example through opportunities for iterative processes of feedback,

pre-publishing, or post-publication peer review).

Now we as a community know that there are cases where all of these are occurring and have fairly extensively documented examples. The question is obviously one of the degree of penetration. Again we know this is small – I’m not exactly sure how you would quantify it.

My challenge to you is whether it would be possible to use the tools and community we already have in place to carry out the project? In the past we’ve talked a lot about aggregating project teams and distributed work but the problem has always been that people don’t have the time to spare. We would need to get some help from social scientists on process and design of the investigation but with £90,000 there is easily enough money to pay people properly for their time. Indeed I know there are some people out there freelancing already who are in many ways already working on these issues anyway. So my question is: Are people interested in pursuing this? And if so, what do you think your hourly rate is?

More on the science exchance – or building and capitalising a data commons

Image from Wikipedia via ZemantaBanknotes from all around the World donated by visitors to the British Museum, London

Following on from the discussion a few weeks back kicked off by Shirley at One Big Lab and continued here I’ve been thinking about how to actually turn what was a throwaway comment into reality:

What is being generated here is new science, and science isn’t paid for per se. The resources that generate science are supported by governments, charities, and industry but the actual production of science is not supported. The truly radical approach to this would be to turn the system on its head. Don’t fund the universities to do science, fund the journals to buy science; then the system would reward increased efficiency.

There is a problem at the core of this. For someone to pay for access to the results, there has to be a monetary benefit to them. This may be through increased efficiency of their research funding but that’s a rather vague benefit. For a serious charitable or commercial funder there has to be the potential to either make money, or at least see that the enterprise could become self sufficient. But surely this means monetizing the data somehow? Which would require restrictive licences, which is not at the end what we’re about.

The other story of the week has been the, in the end very useful, kerfuffle caused by ChemSpider moving to a CC-BY-SA licence, and the confusion that has been revealed regarding data, licencing, and the public domain. John Wilbanks, whose comments on the ChemSpider licence, sparked the discussion has written two posts [1, 2] which I found illuminating and have made things much clearer for me. His point is that data naturally belongs in the public domain and that the public domain and the freedom of the data itself needs to be protected from erosion, both legal, and conceptual that could be caused by our obsession with licences. What does this mean for making an effective data commons, and the Science Exchange that could arise from it, financially viable? Continue reading “More on the science exchance – or building and capitalising a data commons”