Researcher as Teenager: Parsing Danah Boyd’s It’s Complicated

I have a distinct tendency to see everything through the lens of what it means for research communities. I have just finally read Danah Boyd’s It’s Complicated a book that focuses on how and why U.S. teenagers interact with and through social media. The book is well worth reading for the study itself, but I would argue it is more worth reading for the way it challenges many of the assumptions we make about how social interactions online and how they are mediated by technology.

The main thrust of Boyd’s argument is that the teenagers she studied are engaged in a process of figuring out what their place is amongst various publics and communities. Alongside this she diagnoses a long standing trend of reducing the availability of the unstructured social interactions through which teens explore and find their place.

A consistent theme is that teens go online not to escape the real world, or because of some attraction to the technology but because it is the place where they can interact with their communities, test boundaries and act out in spaces where they feel in control of the process. She makes the point that through these interactions teens are learning how to be public and also how to be in public.

So the interactions and the needs they surface are not new, but the fact that they occur in online spaces where those interactions are more persistent, visible, spreadable and searchable changes the way in which adults view and interact with them. The activities going on are the same as in the past: negotiating social status, sharing resources, seeking to understand what sharing grants status, pushing the boundaries, claiming precedence and seeking control of their situation.

Boyd is talking about U.S. teenagers but I was consistently struck by the parallels with the research community and its online and offline behavior. The wide prevalence of imposter syndrome amongst researchers is becoming better known – showing how strongly the navigation and understanding of your place in the research community effects even senior researchers. Prestige in the research community arises from two places, existing connections (where you came from, who you know) and the sharing of resources (primarily research papers). Negotiating status, whether offline or on, remains at the core of researcher behavior throughout careers. In a very real sense we never grow up.

People generally believe that social media tools are designed to connect people in new ways. In practice, Boyd points out, mainstream tools effectively strengthen existing connections. My view has been that “Facebooks for Science” fail because researchers have no desire to be social as researchers in the same way the do as people – but that they socialize through research objects. What Boyd’s book leads me to wonder is whether in fact the issue is more that the existing tools do little to help researchers negotiate the “networked publics” of research.

Teens are learning and navigating forms of power, prestige and control that are highly visible. The often do this through sharing objects that are easily intepretable, text and images (although see the chapter on privacy for how this can be manipulated). The research community buries those issues because we would like to think we are a transparent meritocracy.

Where systems have attempted to surface prestige or reputation in a research context through point systems they have never really succeeded. Partly this is because those points are not fungible – they don’t apply in the “real” world (StackExchange wins in part precisely because those points did cross over rapidly into real world prestige). Is it perhaps precisely our pretence that this sense-making and assignment of power and prestige is supposed to be hidden that makes it difficult to build social technologies for research that actually work?

An Aside: I got a PDF copy of the book from Danah Boyd’s website because a) I don’t need a paper copy and b) I didn’t want to buy the ebook from Amazon. What I’d really like to do is buy a copy from an independent bookstore and have it sent somewhere where it will be read, a public or school library perhaps. Is there an easy way to do that?

Github for science? Shouldn’t we perhaps build TCP/IP first?

Mapa mental do TCP/IP
Image via Wikipedia

It’s one of those throw away lines, “Before we can talk about a github for science we really need to sort out a TCP/IP for science”, that’s geeky, sharp, a bit needly and goes down a treat on Twitter. But there is a serious point behind it. And its not intended to be dismissive of the ideas that are swirling around about scholarly communication at the moment either. So it seems worth exploring in a bit more detail.

The line is stolen almost wholesale from John Wilbanks who used it (I think) in the talk he gave at a Science Commons meetup in Redmond a few years back. At the time I think we were awash in “Facebooks for Science” so that was the target but the sentiment holds. As once was the case with Facebook and now is for Github, or Wikipedia, or StackOverflow, the possibilities opened up by these new services and technologies to support a much more efficient and effective research process look amazing. And they are. But you’ve got to be a little careful about taking the analogy too far.

If you look at what these services provide, particularly those that are focused on coding, they deliver commentary and documentation, nearly always in the form of text about code – which is also basically text. The web is very good at transferring text, and code, and data. The stack that delivers this is built on a set of standards, with each layer building on the layer beneath it. StackOverflow and Github are built on a set of services, that in turn sit on top of the web standards of http, which in turn are built on network standards like TCP/IP that control the actual transfer of bits and bytes.

The fundamental stuff of these coding sites and Wikipedia is text, and text is really well supported by the stack of web technologies. Open Source approaches to software development didn’t just develop because of the web, they developed the web so its not surprising that they fit well together. They grew up together and nurtured each other. But the bottom line is that the stack is optimized to transfer the grains of material, text and code, that make up the core of these services.

When we look at research we can see that when we dig down to the granular level it isn’t just made up of text. Sure most research could be represented as text but we don’t have the standardized forms to do this. We don’t have standard granules of research that we can transfer from place to place. This is because its complicated to transfer the stuff of research. I picked on TCP/IP specifically because it is the transfer protocol that supports moving bits and bytes from one place to another. What we need are protocols that support moving the substance of a piece of my research from one place to another.

Work on Research Objects [see also this paper], intended to be self-contained but useable pieces of research is a step in this direction, as are the developing set of workflow tools, that will ultimately allow us to describe and share the process by which we’ve transformed at least some parts of the research process into others. Laboratory recording systems will help us to capture and workflow-ify records of the physical parts of the research process. But until we can agree how to transfer these in a standardized fashion then I think it is premature to talk about Githubs for research.

Now there is a flip side to this, which is that where there are such services that do support the transfer of pieces of the research process we absolutely should be  experimenting with them. But in most cases the type-case itself will do the job. Github is great for sharing research code and some people are doing terrific things with data there as well. But if it does the job for those kinds of things why do we need one for researchers? The scale that the consumer web brings, and the exposure to a much bigger community, is a powerful counter argument to building things ‘just for researchers’. To justify a service focused on a small community you need to have very strong engagement or very specific needs. By the time that a mainstream service has mindshare and researchers are using it, your chances of pulling them away to a new service just for them are very small.

So yes, we should be inspired by the possibilities that these new services open up, and we should absolutely build and experiment but while we are at it can we also focus on the lower levels of the stack?They aren’t as sexy and they probably won’t make anyone rich, but we’ve got to get serious about the underlying mechanisms that will transfer our research in comprehensible packages from one place to another.

We have to think carefully about capturing the context of research and presenting that to the next user. Github works in large part because the people using it know how to use code, can recognize specific languages, and know how to drive it. It’s actually pretty poor for the user who just wants to do something – we’ve had to build up another set of services at different levels, the Python Package Index, tools for making and distributing executables, that help provide the context required for different types of user. This is going to be much, much harder, for all the different types of use we might want to put research to.

But if we can get this right – if we can standardize transfer protocols and build in the context of the research into those ‘packets’ that lets people use it then what we have seen on the wider web will happen naturally. As we build the stack up these services that seem so hard to build at the moment will become as easy today as throwing up a blog, downloading a rubygem, or firing up a machine instance. If we can achieve that then we’ll have much more than a github for research, we’ll have a whole web for research.

There’s nothing new here that wasn’t written some time ago by John Wilbanks and others but it seemed worth repeating. In particular I recommend these posts [1, 2] from John.

A little bit of federated Open Notebook Science

Girl Reading a Letter at an Open Window
Image via Wikipedia

Jean-Claude Bradley is the master when it comes to organising collaborations around diverse sets of online tools. The UsefulChem and Open Notebook Science Challenge projects both revolved around the use of wikis, blogs, GoogleDocs, video, ChemSpider and whatever tools are appropriate for the job at hand. This is something that has grown up over time but is at least partially formally organised. At some level the tools that get used are the ones Jean-Claude decides will be used and it is in part his uncompromising attitude to how the project works (if you want to be involved you interact on the project’s terms) that makes this work effectively.

At the other end of the spectrum is the small scale, perhaps random collaboration that springs up online, generates some data and continues (or not) towards something a little more organised. By definition such “projectlets” will be distributed across multiple services, perhaps uncoordinated, and certainly opportunistic. Just such a project has popped up over the past week or so and I wanted to document it here.

I have for some time been very interested in the potential of visualising my online lab notebook as a graph. The way I organise the notebook is such that it, at least in a sense, automatically generates linked data and for me this is an important part of its potential power as an approach. I often use a very old graph visualisation in talks I give out the notebook as a way of trying to indicate the potential which I wrote about previously, but we’ve not really taken it any further than that.

A week or so ago, Tony Hirst (@psychemedia) left a comment on a blog post which sparked a conversation about feeds and their use for generating useful information. I pointed Tony at the feeds from my lab notebook but didn’t take it any further than that. Following this he posted a series of graph visualisations of the connections between people tweeting at a set of conferences and then the penny dropped for me…sparking this conversation on twitter.

@psychemedia You asked about data to visualise. I should have thought about our lab notebook internal links! What formats are useful? [link]

@CameronNeylon if the links are easily scrapeable, it’s easy enough to plot the graph eg http://blog.ouseful.info/2010/08/30/the-structure-of-ouseful-info/ [link]

@psychemedia Wouldn’t be too hard to scrape (http://biolab.isis.rl.ac.uk/camerons_labblog) but could possibly get as rdf or xml if it helps? [link]

@CameronNeylon structured format would be helpful… [link]

At this point the only part of the whole process that isn’t publicly available takes place as I send an email to find out how to get an XML download of my blog and then report back  via Twitter.

@psychemedia Ok. XML dump at http://biolab.isis.rl.ac.uk/camerons_labblog/index.xml but I will try to hack some Python together to pull the right links out [link]

Tony suggests I pull out the date and I respond that I will try to get the relevant information into some sort of JSON format, and I’ll try to do that over the weekend. Friday afternoons being what they are and Python being what is I actually manage to do this much quicker than I expect and so I tweet that I’ve made the formatted data, raw data, and script publicly available via DropBox. Of course this is only possible because Tony tweeted the link above to his own blog describing how to pull out and format data for Gephi and it was easy for me to adapt his code to my own needs, an open source win if there ever was one.

Despite the fact that Tony took the time out to put the kettle on and have dinner and I went to a rehearsal by the time I went to bed on Friday night Tony had improved the script and made it available (with revisions) via a Gist, identified some problems with the data, and posted an initial visualisation. On Saturday morning I transfer Tony’s alterations into my own code, set up a local Git repository, push to a new Github repository, run the script over the XML dump as is (results pushed to Github). I then “fix” the raw data by manually removing the result of a SQL insertion attack – note that because I commit and push to the remote repository I get data versioning for free – this “fixing” is transparent and recorded. Then I re-run the script, pushing again to Github. I’ve just now updated the script and committed once more following further suggestions from Tony.

So over a couple of days we used Twitter for communication, DropBox, GitHub, Gists, and Flickr for sharing data and code, and the whole process was carried out publicly. I wouldn’t have even thought to ask Tony about this if he hadn’t  been publicly posting his visualisations (indeed I remember but can’t find an ironic tweet from Tony a few weeks back about it would be clearly much better to publish in a journal in 18 months time when no-one could even remember what the conference he was analysing was about…).

So another win for open approaches. Again, something small, something relatively simple, but something that came together because people were easily connected in a public space and were routinely sharing research outputs, something that by default spread into the way we conducted the project. It never occurred to me at the time, I was just reaching for the easiest tool at each stage, but at every stage every aspect of this was carried out in the open. It was just the easiest and most effective way to do it.

Enhanced by Zemanta

Why the web of data needs to be social

Picture1
Image by cameronneylon via Flickr

If you’ve been around either myself or Deepak Singh you will almost certainly have heard the Jeff Jonas/Jon Udell soundbite: ‘Data finds data. Then people find people’. Jonas is referring to data management frameworks and knowledge discovery and Udell is referring to the power of integrated data to bring people together.

At some level Jonas’ vision (see his chapter[pdf] in Beautiful Data) is what the semantic web ought to enable, the automated discovery of data or objects based on common patterns or characteristics. Thus far in practical terms we have signally failed to make this a reality, particularly for research data and objects.

Udell’s angle (or rather, my interpretation of his overall stance) is more linked to the social web – the discovery of common contexts through shared data frameworks. These contexts might be social groups, as in conventional social networks, a particular interest or passion, or – in the case of Jon’s championing of the iCalendar standard –  a date and place as demonstrated by the  the elmcity project supporting calendar curation and aggregation. Shared context enables the making of new connection, the creation of new links. But still mainly links between people.

It’s not the scientists who are social; it’s the data – Neil Saunders

The naïve analysis of the success of consumer social networks and the weaknesses of science communication has lead to efforts that almost precisely invert the Jonas/Udell concept. In the case of most of these “Facebooks for Scientists” the idea is that people find people, and then they connect with data through those people.

My belief is that it is this approach that has led to the almost complete failure of these networks to gain traction. Services that place the object  research at the centre; the reference management and bookmarking services, to some extent Twitter and Friendfeed, appear to gain much more real scientific use because they mediate the interactions that researchers are interested in, those between themselves and research objects. Friendfeed in particular seems to support this discovery pattern. Objects of interest are brought into your stream, which then leads to discovery of the person behind them.  I often use Citeulike in this mode. I find a paper of interest, identify the tags other people have used for it and the papers that share those tags. If these seems promising, I then might look at the library of the person, but I get to that person through the shared context of the research object, the paper, and the tags around that object.

Data, data everywhere, but not a lot of links – Simon Coles

A common complaint made of research data is that people don’t make it available. This is part of the problem but increasingly it is a smaller part. It is easy enough to put data up that many researchers are doing so, in supplementary data of journal articles, on personal websites, or on community or consumer sites. From a linked data perspective we ought to be having a field day with this, even if it represents only a small proportion of the total. However little of this data is easily discoverable and most of it is certainly not linked in any meaningful way.

A fundamental problem that I feel like I’ve been banging on about for years now is that dearth of well built tools for creating these links. Finally these tools are starting to appear with Freebase Gridworks being an early example. There is a good chance that it will become easier over time for people to create links as part of the process of making their own record. But the fundamental problems we always face, that this is hard work, and often unrewarded work, are limiting progress.

Data friends data…then knowledge becomes discoverable

Human interaction is unlikely to work at scale. We are going to need automated systems to wire the web of data together. The human process simply cannot keep up with the ongoing annotation and connection of data at the volumes that are being generated today. And we can’t afford not to if we want to optimize the opportunities of research to deliver useful outcomes.

When we think about social networks we always place people at their centre. But there is nothing to stop us replacing people with data or other research objects. Software that wants to find data, data that wants to find complementary or supportive data, or wants to find the right software to convert or analyze it. Instead of Farmville or Mafia Wars imagine useful tools that make these connections, negotiate content, and identify common context. As pointed out to me by Paul Walk this is very similar to what was envisioned in the 90s as the role of software agents. In this view the human research users are the poorly connected users on the outskirts of the web.

The point is that the hard part of creating linked data is making the links, not publishing the data. The semantic web has always suffered from the chicken and egg problem of a lack of user-friendly ways to generate RDF and few tools that could really use that RDF in exciting ways even if it did exist. I still can’t do a useful search on which restaurants in Bath will be open next Sunday. The reality is that the innards of this should be hidden from the user, the making of connections needs to be automated as far as possible, and as natural as possible when the user has to be involved. As easy as hitting that “like” button, or right clicking and adding a citation.

We have learnt a lot about the principles of when and how social networks work. If we can apply those lessons to the construction of open data management and discovery frameworks then we may stand some chance of actually making some of the original vision of the web work.

Reblog this post [with Zemanta]

The personal and the institutional

Twittering and microblogging not permitted
Image by cameronneylon via Flickr

A number of things recently have lead me to reflect on the nature of interactions between social media, research organisations and the wider community. There has been an awful lot written about the effective use of social media by organisations, the risks involved in trusting staff and members of an organisation to engage productively and positively with a wider audience. Above all there seems a real focus on the potential for people to embarrass the organisation. Relatively little focus is applied to the ability of the organisation to embarrass its staff but that is perhaps a subject for another post.

In the area of academic research this takes on a whole new hue due to the presence of a strong principle and community expectation of free speech, the principle of “academic freedom”. No-one really knows what academic freedom is. It’s one of those things that people can’t define but will be very clear about when it has been taken away. In general terms it is the expectation that a tenured academic has earnt the right to be able to speak their opinion, regardless of how controversial. We can accept there are some bounds on this, of ethics, taste, and legality – racism would generally be regarded as unacceptable – while noting that the boundary between what is socially unacceptable and what is a validly held and supported academic opinion is both elastic and almost impossible to define. Try expressing the opinion, for example, that their might be a biological basis to the difference between men and women on average scores on a specific maths test. These grey areas, looking at how the academy ( or academies) censor themselves are interesting but aren’t directly relevant to this post. Here I am more interested in how institutions censor their staff.

Organisations always seek to control the messages they release to the wider community. The first priority of any organisation or institution is its own survival. This is not necessarily a bad thing – presumably the institution exists because it is  (or at least was) the most effective way of delivering a specific mission. If it ceases to exist, that mission can’t be delivered. Controlling the message is a means of controlling others reactions and hence the future. Research institutions have always struggled with this – the corporate centre sending once message of clear vision, high standards, continuous positive development, while the academics privately mutter in the privacy of their own coffee room about creeping beauracracy, lack of resources, and falling standards.

There is fault on both sides here. Research administration and support only very rarely puts the needs and resources of academics at its centre. Time and time again the layers of beauracracy mean that what may or may not have been a good idea gets buried in a new set of unconnected paperwork, that more administration is required taking resources away from frontline activities, and that target setting results in target meeting but at the cost of what was important in the first place. There is usually a fundamental lack of understanding of what researchers do and what motivates them.

On the other side academics are arrogant and self absorbed, rarely interested in contributing to the solution of larger problems. They fail to understand, or take any interest in the corporate obligations of the organisations that support them and will only rarely cooperate and compromise to find solutions to problems. Worse than this, academics build social and reward structures that encourage this kind of behaviour, promoting individual achievement rather than that of teams, penalising people for accepting compromises, and rarely rewarding the key positive contribution of effective communication and problem solving between the academic side and administration.

What the first decade of the social web has taught us is that organisations that effectively harness the goodwill of their staff or members using social media tools do well. Organisations that effectively use Twitter or Facebook enable and encourage their staff to take the shared organisational values out to the wider public. Enable your staff to take responsibility and respond rapidly to issues, make it easy to identify the right person to engage with a specific issue, and admit (and fix) mistakes early and often, is the advice you can get from any social media consultant. Bring the right expert attention to bear on a problem and solve it collaboratively, whether its internal or with a customer. This is simply another variation on Michael Nielsen’s writing on markets in expert attention – the organisations that build effective internal markets and apply the added value to improving their offering will win.

This approach is antithetical to traditional command and control management structures. It implies a fluidity and a lack of direct control over people’s time. It is also requires that there be slack in the system, something that doesn’t sit well with efficiency drives. In its extreme form it removes the need for the organisation to formally exist, allowing a fluid interaction of free agents to interact in a market for their time. What it does do though is map very well onto a rather traditional view of how the academy is “managed”. Academics provide a limited resource, their time, and apply it to a large extent in a way determined by what they think is important. Management structures are in practice fairly flat (and used to be much more so) and interactions are driven more by interests and personal whim than by widely accepted corporate objectives. Research organisations, and perhaps by extension those commercial interests that interact most directly with them, should be ideally suited to harness the power of the social web to first solve their internal problems and secondly interact more effectively with their customers and stakeholders.

Why doesn’t this happen? A variety of reasons, some of them the usual suspects, a lack of adoption of new tools by academics, appalling IT procurement procedures and poor standards of software development, and a simple lack of time to develop new approaches, and a real lack of appreciation of the value that diversity of contributions can bring to a successful department and organisation. The biggest one though I suspect is a lack of good will between administrations and academics. Academics will not adopt any tools en masse across a department, let alone an organisation because they are naturally suspicious of the agenda and competence of those choosing the tools. And the diversity of tools they choose on their own means that none have critical mass within the organisation – few academic institutions had a useful global calendar system until very recently. Administration don’t trust the herd of cats that make up their academic staff to engage productively with the problems they have and see the need to have a technical solution that has critical mass of users, and therefore involves a central decision.

The problems of both diversity and lack of critical mass are a solid indication that the social web has some way to mature – these conversations should occur effectively across different tools and frameworks – and the uptake at research institutions should (although it may seem paradoxical) be expected to much slower than in more top down, managed organisation, or at least organisations with a shared focus. But it strikes me that the institutions that get this right, and they won’t be the traditional top institutions, will very rapidly accrue a serious advantage, both in terms of freeing up staff time to focus on core activities and releasing real monetary resource to support those activities. If the social side works, then the resource will also go to the right place. Watch for academic institutions trying to bring in strong social media experience into senior management. It will be a very interesting story to follow.

Reblog this post [with Zemanta]

“Friendfeeds for Science” pt II – Design ideas for a research focussed aggregator

Who likes me on friendfeed?
Image by cameronneylon via Flickr

This post, while only 48 hours old is somewhat outdated by these two Friendfeed discussions. This was written independently of those discussions so it seemed worth putting out in its original form rather than spending too much time rewriting.

I wrote recently about Sciencefeed, a Friendfeed like system aimed at scientists and was fairly critical. I also promised to write about what I thought a “Friendfeed for Researchers” should look like. To look at this we need to think about what Friendfeed, and other services including Twitter, Facebook, and Posterous are used for and what else they could do.

Friendfeed is an aggregator that enables, as I have written before, an “object-centric” means of interacting around those objects. As Alan Cann has pointed out this is not the only thing it does, also enabling the person-centric interactions that I see as more typical of Facebook and Twitter. Enabling both is important, as is the realization that all of these systems need to interoperate effectively with each other, something which is still evolving. But core to the development of something that works for researchers is that standard research objects and particularly papers, need to be first class objects. Author lists, one click to full text, one click to bookmark to my library.

Functionality 1: Treat research objects as first class citizens with special attention, start with journal papers and support for Citeulike/Zotero/Mendeley etc.

On top of this Friendfeed is a community, or rather several interlinked communities that have their own traditions, standards, and expectations, that are supported to a greater or lesser extent by the functionality of rooms, search, hiding, and administration found within Friendfeed. Any new service needs to understand and support these expectations.

Friendfeed also doesn’t so some things. It is not terribly effective as a bookmark tool, nor very good as tool for identifying and mining for objects or information that is more than a few days old although paradoxically it has served quite well as a means of archiving tweets and exposing them to search engines. The idea of a tool that surfaces objects to Google is an interesting one, and one we could take advantage of.  Granularity of sharing is also limited, what if I want slidesets to be public but tweets to be a private feed? Or to collect different feeds under different headings for different communities, public, domain-specific, and only for the interested specialist?

Finally Friendfeed doesn’t have a very sophisticated karma system.  While likes and comments will keep bringing specific objects (and by extension the people who have brought them in) into your attention stream there is none of the filtering power enabled by tools like StackOverflow. Whether or not such a thing is something we would want is an interesting question but it has the potential to enable much more sophisticated filtering and curation of content. StackOverflow itself has an interesting limitation as well; there is only one rank order of answers, I can’t choose to privelege the upmods of one specific curator rather than another. I certainly can’t choose to order my stream based on a persons upmods but not their downmods.

A user on Friendfeed plays three distinct roles, content author, content curator, and content consumer. Different people will emphasise different roles, from the pure broadcaster, to the pure reader who doesn’t ever interact. The real added value comes from the curation role and in particular enabling granular filtering based on your choice of curators. Curation comes in the form of choosing to push content to Friendfeed from outside servces, from “likes”, and from commenting. Commenting is both curation and authoring, providing context as well as providing new information or opinion. But supporting and validating this activity will be important. Whatever choice is made around “liking” or StackOverflow style up and down-modding needs to apply to comments as well as objects.

Functionality addition 2: Enable rating of comments and by extension, the people making them

If reputation gathering is to be useful in driving filtering functionality as I have suggested we will need good ways of separating content authoring from curation. One thing that really annoys me is seeing an interesting title and a friendly avatar on Friendfeed and clicking through to find something written by someone else. Not because I don’t want to read something written by someone else, but because my decision to click through was based on assumptions about who the author was.  We need to support a strong culture of citation and attribution in research. A Friendfeed for research will need to clearly mark the distinction between who has brought an object into the service, who has curated it, and who authored it. Both should be valued but the roles should be measured separately.

Functionality addition 3: Clearly designate authors and curators of objects brought into the stream. Possibly enable these activities to be rated separately?

If we recognize a role of author, outside that of the user’s curation activity we can also enable the rating of people and objects that don’t belong to users. This would allow researchers who are not users to build up reputation within the system. This has the potential to solve the “ghost town” phenomonen that plagues most science social networking sites. A new user could be able to claim the author role for objects that were originally brought  in by someone else. This would immediately connect them with other people who have commented on their work, and provide them with a reputation that can be further built upon through taking on curation activities.

This is a sensitive area, holding information on people without their knowledge, but it is something done already across indexing services, aggregation services, and chat rooms. The use of karma in this context would need to be very carefully thought out., and whether it would be made available either within or outside the system would be an important question to tackle.

Functionality addition 4: Collect reputation and comment information for authors who are not users to enable them to rapidly connect with relevant content if they choose to join.

Finally there is the question of interacting with this content and filtering it through the rating systems that have been created. The UI issues for this are formidable but there is a need to enable different views. A streaming view, and more static views of content a user has collected over long periods, as well as search. There is probably enough for another whole post in those issues.

Summary: Overall for me the key to building a service that takes inspiration from Friendfeed but delivers more functionality for researchers, while not alienating a wider potential user base is to build a tool that enables and supports curation rating and granular filtering of content. Authorship is key, as is quantitative measures of value and personal relevance that will enable users to build their own view of the content they are interested in, to collect it for themselves and to continue to curate it for themselves, either on their own or in collaboraton with others.

Reblog this post [with Zemanta]

Friendfeed for Research? First impressions of ScienceFeed

Image representing FriendFeed as depicted in C...
Image via CrunchBase

I have been saying for quite some time that I think Friendfeed offers a unique combination of functionality that seems to work well for scientists, researchers, and the people they want to (or should want to) have conversations with. For me the core of this functionality lies in two places: first that the system explicitly supports conversations that centre around objects. This is different to Twitter which supports conversations but doesn’t centre them around the object – it is actually not trivial to find all the tweets about a given paper for instance. Facebook now has similar functionality but it is much more often used to have pure conversation. Facebook is a tool mainly used for person to person interactions, it is user- or person-centric. Friendfeed, at least as it is used in my space is object-centric, and this is the key aspect in which “social networks for science” need to differ from the consumer offerings in my opinion. This idea can trace a fairly direct lineage via Deepak Singh to the Jeff Jonas/Jon Udell concatenation of soundbites:

“Data finds data…then people find people”

The second key aspect about Friendfeed is that it gives the user a great deal of control over what they present to represent themselves. If we accept the idea that researchers want to interact with other researchers around research objects then it follows that the objects that you choose to represent yourself is crucial to creating your online persona. I choose not to push Twitter into Friendfeed mainly because my tweets are directed at a somewhat different audience. I do choose to bring in video, slides, blog posts, papers, and other aspects of my work life. Others might choose to include Flickr but not YouTube. Flexibility is key because you are building an online presence. Most of the frustration I see with online social tools and their use by researchers centres around a lack of control in which content goes where and when.

So as an advocate of Friendfeed as a template for tools for scientists it is very interesting to see how that template might be applied to tools built with researchers in mind. ScienceFeed launched yesterday by Ijad Madisch, the person behind ResearchGate. The first thing to say is that this is an out and out clone of Friendfeed, from the position of the buttons to the overall layout. It seems not to be built on the Tornado server that was open sourced by the Friendfeed team so questions may hang over scalability and architecture but that remains to be tested. The main UI difference with Friendfeed is that the influence of another 18 months of development of social infrastructure is evident in the use of OAuth to rapidly leverage existing networks and information on Friendfeed, Twitter, and Facebook. Although it still requires some profile setup, this is good to see. It falls short of the kind of true federation which we might hope to see in the future but then so does everything else.

In terms of specific functionality for scientists the main additions is a specialised tool for adding content via a search of literature databases. This seems to be adapted from the ResearchGate tool for populating a profile’s publication list. A welcome addition and certainly real tools for researchers must treat publications as first class objects. But not groundbreaking.

The real limitation of ScienceFeed is that it seems to miss the point of what Friendfeed is about. There is currently no mechanism for bringing in and aggregating diverse streams of content automatically. It is nice to be able to manually share items in my citeulike library but this needs to happen automatically. My blog posts need to come in as do my slideshows on slideshare, my preprints on Nature Precedings or Arxiv. Most of this information is accessible via RSS feeds so import via RSS/Atom (and in the future real time protocols like XMPP) is an absolute requirement. Without this functionality, ScienceFeed is just a souped up microblogging service. And as was pointed out yesterday in one friendfeed thread we have a twitter-like service for scientists. It’s called Twitter. With the functionality of automatic feed aggregation Friendfeed can become a presentation of yourself as a researcher on the web. An automated publication list that is always up to date and always contains your latest (public) thoughts, ideas, and content. In short your web-native business card and CV all rolled into one.

Finally there is the problem of the name. I was very careful at the top of this post to be inclusive in the scope of people who I think can benefit from Friendfeed. One of the great strengths of Friendfeed is that it has promoted conversations across boundaries that are traditionally very hard to bridge. The ongoing collision between the library and scientific communities on Friendfeed may rank one day as its most important achievement, at least in the research space. I wonder whether the conversations that have sparked there would have happened at all without the open scope that allowed communities to form without prejudice as to where they came from and then to find each other and mingle. There is nothing in ScienceFeed that precludes anyone from joining as far as I can see, but the name is potentially exclusionary, and I think unfortunate.

Overall I think ScienceFeed is a good discussion point, a foil to critical thinking, and potentially a valuable fall back position if Friendfeed does go under. It is a place where the wider research community could have a stronger voice about development direction and an opportunity to argue more effectively for business models that can provide confidence in a long term future. I think it currently falls far short of being a useful tool but there is the potential to use it as a spur to build something better. That might be ScienceFeed v2 or it might be an entirely different service. In a follow-up post I will make some suggestions about what such a service might look like but for now I’d be interested in what other people think.

Other Friendfeed threads are here and here and Techcrunch has also written up the launch.

Reblog this post [with Zemanta]

What should social software for science look like?

Nat Torkington, picking up on my post over the weekend about the CRU emails takes a slant which has helped me figure out how to write this post which I was struggling with. He says:

[from my post...my concern is that in a kneejerk response to suddenly make things available no-one will think to put in place the social and technical infrastructure that we need to support positive engagement, and to protect active researchers, both professional and amateur from time-wasters.] Sounds like an open science call for social software, though I’m not convinced it’s that easy. Humans can’t distinguish revolutionaries from terrorists, it’s unclear why we think computers should be able to.

As I responded over at Radar, yes I am absolutely calling for social software for scientists, but I didn’t mean to say that we could expect it to help us find the visionaries amongst the simply wrong. But this raises a very helpful question. What is it that we would hope Social Software for Science would do? And is that realistic?

Over the past twelve months I seem to have got something of a reputation for being a grumpy old man about these things, because I am deeply sceptical of most of the offerings out there. Partly because most of these services don’t actually know what it is they are trying to do, or how it maps on to the success stories of the social web. So prompted by Nat I would like to propose a list of what effective Social Software for Science (SS4S) will do and what it can’t.

  1.  SS4S will promote engagement with online scientific objects and through this encourage and provide paths to those with enthusiasm but insufficient expertise to gain sufficient expertise to contribute effectively (see e.g. Galaxy Zoo). This includes but is certainly not limited to collaborations between professional scientists. These are merely a special case of the general.
  2. SS4S will measure and reward positive contributions, including constructive criticism and disagreement (Stack overflow vs YouTube comments). Ideally such measures will value quality of contribution rather than opinion, allowing disagreement to be both supported when required and resolved when appropriate.
  3. SS4S will provide single click through access to available online scientific objects and make it easy to bring references to those objects into the user’s personal space or stream (see e.g. Friendfeed “Like” button)
  4. SS4S should provide zero effort upload paths to make scientific objects available online while simultaneously assuring users that this upload and the objects are always under their control. This will mean in many cases that what is being pushed to the SS4S system is a reference not the object itself, but will sometimes be the object to provide ease of use. The distinction will ideally be invisible to the user in practice barring some initial setup (see e.g. use of Posterous as a marshalling yard).
  5. SS4S will make it easy for users to connect with other users and build networks based on a shared interest in specific research objects (Friendfeed again).
  6. SS4S will help the user exploit that network to collaboratively filter objects of interest to them and of importance to their work. These objects might be results, datasets, ideas, or people.
  7. SS4S will integrate with the user’s existing tools and workflow and enable them to gradually adopt more effective or efficient tools without requiring any severe breaks (see Mendeley/Citeulike/Zotero/Papers and DropBox)
  8. SS4S will work reliably and stably with high performance and low latency.
  9. SS4S will come to where the researcher is working both with respect to new software and also unusual locations and situations requiring mobile, location sensitive, and overlay technologies (Layar, Greasemonkey, voice/gesture recognition – the latter largely prompted by a conversation I had with Peter Murray-Rust some months ago).
  10. SS4S will be trusted and reliable with a strong community belief in its long term stability. No single organization holds or probably even can hold this trust so solutions will almost certainly need to be federated, open source, and supported by an active development community.

What SS4S won’t do is recognize geniuses when they are out in the wilderness amongst a population of the just plain wrong. It won’t solve the cost problems of scientific publication and it won’t turn researchers into agreeable, supportive, and collaborative human beings. Some things are beyond even the power of Web 2.0

I was originally intending to write this post from a largely negative perspective, ranting as I have in the past about how current services won’t work. I think now there is a much more positive approach. Lets go out there and look at what has been done, what is being done, and how well it is working in this space. I’ve set up a project on my new wiki (don’t look too closely, I haven’t finished the decorating) and if you are interested in helping out with a survey of what’s out there I would appreciate the help. You should be able to log in with an OpenID as long as you provide an email address. Check out this Friendfeed thread for some context.

My belief is that we are near to position where we could build a useful requirements document for such a beast with references to what has worked and what hasn’t. We may not have the resources to build it and maybe the NIH projects currently funded will head in that direction. But what is valuable is to pull the knowledge together to figure out the most effective path forward.

It wasn’t supposed to be this way…

I’ve avoided writing about the Climate Research Unit emails leak for a number of reasons. Firstly it is clearly a sensitive issue with personal ramifications for some and for many others just a very highly charged issue. Probably more importantly I simply haven’t had the time or energy to look into the documents myself. I haven’t, as it were, examined the raw data for myself, only other people’s interpretations. So I’ll try to stick to a very general issue here.

There are appear to be broadly two responses from the research community to this saga. One is to close ranks and to a certain extent say “nothing was done wrong here”. This is at some level, the tack taken by the Nature Editorial of 3 December, which was headed up with “Stolen e-mails have revealed no scientific conspiracy…”. The other response is that the scandal has exposed the shambolic way that we deal with collecting, archiving, and making available both data and analysis in science, as well as the endemic issues around the hoarding of data by those who have collected it.

At one level I belong strongly in the latter camp, but I also appreciate the dismay that must be felt by those who have looked at, and understand what the emails actually contain, and their complete inability to communicate this into the howling winds of what seems to a large extent a media beatup. I have long felt that the research community would one day be shocked by the public response when, for whatever reason, the media decided to make a story about the appalling data sharing practices of publicly funded academic researchers like myself. If I’d thought about it more deeply I should have realised that this would most likely be around climate data.

Today the Times reports on its front page that the UK Metererology Office is to review 160 years of climate data and has asked a range of contributing organisations to allow it to make data public. The details of this are hazy but if the UK Met Office is really going to make the data public this is a massive shift. I might be expected to be happy about this but I’m actually profoundly depressed. While it might in the longer term lead to more strongly worded and enforced policies it will also lead to data sharing being forever associated with “making the public happy”. My hope has always been that the sharing of the research record would come about because people started to see the benefits, because they could see the possibilities in partnership with the wider community, and that it made their research more effective. Not because the tabloids told us we should.

Collecting the best climate data and doing the best possible analysis on it is not an option. If we get this wrong and don’t act effectively then with some probability that is significantly above zero our world ends. The opportunity is there to make this the biggest, most important, and most effective research project ever undertaken. To actively involve the wider community in measurement. To get an army of open source coders to re-write, audit, and re-factor the analysis software. Even to involve the (positively engaged) sceptics, to use their interest and ability to look for holes and issues. Whether politicians will act on data is not the issue that the research community can or should address; what we need to be clear on is that we provide the best data, the best analysis, and an honest view of the uncertainties. Along with the ability of anyone to critically analyse the basis for those conclusions.

There is a clear and obvious problem with this path. One of the very few credible objections to open research that I have come across is that by making material available you open your inbox to a vast community of people who will just waste your time. The people who can’t be bothered to read the background literature or learn to use the tools; the ones who just want the right answer. This is nowhere more the case than it is with climate research and it forms the basis for the most reasonable explanation of why the CRU (and every other repository of climate data as far as I am aware) have not made more data or analysis software directly available.

There are no simple answers here, and my concern is that in a kneejerk response to suddenly make things available no-one will think to put in place the social and technical infrastructure that we need to support positive engagement, and to protect active researchers, both professional and amateur from time-wasters. Interestingly I think this infrastructure might look very similar to that which we need to build to effectively share the research we do, and effectively discover the relevant work of others. Infrastructure is never sexy, particularly in the middle of a crisis. But there is one thing in the practice of research that we forget at our peril. Any given researcher needs to earn the right to be taken seriously. No-one ever earns the right to shut people up. Picking out the objection that happens to be important is something we have to at least attempt to build into our systems.

The trouble with business models (Facebook buys Friendfeed)

…is that someone needs to make money out of them. It was inevitable at some point that Friendfeed would take a route that lead it towards mass adoption and away from the needs of the (rather small) community of researchers that have found a niche that works well for them. I had thought it more likely that Friendfeed would gradually move away from the aspects that researchers found attractive rather than being absorbed wholesale by a bigger player but then I don’t know much about how Silicon Valley really works. It appears that Friendfeed will continue in its current form as the two companies work out how they might integrate the functionality into Facebook but in the long term it seems unlikely that current service will survive. In a sense the sudden break may be a good thing because it forces some of the issues about providing this kind of research infrastructure out into the open in a way a gradual shift probably wouldn’t.

What is about Friendfeed that makes it particularly attractive to researchers? I think there are a couple of things, based more on hunches than hard data but in comparing with services like Twitter and Facebook there are a couple of things that standout.

  1. Conversations are about objects. At the core of the way Friendfeed works are digital objects, images, blog posts, quotes, thoughts, being pushed into a shared space. Most other services focus on the people and the connections between them. Friendfeed (at least the way I use it) is about the objects and the conversations around them.
  2. Conversation is threaded and aggregated. This is where Twitter loses out. It is almost impossible to track a specific conversation via Twitter unless you do so in real time. The threaded nature of FF makes it possible to track conversations days or months after they happen (as long as you can actually get into them)
  3. Excellent “person discovery” mechanisms. The core functionality of Friendfeed means that you discover people who “like” and comment on things that either you, or your friends like and comment on. Friendfeed remains one of the most successful services I know of at exploiting this “friend of a friend” effect in a useful way.
  4. The community. There is a specific community, with a strong information technology, information management, and bioinformatics/structural biology emphasis, that grew up and aggregated on Friendfeed. That community has immense value and it would be sad to lose it in any transition.

So what can be done? One option is to set back and wait to be absorbed into Facebook. This seems unlikely to be either feasible or popular. Many people in the FF research community don’t want this for reasons ranging from concerns about privacy, through the fundamentals of how Facebook works, to just not wanting to mix work and leisure contacts. All reasonable and all things I agree with.

We could build our own. Technically feasible but probably not financially. Lets assume a core group of say 1000 people (probably overoptimistic) each prepared to pay maybe $25 a year subscription as well as do some maintenance or coding work. That’s still only $25k, not enough to pay a single person to keep a service running let alone actually build something from scratch. Might the FF team make some of the codebase Open Source? Obviously not what they’re taking to Facebook but maybe an earlier version? Would help but there would still need to be either a higher subscription or many more subscribers to keep it running I suspect. Chalk one up for the importance of open source services though.

Reaggregating around other services and distributing the functionality would be feasible perhaps. A combination of Google Reader, Twitter, with services like Tumblr, Posterous, and StoryTlr perhaps? The community would be likely to diffuse but such a distributed approach could be more stable and less susceptible to exactly this kind of buy out. Nonetheless these are all commercial services that can easily dissappear. Google Wave has been suggested as a solution but I think has fundamental differences in design that make it at best a partial replacement. And it would still require a lot of work.

There is a huge opportunity for existing players in the Research web space to make a play here. NPG, Research Gate, and Seed, as well as other publishers or research funders and infrastructure providers (you know who you are) could fill this gap if they had the resource to build something. Friendfeed is far from perfect, the barrier to entry is quite high for most people, the different effective usage patterns are unclear for new users. Building something that really works for researchers is a big opportunity but it would still need a business model.

What is clear is that there is a signficant community of researchers now looking for somewhere to go. People with a real critical eye for the best services and functionality and people who may even be prepared to pay something towards it. And who will actively contribute to help guide design decisions and make it work. Build it right and we may just come.