Home » Blog

Contributor IDs – an attempt to aggregate and integrate

15 February 2009 26 Comments

Following on from my post last month about using OpenID as a way of identifying individual researchers,  Chris Rusbridge made the sensible request that when conversations go spreading themselves around the web it would be good if they could be summarised and aggregated back together. Here I am going to make an attempt to do that – but I won’t claim that this is a completely unbiased account. I will try to point to as much of the conversation as possible but if I miss things out or misprepresent something please correct me in the comments or the usual places.

The majority of the conversation around my post occured on friendfeed, at the item here, but also see commentary around Jan Aert’s post (and friendfeed item) and Bjoern Bremb’s summary post. Other commentary included posts from Andy Powell (Eduserv), Chris Leonard (PhysMathCentral), Euan, Amanda Hill of the Names project, and Paul Walk (UKOLN). There was also a related article in Times Higher Education discussing the article (Bourne and Fink) in PLoS Comp Biol that kicked a lot of this off [Ed – Duncan Hull also pointed out there is a parallel discussion about the ethics of IDs that I haven’t kept up with – see the commentary at the PLoS Comp Biol paper for examples]. David Bradley also pointed out to me a post he wrote some time ago which touches on some of the same issues although from a different angle. Pierre set up a page on OpenWetWare to aggregate material to, and Martin Fenner has a collected set of bookmarks with the tag authorid at Connotea.

The first point which seems to be one of broad agreement is that there is a clear need for some form of unique identifier for researchers. This is not necessarily as obvious as it might seem. With many of these proposals there is significant push back from communities who don’t see any point in the effort involved. I haven’t seen any evidence of that with this discussion which leads me to believe that there is broad support for the idea from researchers, informaticians, publishers, funders, and research managers. There is also strong agreement that any system that works will have to be credible and trustworthy to researchers as well as other users, and have a solid and sustainable business model. Many technically minded people pointed out that building something was easy – getting people to sign up to use it was the hard bit.

Equally, and here I am reading between the lines somewhat, any workable system would have to be well designed and easy to use for researchers. There was much backwards and forwards about how “RDF is too hard”, “you can’t expect people to generate FOAF” and “OpenID has too many technical problems for widespread uptake”. Equally people thinking about what the back end would have to look like to even stand a chance of providing an integrated system that would work felt that FOAF, RDF, OAuth, and OpenID would have to provide a big part of the gubbins. The message for me was that the way the user interface(s) is presented have to be got right. There are small models of aspects of this that show that easy interfaces can be built to capture sophisticated data, but getting it right at scale will be a big challenge.

Where there is less agreement is on the details, both technical and organisational of how best to go about creating a useful set of unique identifiers. There was some to-and-fro as to whether CrossRef was the right organisation to manage such a system. Partly this turned on concern over centralised versus distributed systems and partly over issues of scope and trust. Nonetheless the majority view appeared to be that CrossRef would be right place to start and CrossRef do seem to have plans in this direction (from Geoffry Bilder see this Friendfeed item).

There was also a lot of discussion around identity tokens versus authorisation. Overall it seemed that the view was that these can be productively kept separate. One of the things that appealed to me in the first instance was that OpenIDs could be used as either tokens (just a unique code that is used as an identifier) as well as a login mechanism. The machinery is already in place to make that work. Nonetheless it was generally accepted, I think, that the first important step is an identifier. Login mechansisms are not necessarily required, or even wanted, at the moment.

The discussion as to whether OpenID is a good mechanism seemed in the end to go around in circles. Many people brought up technical problems they had with getting OpenIDs to work, and there are ongoing problems both with the underlying services that support and build on the standard as well as with the quality of some of the services that provide OpenIDs. This was at the core of my original proposal to build a specialist provider, that had an interface, and functionality that worked for researchers. As Bjoern pointed out, I should of course be applying my own five criteria for successful web services (got to the last slide) to this proposal. Key questions: 1) can it offer something compelling? Well no, not unless someone, somewhere requires you to have this thing 2) can you pre-populate? Well yes, and maybe that is the key…(see later). In the end, as with the concern over other “informatics-jock” terms and approaches, the important thing is that all of the technical side is invisible to end users.

Another important discussion, that again, didn’t really come to a conclusion, was who would pass out these identifiers? And when? Here there seemed to be two different perspectives. Those who wanted the identifiers to be completely separated from institutional associations, at least at first order. Others seemed concerned that access to identifiers be controlled via institutions. I definitely belong in the first camp. I would argue that you just give them to everyone who requests them. The problem then comes with duplication, what if someone accidentally (or deliberately) ends up with two or more identities. At one level I don’t see that it matters to anyone except to the person concerned (I’d certainly be trying to avoid having my publication record cut in half). But at the very least you would need to have a good interface for merging records when it was required. My personal belief is that it is more important to allow people to contribute than to protect the ground. I know others disagree and that somewhere we will need to find a middle path.

One thing that was helpful was the fact that we seemed to do a pretty good job of getting various projects in this space aggregated together (and possibly more aware of each other). Among these is ResearcherID, a commercial offering that has been running for a while now, the Names project, a collaboration of Mimas and the British Library funded by JISC, ClaimID is an OpenID provider that some people use that provides some of the flexible “home page” functionality (see Maxine Clark’s for instance) that drove my original ideas, PublicationsList.org provides an online homepage but does what ClaimID doesn’t, providing a PubMed search that makes it easier (as long as your papers are in PubMed) to populate that home page with your papers (but not easier to include datasets, blogs, or wikis – see here for my attempts to include a blog post on my page). There are probably a number of others, feel free to point out what I’ve missed!

So finally where does this leave us? With a clear need for something to be done, with a few organisations identified as the best ones to take it forward, and with a lot of discussion required about the background technicalities required. If you’re still reading this far down the page then you’re obviously someone who cares about this. So I’ll give my thoughts, feel free to disagree!

  1. We need an identity token, not an authorisation mechanism. Authorisation can get easily broken and is technically hard to implement across a wide range of legacy platforms. If it is possible to build in the option for authorisation in the future then that is great but it is not the current priority.
  2. The backend gubbins will probably be distributed RDF. There is identity information all over the place which needs to be aggregated together. This isn’t likely to change so a centralised database, to my mind, will not be able to cope. RDF is built to deal with these kinds of problems and also allows multiple potential identity tokens to be pulled together to say they represent one person.
  3. This means that user interfaces will be crucial. The simpler the better but the backend, with words like FOAF and RDF needs to be effectively invisible to the user. Very simple interfaces asking “are you the person who wrote this paper” are going to win, complex signup procedures are not.
  4. Publishers and funders will have to lead. The end view of what is being discussed here is very like a personal home page for researchers. But instead of being a home page on a server it is a dynamic document pulled together from stuff all over the web. But researchers are not going to be interested for the most part in having another home page that they have to look after. Publishers in particular understand the value (and will get most value out of in the short term) unique identifiers so with the most to gain and the most direct interest they are best placed to lead, probably through organisations like CrossRef that aggregate things of interest across the industry. Funders will come along as they see the benefits of monitoring research outputs, and forward looking ones will probably come along straight away, others will lag behind. The main point is that pre-populating and then letting researchers come along and prune and correct is going to be more productive than waiting for ten millions researchers to sign up to a new service.
  5. The really big question is whether there is value in doing this specially for researchers. This is not a problem unique to research and one in which a variety of messy and disparate solutions are starting to arise. Maybe the best option is to sit back and wait to see what happens. I often say that in most cases generic services are a better bet than specially built ones for researchers because the community size isn’t there and there simply isn’t a sufficient need for added functionality. My feeling is that for identity that there is a special need, and that if we capture the whole research community that it will be big enough to support a viable service. There is a specific need for following and aggregating the work of people that I don’t think is general, and is different to the authentication issues involved in finance. So I think in this case it is worth building specialist services.

The best hope I think lies in individual publishers starting to disambiguate authors across their existing corpus. Many have already put a lot of effort into this. In turn, perhaps through CrossRef, it should be possible to agree an arbitrary identifier for each individual author. If this is exposed as a service it is then possible to start linking the information up. People can and will and the services will start to grow around that. Once this exists then some of the ideas around recognising referees and other efforts will start to flow.


26 Comments »

  • Duncan Hull said:

    Hi Cameron, nice summary. I think lots of people will back some kind of identity token as you call it, there are potentially big benefits for science – for example following what experts in your field publish – something that can be difficult to do right now.

    There is a parallel discussion here, which you haven’t touched on, concerning the ethics of identity. See for example Alon Korngreen’s comment on the “Bourne Identity paper. Some see identity on the web as an Orwellian nightmare, like the debate about identity cards (but worse) a la NO2ID.

    Perhaps we should be more careful about what we wish for on the Web?

  • Duncan Hull said:

    Hi Cameron, nice summary. I think lots of people will back some kind of identity token as you call it, there are potentially big benefits for science – for example following what experts in your field publish – something that can be difficult to do right now.

    There is a parallel discussion here, which you haven’t touched on, concerning the ethics of identity. See for example Alon Korngreen’s comment on the “Bourne Identity paper. Some see identity on the web as an Orwellian nightmare, like the debate about identity cards (but worse) a la NO2ID.

    Perhaps we should be more careful about what we wish for on the Web?

  • Cameron Neylon said:

    I take the point but is it better to have the illusion that the partial and incomplete information already on the web doesn’t already provide all the risks and none of the benefits? But this is the reason why I have concerns about a centralised system and controls over who is “allowed” to have such an ID.

    I don’t particularly want to support bean counting but I’d rather at least we had more sophisticated means of counting things beyond just the beans because the counters aren’t going to go away anytime soon. I really don’t get how a “tag” can make us less creative though. I do disagree with the metrics that were suggested in PLoS Comp Biol paper because I think any metric is going to break – but you need data to make decisions don’t you? Better data makes better decisions surely?

    Added the link in the link sections at top of the post.

  • Cameron Neylon said:

    I take the point but is it better to have the illusion that the partial and incomplete information already on the web doesn’t already provide all the risks and none of the benefits? But this is the reason why I have concerns about a centralised system and controls over who is “allowed” to have such an ID.

    I don’t particularly want to support bean counting but I’d rather at least we had more sophisticated means of counting things beyond just the beans because the counters aren’t going to go away anytime soon. I really don’t get how a “tag” can make us less creative though. I do disagree with the metrics that were suggested in PLoS Comp Biol paper because I think any metric is going to break – but you need data to make decisions don’t you? Better data makes better decisions surely?

    Added the link in the link sections at top of the post.

  • Amanda Hill said:

    Thanks for this summary Cameron – very helpful. Just one small correction – the Names Project is a collaboration between Mimas (a national data centre at The University of Manchester) and the British Library.

  • Amanda Hill said:

    Thanks for this summary Cameron – very helpful. Just one small correction – the Names Project is a collaboration between Mimas (a national data centre at The University of Manchester) and the British Library.

  • Cameron Neylon said:

    Ooops! Sorry Amanda, problem of writing in a hurry. Now corrected in the original text.

  • Cameron Neylon said:

    Ooops! Sorry Amanda, problem of writing in a hurry. Now corrected in the original text.

  • Cameron Neylon said:

    I’ve also set up a group on LinkedIn to try and connect up all the people interested in this discussion. If you would like to join drop me a line, either, here, at Friendfeed, or on LinkedIn.

  • Cameron Neylon said:

    I’ve also set up a group on LinkedIn to try and connect up all the people interested in this discussion. If you would like to join drop me a line, either, here, at Friendfeed, or on LinkedIn.

  • Mr. Gunn said:

    I’d like to join the group, as well.

  • Mr. Gunn said:

    I’d like to join the group, as well.

  • Candy Schwartz said:

    I would like to join the group, and am in Linked In. Thanks

  • Candy Schwartz said:

    I would like to join the group, and am in Linked In. Thanks

  • Sue Cook said:

    I would like to join your LinkedIn group please. I am http://www.linkedin.com/pub/4/232/17b on Linkedin

  • Sue Cook said:

    I would like to join your LinkedIn group please. I am http://www.linkedin.com/pub/4/232/17b on Linkedin

  • Peter Murray said:

    It is a fine line between having publishers take the lead and having openly reusable identifiers. I found that ResearcherID, for example, has some limiting conditions in its terms of service that make it unattractive as an identifier.

  • Peter Murray said:

    It is a fine line between having publishers take the lead and having openly reusable identifiers. I found that ResearcherID, for example, has some limiting conditions in its terms of service that make it unattractive as an identifier.

  • rpg said:

    I’ve sent you a linkedin invite.

  • rpg said:

    I’ve sent you a linkedin invite.

  • Gudmundur Thorisson said:

    My opinion is that yes, it would be a Bad Thing if we could be tracked everywhere we go online, as Duncan says (side note: aren’t we already!!), and it would surely be a step in the wrong direction if everybody was uni-laterally assigned an ID that we *had* to use for all sorts of services. Though such IDs (read: national ID cards) will be an inevitable part of certain government services, I suppose (but that’s another story).

    That said, a lot of the things we do want recognition for (be it papers, blog posts, database submissions, etc.) require identification, simple as that (the proverbial cake + eating/having it). To me, OpenID and related tech looks very attractive for many such application, as the user-centricity of it allows us to control where and how our online identity gets used. My 2 cents worth.

    By the way, this stuff is further explored on a recently-launched website, if you care to have a look: http://www.gen2phen.org/researcher-identification/

  • Gudmundur Thorisson said:

    My opinion is that yes, it would be a Bad Thing if we could be tracked everywhere we go online, as Duncan says (side note: aren’t we already!!), and it would surely be a step in the wrong direction if everybody was uni-laterally assigned an ID that we *had* to use for all sorts of services. Though such IDs (read: national ID cards) will be an inevitable part of certain government services, I suppose (but that’s another story).

    That said, a lot of the things we do want recognition for (be it papers, blog posts, database submissions, etc.) require identification, simple as that (the proverbial cake + eating/having it). To me, OpenID and related tech looks very attractive for many such application, as the user-centricity of it allows us to control where and how our online identity gets used. My 2 cents worth.

    By the way, this stuff is further explored on a recently-launched website, if you care to have a look: http://www.gen2phen.org/researcher-identification/

  • Stefano Bocconi said:

    Following Cameron’s invitation to suggest missing projects in the identifiers arena, I wanted to add to the list the OKKAM project, which is an EU project about assigning unique ids to entities (www.okkam.org). One of the goals is to provide an infrastructure (called Entity Name Server as an analogy to the DNS) that can be queried by describing an entity and that returns the entity’s unique id. Such an infrastructure will be maintained beyond the duration of the project probably by an ad-hoc foundation. One of the entities we are focusing on are authors and we would like to provide the identification token described in this article OR collaborate with parties that are capable to provide it. Elsevier is one of the project partners and could contribute with Scopus data to the OKKAM repository.

  • Stefano Bocconi said:

    Following Cameron’s invitation to suggest missing projects in the identifiers arena, I wanted to add to the list the OKKAM project, which is an EU project about assigning unique ids to entities (www.okkam.org). One of the goals is to provide an infrastructure (called Entity Name Server as an analogy to the DNS) that can be queried by describing an entity and that returns the entity’s unique id. Such an infrastructure will be maintained beyond the duration of the project probably by an ad-hoc foundation. One of the entities we are focusing on are authors and we would like to provide the identification token described in this article OR collaborate with parties that are capable to provide it. Elsevier is one of the project partners and could contribute with Scopus data to the OKKAM repository.

  • Cameron Neylon said:

    Stefano, thanks for the pointer. It would probably be good to bring this to the attention of Gudmunder as well. The website he mentions above is a good point to start aggregating information on these efforts. There is also a LinkedIn forum on unique identifiers for researchers which you might want to join and flag this effort at.

    Cheers

    Cameron

  • Cameron Neylon said:

    Stefano, thanks for the pointer. It would probably be good to bring this to the attention of Gudmunder as well. The website he mentions above is a good point to start aggregating information on these efforts. There is also a LinkedIn forum on unique identifiers for researchers which you might want to join and flag this effort at.

    Cheers

    Cameron