NSF – Science in the Open

: Image via Wikipedia

On April 26 I am attending a joint meeting of the NSF and EuroHORCS (European Heads of Research Councils) on “Changing the Conduct of Science in the Information Age”. I have been asked to submit a one page white paper in advance of the meeting and have been struggling a bit with this. This is stage one, a draft document relating to researcher identifiers. I’m not happy with it but reckon that other people out there may well be able to help where I am struggling. I may write a second one on metrics or at least a brief literature collection. Any and all comments welcome.

Summary

Citation lies at the core of research practice, recognizing both the contributions that others have made in the development of a specific piece of work and in linking related knowledge together. The technology of the web make it technically feasible to radically improve the precision, accuracy, and completeness of these links. Such improvements are crucial to the successful implementation of any system that purports to measure research outputs or quality.

Additionally the web offers the promise of extended and distributed projects involving diverse parties, many of which may not be professional researchers. Nonetheless such contributions deserve the same level of recognition as comparable contributions from professional researchers. Providing an open system in which people can contribute to research efforts and receive credit raises significant social issues of control and validation of identities.

Such open systems and federated systems are exposed to potential failure through lack of technical expertise on the part of users, particularly where a person loses or creates a new identity. This in many ways is already the case where we use institutional email addresses as proxies for researcher identity. Is D.C.Neylon@####.ac.uk (a no longer functioning email address), the same person as Cameron.Neylon@####.ac.uk? It is technically feasible to consider systems in which the actual identity token used is widely available and compatible with the wider consumer web but centralised and trusted authorities provide validation services to confirm specific claims around identity. Such a broker or clearing house would provide a similar role for identities as CrossRef provides for scholarly articles via the DOI.

General points

By adding the concept of a semantic-web ready researcher identifier, i.e. an identifier that provides a URL endpoint that uniquely represents a specific researcher, the existing technical capacity of the semantic web stack can be used to provide a linked data representation of contributions to existing published research objects that are accessible at URL endpoints. Such a representation could be readily expanded beyond authorship to funder contributions.
Crediting a researcher as a contributor to a specific object published on the web is a specific form of citation or linking in this view
The authoring tools to support the linking and publishing of research objects in this form do not currently exist in a widely useable form.
Semantic web technology provides an extensible means to adding and recognising diverse forms of contribution.

Authorisation, validation, and control

Access to such identifiers must be under the control of the researcher and not limited to those with institutional affiliations. Any person must be able to obtain and control a unique researcher identifier that refers to them.
Authorisation and validation of claimed rights of access or connections to specific institutions can be technically handled separately from the provision of identifiers.

Technical and legal issues

OpenID and OAuth provide a developing internet standard that provides technical means to achieve a distributed availability of identifiers and to separate issues of authentication from those of identification. They are a current leader for federated identity and authorisation solutions on the consumer web.
OpenID and OAuth do not currently provide the levels of security required in several jurisdictions for personal or sensitive information (e.g. UK data protection act).Â Such federated systems may fall foul of jurisdictions with strong generic privacy requirements, e.g. Canada
To interoperate with the wider web and enable a wider notion of citation as a declaration of a piece of knowledge â€œPerson X authored Paper Yâ€, identities must resolve on the web, in the sense of being a clickable hyperlink that takes a human or machine reader to a page containing information representing that person.

Social issues

There are profound social issues of trust in the maintenance of such identifiers, especially for non-professional researchers in the longer term.
A centralised trusted authority (or authorities) that validates specific claims about identity (a â€œCrossRef for peopleâ€) might provide a trusted broker for identity transactions in the research space that solves many of these trust problems.
Issues around trust and jurisdiction as well as scope and control are likely to limit and fragment any effort to coordinate, federate, or integrate differing identity solutions in the research space. Therefore interoperability of any developed system with the wider web must be a prime consideration.

Conclusions

Identity, unique identifiers, authorisation of access and validation of claims are issues that need to be solved before any transparent and believable metric systems can be reliably implemented. In the current world ill-considered, non-transparent, and irreproducible metric systems will almost inevitably lead to legal claims. At the same time there is a massive opportunity for wider involvement in research for which a much more diverse range of peopleâ€™s contributions will need recognition.

A system in which recognition and citation takes the form of a link to a specified address on the web that represents a person has the potential to simultaneously make it much easier to unambiguously confirm who is being credited but additionally provides the opportunity to leverage an existing stack of tools and services to aggregate and organize information relating to identity. This is in fact a specific example of a wider view of addressable research objects on the web that can be part of a web of linked data. In this view a person is simply another object that can have specified relationships (links) to other objects.

Partial technical solutions in the form of OAuth, and OpenID exist that solve some subset of these problems. However these systems are currently not technically secure to a level compatible with handling the transfer of sensitive data. However they can interoperate with more secure transfer systems. They provide a federated and open system that enables any person to obtain and assert an identity and to control the appearance of that identity. Severe social issues around trust and persistence exist for this kind of system. This may be addressed through trusted centralized repositories that can act as a reliable broker.

Given expected issues with uptake of any system, systems that are interoperable with competitive or complementary offerings are crucial.

Tag: NSF

Draft White Paper – Researcher identifiers