provenance – Science in the Open

April 16, 2009December 30, 2009

Use Cases for Provenance – eScience Institute – 20 April

On Monday I am speaking as part of a meeting on Use Cases for Provenance (Programme), which has a lot of interesting talks scheduled. I appear to be last. I am not sure whether that means I am the comedy closer or the pre-dinner entertainment. This may, however, be as a result of the title I chose:

In your worst nightmares: How experimental scientists are doing provenance for themselves

On the whole experimental scientists, particularly those working in traditional, small research groups, have little knowledge of, or interest in, the issues surrounding provenance and data curation. There is however an emerging and evolving community of practice developing the use of the tools and social conventions related to the broad set of web based resources that can be characterised as “Web 2.0”. This approach emphasises social, rather than technical, means of enforcing citation and attribution practice, as well as maintaining provenance. I will give examples of how this approach has been applied, and discuss the emerging social conventions of this community from the perspective of an insider.

The meeting will be webcast (link should be available from here) and my slides will with any luck be up at least a few minutes before my talk in the usual place.

April 9, 2008December 30, 2009

Provenance, identity, and Google App Engine

Image by davemc500hats via Flickr

One thing that has been becoming clearer and clearer to me is the need to an agreed central authority for identify. This is one thing, possibly the one thing, that needs to be absolutely secure and inviolable for Open Science to work. Trust relies on provenance. Attribution, which is at the heart of open practice, relies on provenance. And pulling all of my data together from multiple streams in an automatic way relies on a record of their provenance.

There has been a lot of discussion about Google App Engine but two posts in particular have collided for me. First was the first blog post about an app I saw off the rank which uses a Google account to access an open id. Useful and cool. Secondly was the emphasis in another post from David Recordon about Google Apps as a potential Facebook Killer that the access to Google accounts is a key part of the offering.

Then I realised. Google already probably had the highest penetration as a validator of identities but these only really provide access to Google services. OpenID is great in principle but is not perhaps getting the traction it needs to go global. But all of that now just goes away. If you can write an app to authenticate someone via Google and then link that to OpenID you can do it for anything. Google have just positioned themselves to be the de facto provider of identities. And they may have solved the provenance problem into the bargain.

Picture and Wikipedia links via Zemanta.

March 25, 2008December 30, 2009

Semantics in the real world? Part I – Why the triple needs to be a quint (or a sext, or…)

I’ve been mulling over this for a while, and seeing as I am home sick (can’t you tell from the rush of posts?) I’m going to give it a go. This definitely comes with a health warning as it goes way beyond what I know much about at any technical level. This is therefore handwaving of the highest order. But I haven’t come across anyone else floating the same ideas so I will have a shot at explaning my thoughts.

The Semantic Web, RDF, and XML are all the product of computer scientists thinking about computers and information. You can tell this because they deal with straightforward declarations that are absolute. X has property Y. Putting aside all the issues with the availability of tools and applications, the fact that triple stores don’t scale well, regardless of all the technical problems a central issue with applying these types of strategy to the real world is that absolutes don’t exist. I may assert that X has property Y, but what hppens when I change my mind, or when I realise I made a mistake, or when I find out that the underlying data wasn’t taken properly. How do we get this to work in the real world? Continue reading “Semantics in the real world? Part I – Why the triple needs to be a quint (or a sext, or…)”