Home » Blog

Why the web of data needs to be social

24 May 2010 409 views View Comments
Picture1
Image by cameronneylon via Flickr

If you’ve been around either myself or Deepak Singh you will almost certainly have heard the Jeff Jonas/Jon Udell soundbite: ‘Data finds data. Then people find people’. Jonas is referring to data management frameworks and knowledge discovery and Udell is referring to the power of integrated data to bring people together.

At some level Jonas’ vision (see his chapter[pdf] in Beautiful Data) is what the semantic web ought to enable, the automated discovery of data or objects based on common patterns or characteristics. Thus far in practical terms we have signally failed to make this a reality, particularly for research data and objects.

Udell’s angle (or rather, my interpretation of his overall stance) is more linked to the social web – the discovery of common contexts through shared data frameworks. These contexts might be social groups, as in conventional social networks, a particular interest or passion, or – in the case of Jon’s championing of the iCalendar standard –  a date and place as demonstrated by the  the elmcity project supporting calendar curation and aggregation. Shared context enables the making of new connection, the creation of new links. But still mainly links between people.

It’s not the scientists who are social; it’s the data – Neil Saunders

The naïve analysis of the success of consumer social networks and the weaknesses of science communication has lead to efforts that almost precisely invert the Jonas/Udell concept. In the case of most of these “Facebooks for Scientists” the idea is that people find people, and then they connect with data through those people.

My belief is that it is this approach that has led to the almost complete failure of these networks to gain traction. Services that place the object  research at the centre; the reference management and bookmarking services, to some extent Twitter and Friendfeed, appear to gain much more real scientific use because they mediate the interactions that researchers are interested in, those between themselves and research objects. Friendfeed in particular seems to support this discovery pattern. Objects of interest are brought into your stream, which then leads to discovery of the person behind them.  I often use Citeulike in this mode. I find a paper of interest, identify the tags other people have used for it and the papers that share those tags. If these seems promising, I then might look at the library of the person, but I get to that person through the shared context of the research object, the paper, and the tags around that object.

Data, data everywhere, but not a lot of links – Simon Coles

A common complaint made of research data is that people don’t make it available. This is part of the problem but increasingly it is a smaller part. It is easy enough to put data up that many researchers are doing so, in supplementary data of journal articles, on personal websites, or on community or consumer sites. From a linked data perspective we ought to be having a field day with this, even if it represents only a small proportion of the total. However little of this data is easily discoverable and most of it is certainly not linked in any meaningful way.

A fundamental problem that I feel like I’ve been banging on about for years now is that dearth of well built tools for creating these links. Finally these tools are starting to appear with Freebase Gridworks being an early example. There is a good chance that it will become easier over time for people to create links as part of the process of making their own record. But the fundamental problems we always face, that this is hard work, and often unrewarded work, are limiting progress.

Data friends data…then knowledge becomes discoverable

Human interaction is unlikely to work at scale. We are going to need automated systems to wire the web of data together. The human process simply cannot keep up with the ongoing annotation and connection of data at the volumes that are being generated today. And we can’t afford not to if we want to optimize the opportunities of research to deliver useful outcomes.

When we think about social networks we always place people at their centre. But there is nothing to stop us replacing people with data or other research objects. Software that wants to find data, data that wants to find complementary or supportive data, or wants to find the right software to convert or analyze it. Instead of Farmville or Mafia Wars imagine useful tools that make these connections, negotiate content, and identify common context. As pointed out to me by Paul Walk this is very similar to what was envisioned in the 90s as the role of software agents. In this view the human research users are the poorly connected users on the outskirts of the web.

The point is that the hard part of creating linked data is making the links, not publishing the data. The semantic web has always suffered from the chicken and egg problem of a lack of user-friendly ways to generate RDF and few tools that could really use that RDF in exciting ways even if it did exist. I still can’t do a useful search on which restaurants in Bath will be open next Sunday. The reality is that the innards of this should be hidden from the user, the making of connections needs to be automated as far as possible, and as natural as possible when the user has to be involved. As easy as hitting that “like” button, or right clicking and adding a citation.

We have learnt a lot about the principles of when and how social networks work. If we can apply those lessons to the construction of open data management and discovery frameworks then we may stand some chance of actually making some of the original vision of the web work.

Reblog this post [with Zemanta]
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

View Comments »

  • Bill Hooker said:

    "Like" button insufficient. Need "this post makes me want to have Cameron’s babies" button.

    This comment was originally posted on FriendFeed

  • Egon Willighagen said:

    @Bill, hey, I’m trying to work here… :)

    This comment was originally posted on FriendFeed

  • Graham Steel said:

    @Bill LOL

    This comment was originally posted on FriendFeed

  • Martin Fenner said:

    Focussing on data instead of people will also help with privacy (something I thought about this weekend). I have much less problems sharing my data and connecting them to other data, than with personal connections.

    This comment was originally posted on FriendFeed

  • neilfws said:

    Liked “Why the web of data needs to be social” http://ff.im/-kQR6F

    This comment was originally posted on Twitter

  • Cameron Neylon said:

    That’s an interesting illustration of my point. Can you expand on why one is more comfortable on the other?

    This comment was originally posted on FriendFeed

  • Martin Fenner said:

    There are (at least) two aspects to privacy: a) personal information about yourself (including but not limited to date of birth, contact information and personal interests) and b) information from a research project that you don’t want to be public (because the data are unpublished, you want to patent this, etc.). I personally place a higher value on the privacy I described in a).

    This comment was originally posted on FriendFeed

  • Tweets that mention Science in the Open » Blog Archive » Why the web of data needs to be social -- Topsy.com said:

    [...] This post was mentioned on Twitter by AJCann, Neil Saunders. Neil Saunders said: Liked "Why the web of data needs to be social" http://ff.im/-kQR6F [...]

  • Greg Tyrelle said:

    You have clearly articulated how current social networking for scientists is a failure and identified they way to make it work: social data. Excellent.

    This comment was originally posted on FriendFeed

  • Greg Tyrelle said:

    You have clearly articulated the key issue for the failure of social networking for scientists and identified they way to make it work: social data. Excellent.

    This comment was originally posted on FriendFeed

  • Greg Tyrelle said:

    You have clearly articulated how current social networking for scientists is a failure and identified the way to make it work: social data. Excellent.

    This comment was originally posted on FriendFeed

  • gtyrelle said:

    http://is.gd/colEl Why the web of data needs to be social

    This comment was originally posted on Twitter

  • Berci said:

    Science in the Open » Blog Archive » Why the web of data needs to be social http://ff.im/-kSGXU

    This comment was originally posted on Twitter

  • Cameron Neylon said:

    Martin, so its more about the types of information? I was wondering whether there might be an underlying sense of not wanting to be bothered or feeling uncomfortable when the transaction is about you, rather than it being something that just happens to the data anyway rather than it being particularly about the DOB or phone number. I certainly feel somewhat more inclined to "pure" data interactions rather then personal ones. That of course may just be a personality defect on my part :-)

    This comment was originally posted on FriendFeed

  • TwistedBacteria said:

    Liked “Why the web of data needs to be social” http://ff.im/-kQR6F

    This comment was originally posted on Twitter

  • D0r0th34 said:

    One’s social graph is also emerging as privacy-damaging information; there have been cases of individuals’ sexual preferences being outed (or at least outable), professional connections being made public that the individual would have preferred to keep private, etc. As for the basic proposition here, somebody (boyd or Stutzman?) has already posited two basic types of social networks: those that emerge around objects and their enthusiasts (e.g. Flickr), and those that emerge around social connections (FF, Facebook). The former appear to be easier to build than the latter.

    This comment was originally posted on FriendFeed

  • D0r0th34 said:

    One aspect of Cameron’s proposal that appeals to me personally is that it may help cut through unconscious prejudice, orchestra-audition style. If the data one finds are useful, one may cavil less at the gender or ethnic identity of the data’s maker(s).

    This comment was originally posted on FriendFeed

  • blJOg said:

    Liked: Why the web of data needs to be social http://bit.ly/cPnKmk

    This comment was originally posted on Twitter

  • Cameron Neylon said:

    D, can you point me in the direction of refs for those categorisations. I haven’t really seen much of this. Would be interesting if my thoughts lined up with those categories, and indeed interesting if not.

    This comment was originally posted on FriendFeed

  • D0r0th34 said:

    I was afraid you’d ask that, Cameron. :) Let me look through my rats-nests at del.icio.us and Zotero.

    This comment was originally posted on FriendFeed

  • Christina Pikas said:

    the the ’round people vs ’round things wasn’t boyd but it was blogged by boyd.. I know I read that, too. The outing thing – i don’t think i read the paper, but maybe a NY times article about it? hmmm.

    This comment was originally posted on FriendFeed

  • Mickey Schafer said:

    Great article, Dorothea — the concept of object-oriented social linking does seem the strategy behind google’s "social" search option — user enters term of interest, and can choose to re-cast search in terms of social contacts (or timeline or wonder wheel which both generate objects that lead to new results) — or is that too narrow a view of "object"? Also, do we need a new scholarly genre term, something like "new scholarship" or "alternative scholarship" that better describes the function of the object (informative, scholarly, etc), leaving descriptors such as "blog" or "post" as references to publishing platforms?

    This comment was originally posted on FriendFeed

  • Christina Pikas said:

    an article about the social graph and de-anonymizing (more tech/process than social) is: "De-anonymizing social networks" on arxiv (pdf at: http://arxiv.org/PS_cache/arxiv/pdf/0903/0903.3276v1.pdf ) from the 30th IEEE Symposium on Security and Privacy, 2009

    This comment was originally posted on FriendFeed

  • Mr. Gunn said:

    Here’s the story on "Project Gaydar" aka determining sexual orientation via Facebook contacts: http://www.boston.com/bostonglobe/ideas/articles/2009/09/20/project_gaydar_an_mit_experiment_raises_new_questions_about_online_privacy/ It was a group at MIT that did it.

    This comment was originally posted on FriendFeed

  • joergkurtwegner said:

    OMG, @Bill, *snief*, that was funny, and @Cameron, what can I say, you are a genius … and I sooo agree!

    This comment was originally posted on FriendFeed

  • Cameron Neylon said:

    I should probably point out that I don’t much like children so someone promising _not_ to have them is much more appealing personally :-)

    This comment was originally posted on FriendFeed

  • D0r0th34 said:

    *snerk* I promised that years ago, Cameron — and backed it up with surgery. ;)

    This comment was originally posted on FriendFeed

  • Martin Fenner said:

    Next Monday there will be an interesting workshop related to the topic: Second Workshop on Trust and Privacy on the Social and Semantic Web, Heraklion, Greece http://spot.semanticweb.org/2010/

    This comment was originally posted on FriendFeed

  • Scrazzl said:

    Some nice points by Cameron http://cameronneylon.net/blog/why-the-web-of-data-needs-to-be-social/

    This comment was originally posted on Twitter

  • mrgunn said:

    @vahidm No worries, mate. For more info on the topic, I recommend Cameron Neylon: http://bit.ly/dlq8SP

    This comment was originally posted on Twitter

  • paoloman said:

    “Why the web of data needs to be social” http://is.gd/cBhQs Great post by @cameronneylon Data “friends” data, then connects people.

    This comment was originally posted on Twitter

  • [citation needed]» Blog Archive » elsewhere on the net said:

    [...] Cameron Neylon makes a nice case for the development of social webs for data mining. [...]

  • Science in the Open » Blog Archive » Capturing and connecting research objects: A pitch for @sciencehackday said:

    [...] Why the web of data needs to be social (cameronneylon.net) [...]

blog comments powered by Disqus
  • May 24, 2010 at 8:03 pm Cameron Neylon
    If you’ve been around either myself or Deepak Singh you will almost certainly have heard the Jeff Jonas/Jon Udell soundbite: ‘Data finds data. Then people find people’. The naïve analysis of the success of consumer social networks and the weaknesses of science communication has lead to efforts that almost precisely invert the Jonas/Udell concept. In the case of most of these “Facebooks for Scientists” the idea is that people find people, and then they connect with data through those people. But what if we built social networks for data, where they could interact, find neighbours, and play games amongst themselves?
  • May 24, 2010 at 8:49 pm Bill Hooker
    "Like" button insufficient. Need "this post makes me want to have Cameron's babies" button.
  • May 24, 2010 at 9:14 pm Egon Willighagen
    @Bill, hey, I'm trying to work here... :)
  • May 24, 2010 at 9:23 pm Graham Steel
    @Bill LOL
  • May 24, 2010 at 9:38 pm Martin Fenner
    Focussing on data instead of people will also help with privacy (something I thought about this weekend). I have much less problems sharing my data and connecting them to other data, than with personal connections.
  • May 25, 2010 at 8:03 am Cameron Neylon
    That's an interesting illustration of my point. Can you expand on why one is more comfortable on the other?
  • May 25, 2010 at 8:21 am Martin Fenner
    There are (at least) two aspects to privacy: a) personal information about yourself (including but not limited to date of birth, contact information and personal interests) and b) information from a research project that you don't want to be public (because the data are unpublished, you want to patent this, etc.). I personally place a higher value on the privacy I described in a).
  • May 25, 2010 at 9:05 am Greg Tyrelle
    You have clearly articulated how current social networking for scientists is a failure and identified the way to make it work: social data. Excellent.
  • May 25, 2010 at 9:48 am Cameron Neylon
    Martin, so its more about the types of information? I was wondering whether there might be an underlying sense of not wanting to be bothered or feeling uncomfortable when the transaction is about you, rather than it being something that just happens to the data anyway rather than it being particularly about the DOB or phone number. I certainly feel somewhat more inclined to "pure" data interactions rather then personal ones. That of course may just be a personality defect on my part :-)
  • May 25, 2010 at 3:32 pm Cameron Neylon
    D, can you point me in the direction of refs for those categorisations. I haven't really seen much of this. Would be interesting if my thoughts lined up with those categories, and indeed interesting if not.
  • May 25, 2010 at 3:42 pm Christina Pikas
    the the 'round people vs 'round things wasn't boyd but it was blogged by boyd.. I know I read that, too. The outing thing - i don't think i read the paper, but maybe a NY times article about it? hmmm.
  • May 25, 2010 at 3:51 pm Mickey Schafer
    Great article, Dorothea -- the concept of object-oriented social linking does seem the strategy behind google's "social" search option -- user enters term of interest, and can choose to re-cast search in terms of social contacts (or timeline or wonder wheel which both generate objects that lead to new results) -- or is that too narrow a view of "object"? Also, do we need a new scholarly genre term, something like "new scholarship" or "alternative scholarship" that better describes the function of the object (informative, scholarly, etc), leaving descriptors such as "blog" or "post" as references to publishing platforms?
  • May 25, 2010 at 4:25 pm Christina Pikas
    an article about the social graph and de-anonymizing (more tech/process than social) is: "De-anonymizing social networks" on arxiv (pdf at: http://arxiv.org/PS_cache/arxiv/pdf/0903/0903.3276v1.pdf ) from the 30th IEEE Symposium on Security and Privacy, 2009
  • May 25, 2010 at 5:58 pm Mr. Gunn
    Here's the story on "Project Gaydar" aka determining sexual orientation via Facebook contacts: http://www.boston.com/bostonglobe/ideas/articles/2009/09/20/project_gaydar_an_mit_experiment_raises_new_questions_about_online_privacy/ It was a group at MIT that did it.
  • May 25, 2010 at 6:44 pm joergkurtwegner
    OMG, @Bill, *snief*, that was funny, and @Cameron, what can I say, you are a genius ... and I sooo agree!
  • May 25, 2010 at 7:10 pm Cameron Neylon
    I should probably point out that I don't much like children so someone promising _not_ to have them is much more appealing personally :-)
  • May 26, 2010 at 6:31 pm Martin Fenner
    Next Monday there will be an interesting workshop related to the topic: Second Workshop on Trust and Privacy on the Social and Semantic Web, Heraklion, Greece http://spot.semanticweb.org/2010/

Add a comment on FriendFeed