Harvesting context #1: Flickr comments

Instead of idly waiting for visitors to stumble over their holdings on some lonely information by-way,  archives are starting to push their content out into the bustling metropolis of the social web. They are going where the people are. Photographic collections, in particular, are gaining new lives and new audiences thanks to Flickr.

But that’s only part of the story. Released into the wild, these photos are slowly picking up the habits of the locals. They are making friends, building connections, even speaking with new accents and dialects. Commented, tagged, organised, linked – they are building new contexts for themselves outside of the cloying control of archival descriptive systems.

Unfortunately it seems there is often a chasm between the old lives of the photos, documented in databases and finding aids, and their new post-institutional careers. This is a pity because the new contexts they are gathering can help us both understand and find them. What can we do to overcome this divide? How could finding aids harvest and display the user-generated content that aggregates around collection items living in the outside world?

The good news is that the tools to start doing this already exist – Flickr has a powerful API that makes it easy to extract photo metadata. Time for a bit of experimenting…

The first result is a userscript that displays Flickr comments in a number of collection databases. Just install it and then try it out:

  • National Archives of Australia Photosearch – try it!
  • State Records NSW Photo Investigator – try it!
  • National Archives and Records Administration ARC – try it!
Flickr comments in PhotoSearch

Flickr comments in PhotoSearch

Gory details follow…

So to begin with I thought I’d just harvest comments from Flickr and display them within existing collection interfaces. As before (here and here), Greasemonkey was my tool of choice for hacking finding aids. The plan was to trigger a Greasemonkey script when you arrive at a photo in a collection database, the script would then:

  • extract a unique identifier for the photo that could be used to find it in Flickr
  • send off a request through the Flickr API to see if the photo was there
  • if so, then fire off another request to retrieve any comments
  • format the comments and insert them at a suitable point in the DOM of the database page

Easy! Obviously for the script to work there needed to be a way of connecting entries in the database with photos on Flickr. In practice this means that the photos need to be described at item level, and that a unique identifier needs to be used somewhere in the description of the photo both on Flickr and in the collection database.

Any archive that meets these criteria is a candidate for inclusion. Only three pieces of information are necessary:

  • the institution’s Flickr id
  • an expression to extract the identifier from the database page
  • an expression to identify the point on the database page at which the comments should be inserted

The expressions could use XPath or regular expressions – whatever it takes to find the desired elements. I’m using JQuery, so that makes selecting elements a lot easier. For example, NARA ARC includes the item identifier in a div with the class ‘arcID’, so I just select that element using JQuery and then use regex matching to pull out the number:

this.identifier = $('.arcID').text().match(/ARC Identifier (\d+)/i)[1];

To start with I’ve included the databases of three institutions:

This is the code to save the settings for each institution:

<br />
if (document.location.href.match(/naa.gov.au\/scripts\/PhotoSearchItemDetail.asp/i)) {<br />
this.name = 'NAA';<br />
this.identifier = document.location.href.match(/M=0&#038;B=(\d+)/)[1];<br />
this.flickrId = '24849862@N08';<br />
this.position = 'table:last';<br />
} else if (document.location.href.match(/records.nsw.gov.au\/asp\/photosearch\/photo\.asp\?/i)) {<br />
this.name = 'StateRecordsNSW';<br />
this.identifier = document.location.href.match(/photo\.asp\?([\d\w_]+)/i)[1];<br />
this.flickrId = '27331537@N06';<br />
this.position = 'table:first';<br />
} else if (document.location.href.match(/arcweb.archives.gov\/arc\/action\/ShowFullRecord|arcweb.archives.gov\/arc\/action\/ExternalIdSearch/i)) {<br />
this.name = 'NARA';<br />
this.identifier = $('.arcID').text().match(/ARC Identifier (\d+)/i)[1];<br />
this.flickrId = '35740357@N03';<br />
this.position = '.genPad:first';<br />
}<br />

From there it’s just a matter of building the calls to the API using Greasemonkey’s built-in  GM_xmlhttpRequest method. Once the comments are retrieved, they’re given some basic formatting and inserted at the point in the DOM identified by the siteDetails.position property. Once again, JQuery greatly simplifies all the DOM manipulation. If there are no comments then a suitable message is inserted together with a link to the photo in Flickr. Finally some CSS is added to prettify it all a little bit.

You can view the full code on the Userscripts site.

Of course, it would be good to have this sort of stuff happening on the server side. In fact, with a few small modifications, this script could just be dropped into the code of any of the collection databases I’ve used. But in the meantime, Greasemonkey gives us a chance to play around with some of the possibilities – to start thinking about what finding aids might be like.

So what’s next? I’d like to do some playing around with tags and locations, perhaps using them to suggest related photos. I’ve also just realised that Flickr machine tags allow semantic markup… hmmm…

If you have any suggestions for databases to add to this script – let me know!

Share this:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • StumbleUpon
  • Tumblr

Comments 6

  1. Andrew G wrote:

    Hello, this sounds great and I would like to try it, however the technical details are a bit beyond me. Can you give me a hint?

    1. installed Greasemonkey on fFox
    2. chose notepad as text editor
    3. copied full code into the text program that comes up
    4. navigated to the web page, added to Greasemonkey as included page.
    Not working. What do I do now?
    Thanks.

    Posted 26 Aug 2009 at 10:40 am
  2. tim wrote:

    Andrew, if you have Greasemonkey installed, then all you have to do is go to the userscript page – http://userscripts.org/scripts/show/56135 – and click on ‘Install’.

    Posted 26 Aug 2009 at 10:49 am
  3. Mark A. Matienzo wrote:

    Hi Tim, I’m inclined to try hammering on your Greasemonkey script to pull comments from NYPL’s Digital Gallery images that also happen to be Flickr. I’ll let you know if I get anywhere; if you don’t hear from me, feel free to get a jump on it and let me know! :)

    Posted 27 Aug 2009 at 3:11 pm
  4. tim wrote:

    Too easy! How about:

    } else if (document.location.href.match(/digitalgallery.nypl.org\/nypldigital\/dgkeysearchdetail\.cfm/i)) {
    this.name = ‘NYPL’;
    this.identifier = document.location.href.match(/imageID=(\d+)/i)[1];
    this.flickrId = ‘32951986@N05′;
    this.position = ‘#metadata’;
    }

    Posted 27 Aug 2009 at 10:20 pm
  5. Larry Cebula wrote:

    This is a neat trick, the only to its usefulness is the quality of the Flickr comments–which is, shall we say, uneven. For example your first “Try It!” link goes to an Australian National Archives photo titled “Natural Disasters- Dust Storm at Broken Hill.” By installing Greasemonkey and your script the user is now able to read the comment from SandyEm: “Whaohhh massive! Hate to clean up after that! Great snap for the time.”

    On the other had the second example has produced a modern Google Street view of the historic building in the photo, and a link to a story about an alleged haunting in that building, so there is some useful metadata there.

    There is much giddyness about the Flickr partnership with various archives but until the Flickr software allows comments and annotations to be rated for usefulness, there is always going to be more wheat than chaff in the comments.

    I posted about the problem on my blog, see: “Lick This”: LOC, Flickr, and the Limits of Crowd Sourcing: http://northwesthistory.blogspot.com/2009/06/lick-this-loc-flickr-and-limits-of.html

    Posted 29 Sep 2009 at 2:55 am
  6. Larry Cebula wrote:

    Edit: Ooops, make that “the only LIMIT to its usefulness…”

    Posted 29 Sep 2009 at 2:56 am

Post a Comment

Your email is never published nor shared. Required fields are marked *