<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>discontents &#187; API</title>
	<atom:link href="http://discontents.com.au/tag/api/feed" rel="self" type="application/rss+xml" />
	<link>http://discontents.com.au</link>
	<description>working for the triumph of content over form, ideas over control, people over systems</description>
	<lastBuildDate>Wed, 16 May 2012 14:11:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Headline roulette</title>
		<link>http://discontents.com.au/shed/experiments/headline-roulette</link>
		<comments>http://discontents.com.au/shed/experiments/headline-roulette#comments</comments>
		<pubDate>Tue, 23 Mar 2010 12:26:29 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[experiments]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[Django]]></category>
		<category><![CDATA[games]]></category>
		<category><![CDATA[newspapers]]></category>
		<category><![CDATA[NLA]]></category>
		<category><![CDATA[Piston]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[screen scraping]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=834</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Headline+roulette&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2010-03-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/experiments/headline-roulette&amp;rft.language=English"></span>
I&#8217;ve been doing a fair bit of coding in recent weeks and I thought I&#8217;d better write a few details down before I forget about them. As previously noted, I&#8217;ve been gathering together various historical data sets for a project at the National Museum of Australia. One resource that I was keen on including was [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Headline+roulette&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2010-03-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/experiments/headline-roulette&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=834"><!-- &nbsp; --></abbr>
<p>I&#8217;ve been doing a fair bit of coding in recent weeks and I thought I&#8217;d better write a few details down before I forget about them.</p>
<p>As previously noted, I&#8217;ve been gathering together various historical data sets for a project at the National Museum of Australia. One resource that I was keen on including was the fantastic <a href="http://newspapers.nla.gov.au/ndp/del/home">Australian Newspapers</a> project at the National Library of Australia. What I had in mind was being able to give a sense of context to any historical event by calling up the headlines for that particular time.</p>
<p>Unfortunately there&#8217;s no API for the newspapers project (or Trove in general), though apparently it&#8217;s in the works. So I had to reverse engineer the advanced search page to work out the various query options, and then build a screen scraper to harvest the results. I played around with the search options a bit to fine tune the results, finally deciding to limit them to &#8216;news&#8217; articles with more than 1000 words. Annoyingly, only 10 results are returned at a time.</p>
<p>I had hoped to parse the results as xml, but a rogue &lt;br&gt; tag broke the XHTML, so I fell back on <a href="http://www.crummy.com/software/BeautifulSoup/">Beautiful Soup</a> – a Python module that makes screen scraping considerably easier by tidying up HTML structures. After than it was pretty straightforward. Soon I had <a href="http://bitbucket.org/wragge/nla-newspapers/">my own Python module</a> to query the newspapers database and process the results.</p>
<p>The next step was to use the module to build a simple API that would let us quickly grab a set of headlines for a particular date and place. <a href="http://www.djangoproject.com/">Django</a> and <a href="http://bitbucket.org/jespern/django-piston/wiki/Home">Piston</a> made this easy. To see headlines from Victoria on 1 January 1901, for example:</p>
<p><a href="http://wraggelabs.com/api/newspapers/1901-01-01/nsw/">http://wraggelabs.com/api/newspapers/1901-01-01/nsw/</a></p>
<p>That was pretty cool and it started me thinking about what else I might do with the data. At first I was planning some sort of browser, like my <a href="http://wraggelabs.com/abs/">Population Browser</a>, but that seemed a bit boring. So I decided to create a simple game that grabbed a random headline and asked you to try and guess the date. After further refinement I decided to impose a limit of 10 guesses, with &#8216;higher&#8217; or &#8216;lower&#8217; prompts to get you moving in the right direction. Yes, basically it was a rip-off of The Price is Right – but an interesting, ironic and historically engaged rip-off&#8230;</p>
<p>This required me to make a change to the API and Python module so that I could retrieve a random headline. Basically it just meant generating a query based on random values for the day, month, year and state. For the interface I once again delved into JQuery&#8217;s box of tricks. With all the kerfuffle about ChatRoulette in the media, the name seemed obvious – <a href="http://wraggelabs.com/newsroulette/">Wragge&#8217;s Headline Roulette</a> was born.</p>
<div id="attachment_839" class="wp-caption aligncenter" style="width: 310px"><a href="http://wraggelabs.com/newsroulette/"><img class="size-medium wp-image-839" title="headline-roulette" src="http://discontents.com.au/wp-content/uploads/2010/03/headline-roulette-300x151.jpg" alt="Headline roulette screen capture" width="300" height="151" /></a><p class="wp-caption-text">Test your historical nous with Headline Roulette!</p></div>
<p>It&#8217;s a very simple little app, but a number of people have said how much fun it is. The bad news is that imminent changes to the NLA newspapers site are probably going to break it (at least in its current form). So enjoy it while you can. When the NLA makes an API available I might work on something a little more sophisticated.</p>
<p>Of course, the broader point is that there are a whole range of cultural materials out there waiting to be remixed and re-used in various forms. Get hacking&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shed/experiments/headline-roulette/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Harvesting context #1: Flickr comments</title>
		<link>http://discontents.com.au/shoebox/archives-shoebox/harvesting-context-1</link>
		<comments>http://discontents.com.au/shoebox/archives-shoebox/harvesting-context-1#comments</comments>
		<pubDate>Sun, 23 Aug 2009 23:57:52 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[archives]]></category>
		<category><![CDATA[experiments]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[finding aids]]></category>
		<category><![CDATA[Flickr]]></category>
		<category><![CDATA[greasemonkey]]></category>
		<category><![CDATA[JQuery]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[photos]]></category>
		<category><![CDATA[userscript]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=670</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Harvesting+context+%231%3A+Flickr+comments&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=archives&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2009-08-24&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shoebox/archives-shoebox/harvesting-context-1&amp;rft.language=English"></span>
Instead of idly waiting for visitors to stumble over their holdings on some lonely information by-way,  archives are starting to push their content out into the bustling metropolis of the social web. They are going where the people are. Photographic collections, in particular, are gaining new lives and new audiences thanks to Flickr. But that&#8217;s [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Harvesting+context+%231%3A+Flickr+comments&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=archives&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2009-08-24&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shoebox/archives-shoebox/harvesting-context-1&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=670"><!-- &nbsp; --></abbr>
<p>Instead of idly waiting for visitors to stumble over their holdings on some lonely information by-way,  archives are starting to push their content out into the bustling metropolis of the social web. They are going where the people are. Photographic collections, in particular, are gaining new lives and new audiences thanks to Flickr.</p>
<p>But that&#8217;s only part of the story. Released into the wild, these photos are slowly picking up the habits of the locals. They are making friends, building connections, even speaking with new accents and dialects. Commented, tagged, organised, linked – they are building new contexts for themselves outside of the cloying control of archival descriptive systems.</p>
<p>Unfortunately it seems there is often a chasm between the old lives of the photos, documented in databases and finding aids, and their new post-institutional careers. This is a pity because the new contexts they are gathering can help us both understand and find them. What can we do to overcome this divide? How could finding aids harvest and display the user-generated content that aggregates around collection items living in the outside world?</p>
<p>The good news is that the tools to start doing this already exist – Flickr has a <a href="http://www.flickr.com/services/api/">powerful API</a> that makes it easy to extract photo metadata. Time for a bit of experimenting&#8230;<span id="more-670"></span></p>
<p>The first result is a <a href="http://userscripts.org/scripts/show/56135">userscript that displays Flickr comments</a> in a number of collection databases. Just <a href="http://userscripts.org/about/installing">install it</a> and then try it out:</p>
<ul>
<li>National Archives of Australia Photosearch &#8211; <a href="http://naa12.naa.gov.au/scripts/SearchOld.asp?O=PSI&amp;Number=7802286">try it!</a></li>
<li>State Records NSW Photo Investigator &#8211; <a href="http://investigator.records.nsw.gov.au/asp/photosearch/photo.asp?4481_a026_000090">try it!</a></li>
<li>National Archives and Records Administration ARC &#8211; <a href="http://arcweb.archives.gov/arc/action/ExternalIdSearch?id=522882">try it!</a></li>
</ul>
<div id="attachment_697" class="wp-caption aligncenter" style="width: 310px"><a href="http://discontents.com.au/wp-content/uploads/2009/08/photosearch.png"><img class="size-medium wp-image-697" title="Flickr comments in PhotoSearch" src="http://discontents.com.au/wp-content/uploads/2009/08/photosearch-300x199.png" alt="Flickr comments in PhotoSearch" width="300" height="199" /></a><p class="wp-caption-text">Flickr comments in PhotoSearch</p></div>
<p>Gory details follow&#8230;</p>
<p>So to begin with I thought I&#8217;d just harvest comments from Flickr and display them within existing collection interfaces. As before (<a href="http://discontents.com.au/shoebox/archives-shoebox/archives-in-3d">here</a> and <a href="http://discontents.com.au/shoebox/archives-shoebox/moa-buttons-galore">here</a>), <a href="https://addons.mozilla.org/firefox/addon/748">Greasemonkey</a> was my tool of choice for hacking finding aids. The plan was to trigger a Greasemonkey script when you arrive at a photo in a collection database, the script would then:</p>
<ul>
<li>extract a unique identifier for the photo that could be used to find it in Flickr</li>
<li>send off a request through the Flickr API to see if the photo was there</li>
<li>if so, then fire off another request to retrieve any comments</li>
<li>format the comments and insert them at a suitable point in the DOM of the database page</li>
</ul>
<p>Easy! Obviously for the script to work there needed to be a way of connecting entries in the database with photos on Flickr. In practice this means that the photos need to be described at item level, and that a unique identifier needs to be used somewhere in the description of the photo both on Flickr and in the collection database.</p>
<p>Any archive that meets these criteria is a candidate for inclusion. Only three pieces of information are necessary:</p>
<ul>
<li>the institution&#8217;s Flickr id</li>
<li>an expression to extract the identifier from the database page</li>
<li>an expression to identify the point on the database page at which the comments should be inserted</li>
</ul>
<p>The expressions could use XPath or regular expressions – whatever it takes to find the desired elements. I&#8217;m using <a href="http://jquery.com/">JQuery</a>, so that makes selecting elements a lot easier. For example, NARA ARC includes the item identifier in a div with the class &#8216;arcID&#8217;, so I just select that element using JQuery and then use regex matching to pull out the number:</p>
<pre class="brush: javascript">this.identifier = $(&#039;.arcID&#039;).text().match(/ARC Identifier (\d+)/i)[1];</pre>
<p>To start with I&#8217;ve included the databases of three institutions:</p>
<ul>
<li>the National Archives of Australia&#8217;s <a href="http://naa.gov.au/collection/photosearch/index.aspx">PhotoSearch</a> database</li>
<li>State Records of NSW&#8217;s <a href="http://investigator.records.nsw.gov.au/asp/photosearch/introduction.htm">Photo Investigator</a></li>
<li>the US National Archives and Records Administration&#8217;s <a href="http://www.archives.gov/research/arc/">Archival Research Catalog</a></li>
</ul>
<p>This is the code to save the settings for each institution:</p>
<pre class="brush: javascript">
if (document.location.href.match(/naa.gov.au\/scripts\/PhotoSearchItemDetail.asp/i)) {
this.name = &#039;NAA&#039;;
this.identifier = document.location.href.match(/M=0&amp;B=(\d+)/)[1];
this.flickrId = &#039;24849862@N08&#039;;
this.position = &#039;table:last&#039;;
} else if (document.location.href.match(/records.nsw.gov.au\/asp\/photosearch\/photo\.asp\?/i)) {
this.name = &#039;StateRecordsNSW&#039;;
this.identifier = document.location.href.match(/photo\.asp\?([\d\w_]+)/i)[1];
this.flickrId = &#039;27331537@N06&#039;;
this.position = &#039;table:first&#039;;
} else if (document.location.href.match(/arcweb.archives.gov\/arc\/action\/ShowFullRecord|arcweb.archives.gov\/arc\/action\/ExternalIdSearch/i)) {
this.name = &#039;NARA&#039;;
this.identifier = $(&#039;.arcID&#039;).text().match(/ARC Identifier (\d+)/i)[1];
this.flickrId = &#039;35740357@N03&#039;;
this.position = &#039;.genPad:first&#039;;
}
</pre>
<p>From there it&#8217;s just a matter of building the calls to the API using Greasemonkey&#8217;s built-in  GM_xmlhttpRequest method. Once the comments are retrieved, they&#8217;re given some basic formatting and inserted at the point in the DOM identified by the siteDetails.position property. Once again, JQuery greatly simplifies all the DOM manipulation. If there are no comments then a suitable message is inserted together with a link to the photo in Flickr. Finally some CSS is added to prettify it all a little bit.</p>
<p>You can <a href="http://userscripts.org/scripts/review/56135">view the full code</a> on the Userscripts site.</p>
<p>Of course, it would be good to have this sort of stuff happening on the server side. In fact, with a few small modifications, this script could just be dropped into the code of any of the collection databases I&#8217;ve used. But in the meantime, Greasemonkey gives us a chance to play around with some of the possibilities – to start thinking about what finding aids might be like.</p>
<p>So what&#8217;s next? I&#8217;d like to do some playing around with tags and locations, perhaps using them to suggest related photos. I&#8217;ve also just realised that Flickr machine tags allow semantic markup&#8230; hmmm&#8230;</p>
<p>If you have any suggestions for databases to add to this script – let me know!</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shoebox/archives-shoebox/harvesting-context-1/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>

