<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>discontents &#187; php</title>
	<atom:link href="http://discontents.com.au/tag/php/feed" rel="self" type="application/rss+xml" />
	<link>http://discontents.com.au</link>
	<description>working for the triumph of content over form, ideas over control, people over systems</description>
	<lastBuildDate>Wed, 21 Jul 2010 23:24:54 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Some archives hacking</title>
		<link>http://discontents.com.au/shoebox/archives-shoebox/some-archives-hacking</link>
		<comments>http://discontents.com.au/shoebox/archives-shoebox/some-archives-hacking#comments</comments>
		<pubDate>Thu, 05 Nov 2009 00:31:07 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[archives]]></category>
		<category><![CDATA[hacks]]></category>
		<category><![CDATA[govhack]]></category>
		<category><![CDATA[mashup]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[recordsearch]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=727</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Some+archives+hacking&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=archives&amp;rft.subject=hacks&amp;rft.source=discontents&amp;rft.date=2009-11-05&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shoebox/archives-shoebox/some-archives-hacking&amp;rft.language=English"></span>

It&#8217;s great to see that the National Archives of Australia has released a large swag of data through the new data.australia.gov.au site. In the Commonwealth Agencies zip file you can find xml dumps of all the publicly accessible agency and series data in RecordSearch, as well as item data for series A1. This is the [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Some+archives+hacking&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=archives&amp;rft.subject=hacks&amp;rft.source=discontents&amp;rft.date=2009-11-05&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shoebox/archives-shoebox/some-archives-hacking&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=727"><!-- &nbsp; --></abbr>
<p>It&#8217;s great to see that the National Archives of Australia has released a large swag of data through the new <a href="http://data.australia.gov.au/">data.australia.gov.au</a> site. In the <a href="http://data.australia.gov.au/84">Commonwealth Agencies</a> zip file you can find xml dumps of all the publicly accessible agency and series data in RecordSearch, as well as item data for series A1. This is the same data that Mitchell Whitelaw visualised so brilliantly in his <a href="http://visiblearchive.blogspot.com/">Visible Archive</a> project. There&#8217;s also item data and images from series A3560 – the <a href="http://data.australia.gov.au/77">Mildenhall photographs of early Canberra</a>.</p>
<p>What&#8217;s even more exciting is that people are already using this data. At the recent GovHack event in Canberra the <a href="http://catherinestyles.com/2009/11/02/wtfgd-first-steps/">What The Federal Government Does</a> team worked on visualising the activities of government by using functions data pulled from the agencies file. Another group has generated a really nice <a href="http://mildenhall.creativepossums.net/">tag cloud and photo gallery</a> from the Mildenhall data. With further GovHack sessions to follow and the <a href="http://mashupaustralia.org/">MashupAustralia</a> contest open until 13 November, let&#8217;s hope for some more inspired archives hacking.</p>
<p>Seeing RecordSearch data out in the world like this reminded me of a little project I started a while back and then set aside. It was a simple PHP script that scraped data from RecordSearch and spat it out either as XML or JSON. Mitchell used a version of this script in his <a href="http://visiblearchive.blogspot.com/2009/08/exploring-a1-items-to-documents.html">A1 Explorer</a> in order to find out the number of pages in each digitised file.</p>
<p>I&#8217;ve now expanded and improved the script so that it provides data on items, series, agencies and persons. The output includes all the basic fields as well as links between entities – such as related series, controlling agencies etc. As an added bonus you also get some useful totals (where they&#8217;re available): items include the number of pages, series include the number of items described on RecordSearch, and agencies include the number of series recorded. I&#8217;ve also fiddled with mod_rewrite to provide a more rest-ful interface.</p>
<p>For XML output use the url <strong>http://discontents.com.au/shed/rs/xml/ </strong>followed by the appropriate identifier – a barcode for an item, a CA number for an agency, a CP number for a person or a series number.</p>
<p>Some examples:</p>
<ul>
<li> Series A1 – <a href="http://discontents.com.au/shed/rs/xml/a1">http://discontents.com.au/shed/rs/xml/a1</a></li>
<li>Item B2455, WRAGGE C L E – <a href="http://discontents.com.au/shed/rs/xml/3445411">http://discontents.com.au/shed/rs/xml/3445411</a></li>
<li>CSIR Head Office – <a href="http://discontents.com.au/shed/rs/xml/CA+486">http://discontents.com.au/shed/rs/xml/CA+486</a></li>
<li>Alfred Deakin – <a href="http://discontents.com.au/shed/rs/xml/CP+9">http://discontents.com.au/shed/rs/xml/CP+9</a></li>
</ul>
<p>As you might have guessed, to get JSON output you just substitute &#8216;json&#8217; for &#8216;xml&#8217; in the url.</p>
<p>Being dependent on screen scraping, it&#8217;s inherently a bit fragile, but I&#8217;m hoping it might be of some use. My intention was to use it to start exploring some new ways of using and interacting with the data. The code itself is <a href="http://bitbucket.org/wragge/rswrapper/">available at BitBucket</a>. It&#8217;s not very elegant, but I don&#8217;t want to spend much time cleaning it up at the moment. If it seems like it might be useful, I&#8217;ll probably rewrite the whole thing in python and publish it through Google&#8217;s AppEngine.</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shoebox/archives-shoebox/some-archives-hacking/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
