<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>discontents &#187; shoebox</title>
	<atom:link href="http://discontents.com.au/sections/shoebox/feed" rel="self" type="application/rss+xml" />
	<link>http://discontents.com.au</link>
	<description>working for the triumph of content over form, ideas over control, people over systems</description>
	<lastBuildDate>Wed, 16 May 2012 14:11:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Topic modelling in the archives</title>
		<link>http://discontents.com.au/shed/experiments/topic-modelling-in-the-archives</link>
		<comments>http://discontents.com.au/shed/experiments/topic-modelling-in-the-archives#comments</comments>
		<pubDate>Wed, 16 May 2012 14:11:02 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[experiments]]></category>
		<category><![CDATA[archives]]></category>
		<category><![CDATA[invisibleaustralians]]></category>
		<category><![CDATA[topic modelling]]></category>
		<category><![CDATA[White Australia]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1709</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Topic+modelling+in+the+archives&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2012-05-17&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/experiments/topic-modelling-in-the-archives&amp;rft.language=English"></span>
There seems to be a lot of topic modelling going on at the moment. Any why not? Projects like Mining the Dispatch are demonstrating the possibilities. Tools like Mallet are making it easy. And generous DHers like Ted Underwood and Scott Weingart are doing a great job explaining what it is and how it works. [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Topic+modelling+in+the+archives&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2012-05-17&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/experiments/topic-modelling-in-the-archives&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1709"><!-- &nbsp; --></abbr>
<p>There seems to be a lot of topic modelling going on at the moment. Any why not? Projects like <a href="http://dsl.richmond.edu/dispatch/">Mining the Dispatch</a> are demonstrating the possibilities. Tools like <a href="http://mallet.cs.umass.edu/index.php">Mallet</a> are making it easy. And generous DHers like <a href="http://tedunderwood.wordpress.com/2012/04/07/topic-modeling-made-just-simple-enough/">Ted Underwood</a> and <a href="http://www.scottbot.net/HIAL/?p=221">Scott Weingart</a> are doing a great job explaining what it is and how it works.</p>
<p>I&#8217;ve talked briefly about using topic modelling to <a title="Mining the treasures of Trove" href="http://discontents.com.au/words/conference-papers/mining-the-treasures-of-trove">explore digitised newspapers</a>, something that the <a href="http://mappingtexts.org/">Mapping Texts</a> project has also been investigating. But I&#8217;ve also been following with interest Chad Black&#8217;s <a href="http://parezcoydigo.wordpress.com/2011/09/23/an-algorithmic-approach-to-legal-culture-in-the-early-modern-spanish-empire/">use of algorithmic techniques</a>, including topic modelling, to look for local variations amidst the legal system of the early modern Spanish empire.</p>
<p>As part of the <a href="http://invisibleaustralians.org/">Invisible Australians</a> project, Kate and I are <a href="http://invisibleaustralians.org/blog/2011/12/inside-the-bureaucracy-of-white-australia/">exploring the bureaucracy</a> of the White Australia Policy. In particular, we&#8217;re interested in the interaction between policy and practice, between the highly-centralised bureaucracy and the activities of individual port officials. Like Chad, we&#8217;re interested in mapping local variations &#8212; to try and understand the bureaucracy from the point of view of an individual forced to live within its restrictions.</p>
<p>I recently gave a presentation about the project at Digital Humanities Australasia (post coming soon!), and in preparation I decided to try a few topic modelling experiments. They were very simple, but I was impressed by the possibilities for exploring archival systems.</p>
<p>The problem I started with was this. The workings of the White Australia Policy are well documented by records held by the <a href="http://naa.gov.au">National Archives of Australia</a>. Some series within the archives are specifically related to the operations of the policy &#8212; such as those containing <a href="http://invisibleaustralians.org/blog/2010/08/collecting-cedt-applications-and-certificates/">many thousands of CEDTs</a>. But there are also general correspondence series created by the customs offices in each state, as well as the Commonwealth Department of External Affairs which administered the Immigration Restriction Act (responsibility was later taken by the Department of Home and Territories and it&#8217;s successors). These general correspondence series are important, because they often include details of difficult or controversial cases &#8212; those that required a policy judgment, or prompted a change in existing practices. But how do you find relevant files within series that can contain large numbers of items?</p>
<p><a href="http://www.naa.gov.au/cgi-bin/Search?Number=A1">Series A1</a>, for example, is a correspondence series created by the Department of External Affairs. It contains more than 60,000 items. Past research tells us that amongst these 60,000 files are records of important policy discussions relating to White Australia. But these files tend to be labelled with the names of the people involved, so unless you know the names in advance they can be difficult to find.</p>
<p>Mitchell Whitelaw&#8217;s <a href="http://visiblearchive.blogspot.com.au/2009/08/exploring-a1-items-to-documents.html">A1 Explorer</a>, part of the <a href="http://visiblearchive.blogspot.com.au/">Visible Archive project</a>, lets you to explore the contents of Series A1 in a easy and engaging way. But while the A1 Explorer provides new opportunities for discovery, it doesn&#8217;t offer the fine-grained analysis we need to sift out the files we&#8217;re after. And so&#8230; topic modelling.</p>
<p>The process was pretty simple. While I can dip into my bag of screen-scrapers to harvest series directly from the NAA&#8217;s <a href="http://www.naa.gov.au/collection/using/search/">RecordSearch</a> database, there was already an <a href="http://data.gov.au/dataset/commonwealth-agencies/">XML dump of A1</a> available from data.gov.au. So I extracted the basic file metadata from the XML and wrote the identifiers and titles out to a text file, one item per line. Following <a href="http://mallet.cs.umass.edu/import.php">the instructions on the website</a> I then loaded this file into Mallet:</p>
<pre class="brush: bash; gutter: true; first-line: 1; highlight: []; html-script: false">/Applications/Mallet/bin/mallet import-file --input ./A1.txt --output A1.mallet --keep-sequence --remove-stopwords</pre>
<p>Then it was just a matter of firing up the topic modeller:</p>
<pre class="brush: bash; gutter: true; first-line: 1; highlight: []; html-script: false">/Applications/Mallet/bin/mallet train-topics --input ./A1.mallet --output-state ./A1.gz --output-doc-topics ./A1-topics.txt --output-topic-keys ./A1-keys.txt --num-topics 40</pre>
<p>Again, I just <a href="http://mallet.cs.umass.edu/topics.php">followed the examples</a> on the Mallet site.</p>
<p>Once it was finished I opened up <a href="http://discontents.com.au/wp-content/uploads/2012/05/A1-keys.txt">A1-keys.txt</a> to browse the &#8216;topics&#8217; Mallet had found. The results were intriguing. There are a large number of applications for naturalisation in A1, so it&#8217;s no surprise that &#8216;naturalisation&#8217; figures prominently in a number of the topics. What was more interesting was the way Mallet had grouped the naturalisation files. For example:</p>
<p><code>naturalization christian hans hansen jensen petersen andersen nielsen larsen christensen johannes jens niels pedersen andreas johansen martin jorgensen</code></p>
<p>and</p>
<p><code>naturalisation certificate giuseppe salvatore frank la leo samios spina sorbello leonardo fisher natale patane torrisi barbagallo luka rossi ross</code></p>
<p>Based on the co-occurrence of names within the file titles, Mallet had created groupings that roughly reflected the ethnic origins of applicants. It makes sense when you think about what Mallet is doing, but I still found it pretty amazing.</p>
<p>Mallet also found clusters around the major activities of the department, such as the administration of the territories. But of most interest to us was:</p>
<p><code>1	0.55539	passport ah student exemption students lee wong chinese young deserter education sing wing chong readmission son hing chin wife</code></p>
<p>The Chinese names alongside words such as &#8216;readmission&#8217; and &#8216;wife&#8217; suggested that this topic revolved around the administration of the White Australia Policy. This was easy to test. In A1-topics.txt was a list of every file in the series and their weightings in relation to each of the topics. I wasn&#8217;t sure what was a reasonable cut-off value to use in assessing the weightings, but after a bit of trial and error I fixed on a value of 0.7. I then just extracted the identifiers of every file that had a weighting greater than 0.7 for this topic. I used the identifiers to build <a href="http://wraggelabs.com/shed/naa/">a simple web page</a> that Kate and I could browse. I also included links back to RecordSearch so we could explore further.</p>
<div id="attachment_1768" class="wp-caption aligncenter" style="width: 530px"><a href="http://wraggelabs.com/shed/naa/"><img src="http://discontents.com.au/wp-content/uploads/2012/05/Screen-Shot-2012-05-16-at-11.23.10-PM-520x224.png" alt="" title="Screen Shot 2012-05-16 at 11.23.10 PM" width="520" height="224" class="size-large wp-image-1768" /></a><p class="wp-caption-text">Browse the full list</p></div>
<p>It&#8217;s a pretty impressive result. Instead of fumbling with the uncertainties of keyword searches, we now have a list of more than 1,300 files that are clearly of relevance to <a href="http://invisibleaustralians.org">Invisible Australians</a>. There&#8217;s a few false positives and there are likely to be other files that we&#8217;ll have missed altogether, but now we have a much clearer picture of the types of files that are included and how they are described.</p>
<p>And that was at my first attempt, simply using the default settings. I&#8217;m now starting to play around with some of Mallet&#8217;s configuration options to see what sort of difference they make. I&#8217;m also keen to try out <a href="http://radimrehurek.com/gensim/">GenSim</a>, a topic modelling package for Python.</p>
<p>I&#8217;m really excited about the possibilities of these sort of tools for analysing the contents of archival descriptive systems, something I mentioned in my Digital Humanities Australasia paper. Much more to come on this I suspect&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shed/experiments/topic-modelling-in-the-archives/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Local heroes</title>
		<link>http://discontents.com.au/words/articles/local-heroes</link>
		<comments>http://discontents.com.au/words/articles/local-heroes#comments</comments>
		<pubDate>Sat, 12 May 2012 12:32:55 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[archives]]></category>
		<category><![CDATA[articles and book chapters]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[Mapping our Anzacs]]></category>
		<category><![CDATA[mashup]]></category>
		<category><![CDATA[World War I]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1726</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Local+heroes&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=archives&amp;rft.subject=articles+and+book+chapters&amp;rft.source=discontents&amp;rft.date=2012-05-12&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/words/articles/local-heroes&amp;rft.language=English"></span>
Earlier this week it was announced that the Mosman Library had been awarded a Library Development Grant for an innovative project that aims to document stories and artefacts relating to the First World War. I&#8217;m very excited to be part of it. As well as working with the local community in the creation of a new resource, [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Local+heroes&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=archives&amp;rft.subject=articles+and+book+chapters&amp;rft.source=discontents&amp;rft.date=2012-05-12&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/words/articles/local-heroes&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1726"><!-- &nbsp; --></abbr>
<p>Earlier this week it was announced that the Mosman Library had been awarded a Library Development Grant for an <a href="http://www.mosman.nsw.gov.au/news/2012/05/09/grant-for-great-war-project">innovative project</a> that aims to document stories and artefacts relating to the First World War. I&#8217;m very excited to be part of it. As well as working with the local community in the creation of a new resource, the project offers an interesting opportunity to explore how we can link in with the ever-increasing volume of WWI material <a href="http://lod-lam.net/summit/">being published as linked data</a> around the world.</p>
<p>But thinking about this new project has also made me reflect again on the creation of <a href="http://mappingouranzacs.naa.gov.au/">Mapping Our Anzacs</a> &#8212; a project that still fills me with great pride and immense frustration. I thought I might as well finally post a couple of things I wrote about the project back in 2009. They&#8217;re a bit out of date, but I think there&#8217;s still a few useful lessons to be gleaned.</p>
<p>The first is a case-study that focuses on the crowdsourcing aspects of <em>Mapping Our Anzacs</em>. The second looks at the project as an example of a mashup. Thanks to <a href="http://archivesnext.com">Kate Theimer</a> for initiating and publishing both pieces.</p>
<p>&nbsp;</p>
<hr />
<h2>Bringing life to records</h2>
<p>2009 preprint version of case-study originally published in Kate Theimer (ed), <em>A Different Kind of Web: New Connections between Archives and Our Users</em>, Society of American Archivists, Chicago, 2011. [<a href="http://saa.archivists.org/4DCGI/store/item.html?Action=StoreItem&amp;Item=2218&amp;LoginPref=1">Order here</a>]</p>
<h5>Overview of repository</h5>
<p>The <a href="http://www.naa.gov.au/">National Archives of Australia</a> is responsible for preserving and making accessible the records of the Commonwealth of Australia. It employs more than 400 staff, with offices in Canberra and every state capital. Its holdings include more than 360 shelf km of records – around 69 million items. Through its digitisation program more than 1.6 million items have been fully digitised, making nearly 20 million digital images available online. The National Archives’ website now provides the main point of access for researchers, with more than 2 million images viewed through the online database <a href="http://www.naa.gov.au/collection/using/search/">RecordSearch</a> in the year 2007–8.</p>
<h5>Business drivers</h5>
<p>Most people now experience the collections of the National Archives of Australia online. With an obligation to provide ‘an accessible, and interpreted, national archival collection’ the Archives is looking to new technologies to enhance access and improve efficiency.</p>
<p>The idea for <em>Mapping</em><em> </em><em>our</em><em> </em><em>Anzacs</em> arose during planning for a <a href="http://www.naa.gov.au/visit-us/exhibitions/shell-shocked.aspx">travelling exhibition</a> on the impact of World War I, timed to coincide with the 90<sup>th</sup> anniversary of the war’s end. Public interest in commemorating Australia’s war effort was as strong as ever, so a website that encouraged local participation seemed a useful way of extending the exhibition and its accompanying education program.</p>
<p>The major focus of both the exhibition and the website was to be the <a href="http://www.naa.gov.au/collection/explore/defence/service-records/army-wwi.aspx">376,000 service records</a> documenting the experiences of Australian men and women during World War I held by the National Archives. These records had been fully digitised and described as part of a major project entitled ‘A Gift to the Nation’, but were still somewhat buried within our collection database.</p>
<p><em>Mapping</em><em> </em><em>our</em><em> </em><em>Anzacs</em> was intended to highlight these records and open them up to local communities. First a map interface would allow service records to be discovered by place of birth or enlistment. Secondly, users would be able to add tributes – online versions of the war memorials that remain a feature of just about every town, large or small.</p>
<p>While the exhibition and the records themselves provided the main drivers for the project, there was also a growing desire within the institution to explore some of the possibilities of Web 2.0 technologies. This desire was tempered somewhat by a range of familiar concerns centred on issues of authority and control. Would user contributions detract from the reliability of the records? Who would take responsibility for any errors in user-created content? Would the potential for abuse demand vigilant moderation? <em>Mapping</em><em> </em><em>our</em><em> </em><em>Anzacs</em> gave us a chance to start working through such issues.</p>
<h5>Setting the stage</h5>
<p>We had an idea, a budget and a launch date, what we needed was a plan. While in theory we had around six months to play with, the project had to be fitted in around the ongoing work of our small web team. On the content side we had one person cleaning up the data. At the technical end we had someone connecting up the various components and making it all work within the Archives’ web environment. In the middle there were two of us trying to marry content and technology and create a usable resource. While we had a range of useful skills, none of us had tackled a project quite like this. We all had to learn on the job.</p>
<p>With few models or examples to work from, we began to experiment – researching available technologies, throwing around possibilities. Our first efforts were largely focused on the map interface and before long we had a working prototype using javascript and Google Maps. But what we also needed was a better understanding of how users might interact with the site.</p>
<div id="attachment_1729" class="wp-caption alignright" style="width: 260px"><a href="http://discontents.com.au/wp-content/uploads/2012/05/chiltern-museum.jpg"><img class="size-medium wp-image-1729" title="chiltern-museum" src="http://discontents.com.au/wp-content/uploads/2012/05/chiltern-museum-250x333.jpg" alt="" width="250" height="333" /></a><p class="wp-caption-text">World War I Honour Roll at the Chiltern Atheneum Museum</p></div>
<p>We started from the idea of the online memorial – a list of names compiled by users that would be linked through to service records. Our example was a local historical society creating a site to commemorate their community’s war effort. But what if they had more information – photographs or family histories – how could this sort of material be incorporated? Further inspiration came from a visit to the local historical museum in the small Victorian town of Chiltern. On one wall was a typical roll of honour, listing the names of those who had served in the war. But underneath were framed portraits of many of those listed.<strong> </strong>They were people, not just records. Could we create something like this online?</p>
<p>There were some exciting possibilities emerging, but concerns remained. Would anybody actually want to contribute? Strong interest in family history and a growing community desire to commemorate the experience of World War I offered anecdotal support. We just had to ensure that this interest could be translated into engagement – that the barriers of participation were low enough to encourage visitors to become collaborators.</p>
<p>But what of concerns that such material might detract from the authority of the records, or open our institution up to liability? We needed to make it clear where public contributions began and archival data ended.</p>
<p>Welcoming, but separate; open, but managed – a tricky balancing act was required. The answer, we decided was to create a separate ‘scrapbook’ using the blogging service Tumblr. The ‘scrapbook’ label was intended to be encouraging – this was not a database, or formal register, it was a place to leave your thoughts, comments, information or memorabilia. This was reinforced by our terms of service which simply required contributions to be relevant and respectful.</p>
<div id="attachment_1733" class="wp-caption alignright" style="width: 260px"><a href="http://discontents.com.au/wp-content/uploads/2012/05/moa_scrapbook_post.jpg"><img class="size-medium wp-image-1733" title="moa_scrapbook_post" src="http://discontents.com.au/wp-content/uploads/2012/05/moa_scrapbook_post-250x258.jpg" alt="" width="250" height="258" /></a><p class="wp-caption-text">Scrapbook post</p></div>
<p>A ‘scrapbook’ was also something quite different to a finding aid. The informality helped to make the boundary clear between record and response. The separation was physical as well as intellectual. While the scrapbook shared many of the design elements of the main site, it was hosted by Tumblr not the National Archives. By using the Tumblr API, however, it was easy to pass information between the two sites. We could also use the API to provide a basic moderation facility.</p>
<p>But this meant that an important part of the site’s functionality would be dependent on an outside service. To make sure we considered fully all the implications of this, we developed a risk analysis and contacted Tumblr staff to inform them of our plans. Our major concern was simply the continuity of the service. While there could be no guarantees, we judged that this risk was manageable. Tumblr staff were interested in the project and offered their assistance if necessary.</p>
<h5>Results</h5>
<div id="attachment_1730" class="wp-caption alignright" style="width: 260px"><a href="http://discontents.com.au/wp-content/uploads/2012/05/moa-cooliris.jpg"><img class="size-medium wp-image-1730" title="moa-cooliris" src="http://discontents.com.au/wp-content/uploads/2012/05/moa-cooliris-250x156.jpg" alt="" width="250" height="156" /></a><p class="wp-caption-text">Images from scrapbook posts viewed via a media RSS feed in CoolIris.</p></div>
<p>On 25 April each year, Anzac Day, Australians remember the sacrifices made in war. Over the Anzac Day weekend in 2009, we were astonished to receive more than 200 <a href="http://our-anzacs.tumblr.com/">scrapbook</a> posts. Of course we had expected an increase in use, particularly after the site was featured on the Australian version of the Today Show, but this remarkable response certainly confirmed the site’s success. In the six months since its launch there had been almost 94,000 visitors to <em>Mapping</em><em> </em><em>our</em><em> </em><em>Anzacs</em>. More than 1,000 scrapbook posts had been contributed and 280 tributes created.</p>
<p>But the greatest success was in <a href="http://our-anzacs.tumblr.com/archive">the type of posts</a> being contributed rather than their sheer volume. Our ‘scrapbook’ had proved to be just that – as well as photographs of service people and their families, there were pictures of medals, headstones, letters, newspaper clippings, pay books, identity disks, diaries, postcards and certificates. Some people simply commented ‘my grandfather’, while others wrote detailed accounts of family history. Perhaps most moving were those who took the opportunity to leave a message for their loved one: ‘You were the best dad’.</p>
<div id="attachment_1736" class="wp-caption alignright" style="width: 260px"><a href="http://discontents.com.au/wp-content/uploads/2012/05/moa_scrapbook_post51.jpg"><img class="size-medium wp-image-1736" title="moa_scrapbook_post5" src="http://discontents.com.au/wp-content/uploads/2012/05/moa_scrapbook_post51-250x236.jpg" alt="" width="250" height="236" /></a><p class="wp-caption-text">Scrapbook post</p></div>
<p>Some have taken a systematic approach. Our most frequent contributor is gradually attaching photographs of headstones and memorial plaques that she has gathered from local cemeteries. Others are posting their own contact details in the hope of linking up with family. Perhaps most interesting are the notes that provide links to other people or documents – to family members, for example, or to a later service record. These are helping build a rich web of contextual data. Equally valuable are the corrections and additions that are being offered by eagle-eyed users, pointing out transcription errors or helping us track down elusive locations.</p>
<p>The success of the scrapbook has somewhat overshadowed the tributes, or online memorials, which really provided our starting point. Many tributes have been created and, as we had hoped, schools and other groups are using them to document the impact of war on their local communities. However, some compromises at the implementation stage have meant that it is not as easy to build them as we had hoped. There has also been some confusion by users between the tributes and the scrapbook. This is one area of the site we certainly hope to improve.</p>
<p>Even though the digitised service records had been available online for sometime though our collection database, it’s clear that many people are discovering them for the first time through <em>Mapping</em><em> </em><em>our</em><em> </em><em>Anzacs</em>. It was ‘a stunning find for me and my siblings’ wrote one grateful user. The scrapbook has aided discovery, providing another way into the records. Indeed, with the addition of a MediaRSS feed for CoolIris, the scrapbook provides two new entry points – one of them a 3D wall of faces and families.<em> </em>By embedding the records in these new contexts and making them easier to find, <em>Mapping</em><em> </em><em>our</em><em> </em><em>Anzacs</em> has successfully garnered extra value from an existing asset.</p>
<p>The site has also been recognised by others for its successful use of Web 2.0 technologies. We were pleased to be joint winners of the <a href="http://www.archivesnext.com/?p=270">Best Archives on the Web Award</a>, and surprised to be cited by the Federal Minister for Finance in a <a href="http://gov2.net.au/blog/2009/06/22/speech-launch-of-the-government-2-0-taskforce/index.html">speech launching a Gov 2.0 taskforce</a>. Recognition such as this has helped strengthen the case for future innovation in the Web 2.0 sphere.</p>
<h5>Challenges</h5>
<p>Success brings its own problems. One of the main challenges has been simply managing the sheer volume of posts and feedback. This was particularly acute of course after the Anzac Day deluge. As a result we have had to consider ways of streamlining our processes.</p>
<p>The Tumblr API allows us to set the status of a new post as ‘private’. We can then examine the post using the Tumblr dashboard before making it public. This works well enough as a basic form of moderation, however, the dashboard is not really designed for this purpose and it takes several clicks to release each post to the world.</p>
<p>But while moderation takes considerable time, it requires little intellectual effort. Despite concerns about abuse, our contributors have caused us few dilemmas. The only significant questions that have arisen concern the re-use of materials from other sources. This has made consider whether pre-emptive moderation is necessary or appropriate.</p>
<p>While the site includes detailed help information, it’s clear from the feedback that there are certain aspects that continue to cause difficulty. This provides useful data on how the site might be improved, but it has also made us think about how we communicate with our users. At the moment the content we provide is fairly static – there is no way of informing visitors of recent updates, or developing quick guides to common problems. If we took a more active approach to communication we might be able to decrease the number of help requests, while building a greater sense of community.</p>
<p>Similarly, while we have been excited by the number of corrections submitted by users, we can now see ways in which we might have structured the feedback process to capture their corrections more easily and efficiently. For example, a ‘submit a correction’ link on each individual’s page could automatically capture the person’s details, saving both us and our contributors from potential confusion.</p>
<p>We have suffered through the expected number of software glitches, and have a growing list of things we’d like to improve or develop, but overall the experience has been much more rewarding than painful.</p>
<h5>Lessons Learned</h5>
<p>Perhaps the most valuable lessons revolve around trust. Having entered into the project uncertain of what to expect from public participation, we have found ourselves in an evolving, creative partnership. Our users have defined what the scrapbook is and have taken an active role in improving and developing the resource. Our trust has been repaid many times over, helping us build something that in many ways has exceeded our expectations.</p>
<p>Trust is also necessary in the support of new ideas. <em>Mapping</em><em> </em><em>our</em><em> </em><em>Anzacs</em> was a very different type of project for the National Archives, challenging ideas both of access and user engagement. By taking the risk we have not only gained valuable publicity and user support, we have opened up the realm of possibilities for future development.</p>
<p>In terms of technology, the project demonstrated the power of the mashup and the efficiencies that can be gained by using existing web services. Tumblr, Google Maps and their associated APIs gave us a kickstart that enabled us to do a lot with a little.</p>
<h5>Next Steps</h5>
<p>There are so many exciting possibilities! Obviously our first priority is to improve those areas of the site that continue to cause our users grief. There are a number of navigation and usability tweaks that should improve the overall experience. Similarly, we can now see ways in which we might streamline moderation and management processes.</p>
<p>We hope to build on the success of the scrapbook and tributes by enhancing and extending their functionality. Improved editing and creation tools could assist contributors while also enriching the web of connections they build. We might, for example, provide widgets that make it easier to link the records of family members or friends. Over time this could develop into a complex network of relationships, providing new means of finding and visualising the records. Similarly, there are ways in which we might reuse the existing content of the scrapbook posts to develop new modes of discovery.</p>
<p>We could also do more to feature the labours and passions of our contributors. We could give them the option of exposing a public profile that lists all of their scrapbook posts. This would help foster a sense of community while providing yet another means of exploring connections between records.</p>
<p>Recent developments in geospatial technology and mobile devices perhaps offer the most exciting possibilities. Our original aim was to give the World War I service records back to local communities, to imbue the records with a greater sense of context, locality and belonging. Perhaps we will have succeeded when a tourist exploring a small country town can press a button on their mobile phone to retrieve a list of service people born near their current location.</p>
<p>Perhaps they will take a photo of a name on the local war memorial and use it to automatically retrieve that person’s service record or create an online tribute.</p>
<p>Perhaps they will come across a headstone in the local cemetery and immediately upload a geocoded photograph to the <em>Mapping</em><em> </em><em>our</em><em> </em><em>Anzacs</em> scrapbook.</p>
<p>Instead of merely being markers on a map, the records will start to overlay and inform the very spaces in which we move. The stories they contain will become part of our journeys, the people they document will have found their way home.</p>
<hr />
<h2>Creating a mashup</h2>
<p>2009 preprint version of interview originally published in Kate Theimer, <em>Web 2.0 Tools and Strategies for Archives and Local History Collections</em>, Neal-Schuman, New York, 2010. [<a href="http://www.neal-schuman.com/w2tsa">Order here</a>]</p>
<h5>What made you interested in creating a mashup?</h5>
<p>It really started with the records. We hold the records of more than 375,000 World War I service people, identified by their places of birth and enlistment. With war memorials in just about every town across Australia, the connection between local communities and the memory of war remains strong. So we wondered how we could we give the service people in our records back to their communities. Having played around with the Google Maps API the answer seemed obvious &#8212; find the places, put them on a map and let people explore the connections for themselves.</p>
<h5>What information, tools and processes did you need to begin?</h5>
<p>The main thing we needed was the confidence to experiment. The process seemed straightforward in principle: first we had to extract the data we needed from the file titles in our collection database, then we had to find the latitude and longitude of each of the place names we extracted, and finally we had to plot these coordinates on a map with links back to details of the service records themselves. Web services, such as those provided by Google Maps, had the potential to do much of the work for us, and we scoured online documentation, user forums and blog posts for hints. But there were many things we could not know until we actually started. How consistent was our data? How many of the places would we be able to find? How would we be able to display thousands of places at once?</p>
<p>Moving from file titles through to coordinates obviously required a lot of data manipulation and we used Perl for much of the grunt work. Because our data set was large and variations in spelling and formatting were often unpredictable (including 13 different spellings of &#8216;lieutenant&#8217;!), we often had to work by trial and error &#8212; seeing what results we obtained and then adjusting our processes accordingly.</p>
<p>Once we had a list of place names in a consistent format we could begin to find latitudes and longitudes through a process known as geocoding. Google&#8217;s geocoding service was an easy option: it was well documented, reasonably comprehensive and it worked! We fed it our place names through a Perl script and soon we had a list of coordinates. Of course, many places were not found or returned multiple results, but the basic principle was sound. Our places were no longer just names, but points in space &#8212; we could begin making maps.</p>
<h5>How did you determine what to include?</h5>
<p>What we were creating was an archival finding aid, but one which placed the people, their homes and their communities up front rather than the systems that control their records. By browsing from a map a user would be able to find the details of a loved one, read a digitised copy of their service record and then follow a link through to our collection database. These links provide crucial context about the records, but we realised that this project also gave us an opportunity to capture other contexts and meanings. Who were these people? What did they look like? What happened to them after the war? By adding an online &#8216;scrapbook&#8217; we gave users the chance to enrich the resource by adding notes or photographs about individuals.</p>
<p>This meant we had to deal with three sets of interlinked data: geocoded places, details from our records, and scrapbook posts provided by the public. To bring these all together with limited resources we had to make clever use of what was already out there. Why create our own maps when all you needed to do was write a bit of Javascript to embed a Google Map? Why build our own scrapbook application when the blogging service Tumblr provides free accounts and a simple API to manipulate posts? While a substantial amount of custom scripting was required to glue everything together, much of the core functionality was provided by free web services, available to anyone.</p>
<h5>What challenges did you face?</h5>
<p>Perhaps the first challenge to overcome was that of imagination. It was difficult for people to understand what the project was until we had a prototype to show them.</p>
<p>The process of handling and cleaning the data at times threatened to overwhelm us. While the geocoding service got us to the point where we could make maps, it also left us with many place names that needed to be manually checked. Often this was the result of misspellings in the original data, or because places either no longer existed or had changed their names. This data cleanup consumed much effort and continues still, though now with the help of our users who regularly point out errors and inconsistencies.</p>
<p>Once we had our coordinates we had to display them on a map without killing anyone&#8217;s computer. Showing thousands of markers on a Google Map is a challenge to slower web browsers and can end up hindering navigation. By dividing up our maps, clustering markers and changing the way they were rendered, we managed to greatly improve performance while maintaining the browsing experience. Once again it was trial and error coupled with the advice of the online community that guided us through the roadblocks.</p>
<h5>What kinds of positive results have you had? (And any negative ones?)</h5>
<p>From the messages we receive it&#8217;s clear that Mapping our Anzacs allows people to find records they didn&#8217;t know existed. Some have met a great-great-uncle for the first time. Others have learned about the war experience of a much-loved grandparent. Local communities have embraced the project and the scrapbook has developed into a rich and often moving resource. We wanted to give users a new way to explore and interact with our collection, and it seems we have succeeded.</p>
<p>Our users have also become our collaborators, providing corrections and comments that help us improve our data. They have extended the idea of the scrapbook, using it, for example, as a noticeboard for family history research, or as a way of creating crosslinks between related resources.</p>
<p>Success brings problems of its own and the work of moderating the scrapbook and responding to feedback has proved considerable. Issues with performance remain for people on slow connections, and while many are familiar with the Google Maps interface, some find it difficult to navigate. We are planning a number of enhancements based on this feedback, and hope to take advantage of the technology as it evolves to improve and extend the interface.</p>
<h5>About how much time did it take?</h5>
<p>While the project as a whole stretched over about eight months, much of this time was taken up cleaning and processing the data. The development of the interface was completed in under two months.</p>
<h5>What advice would you give an organization wanting to use something similar?</h5>
<p>Start experimenting. The technology is developing so rapidly that if you spend 12 months planning a project it&#8217;s likely to be out-of-date even before you start. New web services and data sources are becoming available every day. Perhaps you could use Open Calais to extract people&#8217;s names from a collection description, or MetaCarta to find the places. You might use the Google Books API to harvest the details of publications that cite your records. Even if you&#8217;re not a coder you can use tools like Yahoo Pipes to see what happens when you start to link data and services. Experimentation brings new ideas and possibilities. It&#8217;s all about making connections.</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/words/articles/local-heroes/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mining for meanings</title>
		<link>http://discontents.com.au/words/speeches/mining-for-meanings</link>
		<comments>http://discontents.com.au/words/speeches/mining-for-meanings#comments</comments>
		<pubDate>Wed, 09 May 2012 12:12:07 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[speeches]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1716</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Mining+for+meanings&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=speeches&amp;rft.source=discontents&amp;rft.date=2012-05-09&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/words/speeches/mining-for-meanings&amp;rft.language=English"></span>
Yes, I have a suit. On 8 May at the National Library of Australia I gave my suit an outing as I delivered my Harold White Fellowship presentation. Thanks to everyone who came along. If you missed it or want to relive the fun, the NLA has made a podcast available. My slides are also [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Mining+for+meanings&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=speeches&amp;rft.source=discontents&amp;rft.date=2012-05-09&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/words/speeches/mining-for-meanings&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1716"><!-- &nbsp; --></abbr>
<p><a href="http://discontents.com.au/wp-content/uploads/2012/05/harold_white.jpg"><img class="aligncenter size-large wp-image-1717" title="harold_white" src="http://discontents.com.au/wp-content/uploads/2012/05/harold_white-520x385.jpg" alt="" width="520" height="385" /></a></p>
<p>Yes, I have a suit. On 8 May at the National Library of Australia I gave my suit an outing as I delivered my <a href="http://www.nla.gov.au/awards-and-grants/harold-white-fellowships">Harold White Fellowship</a> presentation. Thanks to everyone who came along.</p>
<p>If you missed it or want to relive the fun, the NLA has made <a href="http://www.nla.gov.au/podcasts/collections.html">a podcast available</a>. My slides <a href="http://wraggelabs.com/shed/presentations/nla/">are also online</a>, so you can follow along for the full audio-visual-not-quite-3D experience.</p>
<p>Use your arrow keys to navigate through the slides, and yes the first page is intentionally left blank. If you linger for a bit on slide two or three, you&#8217;ll see the <a href="http://trove.nla.gov.au/general/api">Trove API</a> in action. The presentation itself was constructed using <a href="http://imakewebthings.com/deck.js/">deck.js</a>.</p>
<p>The slides also include links to lots of different examples and demos, and introduce my <a href="http://newspapers.wraggelabs.com/difference/">new favourite plaything</a>. I don&#8217;t really know what to call it yet, or what it&#8217;s actually for, but it makes me happy, and it makes me think. TF-IDF FTW. I&#8217;ll write up some more details shortly.</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/words/speeches/mining-for-meanings/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The new QueryPic (or what a difference an API makes)</title>
		<link>http://discontents.com.au/shed/experiments/the-new-querypic-or-what-a-difference-an-api-makes</link>
		<comments>http://discontents.com.au/shed/experiments/the-new-querypic-or-what-a-difference-an-api-makes#comments</comments>
		<pubDate>Tue, 17 Apr 2012 13:06:55 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[experiments]]></category>
		<category><![CDATA[newspapers]]></category>
		<category><![CDATA[Papers Past]]></category>
		<category><![CDATA[QueryPic]]></category>
		<category><![CDATA[Trove]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1655</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The+new+QueryPic+%28or+what+a+difference+an+API+makes%29&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2012-04-17&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/experiments/the-new-querypic-or-what-a-difference-an-api-makes&amp;rft.language=English"></span>
It seems a bit late to be introducing the newest version of QueryPic. Folks are already using it to explore the contents of digitised newspapers made available through Trove and Papers Past. Some, like the National Library of New Zealand, Andrew S. Bowman and the Carnamah Historical Society are already blogging about it. But I suppose [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The+new+QueryPic+%28or+what+a+difference+an+API+makes%29&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2012-04-17&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/experiments/the-new-querypic-or-what-a-difference-an-api-makes&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1655"><!-- &nbsp; --></abbr>
<p>It seems a bit late to be introducing the newest version of <a href="http://wraggelabs.com/shed/querypic/">QueryPic</a>. Folks are already using it to explore the contents of digitised newspapers made available through <a href="http://trove.nla.gov.au/newspaper/">Trove</a> and <a href="http://paperspast.natlib.govt.nz/cgi-bin/paperspast">Papers Past</a>. Some, like the <a href="http://beta.natlib.govt.nz/blog/a-tale-of-two-islands">National Library of New Zealand</a>, <a href="http://andrew-s-bowman.blogspot.com.au/2012/04/querypic-new-tool-for-historical.html">Andrew S. Bowman</a> and the <a href="http://carnamah.blogspot.com.au/2012/04/mentions-of-carnamah-in-australian.html">Carnamah Historical Society</a> are already blogging about it. But I suppose I&#8217;d better document a few things&#8230;</p>
<p>As I noted in my <a title="QueryPicNZ" href="http://discontents.com.au/shed/experiments/querypicnz">post about QueryPicNZ</a> (yes I now have a rather confusing proliferation of QueryPics), I was waiting for the Trove API to become public. Last week I noticed a little &#8216;API&#8217; link pop up in the Trove footer and so I set to work&#8230;</p>
<div id="attachment_1662" class="wp-caption aligncenter" style="width: 530px"><a href="http://wraggelabs.com/shed/querypic/?q=%22the%20past%22|aus&amp;q=%22the%20future%22|aus"><img class="size-large wp-image-1662" title="new_querypic" src="http://discontents.com.au/wp-content/uploads/2012/04/new_querypic-520x477.png" alt="" width="520" height="477" /></a><p class="wp-caption-text">&quot;The past&quot; versus &quot;the future&quot; in the new QueryPic</p></div>
<p>My <a title="QueryPic" href="http://discontents.com.au/shed/hacks/querypic">original version of QueryPic</a> (<a href="http://journalofdigitalhumanities.org/1-1/reviews/querypic/">recently reviewed</a> in the <em>Journal of the Digital Humanities</em>) used a series of Python scripts to harvest and scrape content from the Trove web pages. This meant that you had to download the scripts and be code-confident enough to run them in a terminal. It&#8217;s still a useful tool and I&#8217;ll be updating it as well, but I wanted to create something quicker and simpler that encouraged people to explore and play.</p>
<p>The latest version of <a href="http://wraggelabs.com/shed/querypic/">QueryPic</a> (QueryPic+, QueryPic Web, <del>QueryPic 2.0</del>?) simply runs in your browser. It uses JQuery to grab data on the fly from the <a href="http://trove.nla.gov.au/general/api">Trove</a> and <a href="http://digitalnz.org.nz/">DigitalNZ</a> APIs. Like previous versions, it uses the <a href="http://www.highcharts.com/">HighCharts</a> library to turn the data into pretty graphs.</p>
<p>What does it do? It&#8217;s really pretty basic. QueryPic just displays the number of articles matching your search query over time. By default, these are displayed as a proportion of the total articles available for that year, but a dropdown field lets you switch to view the raw numbers. It&#8217;s simple, but it&#8217;s also remarkably evocative, suggestive and fun. <strong><a href="http://wraggelabs.com/shed/querypic/">Just try it!</a></strong></p>
<p>Why stop at just one query? To compare frequency patterns you can add as many as you like. Just keep entering new words or phrases.</p>
<p>If you notice an interesting peak or trough you can just click on it and another API request will be fired off to retrieve the first 20 matching articles. So it&#8217;s also a new way of exploring the newspaper databases themselves.</p>
<p>There are plenty of limitations &#8212; not all newspapers are digitised, for example, and the quality of the OCR is patchy. The <a href="http://beta.natlib.govt.nz/blog/a-tale-of-two-islands">National Library of New Zealand&#8217;s post</a> does a great job summing up a number of issues relating to Papers Past. It&#8217;s not magic, it&#8217;s not perfect, but is it useful? I think so.</p>
<p>Tasks for the future:</p>
<ul>
<li>Create some sort of backend that makes it easy to save , share and cite your query data. The &#8216;share&#8217; link just regenerates the graph which, of course, might change as new articles are added to the databases.</li>
<li>Make it possible to add more complex queries &#8212; I want to keep the interface simple, so I&#8217;ll probably create a bookmarklet to take any Trove or Papers Past query and display it using QueryPic.</li>
<li>As I mentioned over at the <a href="http://wraggelabs.com/emporium/2012/04/the-new-api-powered-future/">WraggeLabs Emporium</a>, I intend to rewrite my various Trove tools to work with the new API. This will include the classic Python version of QueryPic. I still think it&#8217;s useful for harvesting your own data.</li>
</ul>
<div>The <a href="https://github.com/wragge/QueryPic">code</a> is on my GitHub site and you can also follow updates at the <a href="http://wraggelabs.com/emporium/trove-tools/newspaper-search-summariser/">QueryPic page</a> in the WraggeLabs Emporium.</div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shed/experiments/the-new-querypic-or-what-a-difference-an-api-makes/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>QueryPicNZ</title>
		<link>http://discontents.com.au/shed/experiments/querypicnz</link>
		<comments>http://discontents.com.au/shed/experiments/querypicnz#comments</comments>
		<pubDate>Sun, 01 Apr 2012 03:17:02 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[experiments]]></category>
		<category><![CDATA[DigitalNZ]]></category>
		<category><![CDATA[New Zealand]]></category>
		<category><![CDATA[newspapers]]></category>
		<category><![CDATA[Papers Past]]></category>
		<category><![CDATA[querypicnz]]></category>
		<category><![CDATA[textmining]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1621</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=QueryPicNZ&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2012-04-01&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/experiments/querypicnz&amp;rft.language=English"></span>
You may have noticed I have a bit on an interest in exploring ways of using digitised historical newspapers. In the last year or so I&#8217;ve spent a lot of time scraping, mining, processing and visualising content from the Trove collection of digitised Australian newspapers. But what about other countries? Recently I was invited to a digital [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=QueryPicNZ&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2012-04-01&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/experiments/querypicnz&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1621"><!-- &nbsp; --></abbr>
<p>You may have noticed I have a bit on an interest in exploring ways of using digitised historical newspapers. In the last year or so I&#8217;ve spent a lot of time <a href="http://discontents.com.au/tag/trove">scraping, mining, processing and visualising</a> content from the Trove collection of digitised Australian newspapers. But what about other countries?</p>
<p>Recently I was invited to a <a href="http://victoria.ac.nz/wtapress/research/digital-history-workshop-2012">digital history workshop</a> organised by Sydney Shep (<a href="https://twitter.com/#!/nzsydney">@nzsydney</a>) at the Victoria University of Wellington. In between sessions I started to play with the <a href="http://www.digitalnz.org/developers">DigitalNZ API</a> guided by Chris McDowall (<a href="https://twitter.com/#!/fogonwater">@fogonwater</a>). In anticipation of the forthcoming Trove API I&#8217;d already done a bit of work converting <a title="QueryPic" href="http://discontents.com.au/shed/hacks/querypic">QueryPic</a> to run in the browser. It didn&#8217;t take long to adapt this to work with New Zealand newspapers available through <a href="http://paperspast.natlib.govt.nz/cgi-bin/paperspast">Papers Past</a>.</p>
<p><strong>So presenting for your enjoyment and education&#8230; <a href="http://wraggelabs.com/shed/querypicnz">QueryPicNZ</a>.</strong></p>
<div id="attachment_1641" class="wp-caption aligncenter" style="width: 530px"><a href="http://wraggelabs.com/shed/querypicnz/?q=wind&amp;q=rain&amp;q=snow"><img class="size-large wp-image-1641" title="Screen Shot 2012-04-01 at 1.07.28 PM" src="http://discontents.com.au/wp-content/uploads/2012/04/Screen-Shot-2012-04-01-at-1.07.28-PM-520x367.png" alt="" width="520" height="367" /></a><p class="wp-caption-text">Wind, rain and snow in QueryPicNZ</p></div>
<p>Like <a title="QueryPic" href="http://discontents.com.au/shed/hacks/querypic">QueryPic</a>, the New Zealand version graphs newspaper search results over time. But thanks to the DigitalNZ API it has a number of advantages:</p>
<ul>
<li>it runs in your browser &#8212; no need to download or run any scripts</li>
<li>results appear almost instantly</li>
<li>easy to combine queries &#8212; just search on a new word or phrase</li>
<li>easy to remove queries &#8212; just use the &#8216;Clear last&#8217; button</li>
<li>easy to share &#8212; just copy the provided link or use the Tweet button</li>
</ul>
<p>It&#8217;s limited to simple word or phrase searches at the moment, but eventually I&#8217;ll add the ability to process more sophisticated queries. I also want to add a way of saving, sharing and citing graphs. For now the &#8216;share&#8217; link simply regenerates the graph, so if the content has changed the result could well be different.</p>
<p>The code is <a href="https://github.com/wragge/QueryPicNZ">available on GitHub</a>.</p>
<p>Ultimately, I want to combine Trove and Papers Past so that you can query and combine content from either Australia or New Zealand&#8230; perhaps even other countries?</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shed/experiments/querypicnz/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Mining the treasures of Trove</title>
		<link>http://discontents.com.au/words/conference-papers/mining-the-treasures-of-trove</link>
		<comments>http://discontents.com.au/words/conference-papers/mining-the-treasures-of-trove#comments</comments>
		<pubDate>Sun, 01 Apr 2012 01:59:15 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[conference presentations]]></category>
		<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[digital history]]></category>
		<category><![CDATA[newspapers]]></category>
		<category><![CDATA[QueryPic]]></category>
		<category><![CDATA[screenscraping]]></category>
		<category><![CDATA[textmining]]></category>
		<category><![CDATA[Trove]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1623</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Mining+the+treasures+of+Trove&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=conference+presentations&amp;rft.subject=digital+humanities&amp;rft.source=discontents&amp;rft.date=2012-04-01&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/words/conference-papers/mining-the-treasures-of-trove&amp;rft.language=English"></span>
In February I made a quick dash to Melbourne to talk at VALA2012. The paper I originally submitted, &#8216;Mining the treasures of Trove: New approaches and new tools&#8217;, provided a general introduction to the use of digitised historical newspapers and the possibilities of digital history. You can download the pdf from the VALA2012 proceedings, or view [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Mining+the+treasures+of+Trove&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=conference+presentations&amp;rft.subject=digital+humanities&amp;rft.source=discontents&amp;rft.date=2012-04-01&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/words/conference-papers/mining-the-treasures-of-trove&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1623"><!-- &nbsp; --></abbr>
<p>In February I made a quick dash to Melbourne to <a href="http://www.vala.org.au/vala2012-proceedings/vala2012-session-2-sherratt">talk at VALA2012</a>.</p>
<p style="text-align: center;"><a href="http://www.vala.org.au/images/phocagallery/thumbs/phoca_thumb_l_258-session-speakers.jpg"> <img class="aligncenter" title="Presenting at VALA2012" src="http://www.vala.org.au/images/phocagallery/thumbs/phoca_thumb_l_258-session-speakers.jpg" alt="" width="512" height="258" /></a></p>
<p>The paper I originally submitted, &#8216;Mining the treasures of Trove: New approaches and new tools&#8217;, provided a general introduction to the use of digitised historical newspapers and the possibilities of digital history. You can <a href="http://www.vala.org.au/docman/vala2012-proceedings/vala2012-session-2-sherratt-paper/download">download the pdf</a> from the VALA2012 proceedings, or <a href="http://www.scribd.com/doc/84088064/Mining-the-treasures-of-Trove-new-approaches-and-new-tools#fullscreen">view online</a> at Scribd.</p>
<p>I ended up presenting something a little different, focusing on my recent work around 1913 and <a href="http://discontents.com.au/tag/1913editorials">extracting editorials</a> from the Trove newspaper database. You can <a href="http://www.slideshare.net/wragge/mining-the-treasures-of-trove">view the slides</a> on Slideshare or <a href="http://webcast.gigtv.com.au/Mediasite/Play/d309aed4ca484bcbb4af03b213b1bb101d">watch a video of the whole presentation</a> on the VALA2012 site.</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/words/conference-papers/mining-the-treasures-of-trove/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Extracting editorials #3</title>
		<link>http://discontents.com.au/shed/experiments/extracting-editorials-3</link>
		<comments>http://discontents.com.au/shed/experiments/extracting-editorials-3#comments</comments>
		<pubDate>Sun, 19 Feb 2012 23:11:54 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[experiments]]></category>
		<category><![CDATA[1913editorials]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[Trove]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1601</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Extracting+editorials+%233&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2012-02-20&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/experiments/extracting-editorials-3&amp;rft.language=English"></span>
By my own criteria I&#8217;ve already failed&#8230; I started this series of posts with the intention of documenting the process of finding and extracting editorials as I was actually doing the work. But here I am about to describe some work I finished a few weeks back. Oh well&#8230; In my previous instalments (here and [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Extracting+editorials+%233&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2012-02-20&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/experiments/extracting-editorials-3&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1601"><!-- &nbsp; --></abbr>
<p>By my own criteria I&#8217;ve already failed&#8230; I started this series of posts with the intention of documenting the process of finding and extracting editorials as I was actually doing the work. But here I am about to describe some work I finished a few weeks back. Oh well&#8230;</p>
<p>In my previous instalments (<a title="Extracting editorials #1" href="http://discontents.com.au/shoebox/digital-humanities/extracting-editorials-1">here</a> and <a title="Extracting editorials #2" href="http://discontents.com.au/shed/hacks/extracting-editorials-2">here</a>), I focused on the <em>Sydney Morning Herald.</em> Having continued the hunt for missing editorials I started in the last post, I&#8217;ve now got a CSV file with the urls of the first editorial published in every edition of the <em>SMH</em> from 1913. Good-o, I thought, I can now start harvesting and analysing some content.</p>
<p>But then ensued a crisis of faith. The whole point of this exercise was to be able to build up some comparisons  &#8211; between newspapers, between states, between the city and the bush. But the process of actually finding the editorials seemed beset with difficulties. Could the rules I developed for the <em>SMH</em> be applied elsewhere? Could I ever assemble a useful set of editorials without large amounts of human intervention? I decided to try a few quick experiments to see whether the whole project was worth pursuing.</p>
<p>I started with a few assumptions:</p>
<ol>
<li>The first (and only the first) editorial in any issue is headed with the name of the newspaper.</li>
<li>Editorials are published on even numbered pages.</li>
<li>Editorials vary in length between about 100 and 1500 words.</li>
</ol>
<p>These assumptions were based on my own experience as a long-time newspaper researcher and on some preliminary poking around. For example, when I looked at <em>The Argus</em> I noticed that editorials were typically followed by news summaries. Unfortunately, these are treated as a single article in Trove, resulting in large blocks of text that are only part editorial. By specifying an upper word limit I hoped to filter these sorts of articles out. Similarly, there are sometimes brief announcements or publication details headed with the name of the newspaper. The lower word limit was intended to exclude these.</p>
<p>The next step was to harvest every article from 1913 that was headed with the name of its publication. I created a script to generate a list of all the newspapers that published issues in 1913. Then I called my existing harvester to download all the matching articles and save the details to a series of CSV files &#8212; one CSV file per newspaper.</p>
<p>In the previous instalment of this series I created a script to check the CSV output of my harvester for missing or duplicate dates. I extended this to perform a series of tests on each article based on the assumptions above. First, I filtered out articles on odd-numbered pages, then articles that were too short or too long. Finally I checked the remainder for missing or duplicate issue dates.</p>
<p>The details of the articles in each category were written out to JSON files. Using these files and a bit of JQuery magic I could quickly build a <a href="http://wraggelabs.com/shed/trove/editorials/">simple web interface</a> that allowed me to explore the results.</p>
<div id="attachment_1613" class="wp-caption aligncenter" style="width: 486px"><a href="http://wraggelabs.com/shed/trove/editorials/"><img class="size-full wp-image-1613" title="editorials-list-cropped" src="http://discontents.com.au/wp-content/uploads/2012/02/editorials-list-cropped.jpg" alt="" width="476" height="536" /></a><p class="wp-caption-text">Summary details of each newspaper</p></div>
<p>You can browse the summary results for the full list of newspapers, or you can drill down to view the actual articles assigned to each category.</p>
<div id="attachment_1616" class="wp-caption aligncenter" style="width: 530px"><a href="http://discontents.com.au/wp-content/uploads/2012/02/editorials-details-cropped.jpg"><img class="size-large wp-image-1616" title="editorials-details-cropped" src="http://discontents.com.au/wp-content/uploads/2012/02/editorials-details-cropped-520x372.jpg" alt="" width="520" height="372" /></a><p class="wp-caption-text">Full details</p></div>
<p>I&#8217;ll save the full analysis for the next post, but if you play around with the results you quickly notice a few things. First, letters to the editor often include the name of the newspaper! If you look at <em>The Mercury</em>, for example, you&#8217;ll notice I&#8217;ve identified 1057 potential editorials &#8212; most of which are letters. Fortunately they should be fairly easy to filter out. In most cases the &#8216;even numbers only&#8217; assumption worked pretty well, and the word length filters did remove quite a lot of false positives. There are still plenty of problems, but I&#8217;m encouraged enough to continue. Yes, there will be a Part #4!</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shed/experiments/extracting-editorials-3/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>2011 &#8212; the year of little sleep</title>
		<link>http://discontents.com.au/words/conference-papers/2011-the-year-of-little-sleep</link>
		<comments>http://discontents.com.au/words/conference-papers/2011-the-year-of-little-sleep#comments</comments>
		<pubDate>Tue, 24 Jan 2012 12:56:50 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[conference presentations]]></category>
		<category><![CDATA[digital humanities]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1580</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=2011+%26%238212%3B+the+year+of+little+sleep&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=conference+presentations&amp;rft.subject=digital+humanities&amp;rft.source=discontents&amp;rft.date=2012-01-24&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/words/conference-papers/2011-the-year-of-little-sleep&amp;rft.language=English"></span>
2011 was a busy year. It&#8217;s hard to believe that it was only February when I first posted about my experiments mining the contents of the Trove newspaper database. Since then I&#8217;ve developed a set of digital tools, organised THATCamp Canberra, given a series of presentations on the possibilities of digital history, pushed ahead with [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=2011+%26%238212%3B+the+year+of+little+sleep&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=conference+presentations&amp;rft.subject=digital+humanities&amp;rft.source=discontents&amp;rft.date=2012-01-24&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/words/conference-papers/2011-the-year-of-little-sleep&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1580"><!-- &nbsp; --></abbr>
<p>2011 was a busy year. It&#8217;s hard to believe that it was only February when I first posted about my experiments <a title="Mining the treasures of Trove (part 1)" href="http://discontents.com.au/shed/mining-the-treasures-of-trove-part-1">mining the contents</a> of the Trove newspaper database. Since then I&#8217;ve developed a set of <a href="http://wraggelabs.com/emporium/">digital tools</a>, organised <a href="http://thatcampcanberra.org">THATCamp Canberra</a>, given a series of presentations on the possibilities of digital history, pushed ahead with <a href="http://invisibleaustralians.org">Invisible Australians</a>, and tried to develop my own digital research program. Oh yes, and endeavoured to earn enough money to feed the kids and pay the mortgage&#8230;</p>
<p>It looks like 2012 could be even busier, so before I lose track completely, I thought I&#8217;d pull together some of the past year&#8217;s exploits for handy reference. So here&#8217;s (most of) my presentations for 2011&#8230;</p>
<p><strong>8 June 2011 &#8212; &#8216;Confessions of an impatient historian&#8217;<br />
</strong><a href="http://www.scholarslab.org/">Scholars&#8217; Lab</a>, University of Virginia</p>
<ul>
<li><a href="http://www.slideshare.net/wragge/confessionspdf">slides</a></li>
<li><a href="http://www.scholarslab.org/podcasts/tim-sherratt-confessions-of-an-impatient-historian/">podcast</a></li>
</ul>
<p><strong>18 August &#8212; &#8216;Digital history: new tools and techniques&#8217;<br />
</strong>National Museum of Australia</p>
<ul>
<li><a href="https://www.zotero.org/groups/digital_history_at_nma_august_2011/items">links in Zotero</a></li>
</ul>
<p><strong>24 August &#8212; &#8216;Hacking the archives&#8217;<br />
</strong><a href="http://recordkeepingroundtable.org/2011/07/21/archival-description-in-an-online-world/">Archival description in an online world</a>, Recordkeeping Roundtable, Sydney</p>
<ul>
<li><a href="http://recordkeepingroundtable.org/2011/09/02/report-on-hacking-the-archives-archival-description-in-an-online-world/">report</a></li>
</ul>
<p><strong>5 September 2011 &#8212; Digital research methods</strong><br />
Cultural heritage students, University of Canberra</p>
<ul>
<li><a href="https://www.zotero.org/groups/university_of_canberra_-_cultural_heritage_-_digital_research_methods/items">links in Zotero</a></li>
</ul>
<p><strong>14 September 2011 &#8212; &#8216;Every story has a beginning&#8217;<br />
</strong>Keynote presentation at the <a href="http://www.anzsi.org/site/2011confprog.asp">Indexing See Change</a> Conference (Australian and New Zealand Society of Editors)</p>
<ul>
<li><a href="http://discontents.com.au/shoebox/every-story-has-a-beginning">full text</a></li>
<li><a href="http://wraggelabs.com/shed/presentations/anzsi/">presentation</a></li>
</ul>
<p><strong>13 November 2011 &#8212; &#8216;Digital history: new tools and techniques&#8217;<br />
</strong><a href="http://dragontails.com.au/">Dragontails 2011</a>: 2nd Australasian conference on overseas Chinese history &amp; heritage, Museum of Chinese Australian History, Melbourne</p>
<ul>
<li><a href="http://www.slideshare.net/wragge/digital-history-new-tools-and-techniques">slides</a></li>
</ul>
<p><strong>30 November 2011 &#8212; &#8216;It&#8217;s all about the stuff&#8217;<br />
</strong><a href="http://ndf.natlib.govt.nz/about/2011-conference.htm">National Digital Forum</a>, Wellington, New Zealand</p>
<ul>
<li><a href="http://discontents.com.au/words/conference-papers/it%e2%80%99s-all-about-the-stuff-collections-interfaces-power-and-people">full text</a></li>
<li><a href="http://discontents.com.au/words/conference-papers/all-about-the-stuff-the-movie">video</a></li>
</ul>
<p><strong>7 December 2011 &#8212; &#8216;An introduction to digital history&#8217;</strong><br />
<a href="http://www.sl.nsw.gov.au/services/public_libraries/professional_development_events/events/digital_december.html">Digital December</a>, State Library of NSW</p>
<ul>
<li><a href="https://docs.google.com/document/d/1wR9-S8QLEUxnnWYC71O7PT_UspGnEWqTCRd17WtHJ1E/edit">links in Google Docs</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/words/conference-papers/2011-the-year-of-little-sleep/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>It&#8217;s all about the stuff &#8212; the movie</title>
		<link>http://discontents.com.au/words/conference-papers/all-about-the-stuff-the-movie</link>
		<comments>http://discontents.com.au/words/conference-papers/all-about-the-stuff-the-movie#comments</comments>
		<pubDate>Mon, 23 Jan 2012 11:42:07 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[conference presentations]]></category>
		<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[hacking]]></category>
		<category><![CDATA[invisibleaustralians]]></category>
		<category><![CDATA[Trove]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1572</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=It%26%238217%3Bs+all+about+the+stuff+%26%238212%3B+the+movie&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=conference+presentations&amp;rft.subject=digital+humanities&amp;rft.source=discontents&amp;rft.date=2012-01-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/words/conference-papers/all-about-the-stuff-the-movie&amp;rft.language=English"></span>
Videos from NDF2011 are now available online. Here&#8217;s the movie version of my talk It&#8217;s all about the stuff. I seem to spend a lot of time in the shadows&#8230;]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=It%26%238217%3Bs+all+about+the+stuff+%26%238212%3B+the+movie&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=conference+presentations&amp;rft.subject=digital+humanities&amp;rft.source=discontents&amp;rft.date=2012-01-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/words/conference-papers/all-about-the-stuff-the-movie&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1572"><!-- &nbsp; --></abbr>
<p>Videos from NDF2011 are now <a href="http://www.r2.co.nz/20111129/">available online</a>. Here&#8217;s the movie version of my talk <a href="http://discontents.com.au/words/conference-papers/it%e2%80%99s-all-about-the-stuff-collections-interfaces-power-and-people" title="It’s all about the stuff: collections, interfaces, power and people">It&#8217;s all about the stuff</a>. I seem to spend a lot of time in the shadows&#8230;</p>
<p><embed src='http://www.r2.co.nz/20111129/player.swf' height='300' width='533' allowscriptaccess='always' allowfullscreen='true' flashvars="&#038;controlbar=over&#038;file=http%3A%2F%2F2009.r2.co.nz%2F20111129%2Ftim-s.mp4&#038;image=http%3A%2F%2Fwww.r2.co.nz%2F20111129%2Fpreview.jpg&#038;plugins=viral-2d"/></p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/words/conference-papers/all-about-the-stuff-the-movie/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>QueryPic</title>
		<link>http://discontents.com.au/shed/hacks/querypic</link>
		<comments>http://discontents.com.au/shed/hacks/querypic#comments</comments>
		<pubDate>Sat, 31 Dec 2011 15:08:12 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[hacks]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[Trove]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1546</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=QueryPic&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=hacks&amp;rft.source=discontents&amp;rft.date=2012-01-01&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/hacks/querypic&amp;rft.language=English"></span>
Back when I was looking at &#8216;When did the Great War become the First World War?&#8216; I promised a detailed post on how I constructed the graphs. But of course I got distracted. Then I started adding new features to the script and redesigning the graphs, so&#8230; Anyway, the result is a rather neat little [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=QueryPic&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=hacks&amp;rft.source=discontents&amp;rft.date=2012-01-01&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/hacks/querypic&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1546"><!-- &nbsp; --></abbr>
<p>Back when I was looking at &#8216;<a title="When did the ‘Great War’ become the ‘First World War’?" href="http://discontents.com.au/shed/experiments/when-did-the-great-war-become-the-first-world-war">When did the Great War become the First World War?</a>&#8216; I promised a detailed post on how I constructed the graphs. But of course I got distracted. Then I started adding new features to the script and redesigning the graphs, so&#8230;</p>
<p>Anyway, the result is a rather neat little gizmo henceforth named <a href="http://wraggelabs.com/emporium/trove-tools/newspaper-search-summariser/">QueryPic</a> (I got a bit sick of &#8216;search summariser&#8217; and &#8216;graph-maker thing&#8217;). <a title="Mining the treasures of Trove (part 2)" href="http://discontents.com.au/shed/experiments/mining-the-treasures-of-trove-part-2">The first version</a> just harvested data and left all the graph-making to you. But QueryPic does it all! It harvests the data <em>and</em> makes the graph. Woohoo.</p>
<p>Here&#8217;s an example showing &#8216;drought&#8217; versus &#8216;flood&#8217;:</p>
<p><a href="http://wraggelabs.com/shed/trove/newgraphs/flood_drought.html"><img class="aligncenter size-medium wp-image-1551" title="Screen Shot 2012-01-01 at 1.53.28 AM" src="http://discontents.com.au/wp-content/uploads/2012/01/Screen-Shot-2012-01-01-at-1.53.28-AM-250x166.png" alt="" width="250" height="166" /></a></p>
<h4>QueryPic features</h4>
<ul>
<li>Explore your Trove newspaper query over time in the form of a simple line graph.</li>
<li>Interactive &#8212; click on a point to retrieve sample articles from that date.</li>
<li>Combine data sources to compare queries.</li>
<li>Choose your interval &#8212; plot by year or month.</li>
<li>Switch views between total results and the proportion of all articles.</li>
</ul>
<h4>Running QueryPic</h4>
<p>Yes, it&#8217;s a Python script and yes it runs on the command line. Let&#8217;s get that out of the way now. I don&#8217;t think I have the time and energy to develop cross-platform gui versions of all my tools. I&#8217;d rather spend the time adding new features or exploring new possibilities. Sorry, but until I have a wealthy benefactor or a technical support team, I think that&#8217;s the way it has to be. In any case, <a href="https://github.com/wragge/Trove-newspapers">the code is all there </a>&#8211; so build your own gui!</p>
<p>Actually, if I did have the time and energy I don&#8217;t think I&#8217;d build a standalone gui anyway. What would be much cooler would be a web service, where people could run, share and combine their queries. Social graph-making! A celebration of serendipity! A historical playground! Hmmm&#8230;</p>
<p>But for now there&#8217;s this python script. It&#8217;s dead easy to use. Starting from the beginning&#8230;</p>
<ol>
<li>Do you have Python installed? If you have a Mac or Linux the answer is yes. Fire up a terminal and type &#8216;python -V&#8217; &#8212; see, I told you. If you have Windows you can get a <a href="http://www.python.org/getit/windows/">handy installer</a>. Do it.</li>
<li>Get the source code. Just <a href="https://github.com/wragge/Trove-newspapers/zipball/master">download this zip file</a> and open it into a new folder.</li>
<li>Open a terminal and cd into the new folder.</li>
<li>Run &#8216;python do_totals.py [your Trove query]&#8216;.</li>
<li>Watch in excitement as the script chugs away retrieving data from Trove.</li>
<li>Once the script is finished, go to the &#8216;graphs&#8217; directory, where you&#8217;ll find your newly-created html page complete with fancy interactive graph.</li>
<li>Open the html page in the web browser of your choice.</li>
<li>Enjoy! Celebrate! Drink a toast in my honour!</li>
</ol>
<h4>Customising QueryPic</h4>
<p>There are a number of optional arguments that you add to the command line to customise your results:</p>
<p><strong>-n (or &#8211;name) [a query name]<br />
</strong>Give a name to your query. The name is used to create filenames for the html and data files, it is also used in the legend of the graph. The default is to use the search keywords as the name.</p>
<p><strong>-d (or &#8211;directory) [a directory path]</strong><br />
The full pathname of the directory/folder for your results. The default is a &#8216;graphs&#8217; sub-directory in the current directory.</p>
<p><strong>-g (or &#8211;graph) [a graph name]</strong><br />
Specify the name of the html file that&#8217;s created. This is useful for displaying multiple queries on a single graph. Just run QueryPic for each query, using the same graph name each time. The default is either the value specified by the -n parameter or a name derived from the search keywords.</p>
<p><strong>-m (or &#8211;monthly)</strong><br />
Plot the query at monthly intervals. The default interval is a year.</p>
<h4>What QueryPic actually does</h4>
<p>QueryPic builds a simple visualisation of your search query in the Trove newspaper database. A list of search results is difficult to interpret and offers little context. QueryPic shows you the number of articles matching your query over time, enabling you reframe your questions, pursue hunches, or simply play around.</p>
<p>QueryPic takes your Trove newspaper query and looks for a date range. If it doesn&#8217;t find one, it assumes you want your graph to go from 1803 to 1954 (the complete contents of the newspaper database &#8212; except for the Women&#8217;s Weekly). QueryPic then strips out any date parameters from the query, so it can fire off the query within the start and end dates, at the specified date interval.</p>
<p>Date interval? In the previous version of this script you could only plot points at yearly intervals, so it was impossible to zoom in an see what might be happening over the span of a single year or two. But amazing advances in QueryPic technology mean you can now plot changes <em>by month</em>. Here for example is a new version of my Great War/First World War graph, focused on 1938&#8211;1946 and plotted at monthly intervals.</p>
<p><a href="http://wraggelabs.com/shed/trove/newgraphs/great_war_1938_46.html"><img class="aligncenter size-medium wp-image-1552" title="Screen Shot 2012-01-01 at 1.55.22 AM" src="http://discontents.com.au/wp-content/uploads/2012/01/Screen-Shot-2012-01-01-at-1.55.22-AM-250x166.png" alt="" width="250" height="166" /></a></p>
<p>So for each interval within the date range QueryPic fires off a request to Trove. From the response it scrapes out the total number of results for that date. If the total is greater than zero, it then fires off a second request to find the total number of newspaper articles for that year. Your query results divided by the total number of articles gives the proportion of articles for that date matching your search query.</p>
<p>The number of results and the proportion are written to a javascript file, together with some other important information including the original query and the date the harvest was performed. Remember, the Trove newspapers database is always changing! QueryPic then grabs a copy of it&#8217;s own special html template and inserts a reference to this javascript file. For good measure, it also inserts a link to your original query. The file is saved under a new name, ready for you to open and explore.</p>
<p>The html file contains everything necessary to take your data and turn it into a graph. It does this using the HighCharts javascript library. Please note, that while licence conditions allow HighCharts to be redistributed as part of a non-commercial package, it is not free for commercial use. Check the <a href="http://www.highcharts.com/">HighCharts website</a> for details.</p>
<h4>Some examples</h4>
<p>Plot &#8216;cat&#8217; against &#8216;dog&#8217; in a graph called &#8216;animals&#8217;:</p>
<pre class="brush: bash; gutter: false">python do_totals.py &quot;http://trove.nla.gov.au/newspaper/result?q=cat&quot; -g &quot;animals&quot;
python do_totals.py &quot;http://trove.nla.gov.au/newspaper/result?q=cat&quot; -g &quot;animals&quot;</pre>
<p>Specify a directory for your results:</p>
<pre class="brush: bash; gutter: false">python do_totals.py &quot;http://trove.nla.gov.au/newspaper/result?q=cat&quot; -d &quot;/User/bill/Documents/graphs&quot;</pre>
<p>Plot results at monthly intervals:</p>
<pre class="brush: bash; gutter: false">python do_totals.py &quot;http://trove.nla.gov.au/newspaper/result?q=cat&amp;fromyyyy=1920&amp;toyyyy=1921&quot; -m</pre>
<p>Specify a name:</p>
<pre class="brush: bash; gutter: false">python do_totals.py &quot;http://trove.nla.gov.au/newspaper/result?q=cat&quot; -n &quot;Felines&quot;</pre>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shed/hacks/querypic/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

