<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>discontents &#187; shoebox</title>
	<atom:link href="http://discontents.com.au/sections/shoebox/feed" rel="self" type="application/rss+xml" />
	<link>http://discontents.com.au</link>
	<description>working for the triumph of content over form, ideas over control, people over systems</description>
	<lastBuildDate>Tue, 24 Jan 2012 20:57:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>2011 &#8212; the year of little sleep</title>
		<link>http://discontents.com.au/words/conference-papers/2011-the-year-of-little-sleep</link>
		<comments>http://discontents.com.au/words/conference-papers/2011-the-year-of-little-sleep#comments</comments>
		<pubDate>Tue, 24 Jan 2012 12:56:50 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[conference presentations]]></category>
		<category><![CDATA[digital humanities]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1580</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=2011+%26%238212%3B+the+year+of+little+sleep&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=conference+presentations&amp;rft.subject=digital+humanities&amp;rft.source=discontents&amp;rft.date=2012-01-24&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/words/conference-papers/2011-the-year-of-little-sleep&amp;rft.language=English"></span>
2011 was a busy year. It&#8217;s hard to believe that it was only February when I first posted about my experiments mining the contents of the Trove newspaper database. Since then I&#8217;ve developed a set of digital tools, organised THATCamp Canberra, given a series of presentations on the possibilities of digital history, pushed ahead with [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=2011+%26%238212%3B+the+year+of+little+sleep&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=conference+presentations&amp;rft.subject=digital+humanities&amp;rft.source=discontents&amp;rft.date=2012-01-24&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/words/conference-papers/2011-the-year-of-little-sleep&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1580"><!-- &nbsp; --></abbr>
<p>2011 was a busy year. It&#8217;s hard to believe that it was only February when I first posted about my experiments <a title="Mining the treasures of Trove (part 1)" href="http://discontents.com.au/shed/mining-the-treasures-of-trove-part-1">mining the contents</a> of the Trove newspaper database. Since then I&#8217;ve developed a set of <a href="http://wraggelabs.com/emporium/">digital tools</a>, organised <a href="http://thatcampcanberra.org">THATCamp Canberra</a>, given a series of presentations on the possibilities of digital history, pushed ahead with <a href="http://invisibleaustralians.org">Invisible Australians</a>, and tried to develop my own digital research program. Oh yes, and endeavoured to earn enough money to feed the kids and pay the mortgage&#8230;</p>
<p>It looks like 2012 could be even busier, so before I lose track completely, I thought I&#8217;d pull together some of the past year&#8217;s exploits for handy reference. So here&#8217;s (most of) my presentations for 2011&#8230;</p>
<p><strong>8 June 2011 &#8212; &#8216;Confessions of an impatient historian&#8217;<br />
</strong><a href="http://www.scholarslab.org/">Scholars&#8217; Lab</a>, University of Virginia</p>
<ul>
<li><a href="http://www.slideshare.net/wragge/confessionspdf">slides</a></li>
<li><a href="http://www.scholarslab.org/podcasts/tim-sherratt-confessions-of-an-impatient-historian/">podcast</a></li>
</ul>
<p><strong>18 August &#8212; &#8216;Digital history: new tools and techniques&#8217;<br />
</strong>National Museum of Australia</p>
<ul>
<li><a href="https://www.zotero.org/groups/digital_history_at_nma_august_2011/items">links in Zotero</a></li>
</ul>
<p><strong>24 August &#8212; &#8216;Hacking the archives&#8217;<br />
</strong><a href="http://recordkeepingroundtable.org/2011/07/21/archival-description-in-an-online-world/">Archival description in an online world</a>, Recordkeeping Roundtable, Sydney</p>
<ul>
<li><a href="http://recordkeepingroundtable.org/2011/09/02/report-on-hacking-the-archives-archival-description-in-an-online-world/">report</a></li>
</ul>
<p><strong>5 September 2011 &#8212; Digital research methods</strong><br />
Cultural heritage students, University of Canberra</p>
<ul>
<li><a href="https://www.zotero.org/groups/university_of_canberra_-_cultural_heritage_-_digital_research_methods/items">links in Zotero</a></li>
</ul>
<p><strong>14 September 2011 &#8212; &#8216;Every story has a beginning&#8217;<br />
</strong>Keynote presentation at the <a href="http://www.anzsi.org/site/2011confprog.asp">Indexing See Change</a> Conference (Australian and New Zealand Society of Editors)</p>
<ul>
<li><a href="http://discontents.com.au/shoebox/every-story-has-a-beginning">full text</a></li>
<li><a href="http://wraggelabs.com/shed/presentations/anzsi/">presentation</a></li>
</ul>
<p><strong>13 November 2011 &#8212; &#8216;Digital history: new tools and techniques&#8217;<br />
</strong><a href="http://dragontails.com.au/">Dragontails 2011</a>: 2nd Australasian conference on overseas Chinese history &amp; heritage, Museum of Chinese Australian History, Melbourne</p>
<ul>
<li><a href="http://www.slideshare.net/wragge/digital-history-new-tools-and-techniques">slides</a></li>
</ul>
<p><strong>30 November 2011 &#8212; &#8216;It&#8217;s all about the stuff&#8217;<br />
</strong><a href="http://ndf.natlib.govt.nz/about/2011-conference.htm">National Digital Forum</a>, Wellington, New Zealand</p>
<ul>
<li><a href="http://discontents.com.au/words/conference-papers/it%e2%80%99s-all-about-the-stuff-collections-interfaces-power-and-people">full text</a></li>
<li><a href="http://discontents.com.au/words/conference-papers/all-about-the-stuff-the-movie">video</a></li>
</ul>
<p><strong>7 December 2011 &#8212; &#8216;An introduction to digital history&#8217;</strong><br />
<a href="http://www.sl.nsw.gov.au/services/public_libraries/professional_development_events/events/digital_december.html">Digital December</a>, State Library of NSW</p>
<ul>
<li><a href="https://docs.google.com/document/d/1wR9-S8QLEUxnnWYC71O7PT_UspGnEWqTCRd17WtHJ1E/edit">links in Google Docs</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/words/conference-papers/2011-the-year-of-little-sleep/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>It&#8217;s all about the stuff &#8212; the movie</title>
		<link>http://discontents.com.au/words/conference-papers/all-about-the-stuff-the-movie</link>
		<comments>http://discontents.com.au/words/conference-papers/all-about-the-stuff-the-movie#comments</comments>
		<pubDate>Mon, 23 Jan 2012 11:42:07 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[conference presentations]]></category>
		<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[hacking]]></category>
		<category><![CDATA[invisibleaustralians]]></category>
		<category><![CDATA[Trove]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1572</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=It%26%238217%3Bs+all+about+the+stuff+%26%238212%3B+the+movie&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=conference+presentations&amp;rft.subject=digital+humanities&amp;rft.source=discontents&amp;rft.date=2012-01-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/words/conference-papers/all-about-the-stuff-the-movie&amp;rft.language=English"></span>
Videos from NDF2011 are now available online. Here&#8217;s the movie version of my talk It&#8217;s all about the stuff. I seem to spend a lot of time in the shadows&#8230;]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=It%26%238217%3Bs+all+about+the+stuff+%26%238212%3B+the+movie&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=conference+presentations&amp;rft.subject=digital+humanities&amp;rft.source=discontents&amp;rft.date=2012-01-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/words/conference-papers/all-about-the-stuff-the-movie&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1572"><!-- &nbsp; --></abbr>
<p>Videos from NDF2011 are now <a href="http://www.r2.co.nz/20111129/">available online</a>. Here&#8217;s the movie version of my talk <a href="http://discontents.com.au/words/conference-papers/it%e2%80%99s-all-about-the-stuff-collections-interfaces-power-and-people" title="It’s all about the stuff: collections, interfaces, power and people">It&#8217;s all about the stuff</a>. I seem to spend a lot of time in the shadows&#8230;</p>
<p><embed src='http://www.r2.co.nz/20111129/player.swf' height='300' width='533' allowscriptaccess='always' allowfullscreen='true' flashvars="&#038;controlbar=over&#038;file=http%3A%2F%2F2009.r2.co.nz%2F20111129%2Ftim-s.mp4&#038;image=http%3A%2F%2Fwww.r2.co.nz%2F20111129%2Fpreview.jpg&#038;plugins=viral-2d"/></p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/words/conference-papers/all-about-the-stuff-the-movie/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>QueryPic</title>
		<link>http://discontents.com.au/shed/hacks/querypic</link>
		<comments>http://discontents.com.au/shed/hacks/querypic#comments</comments>
		<pubDate>Sat, 31 Dec 2011 15:08:12 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[hacks]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[Trove]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1546</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=QueryPic&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=hacks&amp;rft.source=discontents&amp;rft.date=2012-01-01&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/hacks/querypic&amp;rft.language=English"></span>
Back when I was looking at &#8216;When did the Great War become the First World War?&#8216; I promised a detailed post on how I constructed the graphs. But of course I got distracted. Then I started adding new features to the script and redesigning the graphs, so&#8230; Anyway, the result is a rather neat little [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=QueryPic&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=hacks&amp;rft.source=discontents&amp;rft.date=2012-01-01&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/hacks/querypic&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1546"><!-- &nbsp; --></abbr>
<p>Back when I was looking at &#8216;<a title="When did the ‘Great War’ become the ‘First World War’?" href="http://discontents.com.au/shed/experiments/when-did-the-great-war-become-the-first-world-war">When did the Great War become the First World War?</a>&#8216; I promised a detailed post on how I constructed the graphs. But of course I got distracted. Then I started adding new features to the script and redesigning the graphs, so&#8230;</p>
<p>Anyway, the result is a rather neat little gizmo henceforth named <a href="http://wraggelabs.com/emporium/trove-tools/newspaper-search-summariser/">QueryPic</a> (I got a bit sick of &#8216;search summariser&#8217; and &#8216;graph-maker thing&#8217;). <a title="Mining the treasures of Trove (part 2)" href="http://discontents.com.au/shed/experiments/mining-the-treasures-of-trove-part-2">The first version</a> just harvested data and left all the graph-making to you. But QueryPic does it all! It harvests the data <em>and</em> makes the graph. Woohoo.</p>
<p>Here&#8217;s an example showing &#8216;drought&#8217; versus &#8216;flood&#8217;:</p>
<p><a href="http://wraggelabs.com/shed/trove/newgraphs/flood_drought.html"><img class="aligncenter size-medium wp-image-1551" title="Screen Shot 2012-01-01 at 1.53.28 AM" src="http://discontents.com.au/wp-content/uploads/2012/01/Screen-Shot-2012-01-01-at-1.53.28-AM-250x166.png" alt="" width="250" height="166" /></a></p>
<h4>QueryPic features</h4>
<ul>
<li>Explore your Trove newspaper query over time in the form of a simple line graph.</li>
<li>Interactive &#8212; click on a point to retrieve sample articles from that date.</li>
<li>Combine data sources to compare queries.</li>
<li>Choose your interval &#8212; plot by year or month.</li>
<li>Switch views between total results and the proportion of all articles.</li>
</ul>
<h4>Running QueryPic</h4>
<p>Yes, it&#8217;s a Python script and yes it runs on the command line. Let&#8217;s get that out of the way now. I don&#8217;t think I have the time and energy to develop cross-platform gui versions of all my tools. I&#8217;d rather spend the time adding new features or exploring new possibilities. Sorry, but until I have a wealthy benefactor or a technical support team, I think that&#8217;s the way it has to be. In any case, <a href="https://github.com/wragge/Trove-newspapers">the code is all there </a>&#8211; so build your own gui!</p>
<p>Actually, if I did have the time and energy I don&#8217;t think I&#8217;d build a standalone gui anyway. What would be much cooler would be a web service, where people could run, share and combine their queries. Social graph-making! A celebration of serendipity! A historical playground! Hmmm&#8230;</p>
<p>But for now there&#8217;s this python script. It&#8217;s dead easy to use. Starting from the beginning&#8230;</p>
<ol>
<li>Do you have Python installed? If you have a Mac or Linux the answer is yes. Fire up a terminal and type &#8216;python -V&#8217; &#8212; see, I told you. If you have Windows you can get a <a href="http://www.python.org/getit/windows/">handy installer</a>. Do it.</li>
<li>Get the source code. Just <a href="https://github.com/wragge/Trove-newspapers/zipball/master">download this zip file</a> and open it into a new folder.</li>
<li>Open a terminal and cd into the new folder.</li>
<li>Run &#8216;python do_totals.py [your Trove query]&#8216;.</li>
<li>Watch in excitement as the script chugs away retrieving data from Trove.</li>
<li>Once the script is finished, go to the &#8216;graphs&#8217; directory, where you&#8217;ll find your newly-created html page complete with fancy interactive graph.</li>
<li>Open the html page in the web browser of your choice.</li>
<li>Enjoy! Celebrate! Drink a toast in my honour!</li>
</ol>
<h4>Customising QueryPic</h4>
<p>There are a number of optional arguments that you add to the command line to customise your results:</p>
<p><strong>-n (or &#8211;name) [a query name]<br />
</strong>Give a name to your query. The name is used to create filenames for the html and data files, it is also used in the legend of the graph. The default is to use the search keywords as the name.</p>
<p><strong>-d (or &#8211;directory) [a directory path]</strong><br />
The full pathname of the directory/folder for your results. The default is a &#8216;graphs&#8217; sub-directory in the current directory.</p>
<p><strong>-g (or &#8211;graph) [a graph name]</strong><br />
Specify the name of the html file that&#8217;s created. This is useful for displaying multiple queries on a single graph. Just run QueryPic for each query, using the same graph name each time. The default is either the value specified by the -n parameter or a name derived from the search keywords.</p>
<p><strong>-m (or &#8211;monthly)</strong><br />
Plot the query at monthly intervals. The default interval is a year.</p>
<h4>What QueryPic actually does</h4>
<p>QueryPic builds a simple visualisation of your search query in the Trove newspaper database. A list of search results is difficult to interpret and offers little context. QueryPic shows you the number of articles matching your query over time, enabling you reframe your questions, pursue hunches, or simply play around.</p>
<p>QueryPic takes your Trove newspaper query and looks for a date range. If it doesn&#8217;t find one, it assumes you want your graph to go from 1803 to 1954 (the complete contents of the newspaper database &#8212; except for the Women&#8217;s Weekly). QueryPic then strips out any date parameters from the query, so it can fire off the query within the start and end dates, at the specified date interval.</p>
<p>Date interval? In the previous version of this script you could only plot points at yearly intervals, so it was impossible to zoom in an see what might be happening over the span of a single year or two. But amazing advances in QueryPic technology mean you can now plot changes <em>by month</em>. Here for example is a new version of my Great War/First World War graph, focused on 1938&#8211;1946 and plotted at monthly intervals.</p>
<p><a href="http://wraggelabs.com/shed/trove/newgraphs/great_war_1938_46.html"><img class="aligncenter size-medium wp-image-1552" title="Screen Shot 2012-01-01 at 1.55.22 AM" src="http://discontents.com.au/wp-content/uploads/2012/01/Screen-Shot-2012-01-01-at-1.55.22-AM-250x166.png" alt="" width="250" height="166" /></a></p>
<p>So for each interval within the date range QueryPic fires off a request to Trove. From the response it scrapes out the total number of results for that date. If the total is greater than zero, it then fires off a second request to find the total number of newspaper articles for that year. Your query results divided by the total number of articles gives the proportion of articles for that date matching your search query.</p>
<p>The number of results and the proportion are written to a javascript file, together with some other important information including the original query and the date the harvest was performed. Remember, the Trove newspapers database is always changing! QueryPic then grabs a copy of it&#8217;s own special html template and inserts a reference to this javascript file. For good measure, it also inserts a link to your original query. The file is saved under a new name, ready for you to open and explore.</p>
<p>The html file contains everything necessary to take your data and turn it into a graph. It does this using the HighCharts javascript library. Please note, that while licence conditions allow HighCharts to be redistributed as part of a non-commercial package, it is not free for commercial use. Check the <a href="http://www.highcharts.com/">HighCharts website</a> for details.</p>
<h4>Some examples</h4>
<p>Plot &#8216;cat&#8217; against &#8216;dog&#8217; in a graph called &#8216;animals&#8217;:</p>
<pre class="brush: bash; gutter: false">python do_totals.py &quot;http://trove.nla.gov.au/newspaper/result?q=cat&quot; -g &quot;animals&quot;
python do_totals.py &quot;http://trove.nla.gov.au/newspaper/result?q=cat&quot; -g &quot;animals&quot;</pre>
<p>Specify a directory for your results:</p>
<pre class="brush: bash; gutter: false">python do_totals.py &quot;http://trove.nla.gov.au/newspaper/result?q=cat&quot; -d &quot;/User/bill/Documents/graphs&quot;</pre>
<p>Plot results at monthly intervals:</p>
<pre class="brush: bash; gutter: false">python do_totals.py &quot;http://trove.nla.gov.au/newspaper/result?q=cat&amp;fromyyyy=1920&amp;toyyyy=1921&quot; -m</pre>
<p>Specify a name:</p>
<pre class="brush: bash; gutter: false">python do_totals.py &quot;http://trove.nla.gov.au/newspaper/result?q=cat&quot; -n &quot;Felines&quot;</pre>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shed/hacks/querypic/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Extracting editorials #2</title>
		<link>http://discontents.com.au/shed/hacks/extracting-editorials-2</link>
		<comments>http://discontents.com.au/shed/hacks/extracting-editorials-2#comments</comments>
		<pubDate>Mon, 19 Dec 2011 13:18:49 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[experiments]]></category>
		<category><![CDATA[hacks]]></category>
		<category><![CDATA[1913editorials]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[Trove]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1515</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Extracting+editorials+%232&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=experiments&amp;rft.subject=hacks&amp;rft.source=discontents&amp;rft.date=2011-12-19&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/hacks/extracting-editorials-2&amp;rft.language=English"></span>
As I explained in the first of this series, I&#8217;m documenting my efforts to extract every editorial published in the Sydney Morning Herald in 1913 from the Trove newspaper database. It&#8217;s an experiment both in text mining and historical writing &#8212; an attempt to put the method up front. While I didn&#8217;t think there was anything [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Extracting+editorials+%232&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=experiments&amp;rft.subject=hacks&amp;rft.source=discontents&amp;rft.date=2011-12-19&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/hacks/extracting-editorials-2&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1515"><!-- &nbsp; --></abbr>
<p>As I explained in <a title="Extracting editorials #1" href="http://discontents.com.au/shoebox/digital-humanities/extracting-editorials-1">the first of this series</a>, I&#8217;m documenting my efforts to extract every editorial published in the <em>Sydney Morning Herald</em> in 1913 from the Trove newspaper database. It&#8217;s an experiment both in text mining and historical writing &#8212; an attempt to put the method up front.</p>
<p>While I didn&#8217;t think there was anything very thrilling in the first instalment, recording my thoughts and assumptions in this way has already proved useful. In a comment, <a href="http://discontents.com.au/shoebox/digital-humanities/extracting-editorials-1#comment-2371">Owen Stephens noted</a> that his attempt to reproduce my search query produced fewer results. After a little bit of poking around I realised that the fulltext modifier, which I often use to switch off fuzzy matching, counteracts the &#8216;search headings only&#8217; flag. So my query was returning results that had the string &#8216;The Sydney Morning Herald&#8217; anywhere in the article.</p>
<p>Try it for yourself.</p>
<p><a href="http://trove.nla.gov.au/newspaper/result?l-textSearchScope=headings+only%7Cscope%3Aheadings&amp;l-title=The+Sydney+Morning+Herald...%7Ctitleid%3A35&amp;l-word=*ignore*%7C*ignore*&amp;fromyyyy=1913&amp;toyyyy=1913&amp;sortby=dateAsc&amp;q=fulltext%3A%22The+Sydney+Morning+Herald%22&amp;l-category=Article%7Ccategory%3AArticle&amp;s=0">Here&#8217;s my original query</a> &#8212; searching for fulltext:&#8221;The Sydney Morning Herald&#8221; in headings only (supposedly). You&#8217;ll notice that it returns 335 results and it&#8217;s clear from a quick scan that a number are false positives (they don&#8217;t follow the pattern for editorials).</p>
<p><a href="http://trove.nla.gov.au/newspaper/result?l-textSearchScope=headings+only%7Cscope%3Aheadings&amp;l-title=The+Sydney+Morning+Herald...%7Ctitleid%3A35&amp;l-word=*ignore*%7C*ignore*&amp;fromyyyy=1913&amp;toyyyy=1913&amp;sortby=dateAsc&amp;l-category=Article%7Ccategory%3AArticle&amp;q=%22The+Sydney+Morning+Herald%22">Here&#8217;s Owen&#8217;s query</a> &#8212; searching for &#8220;The Sydney Morning Herald&#8221; in headings only. It returns 294 results, without any obvious false positives.</p>
<p>So my attempt to disable fuzzy matching actually produced a less accurate result! Weird.</p>
<p>Actually, I think one important benefit of this sort of text mining is that it helps you understand how the search engines you&#8217;re using actually work. Once you start poking and prodding, the idiosyncrasies start to emerge.</p>
<p>Anyway, I harvested Owen&#8217;s cleaner result set and opened up the resulting csv file. As it seemed in Trove, there we&#8217;re very few false positives. Indeed there were only two articles that didn&#8217;t seem to follow the standard editorial format, and these were notes added to the editorial page. On the other hand, there were obviously about 20 editorials missing. I could have manually worked through the csv file to identify the missing dates, but I thought I&#8217;d try to create some tools that would do the work for me.</p>
<p>What I wanted was the details of the first editorial in every edition of the newspaper in 1913 &#8212; so there should be one, and only one, article for each day on which the newspaper was published. I needed a tool that would analyse the csv file and do two things:</p>
<ul>
<li>identify dates that occur multiple times (false positive alert!)</li>
<li>identify dates that are absent from the result set (missing in action!)</li>
</ul>
<p>The resulting code is <a href="https://github.com/wragge/Trove-newspapers">all on GitHub</a> if you want follow along. I wrote a Python script that opens up the csv file, extracts all the date strings, converts them to datetime objects and then saves them to a list. Once that&#8217;s done it&#8217;s pretty easy to loop through and find duplicates:</p>
<pre class="brush: python">
def find_duplicates(list):
    &#039;&#039;&#039;
    Check a list for suplicate values.
    Returns a list of the duplicates.
    &#039;&#039;&#039;
    seen = set()
    duplicates = []
    for item in list:
        if item in seen:
            duplicates.append(item)
        seen.add(item)
    return duplicates
</pre>
<p>Finding missing dates was a little more complicated, but Google came to the rescue with some handy code samples. All I had to do was set a start and end date (in this case 1 January 1913 and 31 December 1913) and create a timedelta object equal to a day. Then it&#8217;s just a matter of adding the timedelta to the start date, comparing the new date to the dates extracted from the csv file, and continuing on until you hit the end. If the new date isn&#8217;t in the csv file, then it gets added to the missing list.</p>
<pre class="brush: python">
if year:
        start_date = datetime.date(year, 1, 1)
        end_date = datetime.date(year, 12, 31)
    else:
        start_date = article_dates[0]
        end_date = article_dates[-1]
    one_day = datetime.timedelta(days=1)
    this_day = start_date
    # Loop through each day in specified period to see if there&#039;s an article
    # If not, add to the missing_dates list.
    while this_day &lt;= end_date:
        if this_day.weekday() not in exclude: #exclude Sunday
            if this_day not in article_dates:
                missing_dates.append(this_day)
        this_day += one_day
</pre>
<p>I&#8217;ve tried to make the code as reusable as possible, so you can either supply a year, or the script will read start and end dates from the csv file itself.</p>
<p>All that left me with two more lists of dates: &#8216;duplicates&#8217; and &#8216;missing&#8217;. At first I just wrote these out to a text file, but then I decided it would be useful to write the results to an html page. That way I could add links that would take me to the actual issue within Trove, helping me to quickly find the missing editorial.</p>
<p>Unfortunately there&#8217;s no direct way to go from a date to an issue &#8212; you first need to find the issue identifier. How do you do this? If you dig around in the code beneath <a href="http://trove.nla.gov.au/ndp/del/title/35">the page for each newspaper title</a>, you&#8217;ll find that the ajax interface pulls in a json file with issue information. You can access this through a url like: http://trove.nla.gov.au/ndp/del/titlesOverDates/[year]/[month]. Here&#8217;s an example for <a href="http://trove.nla.gov.au/ndp/del/titlesOverDates/1913/01">January 1913</a>.</p>
<p>The json includes all issues for all titles in the specified month. So you then have to loop through to find a specific title and day. Once you have the issue identifier you can just attach it to a url:</p>
<pre class="brush: python">
def get_issue_url(date, title_id):
    &#039;&#039;&#039;
    Gets the issue url given a title and date.
    &#039;&#039;&#039;
    year, month, day = date.timetuple()[:3]
    url = &#039;http://trove.nla.gov.au/ndp/del/titlesOverDates/%s/%02d&#039; % (year, month)
    issues = json.load(urllib2.urlopen(url))
    for issue in issues:
        if issue[&#039;t&#039;] == title_id and int(issue[&#039;p&#039;]) == day:
            issue_id = issue[&#039;iss&#039;]
    return &#039;http://trove.nla.gov.au/ndp/del/issue/%s&#039; % issue_id
</pre>
<div id="attachment_1533" class="wp-caption alignright" style="width: 260px"><a href="http://discontents.com.au/wp-content/uploads/2011/12/Screen-Shot-2011-12-19-at-4.43.15-PM1.png"><img src="http://discontents.com.au/wp-content/uploads/2011/12/Screen-Shot-2011-12-19-at-4.43.15-PM1-250x469.png" alt="" title="Screen Shot 2011-12-19 at 4.43.15 PM" width="250" height="469" class="size-medium wp-image-1533" /></a><p class="wp-caption-text">My results file with links to Trove</p></div>
<p>Finally, to save myself having to cut and paste the missing dates back into the csv file, I added a few lines to write them in automatically.</p>
<p>So now I have a handy little html page, complete with dates and links, that I&#8217;m working through to find all the missing editorials. All I need for the next stage are the urls for the editorial and the page on which it&#8217;s published. I&#8217;m just cutting and pasting these from the citation box in Trove into the csv file. Once this is done I can start trying to find <strong>all</strong> the editorials.</p>
<p>PS: I noted in my first post that one benefit in finding the editorials was that the main news articles usually appeared on the page after the editorials. I&#8217;ve been thinking some more about ways to identify &#8216;major&#8217; news stories. Word length perhaps? But not always. Hmmm, but major stories do seem to be published at the top of the page. After a bit more poking around in the code I found that there&#8217;s a &#8216;y value&#8217; assigned to each article that indicates its position on the page. So if I harvest all the articles on the page after the editorials and then rank them by their y values? Interesting&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shed/hacks/extracting-editorials-2/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>It’s all about the stuff: collections, interfaces, power and people</title>
		<link>http://discontents.com.au/words/conference-papers/it%e2%80%99s-all-about-the-stuff-collections-interfaces-power-and-people</link>
		<comments>http://discontents.com.au/words/conference-papers/it%e2%80%99s-all-about-the-stuff-collections-interfaces-power-and-people#comments</comments>
		<pubDate>Thu, 01 Dec 2011 09:52:03 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[archives]]></category>
		<category><![CDATA[conference presentations]]></category>
		<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[APIs]]></category>
		<category><![CDATA[hacking]]></category>
		<category><![CDATA[interfaces]]></category>
		<category><![CDATA[invisibleaustralians]]></category>
		<category><![CDATA[linked data]]></category>
		<category><![CDATA[White Australia]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1475</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=It%E2%80%99s+all+about+the+stuff%3A+collections%2C+interfaces%2C+power+and+people&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=archives&amp;rft.subject=conference+presentations&amp;rft.subject=digital+humanities&amp;rft.source=discontents&amp;rft.date=2011-12-01&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/words/conference-papers/it%e2%80%99s-all-about-the-stuff-collections-interfaces-power-and-people&amp;rft.language=English"></span>
This is the full version of a paper I presented at the National Digital Forum, 30 November 2011. In 1901, one of the first acts of the Commonwealth of Australia was to create a system of exclusion and control designed to keep the newly-formed nation ‘white’. But White Australia was always a myth. As well [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=It%E2%80%99s+all+about+the+stuff%3A+collections%2C+interfaces%2C+power+and+people&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=archives&amp;rft.subject=conference+presentations&amp;rft.subject=digital+humanities&amp;rft.source=discontents&amp;rft.date=2011-12-01&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/words/conference-papers/it%e2%80%99s-all-about-the-stuff-collections-interfaces-power-and-people&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1475"><!-- &nbsp; --></abbr>
<p><em>This is the full version of a paper I presented at the <a href="http://ndf.natlib.govt.nz/about/2011-conference.htm">National Digital Forum</a>, 30 November 2011.</em></p>
<p>In 1901, one of the first acts of the Commonwealth of Australia was to create a system of exclusion and control designed to keep the newly-formed nation ‘white’. But White Australia was always a myth. As well as the Indigenous population, there were already many thousands of people classified as ‘non-white‘ living in Australia &#8212; most were Chinese, but there were also Japanese, Indians, Syrians and Indonesians.</p>
<p>Here are some of them&#8230;</p>
<div id="attachment_1481" class="wp-caption aligncenter" style="width: 260px"><a href="http://invisibleaustralians.org/faces/"><img class="size-medium wp-image-1481" title="the stuff.002" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.002-250x187.jpg" alt="" width="250" height="187" /></a><p class="wp-caption-text">The real face of White Australia</p></div>
<p>The administration of what became known as the White Australia Policy created a huge volume of records, much of which is still preserved within the <a href="http://naa.gov.au">National Archives of Australia</a>. These photographs are attached to certificates that non-white residents needed to get back into the country if they decided to travel overseas. There are thousands upon thousands of these certificates in the Archives. Thousands of certificates representing thousands of lives &#8212; all monitored and controlled.</p>
<p>But is is too easy to see these people as the powerless victims of a repressive system. There were many acts of resistance. Some argued against the need to be identified ‘just like a criminal’. Others exercised control over their representation, submitting formal studio portraits instead of mug shots.</p>
<p><a href="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.003.jpg"><img class="aligncenter size-medium wp-image-1484" title="the stuff.003" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.003-250x187.jpg" alt="" width="250" height="187" /></a></p>
<p>Most commonly and most powerfully, people resisted the policy simply by going ahead and living rich and productive lives.</p>
<p>My partner, <a href="http://chineseaustralia.org/">Kate Bagnall</a>, is helping to rewrite Australian-Chinese history by overthrowing the stereotype of the culturally isolated Chinese man living a lonely, meagre existence surrounded by gambling and opium dens. By mining the available records, by reading against the grain of contemporary reports and by working with family historians, Kate is documenting their intimate lives &#8212; their wives, their lovers, their families and descendants &#8212; the sorts of relationships that sent a shudder through the edifice of White Australia. Power can be reclaimed in many subtle and subversive ways.</p>
<p>‘The real face of White Australia’ <a title="the real face of white australia" href="http://discontents.com.au/shoebox/archives-shoebox/the-real-face-of-white-australia">is an experiment</a>. It uses facial detection to technology to find and extract the photographs from digital copies of the original certificates made available through the National Archives of Australia’s collection database. The photographs you see here come from just one series, ST84/1. There’s no API to the collection so I reverse-engineered the web interface to create a script that would harvest the item metadata and download copies of all the digitised images. There are 2,756 files in this series. On the day I harvested the metadata, 347 of those files had been digitised, comprising 12,502 images. It took a few hours, but I just ran my script and soon I had a copy of all of this in my local database.</p>
<p>Then came the exciting part. Using a facial detection script I found through Google and an open source computer vision library, I started experimenting with ways of extracting the photos. After a few tweaks I had something that worked pretty well, so I pointed my aging laptop at the 12,502 images and watched anxiously as the CPU temperature rose and rose. It took a few emergency cooling measures, but the laptop survived and I had a folder containing 11,170 cropped images. About a third of these weren’t actually faces, but it was easy to manually remove the false positives, leaving 7,247 photos.</p>
<p><a href="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.002.jpg"><img class="aligncenter size-medium wp-image-1481" title="the stuff.002" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.002-250x187.jpg" alt="" width="250" height="187" /></a></p>
<p>These photos. These people.</p>
<p>With my database fully primed and loaded it was just a matter of creating a simple web interface using Django for the backend and Isotope (a jQuery plugin) at the front. Both are open source projects. All together, from idea to interface, it took a bit more than a weekend to create, and most of that was waiting for the harvesting and facial detection scripts to complete. It would be silly to say it was easy, but I would say that <em>it wasn’t hard</em>.</p>
<p>What we ended up with was a new way of seeing and understanding the records &#8212; not as the remnants of bureaucratic processes, but as windows onto the lives of people. All the faces are linked to copies of the original certificates and back to the collection database of the National Archives. So this is also a finding aid. A finding aid that brings the people to the front.</p>
<p>According to Margaret Hedstrom the archival interface ‘is a site where power is negotiated and exercised’. Whether in a reading room or online, finding aids or collection databases are ‘neither neutral nor transparent’, but the product of ‘conscious design decisions’. We would like to think that this interface gives some power back to the people within the records. Their photographs challenge us to do something, to think something, to feel something. We cannot escape their discomfiting gaze.</p>
<p>But this interface represents another subtle shift in power. We could create it without any explicit assistance or involvement by the National Archives itself. Simply by putting part of the collection online, they provided us with the opportunity to develop a resource that both extends and critiques the existing collection database. Interfaces to cultural heritage collections are no longer controlled solely by cultural heritage institutions.</p>
<p>It’s these two aspects of the power of interfaces that I want to focus on today.</p>
<p>There are a growing number of examples where the records created by repressive or discriminatory regimes have, in Eric Ketelaar’s words, ‘become instruments of empowerment and liberation, salvation and freedom’. Nazi records of assets confiscated during the Holocaust have been used to inform processes of restitution and reparation. Government records have helped members of Australia’s Stolen Generations trace family members. Descendants of inmates incarcerated by American colonial authorities in what was the world’s largest leprosy colony in the Philippines, have embraced the administrative record as an affirmation of their own heritage and survival. Records can find new meanings. Power can be reclaimed.</p>
<p>Technology can help. <a href="http://historyonics.blogspot.com/">Tim Hitchcock</a> has described how something as simple as keyword searching can turn archives on their heads. Recordkeeping systems tend to reflect the structures and power relations of the organisations that create them. The ‘hierarchical and institutional nature of most archives’, Hitchcock argues, ‘contains an ideological component which is sucked in with every dust-filled breath’. But digitisation and keyword searching free us from having to follow the well-worn paths of institutional power. We can find people and follow their lives against the flow of bureaucratic convenience. We can gain a wholly new perspective on the workings of society. ‘What changes’, Hitchcock asks, ‘when we examine the world through the collected fragments of knowledge that we can recover about a single person, reorganised as a biographical narrative, rather than as part of an archival system?’</p>
<p>Projects such as <a href="http://unknownnolonger.vahistorical.org/">Unknown no longer</a> may help us answer that question.</p>
<div id="attachment_1488" class="wp-caption aligncenter" style="width: 260px"><a href="http://unknownnolonger.vahistorical.org/"><img class="size-medium wp-image-1488" title="the stuff.006" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.006-250x187.jpg" alt="" width="250" height="187" /></a><p class="wp-caption-text">Unknown no longer</p></div>
<p>It’s aiming to extract the names and biographical details of slaves from the 8 million manuscript documents held by the Virginia Historical Society. The documents include court records, receipts, wills and inventories. Here is a page from the ‘Inventory of Negroes at Berry Plain Plantation, King George County, Virginia’ for 1855, listing names, occupations and <em>valuations</em>.</p>
<p><a href="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.007.jpg"><img class="aligncenter size-medium wp-image-1489" title="the stuff.007" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.007-250x187.jpg" alt="" width="250" height="187" /></a></p>
<p>Tim Hitchcock is one of the directors of <a href="http://www.londonlives.org/">London Lives</a> a project that similarly seeks to find the people in 240,000 manuscript pages documenting the lives of plebeian Londoners in the 17th century.</p>
<div id="attachment_1491" class="wp-caption aligncenter" style="width: 260px"><a href="http://www.londonlives.org/"><img class="size-medium wp-image-1491" title="the stuff.008" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.008-250x187.jpg" alt="" width="250" height="187" /></a><p class="wp-caption-text">London Lives</p></div>
<p>More than three million names have already been extracted from the records of courts, workhouses, hospitals and other institutions. Work is continuing to link these names together, to merge these various shards of identity and piece together the experiences of London’s poorest inhabitants.</p>
<p><a href="http://rememberme.ushmm.org/">Remember me</a> from the US Holocaust Memorial Museum is working with photographs taken by relief agencies in the aftermath of World War Two. The photographs are of displaced children who survived the Holocaust but were separated from families. What happened to them? The project is seeking public help to identify and trace the children.</p>
<div id="attachment_1492" class="wp-caption aligncenter" style="width: 260px"><a href="http://rememberme.ushmm.org/"><img class="size-medium wp-image-1492" title="the stuff.009" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.009-250x187.jpg" alt="" width="250" height="187" /></a><p class="wp-caption-text">Remember me</p></div>
<p>These are all projects about finding people. Finding the oppressed, the vulnerable, the displaced, the marginalized and the poor and giving them their place in history. This is what Kate and I hope to do with <a href="http://invisibleaustralians.org/">Invisible Australians</a>, the broader project of which our faces experiment is part.</p>
<div id="attachment_1493" class="wp-caption aligncenter" style="width: 260px"><a href="http://invisibleaustralians.org"><img class="size-medium wp-image-1493" title="the stuff.010" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.010-250x187.jpg" alt="" width="250" height="187" /></a><p class="wp-caption-text">Invisible Australians</p></div>
<p>&#8216;Invisible Australians&#8217; aims to extract more than just photographs. We want to record and aggregate the biographical data contained within the records of the White Australia Policy &#8212; to extract the data and rebuild identities.</p>
<p><a href="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.011.jpg"><img class="aligncenter size-medium wp-image-1494" title="the stuff.011" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.011-250x187.jpg" alt="" width="250" height="187" /></a></p>
<p>But <a title="Liberating lives: invisible Australians and biographical networks" href="http://discontents.com.au/shoebox/archives-shoebox/liberating-lives">we want to do more</a>, we want to link these identities up with with other records, with the research of family and local historians, with cemetery registers and family trees, with newspaper articles and databases we don&#8217;t even know about yet. We want to find people, families and communities.</p>
<p>It&#8217;s ridiculously ambitious and totally unfunded. But it is possible.</p>
<p>The most exciting part of online technology is the power it gives to people to pursue their passions. As with the faces, we don&#8217;t need the help of the National Archives. We need the records to be digitized, but that&#8217;s happening anyway and we can afford to be patient. Most of the tools we need already exist, and are free. In the past 12 months, for example, there have been a number of open source tools released for crowd-sourced transcription of manuscript records.</p>
<p>People with passions, people with dreams, people who are just annoyed and impatient, don&#8217;t have to wait for cultural institutions to create exactly what they need. They can take what&#8217;s on offer and change it.</p>
<p>Interfaces can be modified. It is amazingly easy to write a script that will change the way a web page looks and behaves in your browser. I was frustrated by the standard interface to digitized files in the National Archives of Australia&#8217;s Recordsearch database &#8212; so I changed it.</p>
<div id="attachment_1495" class="wp-caption aligncenter" style="width: 260px"><a href="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.012.jpg"><img class="size-medium wp-image-1495" title="the stuff.012" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.012-250x187.jpg" alt="" width="250" height="187" /></a><p class="wp-caption-text">Before and after</p></div>
<p>Not only did make it look a bit nicer, I added new functions. My script lets you print a whole file or a range of pages and display the entire contents of the file on a pretty cool 3d wall.</p>
<p><a href="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.013.jpg"><img class="aligncenter size-medium wp-image-1496" title="the stuff.013" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.013-250x187.jpg" alt="" width="250" height="187" /></a></p>
<p>I&#8217;ve shared this script, and <a href="http://wraggelabs.com/emporium/">a few other Recordsearch enhancements</a>. Anyone can install them with a click and use them.</p>
<div id="attachment_1497" class="wp-caption aligncenter" style="width: 260px"><a href="http://wraggelabs.com/emporium/"><img class="size-medium wp-image-1497" title="the stuff.014" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.014-250x187.jpg" alt="" width="250" height="187" /></a><p class="wp-caption-text">Wragge Labs Emporium</p></div>
<p>Interfaces are sites of power and we can claim some of that power for ourselves. Online technologies not only free us from the having to brave the physical intimidation of the reading room, they free us up to engage with the records in new ways. The archivist-on-duty would probably not be pleased if I pulled out some scissors and started snipping photos out of certificates. Or if I pulled a file apart and pasted it&#8217;s contents on the wall. But online we are free to experiment.</p>
<p>The power of cultural heritage organisations is perhaps expressed most forcefully in their ability to control the arrangement and description of their collections. ‘Every representation, every model of description, is biased’, note Verne Harris and Wendy Duff, ‘because it reflects a particular world-view and is constructed to meet specific purposes’. Archives, libraries and museums are already starting to share this power, by allowing tagging, or seeking public assistance with description through crowd sourcing projects. But most of the these activities still happen within spaces created and curated by the institutions themselves. Our cathedrals of culture might be opening their doors and inviting the public to participate in their ceremonies, but that doesn&#8217;t make them bazaars. The architecture stills speaks of authority.</p>
<p>In any case, people already have a space where they can explore and enrich collections &#8212; it’s called the internet.</p>
<p>It would be great to see cultural institutions doing more to watch, understand and support what people are doing with collections in their own spaces &#8212; following them as they pursue their passions, rather than thinking of ways to motivate them.</p>
<p>A quick example&#8230; You might have heard of <a href="http://zotero.org/">Zotero</a>, it&#8217;s an open source project that lets you capture, annotate and organize your research materials.</p>
<div id="attachment_1505" class="wp-caption aligncenter" style="width: 260px"><a href="http://zotero.org"><img class="size-medium wp-image-1505" title="the stuff.015" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.015-250x187.jpg" alt="" width="250" height="187" /></a><p class="wp-caption-text">Zotero</p></div>
<p>One cool thing about Zotero is that you can build and contribute little screen scrapers, called translators, that let Zotero extract structured data from any old collection database. You might not be surprised to learn that I&#8217;ve created a translator for Recordsearch. Another cool thing about Zotero is that you can share the stuff that you collect in public groups.</p>
<div id="attachment_1499" class="wp-caption aligncenter" style="width: 260px"><a href="https://www.zotero.org/groups/invisible_australians"><img class="size-medium wp-image-1499" title="the stuff.016" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.0161-250x187.jpg" alt="" width="250" height="187" /></a><p class="wp-caption-text">Invisible Australians Zotero group</p></div>
<p>Put those two cool things together and what do you have? Well to me they spell out user generated finding aids &#8212; parallel collection databases created by researchers simply pursuing their own passions.</p>
<p>Linked Open Data greatly increases opportunities for collection description to leak into the wider web. If objects and documents are identified with a unique URL, then anyone can can make and publish statements about them in machine-readable form. These statements can then be aggregated and explored. Initiatives such as the <a href="http://www.openannotation.org/">Open Annotation Collaboration</a> will hasten the development of these shared descriptive and interpretative layers around our cultural collections.</p>
<p>And of course all this descriptive and interpretative work can be harvested back to enhance existing collection databases. We could start doing it now &#8212; though I will spare you today my rant about the possibilities of mining footnotes.</p>
<p>As well as exploring the possibilities of user-generated content, cultural institutions are starting to open up their collection data for re-use. APIs are great (though Linked Open Data is better), and New Zealand is lucky to have an organisation like <a href="http://www.digitalnz.org/">DigitalNZ</a> which just <em>gets it</em>. People can and will make cool things with your stuff.</p>
<p>But again, we don’t have to wait for everything to be delivered in a convenient, machine-readable form. If it’s on the web anybody can scrape, harvest and experiment.</p>
<p>You probably all know about the <a href="http://trove.nla.gov.au/newspaper">National Library of Australia&#8217;s newspaper digitisation project</a> &#8212; it&#8217;s building a magnificent resource. But I wanted to do more than just find articles. I wanted to explore and analyze their content on a large scale. So I built a screen scraper to extract structured data from search results, and then used the scraper to  power a series of tools. I have a <a href="http://wraggelabs.com/emporium/trove-tools/harvester/">harvester</a> that lets you download an entire results set &#8212; hundreds or thousands of articles &#8212; with metadata neatly packaged for further analysis.</p>
<div id="attachment_1500" class="wp-caption aligncenter" style="width: 260px"><a href="http://wraggelabs.com/emporium/trove-tools/harvester/"><img class="size-medium wp-image-1500" title="the stuff.017" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.017-250x187.jpg" alt="" width="250" height="187" /></a><p class="wp-caption-text">Harvester</p></div>
<p>Or what about a script that graphs the occurrence of search terms over time, and allows you to ask questions like <a href="http://discontents.com.au/shed/experiments/when-did-the-great-war-become-the-first-world-war">When did the Great War become the First World War?</a>.</p>
<div id="attachment_1501" class="wp-caption aligncenter" style="width: 260px"><a href="http://discontents.com.au/shed/experiments/when-did-the-great-war-become-the-first-world-war"><img class="size-medium wp-image-1501" title="the stuff.018" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.018-250x187.jpg" alt="" width="250" height="187" /></a><p class="wp-caption-text">When did the Great War become the First World War?</p></div>
<p>In the end I got a bit carried away and built my own <a href="http://wraggelabs.appspot.com/api/newspapers/">public API</a> to the Trove newspaper database.</p>
<div id="attachment_1502" class="wp-caption aligncenter" style="width: 260px"><a href="http://wraggelabs.appspot.com/api/newspapers/"><img class="size-medium wp-image-1502" title="the stuff.019" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.019-250x187.jpg" alt="" width="250" height="187" /></a><p class="wp-caption-text">Unofficial Trove newspapers API</p></div>
<p>I think it’s important to note that the tools I developed were guided by the types of questions I wanted to ask. While we should welcome APIs and celebrate their possibilities, we should also remain critical. APIs are interfaces, they too embed power relations. Every API has an argument. What questions do they let us ask? What questions do they prevent us from asking?</p>
<p>Even as we move from the age of lumbering, slow-witted data silos into the rapidly-evolving realms of Linked Open Data, we have to constantly question the models we make of the world. Ontologies and vocabularies are culturally determined and historically specific. Yes, they too are interfaces, complete with their own distributions of power and authority. But we can revisit and change them. And we can relate our new models to our old models, capturing complex, long-term shifts in the way we think about the world. That’s incredibly exciting.</p>
<p>All of this hacking, harvesting, questioning, enriching and meaning-making makes me think about the possibilities of grassroots leadership. Online technologies enable people to take cultural institutions into unexpected realms. They can build their own interfaces, ask their own questions, determine their own needs &#8212; they can point the way instead of simply waiting to be served.</p>
<p>You might wonder what the National Library of Australia thinks of my various scrapers and harvesters. I can’t speak for them, but I can say that they’ve <a href="http://www.nla.gov.au/harold-white-fellowships/list-of-harold-white-fellows">awarded me a fellowship</a> to explore further the possibilities of text-mining in their newspaper database.</p>
<p>The idea of grassroots leadership brings me back to the title of this talk &#8212; ‘It’s all about the stuff’. It seems to me that we tend to model the interactions between cultural institutions and the public as transactions. The public are ‘clients’, ‘patrons’, ‘users’ or ‘visitors’. But the sorts of things I’ve been talking about today give us a chance to put the collections themselves squarely at the centre of our thoughts and actions. Instead of concentrating on the relationship between the institution and the public, we can can focus on the relationship we both have with the collections.</p>
<p>It’s all about the stuff.</p>
<p>It’s all about the respect and responsibility we both have for our collections.</p>
<p><a href="http://invisibleaustralians.org/faces/"><img class="aligncenter size-medium wp-image-1481" title="the stuff.002" src="http://discontents.com.au/wp-content/uploads/2011/12/the-stuff.002-250x187.jpg" alt="" width="250" height="187" /></a></p>
<p>It’s all about the respect and responsibility we both have for people like this.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/words/conference-papers/it%e2%80%99s-all-about-the-stuff-collections-interfaces-power-and-people/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Extracting editorials #1</title>
		<link>http://discontents.com.au/shoebox/digital-humanities/extracting-editorials-1</link>
		<comments>http://discontents.com.au/shoebox/digital-humanities/extracting-editorials-1#comments</comments>
		<pubDate>Mon, 21 Nov 2011 13:24:29 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[1913editorials]]></category>
		<category><![CDATA[newspapers]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[Trove]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1462</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Extracting+editorials+%231&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.source=discontents&amp;rft.date=2011-11-21&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shoebox/digital-humanities/extracting-editorials-1&amp;rft.language=English"></span>
In their chapter in Writing History in the Digital Age, Trevor Owens and Fred Gibbs encourage historians to write about the ways they work with data &#8212; to document their methods, their working assumptions, their dead ends and their discoveries. It&#8217;s an important argument and one that makes me wonder again about forms of publication [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Extracting+editorials+%231&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.source=discontents&amp;rft.date=2011-11-21&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shoebox/digital-humanities/extracting-editorials-1&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1462"><!-- &nbsp; --></abbr>
<p><a href="http://nla.gov.au/nla.news-article15387373"><img class="alignright size-full wp-image-1468" title="smh_editorial" src="http://discontents.com.au/wp-content/uploads/2011/11/smh_editorial.png" alt="" width="249" height="328" /></a></p>
<p>In <a href="http://writinghistory.trincoll.edu/data/hermeneutics-of-data-and-historical-writing-gibbs-owens/">their chapter</a> in <em>Writing History in the Digital Age</em>, Trevor Owens and Fred Gibbs encourage historians to write about the ways they work with data &#8212; to document their methods, their working assumptions, their dead ends and their discoveries. It&#8217;s an important argument and one that makes me wonder again <a title="Every story has a beginning" href="http://discontents.com.au/shoebox/every-story-has-a-beginning">about forms of publication</a> that might integrate narrative, methods and sources.</p>
<p>In the meantime though we have blogs. My problem is that I&#8217;m easily bored so by the time I get to the end of a project or experiment I&#8217;m already thinking about the next one. Going back and trying to write things up seems a bit of a chore (which is why I&#8217;m always way behind in my blog writing). Also leaving the writing to the end means that I tend to take shortcuts &#8212; leaving out some of the &#8216;boring&#8217; procedural stuff or the &#8216;stupid&#8217; ideas that just didn&#8217;t work.</p>
<p>But Trevor and Fred&#8217;s chapter has made me think I should be a bit more diligent, so as I start a new series of text-mining experiments I&#8217;ve decided to write things up as I&#8217;m doing them. So be warned, this could get messy&#8230;</p>
<p>So what do I want to do? You might not be surprised to learn that it&#8217;s another Trove newspaper database experiment. I want to see if I can harvest newspaper editorials over a certain period and then analyse these to build up a picture of what issues, events or ideas were perceived as important. As I&#8217;m currently looking at ways of harvesting digital sources relating to 1913 for an exhibition being developed by the <a href="http://nma.gov.au">National Museum of Australia</a>, I&#8217;m going to start by focusing on 1913.</p>
<p>But editorials are opinion pieces, wouldn&#8217;t it be better to harvest &#8216;news&#8217; articles?</p>
<p>First of all, I&#8217;m thinking that editorials will be fairly easy to identify and extract &#8212; there&#8217;s no real way in Trove to separate out current news from other sorts of articles. Secondly, I&#8217;m assuming that the issues that make it into editorials have some importance attached to them. Attached by whom, you may well ask &#8212; whose voice is being represented in the editorial? This is an important question and I&#8217;m thinking that it could be explored in interesting ways by harvesting editorials from a range of papers and regions. Thirdly, finding the editorials might actually help me find the major news articles, simply because in this period the main news stories were often on the page after the editorials.</p>
<p>So how do I find them? Looking at the <em>Sydney Morning Herald</em> for 1913, you can see that the editorials follow a regular pattern:</p>
<ul>
<li>the first editorial is always headed with the name of the paper and the date, followed by the title</li>
<li>subsequent editorials that follow have a title but no subtitle (most other types of articles have a subtitle)</li>
<li>editorials are published on an even-numbered page, usually about half way through the newspaper</li>
</ul>
<p>To check this I conducted a search for<a href="http://trove.nla.gov.au/newspaper/result?l-textSearchScope=headings+only%7Cscope%3Aheadings&amp;l-title=The+Sydney+Morning+Herald...%7Ctitleid%3A35&amp;l-word=*ignore*%7C*ignore*&amp;fromyyyy=1913&amp;toyyyy=1913&amp;sortby=dateAsc&amp;q=fulltext%3A%22The+Sydney+Morning+Herald%22&amp;l-category=Article%7Ccategory%3AArticle&amp;s=0"> articles including &#8216;The Sydney Morning Herald&#8217; in their title</a>. The search returns 335 results. Of course we&#8217;d expect there to be 312 (6 x 52), but it looks like there&#8217;s quite a few false positives and some days missing altogether (presumably due to OCR errors). You can see there&#8217;s a fair bit of consistency in the pages that editorials appear on, but it doesn&#8217;t quite seem consistent enough to rely on. So I&#8217;ve decided that as a first step I&#8217;ll <a title="Mining the treasures of Trove (part 1)" href="http://discontents.com.au/shed/mining-the-treasures-of-trove-part-1">harvest</a> all the articles from this query. I&#8217;ll then do some manual cleaning to remove the articles that aren&#8217;t editorials and try and identify and retrieve the missing days.</p>
<p>Remember, this won&#8217;t give me all the editorials, only the first editorial from each day. To get all the editorials, I&#8217;ll have to write a new script that will take this first result set, retrieve all the articles from the editorial page and then try to work out which of the articles are editorials &#8212; they should be the ones that come after the first editorial and have no subtitle. Or that&#8217;s the theory.</p>
<p>I&#8217;ve harvested the query. You can <a href="https://docs.google.com/spreadsheet/ccc?key=0AoLhQYoG1_hmdDE3MU1PNkc1YU9FOGNHajJrWjNwYWc">view the spreadsheet</a> on Google Docs if you feel so moved.</p>
<p>[After I wrote the sentence above I checked the CSV file properly and realised I'd stuffed up. There's a bit of a bug in my harvester that means if the query string you use includes a start value, the harvester wil retrieve the same page of results over and over again... I really need to fix that. <img src='http://discontents.com.au/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  I'm now running it again. You wanted warts and all, right?]</p>
<p>[After I wrote the paragraph above I checked my new harvest and realised I'd stuffed up again. There were only half as many results as there should have been! So I poked around and realised some recent changes I'd made to the harvest script meant I was only getting odd numbered results (I was incrementing the row value twice). A lesson in what happens when you do this stuff late at night... Trying again. ]</p>
<div>I&#8217;m not sure when I&#8217;ll have time to do the cleaning. But hey folks this is what research is like for people like me who have to try and fit it in around the edges of their lives. You can expect posts to come in sudden bursts and then dry up altogether for long periods as other priorities intrude.</div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shoebox/digital-humanities/extracting-editorials-1/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>An infrastructure wishlist</title>
		<link>http://discontents.com.au/shoebox/digital-humanities/an-infrastructure-wishlist</link>
		<comments>http://discontents.com.au/shoebox/digital-humanities/an-infrastructure-wishlist#comments</comments>
		<pubDate>Tue, 08 Nov 2011 11:46:06 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[digital humanities]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1454</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=An+infrastructure+wishlist&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.source=discontents&amp;rft.date=2011-11-08&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shoebox/digital-humanities/an-infrastructure-wishlist&amp;rft.language=English"></span>
I have problems with the idea of infrastructure, particularly that of the e-research variety. It seems like we always end up talking about huge amounts of money and multi-institutional partnerships. It just doesn&#8217;t seem like a great model for innovation. As I&#8217;ve previously argued, I&#8217;d like to see something more like the funding schemes offered [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=An+infrastructure+wishlist&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.source=discontents&amp;rft.date=2011-11-08&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shoebox/digital-humanities/an-infrastructure-wishlist&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1454"><!-- &nbsp; --></abbr>
<p>I have problems with the idea of infrastructure, particularly that of the e-research variety. It seems like we always end up talking about huge amounts of money and multi-institutional partnerships. It just doesn&#8217;t seem like a great model for innovation. As <a title="Hacking a research project" href="http://discontents.com.au/shed/experiments/hacking-a-research-project">I&#8217;ve previously argued</a>, I&#8217;d like to see something more like the <a href="http://www.neh.gov/ODH/GrantOpportunities/tabid/57/Default.aspx">funding schemes</a> offered by the NEH Office for Digital Humanities. Encourage people with ideas, don&#8217;t just reward the good networkers. Build tools and apis, not portals and platforms.</p>
<p>Of course I&#8217;d still like to see the digital humanities well represented in the list of Virtual Laboratories and eResearch Tools currently under consideration by <a href="http://nectar.org.au/">NeCTAR</a>. It&#8217;s time the digital research needs of the humanities were properly recognised. There are lots of possibilities, most of which we can&#8217;t yet envisage, but as I was asked what I would like to see as part of a Virtual Laboratory I had a go at setting down a few brief ideas. For what it&#8217;s worth, here&#8217;s my e-research infrastructure wishlist&#8230;</p>
<h3>Grappling with abundance</h3>
<p>Traditional historical research is often based on a presumed scarcity of resources &#8212; the skill is in tracking down the sources. But large digital collections, like the Trove newspapers database, change this &#8212; you now have to make sense of the sheer volume of material. Digital history, through techniques such as text-mining and visualisation, offer a way of using these new riches effectively. We need to ensure that investments in digitisation are accompanied by evolutions in scholarly practice.</p>
<h3>Understanding what&#8217;s <em>not</em> online</h3>
<p>At the same time, it must be recognised that large quantities of our cultural heritage are not available in digital form. For example, only about 10% of the holdings of the National Archives of Australia are described in their collection database, and only a small proportion of these are digitised. Easy online access could foster a certain circularity in historical research where only &#8216;known&#8217; resources are consulted. We need to develop tools and visualisations that reveal the valleys as well as the mountaintops &#8212; identifying the holes in our research fabric.</p>
<h3>Critical engagement</h3>
<p>More generally, we need to foster critical engagement with the tools and assumptions of digital research. Federated searching sounds great, but as scholars we need to expose the assumptions implicit in any such tool. What is being federated, from where, how is relevance being determined etc? Humanities e-research infrastructure should have built-in levels of reflexivity that enable scholars to understand the limits and assumptions of their digital research. Every algorithm contains an argument.</p>
<h3>Documenting change</h3>
<p>The resources we build are arguments with are subject to change. The Trove newspapers database, for example, is constantly adding new titles and articles, while users are improving the text transcriptions. Any analysis based on the holdings of this database needs to explicitly recognise this. At the very least the tools we have need to be able to generate time-stamped citations. It would be even better if we could capture a snapshot of the data to accompany our analyses. Perhaps there are possibilities for using something like the Memento project to ensure that the temporal context of humanities research is adequately documented.</p>
<h3>Show your working out</h3>
<p>Scholarly publication in history, and the humanities generally, tends to present a finished product. But as we delve further into digital research the research processes themselves will be equally important both for fostering critical engagement with tools and methods and for enabling others to reproduce or extend the research. We need easy ways for researchers to expose their working out (subject to whatever access controls they think appropriate). It should be possible to save a series of steps &#8211; search, analysis, visualisation etc as modules for sharing and re-use.</p>
<h3>Follow your nose</h3>
<p>Search needs to be complemented by rich, exploratory environments that encourage browsing, enable you to follow relationships, and foster serendipitous discovery. The problem with many collections is knowing enough about what&#8217;s in them to frame a useful search. Browsing, though a variety of interfaces &#8212; people, maps, events, record types, physical proximity &#8212; overcomes this problem. As more cultural institutions make use of Linked Open Data and shared identifiers &#8212; such as People Australia, Geonames or the Powerhouse Object Thesaurus &#8212; the possibilities for navigating this rich contextual space will increase.</p>
<h3>Citation</h3>
<p>We need to develop better models for embedding rich citations within scholarly research &#8212; citations that describe not only the resource in structured, machine-readable forms, but also relevant relationships. This will link research directly to resources, making scholarly outputs a means of resource discovery, and enabling resource databases to re-use the scholarly research to enhance their own descriptions and finding aids.</p>
<h3>Constructing narratives</h3>
<p>Moving beyond simple citation, we need better ways of exposing the structures of people, events, places and things that are referenced in our narratives. Linked Open Data provides a model, but we need tools to make it simple and examples to make it obvious.</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shoebox/digital-humanities/an-infrastructure-wishlist/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Every story has a beginning</title>
		<link>http://discontents.com.au/shoebox/every-story-has-a-beginning</link>
		<comments>http://discontents.com.au/shoebox/every-story-has-a-beginning#comments</comments>
		<pubDate>Tue, 04 Oct 2011 02:02:03 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[shoebox]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[archives]]></category>
		<category><![CDATA[history]]></category>
		<category><![CDATA[history wall]]></category>
		<category><![CDATA[invisibleaustralians]]></category>
		<category><![CDATA[linked data]]></category>
		<category><![CDATA[Mapping our Anzacs]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1342</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Every+story+has+a+beginning&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=shoebox&amp;rft.subject=web&amp;rft.source=discontents&amp;rft.date=2011-10-04&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shoebox/every-story-has-a-beginning&amp;rft.language=English"></span>
Entering the web of data [view the presentation...] [view the triples...] Keynote delivered at the annual conference of the Australia and New Zealand Society of Indexers, 14 September 2011. This is me. Today, Wednesday, 14 September 2011, I&#8217;m honoured to be able to join you here in the luxurious surrounds of the Brighton Savoy Hotel [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Every+story+has+a+beginning&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=shoebox&amp;rft.subject=web&amp;rft.source=discontents&amp;rft.date=2011-10-04&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shoebox/every-story-has-a-beginning&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1342"><!-- &nbsp; --></abbr>
<h3>Entering the web of data</h3>
<p><a href="http://wraggelabs.com/shed/presentations/anzsi/">[view the presentation...]</a> <a href="http://wraggelabs.com/shed/presentations/anzsi/rdfa_triples.txt">[view the triples...]</a></p>
<p><em>Keynote delivered at the annual conference of the Australia and New Zealand Society of Indexers, 14 September 2011.</em></p>
<hr />
<p>This is <a href="http://discontents.com.au/about-me" title="about me">me</a>.</p>
<p>Today, Wednesday, 14 September 2011, I&#8217;m honoured to be able to join you here in the luxurious surrounds of the <a href="http://www.brightonsavoy.com.au/">Brighton Savoy Hotel</a> for the &#8216;<a href="http://www.anzsi.org/site/2011Conference.asp">Indexing See Change</a>&#8216; conference. This is an event, a moment in history; we can pinpoint ourselves, this gathering, both in time and in space.</p>
<p>If we do that, if we move outside the moment and position ourselves on a timeline or <a href="http://maps.google.com.au/maps/ms?msid=214642381989548709162.0004ac3b87c9fa486df4a&#038;msa=0&#038;ll=-37.905877,144.995928&#038;spn=0.041717,0.090895">a map</a>, interesting things start to happen. Connections emerge.</p>
<p>Here we are at number 150, The Esplanade, in Brighton. A <a href="http://maps.google.com.au/maps/ms?msid=214642381989548709162.0004ac3b87c9fa486df4a&#038;msa=0&#038;ll=-37.905877,144.995928&#038;spn=0.041717,0.090895">bit over a kilometre away</a> is the stately villa, Kamesburgh. For many years Kamesburgh was also known as the Anzac Hostel &#8212; a refuge for permanently-incapacitated World War One veterans.</p>
<p>The Anzac Hostel opened on 5 July 1919. Here it is <a href="http://cas.awm.gov.au/item/P00158.039">draped in its patriotic finery</a>, from the collections of the Australian War Memorial. According to the caption, the Anzac Hostel was &#8216;a home, not an institute&#8217;.</p>
<p>Also amongst the War Memorial&#8217;s holdings is a <a href="http://cas.awm.gov.au/item/REL27665">wheeled bed</a> that was used at the hostel. This particular bed was apparently occupied by one man, Albert Ward, for forty-three years.</p>
<div id="attachment_1367" class="wp-caption alignright" style="width: 210px"><a href="http://nla.gov.au/nla.news-article11808280"><img src="http://discontents.com.au/wp-content/uploads/2011/10/kelley_death_200.jpeg" alt="" title="kelley_death_200" width="200" height="189" class="size-full wp-image-1367" /></a><p class="wp-caption-text">Death notice for Alexander Kelley. Argus, 29 January 1944.</p></div>
<p>It was probably in a bed just like this that Alexander Dewar Kelley passed away on 27 January 1944. Alexander Kelley was cremated, and his remains interred amongst the roses at what is now called the Springvale Botanical Cemetery. Not far from my own grandparents.</p>
<p>Alexander Kelley spent close to half his life in the Anzac Hostel. Like many young men, he bravely answered his nation&#8217;s call to arms, but returned from war much changed. We can follow Alex&#8217;s war through his service record, <a href="http://mappingouranzacs.naa.gov.au/details-permalink.aspx?barcode_no=7336927">easily-accessible</a> through the website &#8216;<a href="http://mappingouranzacs.naa.gov.au">Mapping Our Anzacs</a>&#8216;.</p>
<p>Alex was a coach painter who enlisted in the AIF in January 1916. Within a year he was in France. In May 1917 he suffered a gunshot wound to the head, but was able to rejoin his unit in August. Less than a month later though, he was wounded again, this time more severely. For Alex the war was over, and he was shipped back to Australia in May 1918.</p>
<p>&#8216;Mapping Our Anzacs&#8217; includes a scrapbook feature through which visitors to the site can attach notes or photographs to a service record. Amongst the the many thousands of postings is <a href="http://our-anzacs.tumblr.com/post/64197860/a-diary-insert-found-inside-alexs-mother-annie">a fragment from a diary</a>, found tucked inside the bible of Alexander Kelley&#8217;s mother. The diary entry reads simply: &#8216;Alex arrived from Front. Wet day. Saw him at &#8220;Caulfield&#8221;.&#8217;</p>
<p>Alex had survived and had returned to his family. This was a day to remember. But there was sadness too, for Alex was not the same young man who had left for the battlefields of Europe. In the diary fragment, &#8216;Caulfield&#8217; is enclosed in inverted commas, indicating perhaps that the reunion took place, not in the suburb, but in the Caulfield rehabilitation hospital. Alexander Kelley was wounded in the face, hands and legs. He was left blind in both eyes and his right leg was amputated. He would live the remainder of his life a little over a kilometre away from here at the Anzac Hostel.</p>
<p>This is just one story. There are over 375,000 World War One service records held by the <a href="http://naa.gov.au">National Archives of Australia</a>. How can we hope to understand a number like that? How can we hope to imagine the war&#8217;s impact on families, on communities?</p>
<p>&#8216;Mapping Our Anzacs&#8217; uses familiar Google maps to display the places of birth and enlistment recorded in many of those service records. But technical limitations make it impossible to display all the places at once. You can, however, take the same data and open it in Google Earth. If you then zoom in on Victoria, you see something like this.</p>
<div id="attachment_1372" class="wp-caption aligncenter" style="width: 260px"><a href="http://discontents.com.au/wp-content/uploads/2011/10/moa_earth.jpeg"><img src="http://discontents.com.au/wp-content/uploads/2011/10/moa_earth-250x189.jpg" alt="" title="moa_earth" width="250" height="189" class="size-medium wp-image-1372" /></a><p class="wp-caption-text">Mapping Our Anzacs data viewed in Google Earth.</p></div>
<p>Each marker represents a place where a service person was born or enlisted. It&#8217;s impossible to read, of course, but that&#8217;s the point. There is so little blank space. As you zoom further, more markers appear, more place names resolve. It&#8217;s simple, but it&#8217;s powerful. They came from everywhere. From the smallest village to the biggest city; nowhere was untouched.</p>
<p>The &#8216;Mapping Our Anzacs&#8217; scrapbook offers another perspective. It&#8217;s possible to extract the images posted to the scrapbook and present them on a 3D wall. Amidst an assortment of memorabilia, there are faces. Not places, or records &#8212; this is a wall of people.</p>
<div id="attachment_1377" class="wp-caption aligncenter" style="width: 260px"><a href="http://discontents.com.au/wp-content/uploads/2011/10/moa_cooliris_wall.jpeg"><img src="http://discontents.com.au/wp-content/uploads/2011/10/moa_cooliris_wall-250x156.jpg" alt="" title="moa_cooliris_wall" width="250" height="156" class="size-medium wp-image-1377" /></a><p class="wp-caption-text">Mapping Our Anzacs Scrapbook photos viewed through CoolIris</p></div>
<p>It&#8217;s worth noting too that like the markers on the maps, these faces link back to the actual service records. So they&#8217;re not just a new way of seeing the collection, they&#8217;re a new way of exploring it.</p>
<p>But the records don&#8217;t stand in isolation, they themselves have a context. A couple of years ago, Mitchell Whitelaw from the University of Canberra, undertook a project called &#8216;<a href="http://visiblearchive.blogspot.com/">The Visible Archive</a>&#8216; to investigate ways of visualising the holdings of the National Archives of Australia. Have you ever wondered what 360km worth of records looks like?</p>
<div id="attachment_1378" class="wp-caption aligncenter" style="width: 260px"><a href="http://discontents.com.au/wp-content/uploads/2011/10/series_browser.jpeg"><img src="http://discontents.com.au/wp-content/uploads/2011/10/series_browser-250x195.jpg" alt="" title="series_browser" width="250" height="195" class="size-medium wp-image-1378" /></a><p class="wp-caption-text">The collections of the NAA visualised by Mitchell&#039;s Series Browser.</p></div>
<p>This represents the holdings of the National Archives. Files within the archives are organised into series, and each square in this image represents a single series &#8212; there are about 60,000 of them. Naturally the size of the square gives an indication of the size of the series itself. It&#8217;s a fascinating and strangely beautiful picture.</p>
<p>It&#8217;s easy enough to pick out the World War One service records &#8212; Series B2455. In the interactive version of Mitchell&#8217;s series browser you can click on a box and display links between series, as well as other series created by the same government agency. Again, it&#8217;s not just a way of seeing the collection, but a means of exploring and interpreting it. As Mitchell says:</p>
<blockquote><p>Visualisation enables us to literally show everything, to display large volumes of data in a way that reveals patterns and communicates context, but also provides access to the fine grain of individual elements. </p></blockquote>
<p>But we can also employ such techniques to ask new kinds of questions. Can you imagine how Alexander Kelley and the other inhabitants of the Anzac Hostel must have felt in 1939? They had lost so much in the Great War, the &#8216;war to end all wars&#8217;, and yet within their own lifetime it was all happening again. More young men were answering the call, more lives were going to be destroyed.</p>
<p>There must have been a dreadful, disheartening moment when Australians realised that the Great War was not an end, but a beginning &#8212; the first in a series of devastating global conflicts. At some point the &#8216;Great War&#8217; became the &#8216;First World War&#8217;, but when?</p>
<p><div id="attachment_1381" class="wp-caption alignright" style="width: 209px"><a href="http://wraggelabs.com/shed/time/the_great_war-2011-08-16.html"><img src="http://discontents.com.au/wp-content/uploads/2011/10/ww1_graph.png" alt="" title="ww1_graph" width="199" height="379" class="size-full wp-image-1381" /></a><p class="wp-caption-text">When did the &#039;Great War&#039; become the &#039;First World War&#039;?</p></div><br />
This is one possible answer. This graph draws its data from the 50 million or so digitised newspaper articles in Trove, the National Library of Australia&#8217;s discovery service. It shows the proportion of newspaper articles that included the phrase &#8216;the great war&#8217; compared to the proportion containing &#8216;the first world war&#8217; (and variations thereof). The lines cross late in 1941. With German victories in Europe and Africa, the opening of the Eastern Front and the Japanese attack on Pearl Harbour, 1941 makes sense.</p>
<p>What is perhaps more intriguing is the dramatic peak in the occurrence of &#8216;the great war&#8217; in 1939. It&#8217;s no surprise that the looming threat of a new conflict would provoke comment and comparisons, but it does make you wonder about the context of those discussions and how they might have changed as the reality of war edged closer.</p>
<p>To start exploring this I&#8217;ve harvested the content of the 6,600 articles from 1939 that included the phrase &#8216;the great war&#8217;. Using an online text analysis service called <a href="http://voyeurtools.org">VoyeurTools</a> I can quickly <a href="http://voyeurtools.org/tool/Cirrus/?corpus=1313568295441.2143&#038;query=&#038;stopList=stop.en.taporware.txt">generate a picture</a> of their contents.</p>
<p>This simple visualisation shows us the relative frequencies of words within the articles. It doesn&#8217;t reveal any great mysteries, but it does suggest some possibilities for further prodding. The prevalence of &#8216;time&#8217; and &#8216;new&#8217;, for example &#8212; might these help us understand the shift in perspective from one war to the next? We can follow this up by <a href="http://voyeurtools.org/tool/DocumentTypeKwicsGrid/?corpus=1313568295441.2143&#038;context=10&#038;type=time&#038;docIdType=d1312914324077.c620677b-dba5-9642-fff2-04759b7e4a97%3Atime">browsing the different contexts</a> in which the words were used.</p>
<p>But what actually is it that we&#8217;re actually searching? We know that Trove includes newspapers from 1803 to 1954, but if we&#8217;re really going to analyse shifting words and ideas it&#8217;s important to have a clear picture of the sources of those words.</p>
<p><a href="http://wraggelabs.com/shed/trove/graphs/summary_states_stacked.html">Something like this</a> perhaps. This graph shows the holdings of the Trove newspaper database on 4 August 2011, organised by state. You can see, for example, that if you&#8217;re searching on a topic between the 1920s and 1940s you&#8217;re probably likely to get more results from Queensland than anywhere else.</p>
<p>So starting from our location here, today, we can make connections across time and space. We can pull back and look at the big picture, or dive in and examine the fabric of a single life. Through the web we can build and explore a rich and complex contextual network.</p>
<hr />
<p>It&#8217;s an exciting time to be a cultural data hacker. We now have a growing range of tools and technologies available for extracting interesting data from a wide variety of sources, both structured and unstructured.</p>
<p>The &#8216;Visible Archive&#8217; project started with well-structured data, courtesy of Peter Scott, the developer of the Series System &#8212; the descriptive framework used by many Australian archives. But we&#8217;re rarely so lucky.</p>
<p>Even when the data starts off in nicely-organised fields in a database there&#8217;s no guarantee that that&#8217;s how it&#8217;s going to be delivered to our web browser. In order to extract the data from my <a href="http://wraggelabs.com/shed/trove/graphs/index.html">Trove graphs</a>, for example, I had to write a little program called a &#8216;<a href="http://wraggelabs.com/emporium/trove-tools/newspaper-search-summariser/">screen scraper</a>&#8216; to identify and save the important metadata elements from the raw web page itself.</p>
<p>Where there are no subject keywords we can infer them using techniques such as topic modelling. Where there are no access points we can identify people, organisations, places and events using special tools developed for named entity extraction. Where there are no common identifiers across datasets we can employ record linkage technologies to find possible connections.</p>
<p>We can count words, we can identify parts of speech, we can formulate a measure of the similarity of any two pieces of text. Once we have some useful data we can manipulate and enrich it. Place names can be geolocated &#8212; you simply send your place name off to a web service and get back its latitude and longitude.</p>
<p>Increasingly these sorts of tools are becoming accessible to anyone. For historians they offer a means of wrestling with rapidly-growing bulk of source material that is becoming available in digital form. How do you make use of 5 million digitised books, 50 million newspaper articles or the complete archive of every public message ever sent on Twitter?</p>
<p>The digital historian Dan Cohen <a href="http://www.dlib.org/dlib/march06/cohen/03cohen.html">has noted</a>:</p>
<blockquote><p>These computational methods which allow us to find patterns, determine relationships, categorize documents, and extract information from massive corpuses, will form the basis for new tools for research in the humanities and other disciplines in the coming decade.
</p></blockquote>
<p>Dan is involved in a number of interesting projects investigating the possibilities of these techniques &#8212; often grouped together under the heading &#8216;text mining&#8217;. One of these projects, &#8216;<a href="http://criminalintent.org">With Criminal Intent</a>&#8216;, is looking to see what patterns can be drawn out of the digitised proceedings of criminal trials held at the Old Bailey from 1645 to 1913. That&#8217;s 197,745 trials, in case you were wondering.</p>
<p>Here&#8217;s one of their visualisations showing how the length of trials varies over time. Much to the surprise of the research team, this graph suggests a dramatic shift in legal practice around 1825 &#8212; defendants started pleading guilty!</p>
<div id="attachment_1408" class="wp-caption aligncenter" style="width: 260px"><a href="http://discontents.com.au/wp-content/uploads/2011/10/criminal_intent.png"><img src="http://discontents.com.au/wp-content/uploads/2011/10/criminal_intent-250x171.png" alt="" title="criminal_intent" width="250" height="171" class="size-medium wp-image-1408" /></a><p class="wp-caption-text">A visualisation by the With Criminal Intent  project showing changing trial lengths.</p></div>
<p>Rather than falter under the growing weight of digital sources, these technologies can actually thrive. The more raw material available, the more chance there is to observe and track new patterns. As digitisation continues apace will we ever reach the point when history can simply be read from a graph?</p>
<p>There are some researchers at Harvard who seem to think that&#8217;s where we&#8217;re heading. Borrowing liberally from the store of scientific metaphors they have staked out the new field of &#8216;<a href="http://www.culturomics.org/">culturomics</a>&#8216;. By mining massive digital resources, like <a href="http://ngrams.googlelabs.com/graph?content=the+Great+War%2Cthe+First+World+War&#038;year_start=1900&#038;year_end=1954&#038;corpus=0&#038;smoothing=3">Google&#8217;s scanned books</a>, they hope to map the &#8216;cultural genome&#8217; that would enable us to follow the evolution of language and culture.</p>
<p>But there&#8217;s something quite barren in this ambition. I prefer the vision of digital humanist Stephen Ramsay, who <a href="http://lenz.unl.edu/papers/2011/06/10/prison-art.html">commented</a> in regard to the &#8216;With Criminal Intent&#8217; project:</p>
<blockquote><p>The Old Bailey, like the Naked City, has eight million stories. Accessing those stories involves understanding trial length, numbers of instances of poisoning, and rates of bigamy. But being stories, they find their more salient expression in the weightier motifs of the human condition: justice, revenge, dishonor, loss, trial. This is what the humanities are about. This is the only reason for an historian to fire up Mathematica or for a student trained in French literature to get into Java.</p></blockquote>
<p>Ultimately it&#8217;s the stories that nourish, anger, inspire and depress us. The closely-packed map of places recorded in World War I service records is so powerful because we know that under each marker are men, women, families, communities &#8212; each with their own story. These new technologies offer new perspectives, they raise new questions, and they challenge us with new contexts to explore and understand. But there is still space for stories and perhaps we can use them to give our stories new life and depth.</p>
<hr />
<p>This is <a href="http://mappingouranzacs.naa.gov.au/details-permalink.aspx?barcode_no=3029140">another World War One service record</a>. It belongs to Charlie Allen. Charlie enlisted three times in the AIF and was discharged on medical grounds each time. It seems he had a problem with his ankle.</p>
<p>Charlie&#8217;s service record notes a tattoo, proclaiming his love for &#8216;Maud Gordon&#8217;. He married Maud in Sydney in 1917 and had two daughters soon after.</p>
<p>Charlie survived the war without further injury, but was not so lucky in peace. On 11 March 1938, Charlie was <a href="http://nla.gov.au/nla.news-article17447524">crushed to death</a> between two railway cars. The accident happened at the Bunnerong Power Station, only a short distance from his home in Matraville. He was <a href="http://maps.google.com/maps/ms?msid=214642381989548709162.0004ac3b87c9fa486df4a&#038;msa=0&#038;ll=-33.969773,151.228008&#038;spn=0.02196,0.045447">buried nearby</a> in the Botany Cemetery.</p>
<p>We also know quite a bit about Charlie&#8217;s early life. Why? Because Charlie&#8217;s father was Chinese and he was therefore categorised as a &#8216;half-caste&#8217;, as someone who was not white, and therefore fell under the restrictions imposed by the White Australia Policy.</p>
<p>Charlie was born in Sydney in 1896. His mother was Frances Allen (sometime sweet shop owner and brothel keeper), his father Charlie Gum (a buyer for Wing On company). Charlie was raised by his mother, but in 1909, at the age of 13, he was taken to China by his father.</p>
<div id="attachment_1412" class="wp-caption aligncenter" style="width: 260px"><a href="http://www.aa.gov.au/cgi-bin/Search?O=I&amp;Number=7461068"><img src="http://discontents.com.au/wp-content/uploads/2011/10/charles_allen_cedt_1909_front-250x389.jpg" alt="" title="charles_allen_cedt_1909_front" width="250" height="389" class="size-medium wp-image-1412" /></a><p class="wp-caption-text">NAA: ST84/1, 1909/22/41-50</p></div>
<p>This certificate granted Charlie an exemption to the Dictation Test. Without it, he may not have been allowed back into the country.</p>
<p>Every time one of many thousands of non-Europeans resident in Australia sought to travel overseas and return home again they needed one of these certificates.</p>
<p>Charlie&#8217;s father returned to Sydney, leaving him in China. He lived with relatives in the town of Shekki (inland from Hong Kong). Charlie was naturally homesick, but had no means of getting back to Australia. He wrote to his mother in 1910:</p>
<blockquote><p>Do try and bring me home every minute I think of you and long for a piece of bread and butter this tucker is not doing me well.</p></blockquote>
<p>His mother wrote to the Prime Minister Billy Hughes in an attempt to enlist government help but to no avail. Charlie finally returned to Australia in 1915.</p>
<p>Despite this experience, Charlie visited China again in 1922 for 7 months. Once again carrying papers to grant him re-entry to the country of his birth.</p>
<p>These fragments of Charlie&#8217;s life have been assembled by my partner, <a href="http://chineseaustralia.org">Kate Bagnall</a>, a historian of Chinese-Australia. They are remarkable, and yet not so, because there are many thousands of stories like Charlie&#8217;s contained within the voluminous records generated by the administration of the White Australia Policy.</p>
<p>We&#8217;re all of course familiar with the general outlines of the White Australia Policy, and the way it underpinned conceptions of Australia as a nation in the first half of the 20th century.</p>
<p>But what we sometimes forget is that it was also a massive bureaucratic exercise.</p>
<p>Forms and certificates were printed, issued, used and filed. Regulations were modified, guidelines were distributed and administering officers were managed and advised. Individual cases were reviewed, policy was changed and new forms and certificates were printed, issued, used and filed&#8230;</p>
<p>Much of this system is now preserved in the National Archives.</p>
<p>You can get a idea of the range of material available from <a href="http://www.naa.gov.au/collection/publications/papers-and-podcasts/immigration/white-australia.aspx">a case study</a> Kate has prepared focusing on the efforts of Poon Gooey, a successful businessman in Horsham, to keep his wife and family in Australia.</p>
<p>If we look again at Charlie&#8217;s certificate from 1909 we can see that it contains a lot of interesting structured data:</p>
<ul>
<li>name</li>
<li>place of birth</li>
<li>age</li>
<li>height</li>
<li>destination</li>
<li>date of departure</li>
<li>name of ship</li>
</ul>
<p>We estimate that there are probably about 50,000 of these forms remaining in the Archives, and then there&#8217;s case files and a variety of other government documents.</p>
<p>Wouldn&#8217;t it be great if we could extract this structured data. If we could piece together the slivers of identity that remain within the Archives and give people back their lives.</p>
<p>This is the dream of <a href="http://invisibleaustralians.org">Invisible Australians</a>, a project Kate and I are trying to turn into a reality. Our aim is to build systems that will enable this data to be extracted, aggregated, shared and connected &#8212; whether to a family tree, a cemetery record, or another document in another archive.</p>
<p>Imagine being able to navigate the network of lives, families and relationships. To follow their journeys, to share their tragedies, to celebrate their small victories against a repressive system.</p>
<p>Imagine being able to watch them age.</p>
<hr />
<p>We tend to assume that new technologies require us to change, to adapt. But sometimes they can take advantage of our strengths. Mitchell Whitelaw is interested in finding out what happens when you take large cultural datasets and try to &#8216;show everything&#8217;. Such an approach, he suggests, takes advantage of the raw processing power of computers, while giving us space to do what we&#8217;re good at &#8212; finding patterns, making connections, crafting meanings.</p>
<p>The <a href="http://historywall.nma.gov.au/">History Wall</a> tries to create a similar sort of space. The History Wall brings together material from a range of different sources &#8212; newspaper articles from Trove, biographies from the Australian Dictionary of Biography, records from a database of NSW convicts, population statistics, collection items from the National Museum of Australia &#8212; you can pretty much plug anything in as long as it has a date attached to it.</p>
<div id="attachment_1415" class="wp-caption aligncenter" style="width: 260px"><a href="http://historywall.nma.gov.au/"><img src="http://discontents.com.au/wp-content/uploads/2011/10/history_wall-250x210.jpg" alt="" title="history_wall" width="250" height="210" class="size-medium wp-image-1415" /></a><p class="wp-caption-text">Irish History Wall</p></div>
<p>For a particular year, the Wall retrieves a random sample from the available sources, jumbles everything up and then throws it onto the screen. As a result, no two views of the Wall are ever quite the same. This is not a traditional exhibition. There is no curator controlling the content or designing the structure. It&#8217;s ephemeral, it&#8217;s serendipitous &#8212; instead of relying on an authorial voice to smooth over the gaps and transitions, it leaves open the cracks and allows new contexts to seep in and around each item.</p>
<p>As the pioneering digital historian Edward Ayers <a href="http://www.vcdh.virginia.edu/Ayers.OAH.html">noted</a>:</p>
<blockquote><p>even isolated and inert pieces of evidence &#8212; a list, a letter, a map, a picture &#8212; can assume new and unimagined meanings when placed in juxtaposition with other fragments. </p></blockquote>
<p>This is not an absence of narrative, but an opportunity for narration. Edward Ayers suggests that we&#8217;re actually quite comfortable filling in blanks and untwisting timelines:</p>
<blockquote><p>Humans, presented with pieces of information about people, put things into the form of a story. They need not be simple stories, for we know how to deal with unexplained lapses of time, flashbacks, and overlapping narratives. We know how to imagine, infer, things happening at the same time in different places. Film and television train all of us at early ages to weave strands of narrative out of intentional (if carefully constructed) confusion and to take pleasure in that weaving. </p></blockquote>
<p>And so I can show you a death notice, or a certificate and you will take those fragments, those isolated data points and you will construct a story &#8212; you will see the person behind them, you will imagine their life. It&#8217;s what we do. We&#8217;re good at it.</p>
<p>Computers on the other hand will just see data.</p>
<p>In her ode in praise of humanities data, digital humanist Amanda French <a href="http://www.scribd.com/doc/50066437/In-Praise-of-Humanities-Data">wonders</a> whether we always need to crunch our data into abstract, pliable forms:</p>
<blockquote><p>What I wonder is whether instead we can begin with the data, or with a datum, and simply watch for what it may tell us, even if what it tells us is simply a story. </p></blockquote>
<p>Yes we can. And we should teach computers how to do it as well. Not because we want them to take over. Not because they can necessarily do it faster or better. But because they can help us share, preserve and connect those stories.</p>
<p>Let&#8217;s think again about the array of documents that Kate has assembled to piece together the story of Charles Allen. How can you share this sort of material? Typically you&#8217;d &#8216;write it up&#8217;. You&#8217;d capture the story behind the data and commit it to words. The documents would then become evidence &#8212; points of connection between your text and the historical record.</p>
<p>So in order to share the meanings of these documents we remove them from the context of the person&#8217;s life and marshal them as allies to proclaim the authenticity of our rendering. Wouldn&#8217;t it be better if we could tell the story, but maintain within our texts the direct connections between sources and subject?</p>
<p>What we need is a data framework that sits beneath the text, identifying people, dates and places, and defining relationships between them and our documentary sources. A framework that computers could understand and interpret, so that if they saw something they knew was a placename they could head off and look for other people associated with that place. Instead of just presenting our research we&#8217;d be creating a whole series of points of connection, discovery and aggregation.</p>
<p>Sounds a bit far-fetched? Well it&#8217;s not. We have it already &#8212; it&#8217;s called the Semantic Web.</p>
<p>The Semantic Web exposes the structures that are implicit in our web pages and our texts in ways that computers can understand. The Linked Data movement takes the basic ideas of the Semantic Web and turns them into a collaborative activity. You share vocabularies, so that other people (and computers) know when you&#8217;re talking about the same sorts of things. You share identifiers, so that other people (and computers) know that you&#8217;re talking about a specific person, place, object or whatever.</p>
<p>Linked Data is Storytelling 101 for computers. It doesn&#8217;t have the full richness, complexity and nuance that we invest in our narratives, but it does at least help computers to fit all the bits together in meaningful ways. And if we talk nice to them, then they can apply their newly-acquired interpretative skills to the things that they&#8217;re already good at &#8212; like searching, aggregating, or generating the sorts of big pictures that enable us to explore the contexts of our stories.</p>
<p>This is why we&#8217;ve always imagined Invisible Australians to be something more than an online database. We want to provide points of connection that other people can build into their own stories. But to do that we have to pay attention to things like vocabulary management and authority control, we have to construct web addresses that are not going to break every time we upgrade our software. We have to think about the sorts of things we&#8217;re talking about &#8212; not just people, but government agencies, legislation, certificates, and correspondence. How do we describe these entities and what sorts of relationships do they have?</p>
<p>And of course we need to expose all these structures so that we can say, these things are people, these are events, these are places and these are documents.</p>
<p>Or perhaps, to introduce Alexander Kelley.</p>
<p>Or remember Charles Allen.</p>
<hr />
<p>You might be wondering why we don&#8217;t just leave it all to the computers themselves. Didn&#8217;t I just talk about all the exciting new tools and techniques that enable us to analyse the structures of texts? Perhaps we should just wait for the Culturomics guys to solve all the problems.</p>
<p>But who defines the problems?</p>
<p>Our postmodern sensibilities encourage a suspicion of neutrality. Labels like &#8216;the new museology&#8217; or Archives 2.0 reflect an awareness that the way we describe and arrange our collections is itself culturally-determined. It&#8217;s not just a matter of what our descriptive systems show, but what they hide.</p>
<p>Tim Hitchcock, another member of the &#8216;With Criminal Intent&#8217; team, has described how online technologies can change the way we access archives. Instead of being forced to navigate the hierarchical structures that archives impose on records, which in turn tend to reflect the workings of the institutions that created the records, we can directly find the people whose lives were regulated, influenced, shaped or controlled by the policies of those institutions.</p>
<p>Instead of merely hearing &#8216;the institutional voice&#8230; in all its stentorian splendour&#8217;, he says, we can listen in to &#8216;the quieter tones uttered by the individual&#8217;.</p>
<p>This reminds us that search boxes, along with other digital tools, themselves embody arguments. There are assumptions built into their code about what is relevant, what is significant, what is necessary.</p>
<p>We can build our own tools of course, and we can critique other people&#8217;s algorithms. But what if we just want to collect and share stories?</p>
<p>Linked Data gives us a way to present an alternative to Google&#8217;s version of the world. We can argue back against the search engines, defining our own criteria for relevance, and building our own discovery networks.</p>
<p>Changing the way we access resources changes the sorts of stories we can tell. Tim Hitchcock asks:</p>
<blockquote><p>What happens when institutions and archives are &#8216;decentred&#8217; in favour of the individual? What changes when we examine the world through the collected fragments of knowledge that we can recover about a single person, reorganised as a biographical narrative, rather than as part of an archival system? </p></blockquote>
<p>Perhaps the invisible become visible.<br />

<div xmlns:vivo="http://vivoweb.org/ontology/core#"
     	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:dcterms="http://purl.org/dc/terms/"
	xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
	xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
	xmlns:bibo="http://purl.org/ontology/bibo/" 
	xmlns:foaf="http://xmlns.com/foaf/spec/"
	xmlns:bio="http://purl.org/vocab/bio/0.1/"
	xmlns:dbp="http://dbpedia.org/property/"
	xmlns:vcard="http://www.w3.org/2006/vcard/ns#"
        xmlns:gr="http://purl.org/goodrelations/v1#"
        xmlns:locah="http://data.archiveshub.ac.uk/def/">
    <div about="http://discontents.com.au/about-me#me" typeof="foaf:Person">
        <div property="foaf:givenName" content="Tim"></div>
        <div property="foaf:familyName" content="Sherratt"></div>
        <div rel="foaf:publications" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
        <div rel="foaf:publications" resource="#great_war_article"></div>
        <div rel="foaf:currentProject" resource="#invisible_austra;ians"></div>
    </div>
    <div about="http://discontents.com.au/shoebox/every-story-has-a-beginning" typeof="vivo:Presentation">
        <div property="dc:title" content="Every story has a beginning: Entering the web of data"></div>
        <div rel="dc:creator" resource="http://discontents.com.au/about-me#me"></div>
        <div property="dc:date" content="2011-09-14"></div>
        <div rel="bibo:presentedAt" resource="#conference"></div>
        <div rel="dcterms:hasFormat" resource="http://wraggelabs.com/shed/anzsi"></div>
        <div rel="dcterms:references" resource="#kamesburgh_photo"></div>
        <div rel="dcterms:references" resource="#amanda_quote"></div>
        <div rel="dcterms:references" resource="#dan_quote"></div>
        <div rel="dcterms:references" resource="#edward_quote_1"></div>
        <div rel="dcterms:references" resource="#edward_quote_2"></div>
        <div rel="dcterms:references" resource="#history_wall"></div>
        <div rel="dcterms:references" resource="#kate_presentation"></div>
        <div rel="dcterms:references" resource="#ngrams"></div>
        <div rel="dcterms:references" resource="#steve_quote"></div>
        <div rel="dcterms:references" resource="#tim_quote"></div>
        <div rel="foaf:topic" resource="#invisible_australians"></div>
        <div rel="dcterms:references" resource="#kelley_moa"></div>
        <div rel="dcterms:references" resource="#kelley_ww1_record"></div>
        <div rel="dcterms:references" resource="#trove_11808280"></div>
        <div rel="dcterms:references" resource="#trove_17447524"></div>
        <div rel="dcterms:references" resource="#trove_by_state"></div>
        <div rel="foaf:topic" resource="#visible_archive"></div>
        <div rel="dcterms:references" resource="#allen_cedt"></div>
        <div rel="dcterms:references" resource="#allen_file"></div>
        <div rel="foaf:topic" resource="#criminal_intent"></div>
        <div rel="dcterms:references" resource="#great_war_article"></div>
        <div rel="foaf:topic" resource="#kamesburgh"></div>
        <div rel="dcterms:references" resource="#moa"></div>
        <div rel="dcterms:references" resource="#trove"></div>
        <div rel="dcterms:references" resource="#allen_ww1_record"></div>
        <div rel="foaf:topic" resource="#allen"></div>
        <div rel="foaf:topic" resource="#kelley"></div>
    </div>
    <div about="#conference" typeof="bibo:Conference">
        <div property="dc:title" content="Indexing See Change"></div>
        <div rel="organizer">
            <div typeof="foaf:Organization" about="#ANZSI">
                <div property="foaf:name" content="Australian and new Zealand Society of Indexers Inc."></div>
                <div property="foaf:name" content="ANZSI"></div>
                <div rel="foaf:homepage" resource="http://www.anzsi.org/site/default.asp"></div>
            </div>
        </div>
        <div rel="foaf:homepage" resource="http://www.anzsi.org/site/2011Conference.asp"></div>
        <div rel="bibo:place" resource="#savoy"></div>
        <div rel="bibo:place" resource="http://sws.geonames.org/2174039/"></div>
        <div rel="event:time">
            <div typeof="time:Interval" about="#conference_dates">
                <div property="time:beginsAt" content="2011-09-12"></div>
                <div property="time:endsAt" content="2011-09-14"></div>
            </div>
        </div>
    </div>
    <div about="#savoy" typeof="gr:BusinessEntity">
        <div property="gr:name" content="Brighton Savoy Hotel"></div>
        <div rel="vcard:adr">
            <div typeof="vcard:Address" about="#savoy_address">
                <div property="vcard:street-address" content="150 The Esplanade"></div>
                <div rel="vcard:locality" resource="http://sws.geonames.org/2174039/"></div>
            </div>
        </div>
        <div rel="vcard:geo">
            <div>
                <div property="vcard:latitude" content="" datatype="xsd:float"></div>
                <div property="vcard:longitude" content="" datatype="xsd:float"></div>
            </div>
        </div>
        <div rel="foaf:homepage" resource="http://www.brightonsavoy.com.au/"></div>
    </div>
    <div about="#kamesburgh" typeof="gn:Feature">
        <div rel="gn:featureCode" resource="http://www.geonames.org/ontology#S.BLDG"></div>
        <div property="gn:name" content="Kamesburgh"></div>
        <div property="gn:alternateName" content="Anzac Hostel"></div>
        <div property="geo:lat" content="-37.899307"></div>
        <div property="geo:lon" content="144.997971"></div>
        <div rel="gn:parentFeature" resource="http://sws.geonames.org/2174039/"></div>
        <div rel="vcard:adr">
            <div typeof="vcard:Address" about="#kamesburg_address">
                <div property="vcard:street-address" content="102 North Road"></div>
                <div rel="vcard:locality" resource="http://sws.geonames.org/2174039/"></div>
            </div>
        </div>
        <div rel="foaf:depiction">
            <div about="#kamesburgh_photo" typeof="bibo:Image">
                <div rel="dcterms:publisher" resource="http://dbpedia.org/resource/Australian_War_Memorial"></div>
                <div property="bibo:uri" content="http://cas.awm.gov.au/item/P00158.039"></div>
            </div>
        </div>
        <div rel="foaf:isPrimaryTopicOf" resource="http://www.nattrust.com.au/trust_register/search_the_register/kamesburgh"></div>
        <div rel="dcterms:relation" resource="#bed"></div>
        <div rel="foaf:page" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
    </div>
    <div typeof="owl:Thing" about="#bed">
        <div rel="dcterms:type" resource="http://purl.org/dc/dcmitype/PhysicalObject"></div>
        <div property="dc:title" content="Coach wheel bed: ANZAC Hostel Brighton"></div>
        <div rel="foaf:isPrimaryTopicOf" resource="http://cas.awm.gov.au/item/REL27665"></div>
        <div rel="dcterms:relation" resource="#kamesburgh"></div>
        <div rel="dcterms:relation" resource="#ward"></div>
    </div>
    <div about="#ward" typeof="foaf:Person">
        <div property="foaf:givenName" content="Albert"></div>
        <div property="foaf:familyName" content="Ward"></div>
        <div rel="dc:relation" resource="#bed"></div>
    </div>
    <div about="#kelley" typeof="foaf:Person">
        <div property="foaf:name" content="Alexander Dewar Kelley"></div>
        <div property="foaf:givenName" content="Alexander"></div>
        <div property="foaf:familyName" content="Kelley"></div>
        <div rel="bio:death">
            <div about="#kelley_death" typeof="bio:Death">
                <div property="dc:date" content="1944-01-27"></div>
                <div rel="bio:place" resource="#kamesburgh"></div>
                <div rel="foaf:page">
                    <div about="#trove_11808280" typeof="bibo:Article">
                        <div property="dc:title" content="Family notices"></div>
                        <div property="dc:date" content="1944-01-29"></div>
                        <div rel="dc:isPartOf">
                            <div about="#argus" typeof="bibo:Newspaper">
                                <div property="dc:title" content="The Argus"></div>
                                <div rel="foaf:basedNear" resource="http://sws.geonames.org/2158177/"></div>
                                <div rel="foaf:isPrimaryTopicOf" resource="http://nla.gov.au/nla.news-title13"></div>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
        <div rel="bio:event">
            <div typeof="bio:Cremation" about="#kelley_cremation">
                <div rel="bio:place" resource="http://dbpedia.org/resource/Springvale_Botanical_Cemetery"></div>
                <div rel="foaf:page" resource="http://our-anzacs.tumblr.com/post/64198489/alexs-plaque-at-the-springvale-crematorium"></div>
            </div>
        </div>
        <div rel="bio:event">
            <div typeof="bio:Event" about="#kelley_enlistment">
                <div property="rdfs:label" content="Enlistment in the Australian Imperial Force for service in the First World War."></div>
                <div property="dc:date" content="1916-01-22"></div>
            </div>
        </div>
        <div rel="bio:event">
            <div typeof="bio:Event" about="#kelley_wounded_1">
                <div property="rdfs:label" content="Wounded in battle."></div>
                <div property="dc:date" content="1917-05-12"></div>
                <div property="dc:description" content="Gunshot wound to head."></div>
            </div>
        </div>
        <div rel="bio:event">
            <div typeof="bio:Event" about="#kelley_wounded_2">
                <div property="rdfs:label" content="Wounded in battle."></div>
                <div property="dc:date" content="1917-09-25"></div>
                <div property="dc:description" content="Severe injuries to face, hands and legs."></div>
            </div>
        </div>
        <div rel="bio:event">
            <div typeof="bio:Event" about="#kelley_discharge">
                <div property="rdfs:label" content="Discharged from the Australian Imperial Force."></div>
                <div property="dc:date" content="1918-11-22"></div>
            </div>
        </div>
        <div rel="bio:event">
            <div typeof="bio:Event" about="#kelley_reunion">
                <div property="rdfs:label" content="Reunion with family."></div>
                <div property="dc:date" content="1918-05-22"></div>
                <div rel="foaf:page" resource="http://our-anzacs.tumblr.com/post/64197860/a-diary-insert-found-inside-alexs-mother-annie"></div>
            </div>
        </div>
        <div rel="foaf:isPrimaryTopicOf">
            <div about="#kelley_moa" typeof="bibo:Webpage">
                <div property="bibo:uri" content="http://mappingouranzacs.naa.gov.au/details-permalink.aspx?barcode_no=7336927"></div>
                <div rel="dcterms:isPartOf" resource="#moa"></div>
            </div>
        </div>
        <div rel="foaf:page" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
    </div>
    <div about="#kelley_ww1_service" typeof="bio:Interval">
        <div rel="bio:initiatingEvent" resource="#kelley_enlistment"></div>
        <div rel="bio:concludingEvent" resource="#kelley_discharge"></div>
        <div rel="foaf:isPrimaryTopicOf">
            <div about="#kelley_ww1_record" typeof="locah:ArchivalResource">
                <div property="dc:identifier" content="B2455, KELLEY ALEXANDER DEWAR"></div>
                <div property="bibo:uri" content="http://www.aa.gov.au/cgi-bin/Search?O=I&Number=7336927"></div>
                <div rel="locah:accessProvidedBy" resource="http://dbpedia.org/resource/National_Archives_of_Australia"></div>
            </div>
        </div>
    </div>
    <div about="#moa" typeof="bibo:Website">
        <div property="dc:title" content="Mapping Our Anzacs"></div>
        <div rel="dcterms:publisher" resource="http://dbpedia.org/resource/National_Archives_of_Australia"></div>
        <div rev="foaf:pastProject" resource="http://discontents.com.au/about-me#me"></div>
        <div rel="dcterms:isReferencedBy" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
    </div>
    <div about="#mitchell" typeof="foaf:Person">
        <div property="foaf:givenName" content="Mitchell"></div>
        <div property="foaf:familyName" content="Whitelaw"></div>
        <div rel="foaf:homepage" resource="http://creative.canberra.edu.au/mitchell/"></div>
        <div rel="foaf:pastProject" resource="#visible_archive"></div>
        <div rev="foaf:knows" resource="http://discontents.com.au/about-me#me"></div>
    </div>
    <div about="#visible_archive" typeof="vivo:Project">
        <div property="dc:title" content="The Visible Archive"></div>
        <div rel="foaf:homepage" resource="http://visiblearchive.blogspot.com/"></div>
        <div rel="vivo:FundingOrganization" resource="http://dbpedia.org/resource/National_Archives_of_Australia"></div>
        <div rel="foaf:page" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
    </div>
    <div about="#great_war_article" typeof="bibo:Article">
        <div property="dc:title" content="When did 'the Great War' become the 'First World War'"></div>
        <div rel="dcterms:creator" resource="http://discontents.com.au/about-me#me"></div>
        <div rel="dcterms:references" resource="#trove"></div>
        <div rel="dcterms:isPartOf" resource="#discontents"></div>
        <div property="bibo:uri" content="http://wraggelabs.com/shed/time/the_great_war-2011-08-16.html"></div>
        <div rel="dcterms:isReferencedBy" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
    </div>
    <div about="#trove" typeof="bibo:Website">
        <div property="dc:title" content="Trove"></div>
        <div property="bibo:uri" content="http://trove.nla.gov.au"></div>
        <div rel="dcterms:publisher" resource="http://dbpedia.org/resource/National_Library_of_Australia"></div>
        <div rel="dcterms:isReferencedBy" resource="#great_war_article"></div>
        <div rel="dcterms:isReferencedBy" resource="#trove_by_state"></div>
        <div rel="dcterms:isReferencedBy" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
    </div>
    <div about="http://dbpedia.org/resource/National_Library_of_Australia" typeof="gr:PublicInstitution">
        <div property="gr:name" content="National Library of Australia"></div>
        <div rel="foaf:homepage" resource="http://nla.gov.au"></div>
    </div>
    <div about="#trove_by_state" typeof="bibo:Image">
        <div property="dc:title" content="Trove newspapers profile – Totals by state"></div>
        <div property="dc:date" content="2011-08-04"></div>
        <div rel="dcterms:references" resource="#trove"></div>
        <div rel="dcterms:isReferencedBy" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
    </div>
    <div about="#dan" typeof="foaf:Person">
        <div property="foaf:givenName" content="Dan"></div>
        <div property="foaf:familyName" content="Cohen"></div>
        <div rel="foaf:homepage" resource="http://www.dancohen.org/"></div>
        <div rel="foaf:currentProject" resource="#criminal_intent"></div>
        <div rev="foaf:knows" resource="http://discontents.com.au/about-me#me"></div>
    </div>
    <div about="#dan_quote" typeof="bibo:Quote">
        <div rel="dcterms:isPartOf" resource="#dan_article"></div>
        <div property="bibo:content" content="These computational methods, which allow us to find patterns, determine relationships, categorize documents, and extract information from massive corpuses, will form the basis for new tools for research in the humanities and other disciplines in the coming decade."></div>
        <div rel="dc:creator" resource="#dan"></div>
        <div rel="dcterms:isReferencedBy" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
    </div>
    <div about="#dan_article" typeof="bibo:Article">
        <div property="dc:title" content="From Babel to Knowledge: Data Mining Large Digital Collections"></div>
        <div rel="dcterms:isPartOf">
            <div about="#dlib" typeof="bibo:Journal">
                <div property="dcterms:title" content="D-Lib Magazine"></div>
                <div rel="foaf:homepage" resource="http://www.dlib.org/"></div>
            </div>
        </div>
        <div rel="dcterms:creator" resource="#dan"></div> 
        <div property="dcterms:date" content="2005-03"></div>
        <div property="bibo:volume" content="12"></div>
        <div property="bibo:issue" content="3"></div>
        <div property="bibo:uri" content="http://www.dlib.org/dlib/march06/cohen/03cohen.html"></div>
    </div>
    <div about="#criminal_intent" typeof="vivo:Project">
        <div property="dc:title" content="With Criminal Intent"></div>
        <div rel="foaf:homepage" resource="http://criminalintent.org/"></div>
        <div rel="vivo:FundingOrganization" resource="http://dbpedia.org/resource/National_Endowment_for_the_Humanities"></div>
        <div rel="foaf:page" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
    </div>
    <div about="#ngrams" typeof="bibo:Website">
        <div property="dc:title" content="Google books Ngram Viewer"></div>
        <div property="bibo:uri" content="http://ngrams.googlelabs.com/"></div>
        <div rel="dcterms:isReferencedBy" resource="#great_war_article"></div>
        <div rel="dcterms:isReferencedBy" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
    </div>
    <div about="#steve" typeof="foaf:Person">
        <div property="foaf:givenName" content="Stephen"></div>
        <div property="foaf:familyName" content="Ramsay"></div>
        <div rel="foaf:homepage" resource="http://lenz.unl.edu/"></div>
    </div>
    <div about="#steve_quote" typeof="bibo:Quote">
        <div rel="dcterms:isPartOf" resource="#steve_article"></div>
        <div rel="dc:creator" resource="#steve"></div>
        <div rel="dcterms:isReferencedBy" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
        <div property="bibo:content" content="The Old Bailey, like the Naked City, has eight million stories. Accessing those stories involves understanding trial length, numbers of instances of poisoning, and rates of bigamy. But being stories, they find their more salient expression in the weightier motifs of the human condition: justice, revenge, dishonor, loss, trial. This is what the humanities are about. This is the only reason for an historian to fire up Mathematica or for a student trained in French literature to get into Java."></div>
    </div>
    <div about="#steve_article" typeof="vivo:Presentation">
        <div property="dc:title" content="Prison Art"></div>
        <div rel="dcterms:creator" resource="#steve"></div>
        <div property="bibo:uri" content="http://lenz.unl.edu/papers/2011/06/10/prison-art.html"></div>
        <div rel="foaf:topic" resource="#criminal_intent"></div>
        <div rel="bibo:presentedAt">
            <div about="#did_conference" typeof="bibo:Conference">
                <div property="dcterms:title" content="Digging into Data Challenge Conference"></div>
                <div rel="event:time">
                    <div typeof="time:Interval" about="#conference_dates">
                        <div property="time:beginsAt" content="2011-06-09"></div>
                        <div property="time:endsAt" content="2011-06-10"></div>
                    </div>
                </div>
                <div rel="bibo:place" resource="http://dbpedia.org/resource/Old_Post_Office_Pavilion"></div>
                <div rel="bibo:organizer" resource="http://dbpedia.org/resource/National_Endowment_for_the_Humanities"></div>
            </div>
        </div>
    </div>
    <div about="#allen" typeof="foaf:Person">
        <div property="foaf:givenName" content="Charles"></div>
        <div property="foaf:familyName" content="Allen"></div>
        <div property="foaf:familyName" content="Gum"></div>
        <div rel="bio:birth">
            <div about="#allen_birth" typeof="bio:Birth">
                <div property="bio:date" content="1896-11-09"></div>
                <div rel="bio:mother">
                    <div about="#allen_mother" typeof="foaf:Person">
                        <div property="foaf:name" content="Frances Allen"></div>
                    </div>
                </div>
                <div rel="bio:father">
                    <div about="#allen_father" typeof="foaf:Person">
                        <div property="foaf:name" content="Charlie Gum"></div>
                    </div>
                </div>
                <div rel="bio:place" resource="http://sws.geonames.org/2147714/"></div>
            </div>
        </div>
        <div rel="bio:event">
            <div about="#allen_marriage" typeof="bio:Marriage">
                <div rel="bio:partner" resource="#allen"></div>
                <div rel="bio:partner">
                    <div about="#allen_wife" typeof="foaf:Person">
                        <div property="foaf:name" content="Maud Gordon"></div>
                    </div>
                </div>
                <div property="bio:date" content="1917-03-13"></div>
                <div rel="bio:place" resource="http://sws.geonames.org/2147714/"></div>
            </div>
        </div>
        <div rel="bio:death">
            <div about="#allen_death" typeof="bio:Death">
                <div property="dc:date" content="1938-03-10"></div>
                <div rel="bio:place" resource="http://dbpedia.org/resource/Bunnerong_Power_Station"></div>
                <div rel="foaf:page">
                    <div about="#trove_17447524" typeof="bibo:Article">
                        <div property="dcterms:title" content="MAN KILLED - Crushed Between Trucks"></div>
                        <div property="dcterms:date" content="1938-03-11"></div>
                        <div property="bibo:uri" content="http://nla.gov.au/nla.news-article17447524"></div>
                        <div rel="dcterms:isPartOf">
                            <div about="#smh" typeof="bibo:Newspaper">
                                <div property="dc:title" content="The Sydney Morning Herald"></div>
                                <div rel="foaf:basedNear" resource="http://sws.geonames.org/2147714/"></div>
                                <div rel="foaf:isPrimaryTopicOf" resource="http://nla.gov.au/nla.news-title35"></div>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
        <div rel="bio:event">
            <div about="#allen_leaves" typeof="bio:Event">
                <div property="rdfs:label" content="Charles Allen leaves Australia with his father to visit China."></div>
                <div property="bio:date" content="1909-06"></div>
                <div rel="foaf:page" resource="#allen_cedt"></div>
            </div>
        </div>
        <div rel="bio:event">
            <div about="#allen_returns" typeof="bio:Event">
                <div property="rdfs:label" content="Charles Allen returns to Australia from China."></div>
                <div property="bio:date" content="1915-06-05"></div>
                <div rel="foaf:page" resource="#allen_cedt"></div>
            </div>
        </div>
        <div rel="bio:event">
            <div about="#allen_enlistment_1" typeof="bio:Event">
                <div property="rdfs:label" content="Enlistment in the Australian Imperial Force."></div>
                <div property="bio:date" content="1916-09-11"></div>
            </div>
        </div>
        <div rel="bio:event">
            <div about="#allen_discharge_1" typeof="bio:Event">
                <div property="rdfs:label" content="Discharged from the Australian Imperial Force for service in the First World War."></div>
                <div property="bio:date" content="1916-11-07"></div>
            </div>
        </div>
        <div rel="bio:event">
            <div about="#allen_enlistment_2" typeof="bio:Event">
                <div property="rdfs:label" content="Enlistment in the Australian Imperial Force for service in the First World War."></div>
                <div property="bio:date" content="1917-10-15"></div>
            </div>
        </div>
        <div rel="bio:event">
            <div about="#allen_discharge_2" typeof="bio:Event">
                <div property="rdfs:label" content="Discharged from the Australian Imperial Force for service in the First World War."></div>
                <div property="bio:date" content="1917-12-04"></div>
            </div>
        </div>
        <div rel="bio:event">
            <div about="#allen_enlistment_3" typeof="bio:Event">
                <div property="rdfs:label" content="Enlistment in the Australian Imperial Force for service in the First World War."></div>
                <div property="bio:date" content="1918-01-14"></div>
            </div>
        </div>
        <div rel="bio:event">
            <div about="#allen_discharge_3" typeof="bio:Event">
                <div property="rdfs:label" content="Discharged from the Australian Imperial Force for service in the First World War."></div>
                <div property="bio:date" content="1918-10-29"></div>
            </div>
        </div>
        <div rel="foaf:page" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
    </div>
    <div about="#allen_ww1_service_1" typeof="bio:Interval">
        <div rel="bio:initiatingEvent" resource="#allen_enlistment_1"></div>
        <div rel="bio:concludingEvent" resource="#allen_discharge_1"></div>
        <div rel="foaf:page" resource="#allen_ww1_record"></div>
    </div>
    <div about="#allen_ww1_service_2" typeof="bio:Interval">
        <div rel="bio:initiatingEvent" resource="#allen_enlistment_2"></div>
        <div rel="bio:concludingEvent" resource="#allen_discharge_2"></div>
        <div rel="foaf:page" resource="#allen_ww1_record"></div>
    </div>
    <div about="#allen_ww1_service_3" typeof="bio:Interval">
        <div rel="bio:initiatingEvent" resource="#allen_enlistment_3"></div>
        <div rel="bio:concludingEvent" resource="#allen_discharge_3"></div>
        <div rel="foaf:page" resource="#allen_ww1_record"></div>
    </div>
    <div about="#allen_ww1_record" typeof="locah:ArchivalResource">
        <div property="dc:identifier" content="B2455, ALLEN C A"></div>
        <div property="bibo:uri" content="http://http://www.aa.gov.au/cgi-bin/Search?O=I&Number=3029140"></div>
        <div rel="locah:accessProvidedBy" resource="http://dbpedia.org/resource/National_Archives_of_Australia"></div>
    </div>
    <div about="#allen_in_china" typeof="bio:Interval">
        <div rel="bio:initiatingEvent" resource="#allen_leaves"></div>
        <div rel="bio:concludingEvent" resource="#allen_returns"></div>
        <div rel="foaf:isPrimaryTopicOf">
            <div about="#allen_file" typeof="locah:ArchivalResource">
                <div property="dc:identifier" content="SP42/1, C1922/4449"></div>
                <div property="bibo:uri" content="http://www.aa.gov.au/cgi-bin/Search?O=I&Number=30173278"></div>
                <div rel="locah:accessProvidedBy" resource="http://dbpedia.org/resource/National_Archives_of_Australia"></div>
            </div>
        </div>
    </div>
    <div about="#allen_cedt" typeof="bibo:Document">
        <div rel="dcterms:isPartOf">
            <div about="#allen_cedt_file" typeof="locah:ArchivalResource">
                <div property="dc:identifier" content="ST84/1, 1909/22/41-50"></div>
                <div property="bibo:uri" content="http://www.aa.gov.au/cgi-bin/Search?O=I&Number=7461068"></div>
                <div rel="locah:accessProvidedBy" resource="http://dbpedia.org/resource/National_Archives_of_Australia"></div>
            </div>
        </div>
        <div property="bibo:pageStart" content="24"></div>
        <div property="bibo:pageEnd" content="25"></div>
        <div rel="dcterms:isReferencedBy" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
    </div>
    <div about="#allen_letter" typeof="bibo:Letter">
        <div rel="dcterms:creator" resource="#allen"></div>
        <div rel="bibo:recipient" resource="#allen_mother"></div>
        <div rel="dcterms:isPartOf" resource="#allen_file"></div>
        <div property="bibo:pageStart" content="12"></div>
        <div property="bibo:pageEnd" content="13"></div>
        <div rel="dcterms:isReferencedBy" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
    </div>
    <div about="#kate" typeof="foaf:Person">
        <div property="foaf:givenName" content="Kate"></div>
        <div property="foaf:familyName" content="Bagnall"></div>
        <div rel="foaf:homepage" resource="http://chineseaustralia.org/"></div>
        <div rel="foaf:currentProject" resource="#invisible_australians"></div>
        <div rev="foaf:knows" resource="http://discontents.com.au/about-me#me"></div>
    </div>
    <div about="#kate_presentation" typeof="vivo:Presentation">
        <div property="dcterms:title" content="A legacy of White Australia: Records about Chinese Australians in the National Archives"></div>
        <div rel="dcterms:creator" resource="#kate"></div>
        <div property="bibo:uri" content="http://www.naa.gov.au/collection/publications/papers-and-podcasts/immigration/white-australia.aspx"></div>
        <div rel="bibo:presentedAt">
            <div about="#kate_conference" typeof="bibo:Conference">
                <div property="dcterms:title" content="Fourth International Conference of Institutes and Libraries for Chinese Overseas Studies"></div>
                <div rel="bibo:place" resource="http://dbpedia.org/resource/Jinan_University"></div>
                <div property="dcterms:date" content="2009-05"></div>            
            </div>
        </div>
        <div rel="dcterms:isReferencedBy" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
    </div>
    <div about="#invisible_australians" typeof="vivo:Project">
        <div property="dcterms:title" content="Invisible Australians: The real face of White Australia"></div>
        <div rel="foaf:homepage" resource="http://invisibleaustralians.org"></div>
        <div rel="foaf:page" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
    </div>
    <div about="#history_wall" typeof="bibo:Website">
        <div property="dcterms:title" content="The History Wall"></div>
        <div property="bibo:uri" content="http://historywall.nma.gov.au/"></div>
        <div rel="dcterms:creator" resource="http://discontents.com.au/about-me#me"></div>
        <div rel="dcterms:isReferencedBy" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
        <div rel="dcterms:publisher" resource="http://dbpedia.org/resource/National_Museum_of_Australia"></div>
    </div>
    <div about="#edward" typeof="foaf:Person">
        <div property="foaf:givenName" content="Edward"></div>
        <div property="foaf:familyName" content="Ayers"></div>
        <div rel="foaf:homepage" resource="http://president.richmond.edu/ayers/index.html"></div>
        <div rel="owl:sameAs" resource="http://dbpedia.org/resource/Edward_L._Ayers"></div>
    </div>
    <div about="#edward_quote_1" typeof="bibo:Quote">
        <div rel="dcterms:isPartOf" resource="#edward_article"></div>
        <div rel="dc:creator" resource="#edward"></div>
        <div rel="dcterms:isReferencedBy" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
        <div property="bibo:content" content="even isolated and inert pieces of evidence – a list, a letter, a map, a picture – can assume new and unimagined meanings when placed in juxtaposition with other fragments."></div>
    </div>
    <div about="#edward_quote_2" typeof="bibo:Quote">
        <div rel="dcterms:isPartOf" resource="#edward_article"></div>
        <div rel="dc:creator" resource="#edward"></div>
        <div rel="dcterms:isReferencedBy" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
        <div property="bibo:content" content="Humans, presented with pieces of information about people, put things into the form of a story. They need not be simple stories, for we know how to deal with unexplained lapses of time, flashbacks, and overlapping narratives. We know how to imagine, infer, things happening at the same time in different places. Film and television train all of us at early ages to weave strands of narrative out of intentional (if carefully constructed) confusion and to take pleasure in that weaving."></div>
    </div>
    <div about="#edward_article" typeof="foaf:Article">
        <div property="dcterms:title" content="History in Hypertext"></div>
        <div property="dcterms:date" content="1999"></div>
        <div property="bibo:uri" content="http://www.vcdh.virginia.edu/Ayers.OAH.html"></div>
    </div>
    <div about="#amanda" typeof="foaf:Person">
        <div property="foaf:givenName" content="Amanda"></div>
        <div property="foaf:familyName" content="French"></div>
        <div rel="foaf:homepage" resource="http://amandafrench.net/"></div>
        <div rev="foaf:knows" resource="http://discontents.com.au/about-me#me"></div>
    </div>
    <div about="#amanda_quote" typeof="bibo:Quote">
        <div rel="dcterms:isPartOf" resource="#amanda_presentation"></div>
        <div rel="dc:creator" resource="#amanda"></div>
        <div rel="dcterms:isReferencedBy" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
        <div property="bibo:content" content="What I wonder is whether instead we can begin with the data, or with a datum, and simply watch for what it may tell us, even if what it tells us is simply a story."></div>
    </div>
    <div about="#amanda_presentation" typeof="vivo:Presentation">
        <div property="dcterms:title" content="In Praise of Humanities Data"></div>
        <div property="dcterms:date" content="2010-11"></div>
        <div property="bibo:uri" content="http://www.scribd.com/doc/50066437/In-Praise-of-Humanities-Data"></div>
    </div>
    <div about="#tim" typeof="foaf:Person">
        <div property="foaf:givenName" content="Tim"></div>
        <div property="foaf:familyName" content="Hitchcock"></div>
        <div rel="foaf:homepage" resource="http://historyonics.blogspot.com/"></div>
        <div rel="foaf:currentProject" resource="#criminal_intent"></div>
    </div>
    <div about="#tim_quote" typeof="bibo:Quote">
        <div rel="dcterms:isPartOf" resource="#tim_article"></div>
        <div rel="dc:creator" resource="#tim"></div>
        <div rel="dcterms:isReferencedBy" resource="http://discontents.com.au/shoebox/every-story-has-a-beginning"></div>
        <div property="bibo:content" content="What happens when institutions and archives are 'decentred' in favour of the individual? What changes when we examine the world through the collected fragments of knowledge that we can recover about a single person, reorganised as a biographical narrative, rather than as part of an archival system?"></div>
    </div>
    <div about="#tim_article" typeof="bibo:Chapter">
        <div rel="dc:creator" resource="#tim"></div>
        <div property="dc:title" content="Digital searching and the re-formulation of historical knowledge"></div>
        <div rel="dcterms:isPartOf">
            <div about="#virtual_representation" typeof="bibo:EditedBook">
                <div property="dc:title" content="The Virtual Representation of the Past"></div>
                <div rel="bibo:editorList">
                    <div about="#editor1" typeof="foaf:Person">
                        <div property="foaf:name" content="Mark Greengrass"></div>
                    </div>
                    <div about="#editor2" typeof="foaf:Person">
                        <div property="foaf:name" content="Lorna Hughes"></div>
                    </div>
                </div>
                <div rel="dcterms:publisher" resource="http://dbpedia.org/resource/Ashgate_Publishing"> </div>
                <div property="dcterms:date" content="2008"></div>
            </div>
        </div>
        <div property="bibo:pageStart" content="81"></div>
        <div property="bibo:pageEnd" content="90"></div>
    </div>
</div>
</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shoebox/every-story-has-a-beginning/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>the real face of white australia</title>
		<link>http://discontents.com.au/shoebox/archives-shoebox/the-real-face-of-white-australia</link>
		<comments>http://discontents.com.au/shoebox/archives-shoebox/the-real-face-of-white-australia#comments</comments>
		<pubDate>Tue, 20 Sep 2011 14:42:16 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[archives]]></category>
		<category><![CDATA[experiments]]></category>
		<category><![CDATA[facial detection]]></category>
		<category><![CDATA[invisibleaustralians]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1323</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=the+real+face+of+white+australia&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=archives&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2011-09-21&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shoebox/archives-shoebox/the-real-face-of-white-australia&amp;rft.language=English"></span>
In many of the presentations I&#8217;ve given in recent times I&#8217;ve managed to include a question raised by Tim Hitchcock in his chapter in The Virtual Representation of the Past. Tim asks: What changes when we examine the world through the collected fragments of knowledge that we can recover about a single person, reorganised as [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=the+real+face+of+white+australia&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=archives&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2011-09-21&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shoebox/archives-shoebox/the-real-face-of-white-australia&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1323"><!-- &nbsp; --></abbr>
<p>In many of the presentations I&#8217;ve given in recent times I&#8217;ve managed to include a question raised by Tim Hitchcock in his chapter in <em>The Virtual Representation of the Past</em>. Tim asks:</p>
<blockquote><p>What changes when we examine the world through the collected fragments of knowledge that we can recover about a single person, reorganised as a biographical narrative, rather than as part of an archival system?</p></blockquote>
<p>The idea of turning archival systems on their head to expose the people rather than the bureaucracy is what motivates Kate Bagnall and I in our attempts to make the <a href="http://invisibleaustralians.org">Invisible Australians</a> project into a reality.</p>
<p><em>Invisible Australians</em> aims to liberate the lives of those who suffered under the restrictions of the White Australia Policy from the rich archival holdings of the National Archives of Australia and elsewhere.</p>
<p>We always knew that the portrait photographs, included on a range of government documents, would provide a compelling perspective on these lives, but we weren&#8217;t quite sure how we were going to extract them. Up until last weekend, I&#8217;d assumed that we&#8217;d develop a crowdsourcing tool that contributors would use to mark-up the photos.</p>
<p>Now I&#8217;m not so sure.</p>
<p>In the space of a couple of days I&#8217;ve extracted over 7,000 photographs and built an application to browse them &#8212; here is <a href="http://invisibleaustralians.org/faces/">the real face of White Australia</a>&#8230;</p>
<p><a href="http://invisibleaustralians.org/faces/"><img src="http://discontents.com.au/wp-content/uploads/2011/09/real_face-250x182.jpg" alt="" title="real_face" width="250" height="182" class="aligncenter size-medium wp-image-1325" /></a></p>
<p>How did I do it? Paul Hagon, at the National Library of Australia, <a href="http://www.paulhagon.com/blog/2010/03/11/everything-i-know-about-cataloguing-i-learned-from-watching-james-bond/">gave a presentation</a> last year in which he explored the possibilities of facial detection in developing access to photographic collections. The idea lodged in my brain somewhere and a few days ago I started to poke around looking to see how practical it might be for <em>Invisible Australians</em>.</p>
<p>It didn&#8217;t take long to find <a href="http://creatingwithcode.com/howto/face-detection-in-static-images-with-python/">a python script</a> that used the <a href="http://sourceforge.net/projects/opencvlibrary/">OpenCV library</a> to detect faces in photographs. I tried the script on a few of the NAA documents and was impressed &#8212; there were a few false positives, but the faces were being found!</p>
<p>So then the excitement kicked in. I modified the script so that instead of just finding the coordinates of faces it would enlarge the selected area by 50px on each side and then crop the image. This did a great job of extracting the portraits. I tweaked a few of the settings as well to try and reduce the number of false positives. Eventually, I developed a two-pass system that repeated the detection process after the image had been cropped and it&#8217;s contrast adjusted. This seemed to weed out a few more errors. You can <a href="https://github.com/wragge/Facial-detection">find the code</a> on GitHub.</p>
<p>Once the script was working I had to assemble the documents. I already had a basic harvester that would retrieve both the file metadata and digitised images for any series in the NAA database. Acting on Kate&#8217;s advice, I pointed it at series <a href="http://www.naa.gov.au/cgi-bin/Search?Number=ST84/1">ST84/1</a> and downloaded 12,502 page images.</p>
<p>All I then had to do was loop the facial detection script over the images. Simple! The only problem was that my 3-year-old laptop wasn&#8217;t quite up to the task. As it&#8217;s CPU temperature rose and rose, I was forced to employ a special high-tech cooling system.</p>
<div id="attachment_1329" class="wp-caption aligncenter" style="width: 260px"><a href="http://discontents.com.au/wp-content/uploads/2011/09/cooling.jpg"><img src="http://discontents.com.au/wp-content/uploads/2011/09/cooling-250x186.jpg" alt="" title="cooling" width="250" height="186" class="size-medium wp-image-1329" /></a><p class="wp-caption-text">Keeping my laptop alive...</p></div>
<p>But after running for several hours, my faithful old laptop finally worked it&#8217;s way through all the documents. The result was a directory full of 11,170 cropped images.</p>
<div id="attachment_1332" class="wp-caption aligncenter" style="width: 260px"><a href="http://discontents.com.au/wp-content/uploads/2011/09/faces_dir.jpg"><img src="http://discontents.com.au/wp-content/uploads/2011/09/faces_dir-250x147.jpg" alt="" title="faces_dir" width="250" height="147" class="size-medium wp-image-1332" /></a><p class="wp-caption-text">The results</p></div>
<p>There were still quite a lot of false positives and so I simply worked my way through the files, manually deleting the errors. I ended up with 7,247 photos of people. That&#8217;s a strike rate of nearly 65% which seems pretty good. The classifier, which does the actual facial detection, was probably trained on conventional photographs rather than on the mixed-format documents I was feeding it.</p>
<p>Then it was just a matter of building a web app to display the portraits. I used Django for the backend work of managing the metadata and delivering the content, while the interface was built using a combination or <a href="http://isotope.metafizzy.co/index.html">Isotope</a>, <a href="http://www.infinite-scroll.com/">Infinite Scroll</a> and <a href="http://fancybox.net/">FancyBox</a>.</p>
<p>It&#8217;s important to note that the portraits provide a way of exploring the records themselves. If you click on a face you see a copy of the document from which the photo was extracted. A link is provided to examine the full context of the image in RecordSearch. This is not just an exhibition, it&#8217;s a finding aid.</p>
<p>What next? There are many more of these documents to be harvested and processed (and many more still yet to be digitised). I will be adding more series as I can (though I might have to wait until I can afford a new computer!). I&#8217;d also like to explore the possibilities of facial or object detection a bit more. Could I train my own classifier? Could I detect handprints, or even classify the type of form?</p>
<p>In the meantime, I think our experimental browser helps us to understand why the <em>Invisible Australians</em> project is so important &#8212; you look at their faces and you simply want to know more. Who are they? What were their lives like?</p>
<p>UPDATE: For more on the photos and the issues they raise, see <a href="http://chineseaustralia.org/?cat=62">Kate Bagnall&#8217;s posts</a> over at the <a href="http://chineseaustralia.org/">Tiger&#8217;s Mouth</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shoebox/archives-shoebox/the-real-face-of-white-australia/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>When did the &#8216;Great War&#8217; become the &#8216;First World War&#8217;?</title>
		<link>http://discontents.com.au/shed/experiments/when-did-the-great-war-become-the-first-world-war</link>
		<comments>http://discontents.com.au/shed/experiments/when-did-the-great-war-become-the-first-world-war#comments</comments>
		<pubDate>Mon, 29 Aug 2011 13:38:38 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[experiments]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[Trove]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=1259</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=When+did+the+%26%238216%3BGreat+War%26%238217%3B+become+the+%26%238216%3BFirst+World+War%26%238217%3B%3F&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2011-08-29&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/experiments/when-did-the-great-war-become-the-first-world-war&amp;rft.language=English"></span>
I&#8217;m interested in time &#8212; in the way we imagine, manipulate, experience and describe time, particularly in the service of ideas such as &#8216;progress&#8217;. This was one of the themes of Atomic Wonderland, but beyond constructing a few case studies it&#8217;s not all that easy to study. Or at least it wasn&#8217;t. Now projects such [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=When+did+the+%26%238216%3BGreat+War%26%238217%3B+become+the+%26%238216%3BFirst+World+War%26%238217%3B%3F&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=digital+humanities&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2011-08-29&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/experiments/when-did-the-great-war-become-the-first-world-war&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=1259"><!-- &nbsp; --></abbr>
<div id="attachment_1293" class="wp-caption alignright" style="width: 260px"><a href=" http://nla.gov.au/nla.news-article62826197"><img src="http://discontents.com.au/wp-content/uploads/2011/08/townsville-daily-bulletin-9-Dec-1939-250x322.png" alt="" title="townsville-daily-bulletin-9-Dec-1939" width="250" height="322" class="size-medium wp-image-1293" /></a><p class="wp-caption-text">Townsville Daily Bulletin, 9 December 1939</p></div>
<p>I&#8217;m interested in time &#8212; in the way we imagine, manipulate, experience and describe time, particularly in the service of ideas such as &#8216;progress&#8217;.</p>
<p>This was one of the themes of <a title="Atomic wonderland" href="http://discontents.com.au/shoebox/history-of-australian-science/atomic-wonderland">Atomic Wonderland</a>, but beyond constructing a few case studies it&#8217;s not all that easy to study. Or at least it wasn&#8217;t. Now projects such as <a href="http://victorianbooks.org/">Victorian Books</a> are showing how we can explore the changing weights of ideas across times and cultures by analysing the contents of large textual collections.</p>
<p>Returning visitors will be probably be aware of <a href="http://discontents.com.au/tag/trove">my own experiments</a> mining the contents of the National Library of Australia&#8217;s digitised newspapers database, available through <a href="http://trove.nla.gov.au/newspaper">Trove</a>. So far I&#8217;ve focused on the development of generic tools and techniques, but I thought it would be interesting to apply these to my study of &#8216;progress&#8217;. Happily the NLA agreed and have awarded me a <a href="http://www.nla.gov.au/harold-white-fellowships/2012-national-library-of-australia-fellowships-announced">Harold White Fellowship for 2012</a> to do just that. Yippee!</p>
<p>I&#8217;ll be taking up the fellowship in February, but in preparation I&#8217;ve started to develop a few little sketches that prod at our fondness for periodisation. Labels such as &#8216;the Roaring Twenties&#8217;, &#8216;the Great Depression&#8217; or even &#8216;the First World War&#8217; are so familiar that we sometimes forget that they themselves have a history.</p>
<p>To begin with I decided to examine the question of when &#8216;the Great War&#8217; became &#8216;the First World War&#8217;. At some point we realised that the Great War was not the final act in a centuries-long drama of European jealousy and jostling, but the first in a series of global conflicts. Can newspapers tell us when?</p>
<p>I <a href="http://discontents.com.au/shed/experiments/mining-the-treasures-of-trove-part-2">already had a script</a> that would generate a basic time series from a Trove query string. It simply takes the query, fires off a separate search for each year and grabs the number of matching articles. If the number of matches is more than zero, it also retrieves the total number of articles for that year and calculates the proportion matching the query. The results are saved in a json file which can be easily visualised using something like <a href="http://www.highcharts.com/">HighCharts</a>. The original script needed a few tweaks to streamline the process, but I&#8217;ll describe these in detail in my next post.</p>
<p>For this experiment I constructed two queries. The first simply searched for the phrase &#8216;<a href="http://trove.nla.gov.au/newspaper/result?q=&#038;exactPhrase=the+great+war&#038;l-category=Article|category%3AArticle">the great war</a>&#8216; between 1900 and 1954. The second was a bit more complicated &#8212; it searched for <a href="http://trove.nla.gov.au/newspaper/result?l-category=Article|category%3AArticle&#038;sortby=dateAsc&#038;q=%22the+first+world+war%22+OR+%22world+war+one%22+OR+%22world+war+i%22+OR+%22world+war+1%22">any of the phrases</a> &#8216;first world war&#8217;, &#8216;world war one&#8217;, &#8216;world war 1&#8242; or &#8216;world war i&#8217; across the same period. I fed the queries to my script and after a bit of ker-chugging, whirring and clunking I ended up with a graph.</p>
<div id="attachment_1278" class="wp-caption alignright" style="width: 260px"><a href="http://wraggelabs.com/shed/time/the_great_war-2011-08-16.html"><img src="http://discontents.com.au/wp-content/uploads/2011/08/great_war_graph-252x300.jpg" alt="" title="When did the Great War become the First World War?" width="250" height="297" class="size-medium wp-image-1278" /></a><p class="wp-caption-text">Click to view the full interactive graph.</p></div>
<p>The result is not really surprising. As you can see <a href="http://wraggelabs.com/shed/time/the_great_war-2011-08-16.html">on the full graph</a>, the two lines cross late in 1941. With German victories across Europe and North Africa, the opening of the Eastern Front and, finally, the Japanese attack on Pearl Harbour, 1941 seems to make sense. But it&#8217;s interesting to see this reflected so clearly in such a rough and ready analysis.</p>
<p>What is perhaps more intriguing is the huge spike in 1939. Of course it makes sense that people would be referring back to the Great War as the prospect of a new conflict loomed, but it does make you wonder about the context of these discussions and how they might have developed as war edged closer.</p>
<p>Notable too are the earlier blips in the First World War count &#8212; the first centred on 1916 and the second on 1935. The peak in 1916 is actually due to the tags and comments added by Trove users. The standard &#8216;search everything&#8217; option in Trove includes these as well as the text of the articles themselves. By using other search options you can choose to exclude the tags that match your query, but that seems rather messy. It would be nicer if Trove gave you the option of ignoring these matches from the start.</p>
<div id="attachment_1286" class="wp-caption alignright" style="width: 260px"><a href="http://nla.gov.au/nla.news-article32886350"><img src="http://discontents.com.au/wp-content/uploads/2011/08/first_world_war-300x298.jpg" alt="" title="first_world_war" width="250" height="248" class="size-medium wp-image-1286" /></a><p class="wp-caption-text">The West Australian, 24 May 1935</p></div>
<p>The second blip is a bit more interesting. By clicking on the graph and exploring the results from Trove, you can see that it&#8217;s due to the screening of a documentary film called &#8216;<a href="http://www.imdb.com/title/tt0976117/">The First World War</a>&#8216;. The film used archival footage drawn from a number of nations and was based on Laurence Stalling&#8217;s book <em>The First World War: A Photographic History</em>. As one newspaper article noted: &#8216;this picture presents war, stripped of its gaudy trappings, and fearful in its grim reality&#8217;.</p>
<p>By way of comparison I <a href="http://ngrams.googlelabs.com/graph?content=the+Great+War%2Cthe+First+World+War&#038;year_start=1900&#038;year_end=1954&#038;corpus=0&#038;smoothing=0">tried a similar query</a> using the Google Books Ngram viewer. The crossover point seems a little later, but of course books take longer to publish than newspapers. There is, however, no peak in 1939 for &#8216;the Great War&#8217; &#8212; at least not if you use the combined &#8216;English&#8217; corpus. If you examine the British-English and American-English corpora separately it&#8217;s a rather different story. Querying the British-English corpus produces <a href="http://ngrams.googlelabs.com/graph?content=the+Great+War%2Cthe+First+World+War&#038;year_start=1900&#038;year_end=1954&#038;corpus=6&#038;smoothing=0">something much closer</a> to our Trove graph, complete with a spike around 1939. Again, this is only as we&#8217;d expect given the lesser significance of the First World War in American history. </p>
<p>This is, of course, only a sketch &#8212; something to prompt new questions or suggest avenues for attack. It&#8217;s made me want to find out a bit more about the nature of discussions in 1939, so I&#8217;ve fired up my <a href="http://wraggelabs.com/emporium/trove-tools/harvester/">Trove Newspaper Harvester</a> and downloaded the text of all 6,582 articles from 1939 that include the phrase &#8216;the Great War&#8217;. More about that soon&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shed/experiments/when-did-the-great-war-become-the-first-world-war/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

