<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>discontents &#187; Piston</title>
	<atom:link href="http://discontents.com.au/tag/piston/feed" rel="self" type="application/rss+xml" />
	<link>http://discontents.com.au</link>
	<description>working for the triumph of content over form, ideas over control, people over systems</description>
	<lastBuildDate>Tue, 24 Jan 2012 20:57:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Headline roulette</title>
		<link>http://discontents.com.au/shed/experiments/headline-roulette</link>
		<comments>http://discontents.com.au/shed/experiments/headline-roulette#comments</comments>
		<pubDate>Tue, 23 Mar 2010 12:26:29 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[experiments]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[Django]]></category>
		<category><![CDATA[games]]></category>
		<category><![CDATA[newspapers]]></category>
		<category><![CDATA[NLA]]></category>
		<category><![CDATA[Piston]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[screen scraping]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=834</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Headline+roulette&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2010-03-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/experiments/headline-roulette&amp;rft.language=English"></span>
I&#8217;ve been doing a fair bit of coding in recent weeks and I thought I&#8217;d better write a few details down before I forget about them. As previously noted, I&#8217;ve been gathering together various historical data sets for a project at the National Museum of Australia. One resource that I was keen on including was [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Headline+roulette&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2010-03-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/experiments/headline-roulette&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=834"><!-- &nbsp; --></abbr>
<p>I&#8217;ve been doing a fair bit of coding in recent weeks and I thought I&#8217;d better write a few details down before I forget about them.</p>
<p>As previously noted, I&#8217;ve been gathering together various historical data sets for a project at the National Museum of Australia. One resource that I was keen on including was the fantastic <a href="http://newspapers.nla.gov.au/ndp/del/home">Australian Newspapers</a> project at the National Library of Australia. What I had in mind was being able to give a sense of context to any historical event by calling up the headlines for that particular time.</p>
<p>Unfortunately there&#8217;s no API for the newspapers project (or Trove in general), though apparently it&#8217;s in the works. So I had to reverse engineer the advanced search page to work out the various query options, and then build a screen scraper to harvest the results. I played around with the search options a bit to fine tune the results, finally deciding to limit them to &#8216;news&#8217; articles with more than 1000 words. Annoyingly, only 10 results are returned at a time.</p>
<p>I had hoped to parse the results as xml, but a rogue &lt;br&gt; tag broke the XHTML, so I fell back on <a href="http://www.crummy.com/software/BeautifulSoup/">Beautiful Soup</a> – a Python module that makes screen scraping considerably easier by tidying up HTML structures. After than it was pretty straightforward. Soon I had <a href="http://bitbucket.org/wragge/nla-newspapers/">my own Python module</a> to query the newspapers database and process the results.</p>
<p>The next step was to use the module to build a simple API that would let us quickly grab a set of headlines for a particular date and place. <a href="http://www.djangoproject.com/">Django</a> and <a href="http://bitbucket.org/jespern/django-piston/wiki/Home">Piston</a> made this easy. To see headlines from Victoria on 1 January 1901, for example:</p>
<p><a href="http://wraggelabs.com/api/newspapers/1901-01-01/nsw/">http://wraggelabs.com/api/newspapers/1901-01-01/nsw/</a></p>
<p>That was pretty cool and it started me thinking about what else I might do with the data. At first I was planning some sort of browser, like my <a href="http://wraggelabs.com/abs/">Population Browser</a>, but that seemed a bit boring. So I decided to create a simple game that grabbed a random headline and asked you to try and guess the date. After further refinement I decided to impose a limit of 10 guesses, with &#8216;higher&#8217; or &#8216;lower&#8217; prompts to get you moving in the right direction. Yes, basically it was a rip-off of The Price is Right – but an interesting, ironic and historically engaged rip-off&#8230;</p>
<p>This required me to make a change to the API and Python module so that I could retrieve a random headline. Basically it just meant generating a query based on random values for the day, month, year and state. For the interface I once again delved into JQuery&#8217;s box of tricks. With all the kerfuffle about ChatRoulette in the media, the name seemed obvious – <a href="http://wraggelabs.com/newsroulette/">Wragge&#8217;s Headline Roulette</a> was born.</p>
<div id="attachment_839" class="wp-caption aligncenter" style="width: 310px"><a href="http://wraggelabs.com/newsroulette/"><img class="size-medium wp-image-839" title="headline-roulette" src="http://discontents.com.au/wp-content/uploads/2010/03/headline-roulette-300x151.jpg" alt="Headline roulette screen capture" width="300" height="151" /></a><p class="wp-caption-text">Test your historical nous with Headline Roulette!</p></div>
<p>It&#8217;s a very simple little app, but a number of people have said how much fun it is. The bad news is that imminent changes to the NLA newspapers site are probably going to break it (at least in its current form). So enjoy it while you can. When the NLA makes an API available I might work on something a little more sophisticated.</p>
<p>Of course, the broader point is that there are a whole range of cultural materials out there waiting to be remixed and re-used in various forms. Get hacking&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shed/experiments/headline-roulette/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Out of the cube</title>
		<link>http://discontents.com.au/shed/experiments/out-of-the-cube</link>
		<comments>http://discontents.com.au/shed/experiments/out-of-the-cube#comments</comments>
		<pubDate>Fri, 26 Feb 2010 05:57:44 +0000</pubDate>
		<dc:creator>tim</dc:creator>
				<category><![CDATA[experiments]]></category>
		<category><![CDATA[APIs]]></category>
		<category><![CDATA[datacubes]]></category>
		<category><![CDATA[Django]]></category>
		<category><![CDATA[Piston]]></category>
		<category><![CDATA[population]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[spreadsheets]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://discontents.com.au/?p=823</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Out+of+the+cube&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2010-02-26&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/experiments/out-of-the-cube&amp;rft.language=English"></span>
For a project that I&#8217;m working on at the National Museum of Australia, I&#8217;ve started collecting various sources of date-identified data. Most recently I had a go at extracting historical population data from the Australian Bureau of Statistics. The data can all be downloaded as .xls files, but they&#8217;re not simple, flat spreadsheets – they&#8217;re [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Out+of+the+cube&amp;rft.aulast=Sherratt&amp;rft.aufirst=Tim&amp;rft.subject=experiments&amp;rft.source=discontents&amp;rft.date=2010-02-26&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://discontents.com.au/shed/experiments/out-of-the-cube&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://discontents.com.au/?p=823"><!-- &nbsp; --></abbr>
<p>For a project that I&#8217;m working on at the National Museum of Australia, I&#8217;ve started collecting various sources of date-identified data. Most recently I had a go at extracting <a href="http://www.abs.gov.au/AUSSTATS/abs@.nsf/mf/3105.0.65.001">historical population data</a> from the Australian Bureau of Statistics.</p>
<p>The data can all be downloaded as .xls files, but they&#8217;re not simple, flat spreadsheets – they&#8217;re data cubes. As the name suggests, data cubes are organised along a number of dimensions. In the case of the population data it&#8217;s year, state and gender.</p>
<p>This means that you can&#8217;t just export the data to CSV and suck it into your database – first you&#8217;ve got to flatten the cube. No doubt there are other ways to do this, but I just wrote a simple python script. It uses <a href="http://pypi.python.org/pypi/xlrd">xlrd</a> to read from the spreadsheet, does a bit or reorganisation, then writes the output to a CSV file. The code, for what it&#8217;s worth, is <a href="http://bitbucket.org/wragge/abs-data-cube-processor/">available at Bitbucket</a>.</p>
<p>Once I had the CSV file I just imported it into MySQL and used Django and <a href="http://bitbucket.org/jespern/django-piston/wiki/Home">Piston</a> to build a basic API. So if you want to know the population of NSW in 1856, you just go to:</p>
<p><a href="http://wraggelabs.com/api/json/population/nsw/1856/">http://wraggelabs.com/api/json/population/nsw/1856/</a></p>
<p>The number of infant deaths in Tasmania in 1932:</p>
<p><a href="http://wraggelabs.com/api/json/infantdeaths/tas/1932/">http://wraggelabs.com/api/json/infantdeaths/tas/1932/</a></p>
<p>The number of female births in Australia in 1959:</p>
<p><a href="http://wraggelabs.com/api/json/births/australia/females/1959/">http://wraggelabs.com/api/json/births/australia/females/1959/</a></p>
<p>I&#8217;m sure you get the picture. You can change the &#8216;json&#8217; to &#8216;xml&#8217; if you&#8217;d like another flavour of data.</p>
<div id="attachment_830" class="wp-caption aligncenter" style="width: 310px"><a href="http://wraggelabs.com/abs/"><img class="size-medium wp-image-830" title="pop_browser" src="http://discontents.com.au/wp-content/uploads/2010/02/pop_browser-300x140.png" alt="Screenshot of population browser" width="300" height="140" /></a><p class="wp-caption-text">The API in action - a simple population browser</p></div>
<p>With an API delivering JSON you can start playing around with all sorts of fun AJAX-y stuff. To demonstrate I built a <a href="http://wraggelabs.com/abs/">simple population browser</a> using JQuery. Just drag the slider!</p>
]]></content:encoded>
			<wfw:commentRss>http://discontents.com.au/shed/experiments/out-of-the-cube/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

