For a project that I’m working on at the National Museum of Australia, I’ve started collecting various sources of date-identified data. Most recently I had a go at extracting historical population data from the Australian Bureau of Statistics.
The data can all be downloaded as .xls files, but they’re not simple, flat spreadsheets – they’re data cubes. As the name suggests, data cubes are organised along a number of dimensions. In the case of the population data it’s year, state and gender.
This means that you can’t just export the data to CSV and suck it into your database – first you’ve got to flatten the cube. No doubt there are other ways to do this, but I just wrote a simple python script. It uses xlrd to read from the spreadsheet, does a bit or reorganisation, then writes the output to a CSV file. The code, for what it’s worth, is available at Bitbucket.
Once I had the CSV file I just imported it into MySQL and used Django and Piston to build a basic API. So if you want to know the population of NSW in 1856, you just go to:
The number of infant deaths in Tasmania in 1932:
The number of female births in Australia in 1959:
I’m sure you get the picture. You can change the ‘json’ to ‘xml’ if you’d like another flavour of data.
With an API delivering JSON you can start playing around with all sorts of fun AJAX-y stuff. To demonstrate I built a simple population browser using JQuery. Just drag the slider!
This work is licensed under a Creative Commons Attribution 4.0 International License.