2016 — the making and the talking

The image above is from Geoff Hinchcliffe’s awesome visualisation of more than 12,000 #fundTrove tweets.

This year I sadly left the wonderful team at Trove and took up a full-time academic post at the University of Canberra. But it was Trove that dominated the early part of the year, as the impact of continual funding cuts on the National Library of Australia became clear. Users of Trove shared their feelings on Twitter and Facebook, organisations posted statements of support, and numerous articles appeared in the media. In the lead up to the federal election, both the Greens and ALP made commitments to support Trove and our national cultural institutions.

In the last few days, we’ve learnt that the Government will provide $16.4 million over four years to the NLA ‘for digitisation of material and upgrade of critical infrastructure for its Trove digital information resource and to upgrade other critical infrastructure’. While we wait to hear exactly what this means for the future of Trove, it’s important to remember that it comes after many cuts and job losses across the cultural sector. The lesson of #fundTrove is that we cannot take the future of our collecting organisations for granted. We need to show why they matter and fight for the resources they need.

Access is important — both its politics and its practicalities. This year I’ve tried to be a bit more rigorous in the way I share information and document my projects. I created a Digital Heritage Handbook where I publish workshops, activities, and other bits and pieces. Much of it is in draft form, but I decided it was better just to push everything out in the hope that it might be useful. Similarly, I created an Open Research Notebook to share work in progress. The Handbook also includes details of the two undergraduate units I taught in second semester — Working with collections, and Exploring digital heritage. I think they went pretty well, but I’ve got a few improvements planned for 2017.

This year I accidentally built my own version of Historic Hansard, created an interface to National Archives files we’re not allowed to see, and mined ASIO surveillance files for redactions. As well as these major projects, there were lots of little hacks and harvests aimed at exploring the idea of ‘access’. You can follow my main research obsessions in my notebook:

Talking and making details follow…

2016 — the making:

  • Locating Trove newspapers
    Updated code, data, and interface to geolocate and display Trove newspaper titles. Now with maps!
  • Headline Roulette
    Much needed update for my old game. Now on it’s own domain and with better handling of Trove API errors.
  • DFAT Documents
    Demonstration code to harvest the Department of Foreign Affairs and Trade’s collection of historical documents and extract some metadata. The harvested documents are available in Markdown format and can be explored through a simple website.
  • People of Australia
    @people_aus is a Twitter bot sharing random names drawn from late 19th and early 20th century naturalisation records held by the National Archives of Australia. Many names. Many cultures. These are the people of Australia.
  • RecordSearch Series Harvests
    Code to harvest the metadata and digitised images of all items in a series from the National Archives of Australia. Data from an assortment of harvested series are available as CSV files.
  • SRNSW indexes
    Code for harvesting indexes from the State Records of NSW website. Data from 59 harvested indexes is available as CSV files.
  • Facial detection demo
    Code and website to demonstrate the principles of facial detection using OpenCV.
  • Show Redactions userscript
    Code for inserting details of redacted files into RecordSearch results.
  • ASIO Experiments
    Code used for the extraction of redactions and other experiments with digitised ASIO files.
  • Redactions dataset
    Redactions extracted from ASIO surveillance records in National Archives of Australia Series A6119, <https://dx.doi.org/10.6084/m9.figshare.4101765.v1>
  • Non redactions dataset
    False positives (non-redactions) extracted from ASIO surveillance records in National Archives of Australia Series A6119, <https://dx.doi.org/10.6084/m9.figshare.4104651.v1<
  • Redacted
    Web interface for exploring redactions extracted from digitised ASIO files. Includes a collection of redaction art.
  • Open with Exception browser
    Code and website providing an experimental browser for digitised ASIO files from the National Archives of Australia.
  • Invisible Australians browser
    Updated code and website providing an experimental browser for digitised records from the National Archives of Australia relating to the administration of the White Australia Policy. Now includes a landscape view for exploring records by their orientation.
  • Closed Access harvester
    Updated code for harvesting and analysing records from the National Archives of Australia with the access status of ‘closed’.
  • Closed Access dataset
    Complete dataset of records held by the National Archives of Australia that had the access status of ‘closed’ (withheld from public access) on 1 January 2016.
  • Closed Access website
    Public web interface for the exploration, analysis, and visualisation of ‘closed’ records in the National Archives of Australia.
  • RecordSearch Functions
    Code and documentation for analysing the performance of functions by Commonwealth government agencies over time, using data from the National Archives of Australia.
  • Commonwealth Hansard XML repository
    A repository of the (almost) complete proceedings of the Commonwealth House of Representatives and Senate from 1901–1980. This comprises several gigabytes of XML-formatted files harvested from the ParlInfo database.
  • Historic Hansard
    A public website that presents the proceedings of the Commonwealth House of Representatives and Senate from 1901–1980 in a form that is optimised for browsing and reading. It includes additional features such as indexes to people and legislation, and the integration of tools for text analysis and annotation. Documentation is also provided.
  • Trove Harvester
    Code and documentation to support the creation of large datasets for research and analysis from Trove’s digitised newspapers.
  • Gadfly front pages
    Code and documentation to demonstrate how to harvest page images from Trove’s digitised newspapers.
  • Trove Proxy
    Code and active proxy service that generates links to download PDFs from Trove’s digitised newspapers, and provides a https wrapper around the Trove API.
  • DIY Headline Roulette
    Code and documentation that makes it easy for anyone to create their own simple game using Trove’s digitised newspapers.
  • Radio National program data
    Updated dataset of programs broadcast on Radio National from 2000–2016 harvested from Trove.
  • PMs Transcripts repository
    Repository of more than 20,000 XML transcripts of speeches by Australian Prime Ministers harvested from the PMs Transcripts site.
  • UMA Ellis Photos
    Repository of data and images from a collection of political photos by John Ellis held by the University of Melbourne Archives. Harvested using the Trove API.

2016 — the talking

This work is licensed under a Creative Commons Attribution 4.0 International License.

Tim Sherratt Written by:

I'm a historian and hacker who researches the possibilities and politics of digital cultural collections.

Be First to Comment

Leave a Reply