• #Borderfarce, building, and a hacker’s reward

    Presented at the National Scholarly Communications Forum, Canberra, 7 September 2015. About 10 days ago I was part of what the Australian described as a ‘frenzy of hyperbole, hysteria and straightforward misinformation’ as Twitter responded to Border Force’s plans to patrol the streets of Melbourne in a crackdown on visa fraud. Like many others I […]

  • Coming soon — our first LOD-Book!

    I spoke recently at the LODLAM Summit and the New Factual Storytelling Symposium about my latest experiments in developing online historical narratives that embed rich structured data about people, places, events and resources. I also mentioned that there were plans afoot to publish a couple of substantial works using the framework. One of these will […]

  • Unremembering the forgotten

    Keynote presented at DH2015, 3 July 2015. Full slides available on SlideShare.   This, you might be surprised to learn, is not the first time that Australia has welcomed some of the world’s leading thinkers to its shores. Just over a hundred years ago, the British Association for the Advancement of Science held its annual meeting […]

  • Asking better questions: History, Trove and the risks that count

      This is the published version of my chapter ‘Asking better questions: History, Trove and the risks that count’ in the book CopyFight, edited by Phillipa McGuiness. It’s reproduced here with the permission of the publishers. You can buy a copy of the book from NewSouth Press or just about any bookstore. You can also download […]

  • My two lives

    My blog hasn’t quite caught up the fact that I now have two jobs to go along with my two lives. Monday to Wednesday I’m still part of the Trove management team at the National Library of Australia, but on Thursdays and Fridays I’m Associate Professor of Digital Heritage in the Faculty of Arts and […]

  • Stories for machines, data for humans

    Presented at the New Factual Storytelling symposium, 10 April 2015, University of Canberra I feel like the nerdy kid at the cool kids’ party. There are lots of interesting and creative projects on show today and I… well I want to talk to you about metadata. Data. According to some pundits it’s the new oil, […]

  • Myths, mega-projects and making

    Keynote presented by video at EuropeanaTech 2015, 13 February 2015. The video of this presentation is available on Vimeo and the slides are on SlideShare.   In the aftermath of World War II, Australian hopes for a new era of national progress were expressed in a massive engineering project called the Snowy Mountains Scheme. The project promised […]

Download PDF

#Borderfarce, building, and a hacker’s reward

Presented at the National Scholarly Communications Forum, Canberra, 7 September 2015.

About 10 days ago I was part of what the Australian described as a ‘frenzy of hyperbole, hysteria and straightforward misinformation’ as Twitter responded to Border Force’s plans to patrol the streets of Melbourne in a crackdown on visa fraud.

Like many others I responded to the announcement with anger, sadness, and disbelief. Was this what we had become?

But the agency’s actions also generated well-deserved ridicule as the hashtag #borderfarce quickly gained ground.

Some months ago, in the wake of ‘Operation Sovereign Borders’, I created a Twitter bot and web app that generate random operation names. Their professed aim is to ‘protect Australia from meaning’. So of course, Operation Random Words quickly leapt into the fray in support of Border Force.


More pointedly, I shared a slide from my recent keynote at the international digital humanities conference where I drew a connection between the Immigration Department’s use of the term ‘illegal maritime arrivals’ and the category of ‘prohibited immigrants’ employed under the White Australia Policy. Then and now.

It’s probably worth pointing out that this was easy because I’d posted the full text on my talk online within a hour of delivering it in July.

Of course I wasn’t the only one pointing out historical connections. Evan Smith, from Flinders University, issued a steady stream of tweets that linked to literature on immigration and the control of borders.


Kate Bagnall, a historian of Chinese Australia, noted further parallels with the actions of Customs officers enforcing the White Australia Policy. I shared Kate’s article about one legitimate resident’s arrest by Customs officers suspicious that he seemed a little too Chinese.

For those venturing onto the streets of Melbourne, I helpfully offered a source of suitable identity papers – linking to a test interface Kate and I are building to help navigate through many thousands of documents relating to the White Australia Policy that we’ve harvested from the National Archives of Australia’s online database.


Ok, so none of these activities had any significant impact on the unfolding of events – although one of Evan’s tweets did make it into a summary of the social media frenzy published by the Sydney Morning Herald. The historical and political dimensions of the debacle were clearly laid out by Mark Finnane in an article for The Conversation a few days later.

But the sense of immediacy was compelling. Something was happening, and we were part of it. The digital tools at our disposal give us the opportunity to mobilise scholarly research and resources and make them available within existing social spaces, in the context of daily events. Communication can come in the form of rapid, pointed, small-scale, ephemeral interventions.

Just the day before Border Farce I realised that an article had been published in Australian Historical Studies that makes heavy use of an online tool I created. QueryPic visualises search queries within the massive collections of digitised newspapers available through Trove and Papers Past. It’s simple, but surprisingly useful.

I created the first version of QueryPic in 2011 and it’s been through several incarnations since. It’s been developed without any sort of institutional support or funding. I built it because I was interested in exploring and encouraging new ways of working with large digitised collections – and I thought others might find it useful.

QueryPic has been cited in other publications, but I was still pretty excited to see it pop up in Australian Historical Studies. What was disappointing, however, was that while all manner of other scholarly works were fully cited, QueryPic was without an author.


I don’t know if this was a matter of style or merely an oversight, but it does reflect the way that many digital projects are treated. Anyone who’s created a complex historical database, or designed a tool for historical analysis, knows that there are many judgements to be made, models to be constructed, concepts to be defined and tested. And yet this work is often thought to support, rather than constitute, research.

My own research practice is framed around building. By creating tools, resources, interfaces, visualisations, artworks and games, I pursue questions about our relationship with the past. As in any research, I’m frequently surprised by the results. I do this in public because communication and use are central to my practice, even though it doesn’t necessarily result in conventional research outputs.

Of course I’m not alone in this. Just last week the American Historical Association appointed a Digital History Working Group to advise institutions on how they should recognise and reward digital work. This follows the release of a set of guidelines for the evaluation of digital scholarship in history. The guidelines assert:

Work done by historians using digital methodologies or media for research, pedagogy, or communication should be evaluated for hiring, promotion, and tenure on its scholarly merit and the contribution that work makes to the discipline through research, teaching, or service.

I suppose it should be obvious, but recognition and reward are still often linked to specific forms of scholarly production. Why shouldn’t the design, management and completion of a large complex digital project be seen as the equivalent of a book? As the guidelines note, use of digital technologies can challenge the nature of what we do. It’s not just a matter of fiddling with the boundaries, but also reconsidering the shape and structure of academic disciplines and the values that maintain them.

The digital humanities community has strong links with the Open Access and Open Source software movements, demonstrated through its commitment to sharing. From code to courses, publications to project management, follow a bunch of digital humanists on Twitter and you’ll see a constant exchange of ideas and resources. It’s a commitment to openness, but it’s also an expression of scholarly solidarity and commitment. It’s something that guides my own engagement with the academic community and beyond.

Just this week I shared some code and data relating to publicly available ASIO files held by the National Archives of Australia. I’ve harvested the metadata of nearly 12,000 files and downloaded almost 300,000 page images. BorderFarce highlighted historical connections across immigration surveillance and control, and I’m keen to see what structures and patterns emerge from large-scale analysis of both the ASIO and White Australia files.

But why keep it all to myself? The metadata is all available from my GitHub site and if you have 70gb free I can share the images as well. I’d like to try and build an informal research network, not just mine the data for publications.

My bio used to say that I was a ‘digital historian’. I’ve changed that recently to just ‘historian and hacker’ because it’s the approach rather than any specific technology that’s most important to me. The hacker ethos is collaborative, critical and constructive. It’s about diving in and trying to make things better. I want to develop a framework that supports other hackers – to help them build careers, pursue their ideas, manage their risks, and be recognised as scholars.

Those of us privileged to have some measure of security and recognition should help create safe spaces for experimentation in content, methodology and disciplinary formation. As the AHA guidelines note, it’s a shared responsibility – one that is expressed through many routine acts of review, assessment, or advice.

I want a scholarly practice that has room for the angry and the weird alongside the rigorous and detached. That sees in digital technologies not just the chance to crunch huge quantities of data, but the opportunity to tinker with our preconceptions, to be playful and political, to explore emotions as well as evidence, to create bots as well as books.

It might only seem like a tweet, a zip file or a blog post, but in a world that offers us #borderfarce, each small intervention carries with it an opportunity to pursue scholarly practice beyond traditional boundaries, and create something different. I think that’s an opportunity we should nurture.

Coming soon — our first LOD-Book!

I spoke recently at the LODLAM Summit and the New Factual Storytelling Symposium about my latest experiments in developing online historical narratives that embed rich structured data about people, places, events and resources. I also mentioned that there were plans afoot to publish a couple of substantial works using the framework. One of these will be by my partner in crime, Kate Bagnall, and I thought you might like a preview. Coming soon(ish) to a website near you…

James Minahan’s Homecoming

By Kate Bagnall

In 1908 James Minahan arrived in Australia after 26 years in China. Born in Melbourne, as a small boy he was taken by his Chinese father to live in his ancestral village in rural Guangdong to be educated in Chinese. Growing up in China James remembered little of the country of his birth or of his Irish-Australian mother, but Australia was always part of his plans for the future. His father had kept shares in the store he had run on the Indigo goldfields in the 1870s and, as an adult, James planned to return there.

A scene in Shek Quey Lee village, where James Minahan lived  from 1882 to 1908. This abandoned house was built with Australian remittances.

A scene in Shek Quey Lee village, where James Minahan lived from 1882 to 1908. This abandoned house was built with Australian remittances.

What he found on his arrival in the newly federated Australia, however, was that being born in Australia did not guarantee a right to live there if you were deemed to be Chinese. No longer able to speak the English of his Victorian childhood, James was made to sit the infamous Dictation Test, which he failed. After he was arrested as a prohibited immigrant, the Chinese community rallied around him and began legal proceedings to have him released. James Minahan’s case made it all the way to the High Court of Australia, who ruled in his favour. He was allowed to remain in Australia.

A century later, the High Court’s decision in Potter v. Minahan continues to inform the legal interpretation of what it means to be an Australian. Who is an immigrant? Who is a member of the Australian community? Who belongs? Who doesn’t? In James Minahan’s Homecoming historian Kate Bagnall tells the story behind the Potter v. Minahan High Court case for the first time.

Unremembering the forgotten

Keynote presented at DH2015, 3 July 2015. Full slides available on SlideShare.


This, you might be surprised to learn, is not the first time that Australia has welcomed some of the world’s leading thinkers to its shores. Just over a hundred years ago, the British Association for the Advancement of Science held its annual meeting in Australia. In earlier years the Association had journeyed to Canada and South Africa, but this was it’s first tour of Australia. One senior Australian scientist heralded the Association’s arrival as ‘a great event in the history of Imperial unity’.


More than 300 scientists made the trip, including such notables as Ernest Rutherford and William Bateson. I’m a little embarrassed to admit that their travel was heavily subsidised by the Federal government. But then, it did take them more than a month to get here. Think about that on your flight home.

The eminent Australian geologist Edgeworth David described the Association’s visit as ‘an epoch making event’. He expected Australian researchers to be ‘strengthened and confirmed’ in their work, reaffirmed through the ‘inspiration which comes alone from personal contact with master minds’.

It was also an occasion to celebrate the ideals of science. War had been declared while the scientists were at sea, but events proceeded nonetheless with delegates barnstorming across the country from Adelaide to Melbourne, Sydney and Brisbane. The spirit of proceedings was summed up in Melbourne where the presentation of an honorary degree to the German geologist Johannes Walter was greeted with a ‘perfect storm of applause’. ‘Truly science knows not distinction between belligerent and belligerent’, noted one newspaper. Australia’s Governor General, Sir Ronald Munro Ferguson, welcomed the scientists with the observation that the looming dangers of war had at least ‘enabled them to realise that all men of science were brothers’.

And of course, they were mostly men.

If you’d like a bit of data around that, you can grab a digitised copy of the report of the meeting from the Internet Archive and run a script over the list of members, grouping them by title – Miss, Mrs and Lady. Here’s what you get.


You can do the same for the people who joined the Association at one of the Australian venues.


Ok, so this 10 minute analysis might not show anything unexpected, but I love the fact that with a digitised text and a few lines of Python I can ask a question and get an almost instant answer.

What the official report doesn’t say is that despite these proclamations of scientific brotherhood, not all German scientists were welcome in wartime Australia. Those who extended their stay beyond the meeting dates fell under suspicion.

Two of them, Fritz Graebner and Peter Pringsheim, were interned as suspected spies and imprisoned for the remainder of the war. The press which had fawned over the travelling savants now railed against these ‘scientists in disguise’ whose ‘supreme act of treachery’ was undoubtedly part of a German plot to capture Australia. The Minister of Defence noted that the case emphasised the ‘real and pressing nature’ of the wartime emergency. Honorary degrees awarded to two German scientists by the University of Adelaide were expunged from the record.


At this point I feel I should warn all our international visitors that legislation introduced in recent years to combat the so-called ‘war on terror’ has added new limits to our freedom of speech and movement. We are all under suspicion.

The German scientists were interned alongside many thousands of others. Most had had no charges brought against them. Many were naturalised British subjects, or Australian-born of German descent. Australia was their home. That didn’t stop the government repatriating many of them to Germany at war’s end.

To orient them on their antipodean adventure, visitors for the British Association meeting were supplied with specially-prepared handbooks that described conditions in Australia. At a time when violence against Indigenous people was still common along the frontiers of settlement, the Commonwealth Handbook informed visitors that Australian Aboriginals ‘represent the most backward race extant’.

Australia was big, but its population was small. The Commonwealth Handbook noted the challenges of maintaining ‘control of so large a territory by a mere handful of people’, pointing to the significance of the ‘White Australia’ policy in avoiding the ‘difficulties’ of ‘heterogenous’ populations. Chris Watson, who served as Australia’s first Labor prime minister a decade earlier, expanded on this theme in the NSW Handbook. Concerns about the financial impact of ‘coloured’ labour, he explained, had been fused with an ‘abhorrence of racial admixture’ to create ‘practically a unanimous demand for a “White Australia”’. ‘White Australia’ was both an ideal and an obligation, an opportunity and a threat. Watson observed:

The aboriginal natives are numerically a negligible quantity, so there is every opportunity for the building up of a great white democracy if the community can maintain possession against the natural desire of the brown and yellow races to participate in the good things to be found in the Commonwealth. That the Asiatic will for ever tamely submit to be excluded from a country which, while presenting golden opportunities, is yet comparatively unpeopled, can hardly be expected. Therefore Australians are realising that to maintain their ideals they must fill their waste spaces and prepare for effective defence.

Welcome to Australia a hundred years later where we remember 1914 not for its institutionalised racism, but because it marked the beginning of a war that has come to be strongly associated with ideas of Australian nationhood.

You have arrived here amidst the ‘Anzac Centenary’ which, the official website notes, ‘is a milestone of special significance to all Australians’. It must be true because, according to the Honest History site, we’re spending more than half a billion dollars on commemorative activities. That’s a lot of remembering.

Amidst the travelling roadshows, the memorials, the exhibitions, and the rolling anniversaries, are of course many worthy digital projects. Some of these will provide new access to war-related collections, or gather community content and memories. They will result in important new historical resources. But who are we remembering and why? As a historian and hacker, as a maker of tools and a scraper of sites, I want today to poke around for a while amidst the complexities of memory.


It’s not all about the war. Recent decades have brought attempts to remember more difficult histories. Peter Read coined the phrase ‘stolen generations’ to draw attention to the devastating effects of official policies that resulted in the forced removal of Indigenous children from their families, through until the 1970s. The damaging experiences of children in institutional care, the ‘forgotten Australians’, have also been opened to scrutiny. Both of these have brought official apologies from the Commonwealth government. Even now, almost every day brings more horrifying testimony as the Royal Commission into Institutional Responses to Child Sexual Abuse continues its hearings.

In each case we have learnt to our shame of continuing failures to protect the most vulnerable in Australian society – children.

Often these investigations are cast as attempts to bring to the surface forgotten aspects of our history. But to those who suffered through these events, who have continued to live with the consequences, they have never been far from memory.

Nor have they been entirely lost to the historical record. One of the responses to these inquiries has been to discover, marshal and deploy existing archival resources. The National Archives of Australia created an exhibition based on the experiences of some of the Stolen Generation. They also developed a new name index to their collections to help Indigenous people reconnect with their families through official records.

The eScholarship Research Centre at the University of Melbourne drew on its experience in documenting a wide variety of archival collections to create Find & Connect – a web resource that assembles information about institutional care in Australia and assists care leavers in recovering their own stories. Official records have been supplemented by oral history programs and other collecting initiatives to ensure that these memories are secure.

Such histories are ‘forgotten’ not because they are unremembered or undocumented, but because they sit uncomfortably alongside more widely promulgated visions of Australia’s past. As researchers on the Find & Connect project noted, the stories of care leavers ‘did not “fit in” with the narratives in the public domain. Their memories were “outside discourse”’.1 Remembering the forgotten is not just a matter of recall or rediscovery, but a battle over the boundaries of what matters.

Libraries, archives and museums are often referred to as memory institutions. Rhetorically it can be a useful way of positioning cultural institutions in respect to structures of governance and assessments of public value. The idea of losing our memory, whether as a society or an individual, is frightening.

But there are contradictions here. We frequently talk about memory in terms of storage – the ability of our technologies to tuck away useful pieces of information for retrieval later. There’s the ‘M’ in RAM and ROM, the fields in our database, our backups in the cloud. Memory is an accumulation of key/value pairs. Each time we query a particular key, we expect to get the same value back.

Memory, as we experience it, is something quite different. It’s fragmentary, uncertain, and shaped by context. The process of recall is unpredictable and sometimes disturbing – memories are often triggered involuntarily. Within a society memories are contested and contradictory. Who controls the keys?

Cultural institutions are trying to respond to this complexity. On the one hand they offer the security of authority – sources to be trusted in world overflowing with information. But they are also looking for ways of capturing and representing alternative voices.

I think we can help with that.


Both in my work at Trove and my own noodling about I use the word ‘access’ a lot. But the more I use it the more I suspect it really doesn’t mean very much. What does it say that we now distinguish between ‘open’ and ‘closed’ access?

We tend to think of ‘access’ as the way we get to stuff. It’s the pathway along which we can explore our cultural collections. But as Mitchell Whitelaw argues, one of our primary means of access, the common or garden variety search box, constrains our view of the resources beyond.  Search provides not an open door, but a grumpy ‘Yes, what?’

I’d suggest that these sort of constraints don’t stand in the way of access, they construct it. Through legislation, technology, and professional practice, through the metadata we create and the interfaces we build, limits are created around what we can see and what we can do. Access is a process of control rather than liberation.


In 1952, in another notable act of ‘imperial unity’, Britain exploded an atomic bomb off the coast of Western Australia. A further 11 atomic tests were carried out here, most at a mainland testing site called Maralinga in South Australia. As a young research student in 1984, the British atomic tests introduced me both to the gloriously rich collections of the National Archives of Australia, and to the contradictions of access.

Under the Archives Act, most government records are opened to the public after 20 years (this was reduced from 30 years in 2010). However, before they are released they undergo examination to see whether they contain material that is exempted from public access – for example any secret squirrel business that could endanger our national security. The access process can therefore result in records that are ‘closed’ or ‘open with exception’.

What does ‘closed’ access look like? A few weeks ago I harvested details of all the files in the National Archives’ online database that have the access status ‘closed’. The records include the reasons why the files remain restricted. If you group them by reason, you can see that the most common grounds for restriction is Section 33(1)(g) of the Archives Act which seeks to prevent the ‘unreasonable disclosure of information relating to the personal affairs of any person’. Fair enough. Coming second is the rather less obvious category of ‘withheld pending advice’. These are files that have gone back to the government agencies that created or controlled them to check that they really can be released. So they’re actually part way through the process.


Using the contents dates of the files we can see how old they are. Section 33(1)(a) of the Archives Act exempts records from public scrutiny if they might ‘cause damage to the security, defence or international relations of the Commonwealth’. Most of the records closed on these grounds are over 50 years old, with a peak in 1956.


And here’s a word cloud of the closed file titles from 1956. I’m sure that we all feel a lot safer knowing all those Cold War secrets are still being protected.


Back in 1984 I asked for some of those secret files to be opened so I could write my Honours thesis on the role of Australian scientists in the British Atomic tests. A number of the files I was interested in went off to agencies for advice, and some even made their way to the British High Commission. Being young, optimistic, and on a deadline, I wrote to the British High Commissioner asking if anything could be done to speed the process up.


I received a very polite reply explaining that they were obligated under the Nuclear Non-Profileration Treaty to make sure that they didn’t unleash any atomic bomb secrets upon the world. This was hilariously and tragically ironic, as the argument of my thesis was that the British government withheld information from their Australian hosts to curry favour with the USA. There was no way that atomic bomb plans would be in Australian government files. Yeah – hilarious.

Access is political. Cassie Findlay has contrasted the Australian government’s processes for the release of records with the creation and use of the WikiLeaks Cablegate archive.2 Cassie argues that the ‘hyper-dissemination’ model of WikiLeaks, through which large volumes of material are shared across multiple platforms, creates a ‘pluralised archive’ that ‘exists beyond spatial and temporal boundaries, transcends state and economic controls and encourages and incorporates people’s participation and comment’. Instead of gatekeepers and reading rooms there are hackers and torrents.

Traditional forms of access are often celebrated as if they are a gift to a grateful nation. As Cassie notes, the release of Cabinet documents by the National Archives is a yearly ritual where stories of 30 year old political manoeuvring are mixed with the comforts of nostalgia. But with each release more files are closed, withheld from public access. The workings of a bureaucratic process developed to control the release of information is recast as an opportunity to party. Invested with the cultural power of the secret and the political weight of national security, access itself becomes mysterious and magical.

We are left to ponder such wonders as: ‘Named country [imposed title, original title wholly exempt]’.

At the same time, governments are pumping out ‘open data’, bringing the promise of greater transparency, and new fuel for the engines of innovation. But for all its benefits, open data isn’t. It only exists because decisions have been made about what is valuable to record and to keep – structures have been defined, categories have been closed. As Geoffrey Bowker and Susan Leigh Starr remind us, the definition, elaboration and enforcement of categories lies at the heart of bureaucracy and the infrastructure of the state.3 Data is not just a product of government, it is implicated in the workings of power.

Chris Watson’s vision of a White Australia was, by 1914, well established as a system of bureaucratic surveillance and control. The Commonwealth Handbook benignly noted that ‘an immigrant may be required to pass a dictation test before being admitted into the Commonwealth’. It added that ‘in general practice this test is not imposed upon persons of European race’. The dictation test was a mechanism of exclusion. Any intending immigrant deemed not to be ‘white’ would be subjected to the dictation test and they would fail. But there were already many people born or resident in Australia of Asian descent. If they wanted to travel overseas they were forced to carry official documents to protect them from the application of the dictation test – otherwise they might not be allowed to return home. Many thousands of these documents are now preserved in the National Archives of Australia. With portrait photographs and inky-black handprints, they are visually compelling, and disturbing, documents. They need to be seen.


A few years ago, my partner Kate Bagnall and I harvested thousands of these documents from the National Archives website, ran them through a facial detection script and created ‘The Real Face of White Australia’.


You might have seen it before. It’s been widely cited, and it’s probably one of the main reasons I’m standing here today. For Kate and me this was part of our ongoing attempts to use the bureaucratic remnants of the White Australia policy to reconstruct the lives of those who lived within its grasp. But it’s also an example of the complications of access.

In the past I’ve tended to gloss over the hardest part of this project – just harvesting those 12,000 images. It was only possible because I’d spent a lot of time, over a number of years, wrestling with RecordSearch, the National Archives’ online database. I think it was back in 2008 that I wrote my first Zotero translator to extract structured data from RecordSearch. It was one of those Eureka moments. Although I’d been developing web applications for a long time, I hadn’t really thought of the web as a source to be mined, manipulated and transformed. I could take what was delivered in my browser and change it.

Thanks to the Bill Turkel and the Programming Historian, I taught myself enough Python to be dangerous and was soon creating screen scrapers for a variety of sites – taking their HTML and turning it into data. I was no longer bound to a particular interface. The meaning of access had changed.

But screen scrapers are a pain. Sites change and scrapers break. I don’t know how many hours I’ve spent inspecting RecordSearch response headers, trying to figure out where my requests were going. I’ve given up several times, but always gone back, because there’s always more to do.

Amongst the enthusiasm for open data there’s perhaps a tendency to overlook the opening of data – the way that hackers, tinkerers, journalists, activists and others have been stretching the limits of access.

The various projects of the Open Australia Foundation are a great example of this – they’ve even established their own public scraping framework called Morph, to share both the code and the data that’s been liberated from websites and pdfs.

The Australian Parliament recently passed changes to the Copyright Act that will enable copyright holders to apply to the Federal Court to block piracy-related websites. Of all the changes needed to copyright, this is the one that went to Parliament.

But what I love is that even before the legislation had passed, even before the first application has been made or site blocked, there was a website and Twitter account ready to document and publicise any site-blocking orders – created not by government, but by an ABC journalist.

Archivists Wendy Duff and Verne Harris have talked about records ‘as always in the process of being made’, not locked in the past but ‘opening out of the future’.4 Cassie Findlay similarly notes that the Cablegate archive ‘is still forming’. She argues for models of participation and access around archives that open ‘more directly from the affairs that they document’.

The act of opening – records, archives, sources – is contingent and contextual. It creates a connection between inside and outside, past and present, us and them. What we do with that connection is up to us.


What would have happened if instead of hearing about ‘prohibited immigrants’, instead of seeing ‘wanted’ posters of escaped Chinese seaman, Australians in 1914 had seen something like our wall of faces?

What would happen now if instead of hearing about ‘illegal maritime arrivals’ (IMAs) we were exposed to the stories of those who arrive in Australia in search of asylum?


Access will never be open. Every CSV is an expression of power, every API is an argument. While I would gladly take back the time I’ve spent wresting data from HTML I recognise the value of the struggle. The bureaucratic structures of the White Australia policy live on in the descriptive hierarchies of the National Archives. To build our wall of faces we had to dismantle these structures – to drill down through series, items, documents and images until we found the people inside. I feel differently about the records because of that. Access can never simply be given, at some level it has to be taken.



In 1987 I ended up outside the gates of Pine Gap, a US intelligence facility near Alice Springs, dressed as a kangaroo. Having finished my honours thesis on the British atomic tests, I couldn’t ignore the parallels between the bombs and the bases. I even organised a conference entitled, ‘From Maralinga to Pine Gap: The historical fallout’. I remember pulling over on the road to Alice Springs because there was one point where you could just glimpse the top of one of the white domes that protected Pine Gap’s receivers. It was a pretty thrilling moment.

Now you can just type ‘Pine Gap’ into Google Maps and there it is.


It’s still secret, it’s still gathering unknown quantities of electronic intelligence, but last time I checked it also had 21 reviews and an average rating of 3.6 stars. Keep it in mind for your next Aussie holiday!


Digital tools enable us to see things differently – to demystify the secret, to expose patterns and trends locked up in tables, statistics, or cultural collections.

Mapping Police Violence, for example, displays your chances of being killed by police in the US based on your location. It also presents the photos and details of more than 100 unarmed black people killed by police in 2014.

@CongressEdits is a Twitter bot, created by Ed Summers, that tweets anonymous edits to Wikipedia made within the US Congress. A similar bot exists for Australian state and federal governments.

I love the way that Twitter bots, in particular, can play around with our ideas of context and significance. I’ve created a few myself that automatically tweet content from Trove, and I’m interested in what happens when we mobilise cultural collections and let them loose in the places where people already congregate. Steve Lubar argues that ‘the randomness of the museumbot calls attention to the choices that we take for granted’. Twitter bots can challenge the sense of control and authority that adhere to our collection databases.

But bots can be more. Mark Sample’s important essay on ‘bots of conviction’ explores the possibilities for protest and intervention. He describes protest bots as ‘tactical media’ creating ‘messy moments that destabilise narratives, perspectives, and events’. Wendy Duff and Verne Harris warn archivists of the dangers of the story in disguising the exercise of power, in stealing from individuals what they need to construct their own narratives – ‘space, confusion, [and] a sense of meaninglessness’. Against the brutal logic of the state, a bot’s algorithmic nonsense can help us to see differently, to feel differently.

Caleb McDaniel’s bot @Every3Minutes is an example of how powerful these interventions can be. Working from estimates of the volume of the slave trade in the American South, it tweets a reminder every three minutes – a person was just traded, a child was just bought – often with links to historical sources. Mark Sample notes that ‘it is in the aggregate that a protest bot’s tweets attain power’ and it is through simple, unyielding repetition that @Every3Minutes reaches us. As Alex Madrigal noted: ‘To follow this bot is to agree to reweave the horrors of slavery into the fabric of your life’.

My own protest bot is trivial by comparison to Mark or Caleb’s work. @OperationBot merely assembles random words to create new names for national security operations. It’s a bot born of frustration and fury as the Australian government responded to the plight of asylum seekers by launching ‘Operation Sovereign Borders’. As @OperationBot proudly proclaims, its aim is to ‘protect Australia from meaning’.

Perhaps more significant than @OperationBot’s supposed subversions is the fact that I could create it in a couple of hours sitting in front of the TV. Digital skills and tools allow us to try things, to create and experiment, without any expectation of significance or impact.

One of the more controversial sessions at the British Association meeting in Australia in 1914 was devoted to the structure of the atom. Ernest Rutherford reported on experiments that pointed to the now familiar model where the atom’s mass is concentrated in a tiny, central nucleus. Firing charged particles at a thin sheet of gold foil, Rutherford, Geiger and Marsden had expected the particles to pass through largely undeflected. But some bounced back. As Rutherford later noted: ‘It was almost as incredible as if you fired a 15-inch shell at a piece of tissue paper and it came back and hit you’.

I wonder if that’s what we’re doing – firing off experiments into the net, waiting for one to hit something solid and bounce back. PING!

@Every3Minutes – PING!

The Real Face of White Australia – PING!

In 2012 Kate and I received an email from Mayu Kanamori, an artist researching the life of an early Japanese Australian photographer. She described her reaction to the Real Face of White Australia:

When I scrolled down the Faces section of your website, browsing through the faces, tears welled up, and I couldn’t stop crying as if some sort of flood gates had been removed.

We knew that that the documents and the images were powerful, but displaying the faces on that seemingly endless scrolling wall did something more than we were expecting.

Jenny Edkins has been exploring the politics of faces, and she suggests that alongside our attempts to ‘read’ portrait photographs we also respond in a more visceral fashion, provoking responses such as ‘guilt, obligation, and reciprocity’.5 Like the ‘messy moments’ of protest bots, she argues that the connections we make through photos of faces can disrupt the ‘linear narrative temporality’ on which sovereign power depends. We are connected through time, not with history, not with the past, but with people. And that has implications.


Last year I tried extracting faces, and eyes within those faces, from photos I’d harvested via Trove’s digitised newspapers. The result was Eyes on the Past. It presents a random selection of eyes, slowly blinking on and off. Clicking on an eye reveals the full face and the source of the image. Where the Real Face of White Australia overwhelms with scale and meaning, Eyes on the Past is minimal and mysterious. Eyes on the Past emphasises absence, and the fragility of our connection with the past, even while it provides a new way of exploring the digitised newspapers. Perhaps the best thing about it is the range of responses it has provoked – from those who found it beautiful, to those who thought it was just creepy.


More recently I’ve been playing around with the possibility of connection, and creepiness, through The Vintage Face Depot. Tweet a photo of yourself to @FaceDepot and a bot will select a face at random from my collection of newspaper images and superimpose that face over yours – tweeting you back the result. It sounds stupid, and it probably is. I’m still waiting for it to go viral like Microsoft’s age detection thing. But sometimes… PING!

One night I started fiddling with the transparency of the superimposed images. All of a sudden I could see the colour of my face showing through. I could see my glasses on this face from the past.



Experimenting on Kate, I saw the blue of her eyes peering through eyes of another person. Again, the potential is there to mess around with the barriers that put some people on the other side of this wall we call the past – to explore what Devon Elliot suggested on Twitter was an ‘uncanny temporal valley’.


The Australian historian Greg Dening has argued:

Nothing can be returned to the past. Not life to its dead. Not justice to its victimised. But we take something from the past with our hindsighted clarity. That which we take we can return. We disempower the people of the past when we rob them of their present moments.

There is no open access to the past. There is no key we can enter to recall a life. I do this sort of stuff not because I want to contribute to some form of national memory, but because I want to unsettle what it means to remember – to go beyond the listing of names and the cataloguing of files to develop modes of access that are confusing, challenging, inspiring, uncomfortable and sometimes creepy.

Perhaps my favourite experiments are a couple of simple userscripts. They sit in your browser and change the behaviour of Trove and RecordSearch.


Instead of pulling faces out of documents, they put them back in. Instead of seeing lists of search results, you see the people inside. Like the faces on our wall the people bubble up though the interfaces. They are present.


Despite the apparent enthusiasm for the visit of the British Association in 1914, there was in Australia a lingering suspicion of scientists as ‘impractical dreamers’, as mere theorists unwilling to address the nation’s most urgent needs. In debates over the application of knowledge to Australian development, the scientist commonly battled it out against the supposed virtues of the ‘practical man’.

I imagine my grandfather, Henry Sherratt, was a practical man. He was a brass moulder with a workshop in Brunswick, a suburb of Melbourne. His father and brother, both brass workers, lived and laboured nearby. I have a small brass ashtray that Henry made.


Henry’s name isn’t amongst those who joined the British Association in Melbourne, though perhaps he attended one of the ‘Public or Citizens Lectures’ which, until the 1911 meeting, had been known as ‘Lectures for the Operative Classes’. Neither is Henry’s name amongst those who journeyed to the battlefields of Europe and the Middle East. He is not one of those honoured by the Anzac Centenary for having ‘served our country and worn our nation’s uniform’. And yet he went to war.

Henry Sherratt was amongst a select group of tradesmen who travelled to Britain in 1916 to help meet the desperate need for skilled workers in munitions factories. He worked as a foreman brass moulder in Scotland, before an accident in which he ‘strained his heart’ carrying a ladle of molten iron. He never really recovered and as his income suffered, so did his family at home. Henry finally returned in 1919 and was offered £50 compensation with no admission of liability. He died in 1955. I never knew him.

Who do we remember and why?


The Commonwealth Bureau of Census and Statistics reported that 159 people died as a result of industrial accidents in 1914. But these were only the accidents that had been reported under the provisions of state legislation. There must have been more. Where is their memorial? What about mothers who died in childbirth, or victims of domestic violence? How do we remember them?

At a recent workshop organised by Europeana, Lucy Delap described how her project to historicise child sex abuse in 20th century Britain was making use of digitised newspapers. As well as documenting individual cases, the researchers hope to create ‘a map of change over time in the reporting of child sexual abuse’ that would enable them to test theories about how different organisations respond to abuse.

In the week that the British Association met in Melbourne, newspapers tell us that David Phillips, an engine driver, was fatally injured at Flinders Street Station. I’m thinking about how we might use Trove’s digitised newspapers to collect the stories of those who went off to work, but never returned. What might we learn about economic history, unionism, industrial legislation – about the value we place on an individual life?

As I’ve often said in regard to our work on the White Australia records – it just seems too important not to try.

As I was writing this talk I was also keeping an eye on my harvesting scripts which were chugging away, pulling down more images from the National Archives. For the original wall of faces I downloaded about 12,000 images from one series, I’ve now got more than 150,000 from about 10 series. You’ll see more of that soon.

As I was writing this talk I stopped at various times to play around with code – to look at the gender balance at the British Association, to investigate ‘closed’ files in the National Archives, to create a public Face API for anyone to use. The code and apps are all out there now for you to play with or improve.

Writing, making, thinking, playing, sharing. It all happens together. I’m a maker like my grandfather. While he poured metal I cut code. I do it because I want to find ways to connect with people like him, ordinary people living their lives. Those connections will always be fleeting and fragile, lacking the certainty of commemoration, but hopefully bearing some of the meaning and complexity of memory.

It’s a task that needs to be both playful and political. It’s not about making things, but trying to make a difference.






Internet Archive




  1. Shurlee Swain, Leonie Sheedy, and Cate O’Neill, ‘Responding to “Forgotten Australians”: historians and the legacy of out-of-home “care”’, Journal of Australian Studies, vol. 36, no. 1, 1 March 2012, pp. 17–28. <doi:10.1080/14443058.2011.646283> []
  2. Cassie Findlay, ‘People, records and power: what archives can learn from WikiLeaks’, Archives and Manuscripts, vol. 41, no. 1, 2013, pp. 7–22. <http://www.tandfonline.com/doi/abs/10.1080/01576895.2013.779926> []
  3. Geoffrey C. Bowker and Susan Leigh Star, Sorting Things Out: Classification and Its Consequences, MIT Press, 2000. []
  4. Wendy M Duff and Verne Harris, ‘Stories and names: Archival description as narrating records and constructing meanings’, Archival Science, vol. 2, 2002, pp. 263–85. []
  5. Jenny Edkins, Face Politics, Routledge, 2015. []

Asking better questions: History, Trove and the risks that count


Buy the book from NewSouth Press!

Buy the book from NewSouth Press!

This is the published version of my chapter ‘Asking better questions: History, Trove and the risks that count’ in the book CopyFight, edited by Phillipa McGuiness. It’s reproduced here with the permission of the publishers.

You can buy a copy of the book from NewSouth Press or just about any bookstore. You can also download a pdf version of this chapter.



A few years ago historian Kate Bagnall and I created ‘The real face of White Australia’. In 1901 the Immigration Restriction Act gave legislative force to a system of racial exclusion and control that came to be known as the White Australia Policy. The bureaucratic remnants of this system survive today in the National Archives of Australia. But how can we find them? How can we see them? Our online experiment brings some of these lives, previously deemed out of place in a ‘white’ Australia, to the surface. Instead of documents, files or search results, all you see are faces – a continuous, scrolling wall displaying thousands of faces. It’s compelling, challenging and discomfiting. Some viewers were brought to tears.

The faces come from portrait photographs that were attached to official certificates. If non-white residents wanted to travel overseas, they needed special identity documents. Without them they could be refused entry on their return. They would not be allowed to come home. On the front of each certificate are photographs and basic biographical details, on the back is a palm print. Many thousands of these documents are preserved in the archives.

To extract the faces I reverse-engineered the National Archives’ online database to automatically harvest images of the documents. From just one series or group of records, I downloaded more than 12,000 images. Then I tinkered with some facial detection code until I was able to find and crop out the portraits. I ended up with 7000 faces – just a sample of the archives’ holdings, but enough.

The whole thing was a quick experiment, mostly completed over the space of a weekend, but its influence has been widely felt. The project has been cited in discussions around race, archives, visualisation and serendipitous discovery. It has been assigned for student reflection in university courses around the world and is regularly held up as an example of what the digital humanities has to offer. But whenever I give a talk about it in Australia, one question seems inevitably to arise – what is the copyright status of the images?

In a keynote address to New Zealand’s National Digital Forum in 2011, library technologist Michael Lascarides challenged those of us who work with digital cultural collections to ‘ask better questions’. When confronted with the remains of our racist history, when looking into the eyes of people whose lives were monitored and controlled by the government because of the colour of their skin, why should we feel compelled to consider the technicalities of copyright?

Surely there are better questions to ask?


I describe myself as a cultural data-hacker, but my business hours are currently spent on the other side of the fence, as the manager of Trove at the National Library of Australia.Trove is a discovery service that makes Australian resources easier to find and use. It’s a collection of collections, bringing together the holdings of many libraries, archives, museums, universities, government agencies and more. The most heavily used part of Trove is an ever-growing collection of digitised newspapers – the full text of more than 130 million articles documenting Australian history from 1803 onwards.

It’s hard to measure the cultural impact of Trove’s digitised newspapers. Technologies like optical character recognition (OCR) and keyword searching are now commonplace, but apply them to 150 years of Australian history and something transformative happens. Easy access to historical newspapers is changing our relationship with the past.

It’s not just about convenience – the ability to do your research at home in your pyjamas – although the significance of opening access to rural and remote communities across a large country like Australia shouldn’t be underestimated. It’s also about using the granularity of newspapers to expose the local, the particular, the personal and the ephemeral – glimpses of ordinary lives otherwise unrecorded.

Kate is a historian of Chinese Australia, interested in intimate relationships between Chinese men and white women. The people she studies often lived at the fringes of society and their lives can be difficult to recover. But searching across digitised newspapers she can find shards and fragments, stories of love and loss, full of the vivid, turbulent detail of everyday life. Each shard helps to build a bigger picture, a different view of Australian society and history.

Other researchers have mined the newspapers in pursuit of topics as diverse as invasive species, climate change, poetry and legal history. These uses will multiply as the corpus grows and our tools develop. Like a big telescope or a particle accelerator, digitised newspapers support large-scale fundamental research across a range of disciplines. Old papers have become a site for the creation of new knowledge.

But digitised newspapers are not solely the province of professional researchers. The size and diversity of their content support almost any interest, feed almost any passion. I recently harvested a sample of articles from the web that include links to Trove’s newspapers – 3116 webpages containing 13,389 links. We know that family and local historians make heavy use of the newspapers and their efforts are well represented in my sample. But there was more – sport, war, science, politics, architecture, music, art… from popular entertainment to academic treatises, from hateful diatribes to thoughtful reflections, they were all there.

More surprising than just the range of topics, styles and prejudices is the different ways the newspapers are used online. In 2013, for example, the local media in Western Australia reported on Cockburn City Council’s plan to erect a shark barrier at Coogee Beach. When one councillor expressed doubts, noting the lack of ‘serious or fatal shark attacks at Coogee Beach since records commenced in the 1800s’, a reader could quickly challenge her comments by citing two Trove newspaper articles that documented local attacks. References to the digitised newspapers are embedded online not just within narratives or compilations, but as commentary and debate. Trove provides a ready source of evidence to test historical claims without lengthy research or the mediation of experts.

Easy accessibility is helping to break down the otherness of the past, allowing it to be mobilised in contemporary discussions. New conversations between past and present are emerging around the digitised newspapers. Trove has launched us upon a massive ongoing experiment in collaborative meaning-making.
And it might never have happened.


Years before I was given the job as Trove manager, I was poking and prodding at the interface, trying to extract useful data to use in new applications, and generally making a nuisance of myself. Among the tools I created was QueryPic, a simple way of visualising newspaper searches. It’s been through several versions but the principle remains constant – just feed in your search query and QueryPic will create a line chart showing you the number of newspaper articles per year that match your query.

QueryPic is well used and has even been cited in scholarly articles, but I think its greatest value is as an example of what becomes possible when you make large quantities of cultural content available in digital form. Instead of the normal list of search results, QueryPic shows you trends and patterns. You can observe changes in language, the rise and fall of our cultural obsessions or the impact of major events. When did the ‘Great War’ become the ‘First World War’? Is that jolly Christmas visitor called ‘Father Christmas’ or ‘Santa Claus’? QueryPic helps you see things differently.

One of the most dramatic and unexpected patterns revealed by QueryPic is that Australian history ends in 1954. Who would have guessed? With very few exceptions, Trove’s collection of digitised newspapers comes to an abrupt halt in 1954, when the possibilities of the digital age meet the realities of copyright. You want to trace cultural patterns beyond 31 December 1954? Sorry, you’re out of luck.

Why 1954? We’re currently about halfway through the great AUSFTA culture drought. On 1 January 2005, the Australia–US Free Trade Agreement (AUSFTA) extended the standard period of copyright protection from fifty to seventy years and changed the way photographs are treated. We might have to wait until 2025 before Trove’s newspapers can start edging forward, year by year, beyond 1954.

But it’s even more complicated than that, as there’s no certainty that newspaper articles published before 1955 are out of copyright. To be sure, the National Library would have to investigate any named authors to confirm they all died before 1955. That’s simply impossible in a mass digitisation project. Instead the library weighed the copyright risks against the cultural benefit and decided to proceed. If the library had been more cautious, if the risks or uncertainties had seemed too great, Trove would have no digitised newspapers. And we would all be poorer.

These types of judgements are made all the time by cultural organisations wanting to open online access to their collections. Libraries, archives and museums are full of so-called ‘orphan’ works, whose creators cannot be identified or located. There’s no risk-free way of making this content available online.

But these risks are not only assessed and managed, they’re passed on to users, who must themselves try and navigate the thicket of copyright law. As use of online collections moves beyond traditional forms of citation into new types of digital aggregation, analysis and annotation, the doubts and complexities accumulate. The price of innovation is increased risk. The only safe course is to do nothing.


Why do we put cultural heritage collections online? Is it for the sake of efficiency, preservation, marketing or perhaps an informed citizenry? Usually we fall back on fuzzy notions of ‘engagement’ or ‘access’. More access is good, particularly if we can measure it easily through web stats.

Recently we surveyed Trove users to gain a broad picture of satisfaction and use. One finding in particular keeps me coming into work each day – 90 per cent of our general users agreed with the statement ‘Trove has made me interested in learning and discovering more’. Access can’t simply be measured in collection images or web-page hits. What we’re creating is an enlarged space for reflection, research, learning, creativity and critique. We’re enabling people to do more, with more.

Trove is not alone. Around the world, projects such as Europeana, the Digital Public Library of America (DPLA) and Digital New Zealand all work to open our collected cultural heritage to new forms of use. Europeana, in particular, has drawn on its research into the value and impact of online collections to proclaim a wonderfully ambitious agenda. Their aim is to ‘transform lives’ – to unlock Europe’s cultural heritage, enabling it to act as a ‘catalyst for social and economic change’.

Resisting Europeana’s efforts to ‘transform the world with culture’ are an array of different copyright regimes across Europe. Advocacy on behalf of the very idea of ‘openness’ is crucial to the success of their mission. But it’s never simply a matter of law.

Our cultural collections contain many resources that are already free of copyright restrictions, but it’s not always easy to find them. A lack of clear identification can stymie reuse as effectively as copyright restrictions – it’s not enough to share the resources, organisations also need to share licensing information so that open content can be easily discovered across collections. Sometimes institutional requirements for permission are weighed upon public domain resources, fostering doubt in place of certainty. Wherever copyright lingers, rights statements bloom in astonishing diversity. The DPLA estimates that there are more than 26,000 different rights statements attached to items in their aggregated collection. How are users expected to know what they’re allowed to do?

The complexity of copyright fosters confusion and uncertainty beyond the reach of mere law. It’s not just legislation that has to change.

Recently Dan Cohen, the Executive Director of the DPLA, argued that the licensing of cultural data should address more than just legal and technical issues. Instead of seeking to enforce acknowledgement of the source of the data through licence conditions, the DPLA wants to push discussion of attribution and reuse ‘into the social or ethical realm’ by ‘pairing a permissive license with a strong moral entreaty’. Instead of a statement about legal constraints, we have the opportunity for a conversation about what matters and why.

It might not figure in our web stats, but the National Library has plenty of anecdotal evidence that Trove changes lives – a grandfather’s face glimpsed for the first time, or perhaps a family mystery solved. One man, who grew up in care, found through Trove the only known photograph of himself as a child. How do we weigh such opportunities against questions of ownership and control?

We need to shift the discussion away from the nature of property to the value of use. What do we want to do? Online culture is read–write. We do not simply consume – we share, we remix, we curate and create. Increased participation brings new opportunities for understanding. Do we give them flight or lock them down?


One of the main sources of traffic to Trove, up there with Facebook and Wikipedia, is the knitting site Ravelry. Why? Because Ravelry users have found and shared hundreds of craft patterns from Trove’s digitised newspapers. And not just shared, but made. The most popular pattern, ‘Elegant elephant’, discovered in a 1959 edition of the Australian Women’s Weekly, has been made more than forty times, often with individual embellishments.

This to me seems a great example of the wonderful complexities that the digitisation of cultural heritage collections are introducing into our relationship with the past. A digital version of a fifty-year-old pattern is shared online and spawns a herd of cuddly elephants. From past to present, from digital to physical, the transformations pile one on top of another.

It’s also unexpected. It’s not a use that was designed into the system, it’s a new set of experiences brought to life through the passions and ingenuity of Trove users.

The application of digital tools to large cultural collections also promises to surprise. Techniques drawn from computational linguistics, for example, are being used to analyse the spread of ideas through nineteenth-century newspapers. Another project is using computer vision to identify poems in newspapers through their distinctive shapes. New structures can be found and visualised within digitised sources. New questions can be asked.

But these techniques also carry new risks. Researchers in the digital humanities, exploring the application of technology to fields such as literature and history, have been prominent in discussions around the copyright implications of mass digitisation projects and the legal status of text-mining. Increased clarity around concepts such as ‘transformative use’ is necessary to ensure that researchers have access to data and the confidence to explore.

And yet the ultimate goal isn’t certainty, it’s a greater awareness of the constraints around our engagement with the past. Within both government and the cultural sector the value of ‘open’ data is rightly proclaimed, but open data is always, to some extent, closed. Categories have been assigned, formats have been cleaned, deci- sions made about what belongs and what doesn’t – every spreadsheet contains an argument.

Each elephant is different. The tensions imposed by our overly complex system of copyright do at least remind us that the past can never be ‘open’, its limits cannot be legislated, its boundaries cannot be fixed. Beneath questions of access and use are better questions about our responsibilities to the past.


I’ll admit that part of my discomfort in being questioned about the copyright status of ‘The real face of White Australia’ stemmed from my ignorance. I just didn’t know the answer.

I’m still not sure.

I do know now that photographs taken before 1955 are okay. But these were attached to official forms, so perhaps the government owns the copyright. Are they published? I suspect the only way to be certain would be to seek permission from the current government department with responsibility for … what exactly? The White Australia Policy lives on, its workings preserved within the archives.

I’ll admit too that I always thought it would be interesting if some part of the government challenged our use of the documents. What exactly would they be claiming ownership of?

‘The real face of White Australia’ was motivated by a strong sense of responsibility towards those people whose lives are glimpsed through the records. To me the question of responsibility still seems more important than the intricacies of ownership. Our debts are to the people who confront us with their gaze, who defy the legislation that told them they did not belong.

Copyright law will never be able to make these judgements for us. No system can predict the individual ethical calculations that shape our engagement with the past. We may always be confronted with risks.

The word ‘access’ itself is full of politics. To what? By whom? When it comes to our cultural heritage we should never be satisfied. We must ask about the silences and the gaps. We must challenge the definitions. Access can never simply be given, to some extent it has to be taken. In the struggle we will find meaning.

There must always be risks. The point is to make the risks count.

My two lives

My blog hasn’t quite caught up the fact that I now have two jobs to go along with my two lives. Monday to Wednesday I’m still part of the Trove management team at the National Library of Australia, but on Thursdays and Fridays I’m Associate Professor of Digital Heritage in the Faculty of Arts and Design at the University of Canberra.

Screen Shot 2015-04-16 at 5.01.37 pm

I love working with the Trove team, but I also want to keep contributing to the development of the digital humanities in Australia through my own teaching and research. Hopefully now I can do both.

At the University of Canberra I’ll be helping to develop new digital heritage offerings in undergraduate, postgraduate and professional development courses. Exciting times ahead!

On the research front I’m hoping to reinvigorate a few stalled projects and poke around some more amidst the possibilities and politics of digital cultural collections.

These are the themes I’m thinking about at the moment.

Digital diversity

Using digital tools to expose alternative voices and experiences from within the cultural record.

  • Invisible Australians — yes it’s time to give this important project a home and kick it into gear
  • Every life counts — this is the working title for a new project around workplace deaths
  • I also want to expand on some of the questions in Seams and edges

Access, impact and understanding

How digitisation projects change our relationship with the past.

Data-enriched narratives

Developing new forms of online publication that use Linked Open Data to integrate historical writing and cultural collections.

All that in two days a week — wish me luck!

And if you’re interested in collaborating, please get in touch

Stories for machines, data for humans

Presented at the New Factual Storytelling symposium, 10 April 2015, University of Canberra

I feel like the nerdy kid at the cool kids’ party.

There are lots of interesting and creative projects on show today and I… well I want to talk to you about metadata.

Data. According to some pundits it’s the new oil, or the new electricity. Fuel for economic development — a raw material ready to be ‘mined’ for insights, innovation and our purchasing preferences.

In the cultural heritage sector the data metaphors are more likely to be framed around liberation than exploitation. Our data wants to be ‘open’. But there’s still a tendency to think on an industrial scale — it’s about pumping out large datasets for potential re-use.

What can be lost in metaphors of extraction and scale is an appreciation of the human origins of data. We are not buoys bobbing in the ocean reporting on the heights of passing waves. Big data is made up of many small acts of living.

So today I want to talk about small-scale, free-range, artisanal data. I want to talk about data, alongside storytelling, as the product of creativity, imagination, frustration and fury.

Let’s think for a moment about the work of a historian — identifying actors, defining relationships, documenting the complex networks that bring together people, places and events over time. It’s painstaking, exhilirating and potentially soul-destroying work. It’s also an exercise in data modelling. Whether the results are preserved in a triplestore, a spreadsheet, or on a drawer full of index cards — it’s nodes and edges, it’s entities and relationships, it’s data.

And that’s ok. Making data doesn’t condemn you to a rigidly empirical, deterministic framework. There’s always room for nuance, interpretation and doubt. there’s always room for stories.

But what happens when historians undertake the oddly-named process of ‘writing up’. The complex data models are flattened down to a series of sentences neatly arranged in linear sequence — our things become strings. The data is squeezed out and discarded, glimpsed only as fragile echoes hiding in footnotes.

This is of course part of the skill of historical writing — the ability to represent complex relationships through narrative. But why can’t we have our stories and data too?

This is a question I’ve returned to a number of times over the last few years.

It’s come up because I get excited about Linked Open Data’s potential to deliver structured, machine-readable information via the web. But then I wonder, whose stories will we be telling to the machines. How can we explore the expressive possibilities of Linked Open Data and not be constrained by instrumentalist assumptions about the models we make.

It’s come up because I get excited about embedding cultural heritage collections within the passion and practice of everyday life. Why squeeze out the data from historical publications when every article could be an online exhibition, every book could be a digital portal, every footnote could be a link for exploration and aggregation?

I don’t really have answers, but I do make stuff. I’ve a few goes at trying to create narratives that embed Linked Open Data.

They’re not very exciting from a design point of view, but I keep coming back to them because there still seems to be a lack of alternatives. There’s lots of talk about publishing Linked Open Data, but much less about how the use and consumption of Linked Open Data can be built into creative practice.

So here I am again.

This exercise has a number of constraints built in. The main one is NO PLATFORMS — a historian using a series of simple tools should be able to create and publish a data-driven web page without any dependencies. It should be as simple as uploading an html page to a server.

In my idealised workflow, the historian would manage their data about people, places, events and resources in a simple database capable of exporting a flavour of Linked Open Data known as JSON-LD.



Then, having created their narrative, they’d mark it up in the tool of their choice to relate specific names or phrases in the text to the entities in their database.


Then they’d just drop the text and the data into a html page and whack it on the web. With a bit of javascript magic to activate the data, you’d have something like this


The demo is live (though still under construction), so have a play.

  • Scroll the text to see those carefully inserted identifiers create pop ups in the sidebar.
  • The text has itself become data, each paragraph is an object — try filtering the text or linking to an individual paragraph.
  • Browse all the people, or resources. Explore all the relationships for Inigo Jones.
  • Mapping to existing identifiers from sources like Trove and Wikipedia help put the ‘linked’ into Linked Open Data.
  • There’s a rather boring map, a timeline and a wall view. New views could be easily added by dropping in some extra javascript.
  • It’s all data, so other visualisations and analyses might be created on the fly.

That’s what humans see, but what about machines? All the carefully curated data is exposed in a machine-readable form. Lots of triples…


The code for the viewer and maker is all available if anyone wants to play with it, and I’m intending this year to develop two substantial monographs using these tools. Both have many links into cultural collections.

My aim here is not to develop a fully-operational publishing system. I just want to get a better idea of what’s useful, what’s interesting, what’s possible. To think beyond the current limits of scholarly publishing into a world where data and narrative can live together, where interpretative work is represented in all data-inflected glory.

Myths, mega-projects and making

Keynote presented by video at EuropeanaTech 2015, 13 February 2015.

The video of this presentation is available on Vimeo and the slides are on SlideShare.


In the aftermath of World War II, Australian hopes for a new era of national progress were expressed in a massive engineering project called the Snowy Mountains Scheme. The project promised new reserves of water and electricity to power development of Australia’s inland.

‘The Snowy Mountains Scheme’, Canberra Times, 
26 December 1975, p. 5.

‘The Snowy Mountains Scheme’, Canberra Times, 
26 December 1975, p. 5. <http://nla.gov.au/nla.news-article102193928>

Rivers were diverted, towns were relocated, and new reservoirs were created. Over 145km of tunnels were carved through the granite peaks of Australia’s Great Dividing Range. Finally completed in the 1970s, the Snowy Mountains Scheme was an engineering marvel.

http://www.zenlan.com/collage/trove/#snowy scheme

http://www.zenlan.com/collage/trove/#snowy scheme

But this symbol of national pride would not have been possible without the labours of thousands of ’New Australians’ drawn from across Europe. Some were recruited because of their skills, others were plucked from displaced persons camps and offered the chance of a new life — as long as they were prepared to work where the Australian government wanted them to.

‘AT WORK ON SNOWY MOUNTAINS SCHEME’, Sydney Morning Herald, 8 January 1954, p. 5. 

‘AT WORK ON SNOWY MOUNTAINS SCHEME’, Sydney Morning Herald, 8 January 1954, p. 5. 

The human and environmental costs of the project are still debated, but the Snowy Scheme is regularly invoked as the country’s prime nation-building project — an example of what can be achieved together through vision, leadership, and toil.

Why am I talking about this today?

Well, I suppose it’s a great chance to say ‘Hey Europe, thanks for all the people!’.

But it’s also because I wanted to highlight the mythic qualities of the mega-project — the cultural power that resides in the ‘big idea’ that promises to set us upon a path towards the future.

We are here today because we are embarked upon ambitious undertakings.

Our projects aim to reshape the cultural landscape. We are building pipelines and reservoirs — moving massive amounts of data across countries, across the world.

But as we’ve tried to show, these large scale efforts are only possible because of many smaller, local collaborations.

The Snowy Scheme was built by individuals fleeing the disruptions of war. They took a risk in the hope of something better. It’s important for us to reflect on the contributions and motivations of our communities, our partners, and our users. A big idea isn’t enough.

You probably all know that as well as metadata from libraries, museums, archives and universities, Trove provides access to almost 150 million full-text digitised newspaper articles. The OCR’d text of the articles is fully searchable, but suffers from the usual errors and inaccuracies.

Fortunately Trove users have been eager to help. Anyone can jump in and correct the OCR output, and they do. More than 150 million lines of text have been corrected so far. Our top corrector (yes we have a scoreboard) has corrected more than 3 million lines of text.

Recently I’ve been thinking about this work and the limitations of language around online engagement. Our correctors are more than ‘users’ — ‘contributors’ perhaps, or ‘volunteers’?

But all of these words seem to place correctors on the other side of the interface — as clients rather than builders.

Each correction is a tweak of our search index. It changes the way the backend functions, increasing the efficiency of the system by getting people to the things they’re interested in more quickly.

Perhaps we should call our correctors ‘discovery engineers’?

The mythic mega-project maintains a sense of otherness — it is exceptional, an achievement above and beyond the realms of ordinary experience. But this obscures the many small acts of commitment and cooperation that make it possible. These are the expressions of ordinary lives, the routine and repetitive alongside moments of passion and meaning.

The success of our projects will ultimately depend not on the speed of our servers or the cleanliness of our code, but on the interactions that emerge as our aggregations become part of the simple business of living.

People correct our newspapers for many reasons, but few of these motivations are likely to align with our own strategic objectives.

It’s not just corrections either. More than 80,000 comments and 3 million tags have been added to resources in Trove. These are just plain text tags, we make no effort to control their content. This creates some interesting possibilities.



I wonder if you can guess the meaning of our most heavily-used tag? It’s ‘LRRSA’ and it’s attached to more than 16,000 items.

Any ideas? It’s an acronym that stands for the Light Rail Research Society of Australia. Members of the society use the tag to share material of interest — it’s become a means of collaboration.

Another popular tag is ‘TBD’ or ‘To be done’. This one’s used by text correctors to manage their own workflows.

The numerous guises of a simple tag illustrate the value of ‘underspecified tools’ — of leaving functionality open to ad hoc elaboration. The boundaries between systems and their use is fluid. Tagging behaviour can extend system functionality.

From machine to human and back again, the limits of what is possible are open to negotiation and change.

My favourite example of this is the work of one man who has been identifying out-of-copyright sheet music in Trove. He’s not a musician but he uses his computer to create performances of the pieces. He then uploads the performances to YouTube or SoundCloud and adds a link to them in a comment on Trove. People who find these works on Trove can now just click to hear them. The functionality of the system has been extended without a single line of code being written.

But the permeability of these boundaries means we can’t take the roles of people and machines as given. Five years ago, crowdsourced text-correction was a cost-effective solution to the vagaries of OCR, but as the technology improves do we continue to ask humans to undertake tasks that a machine might do more easily? Do we continue to ask our volunteers to change every instance of ‘tbe’ to ‘the’?

While an astonishing 150 million lines of text have been corrected, more than 96% of articles have no corrections at all. More articles are being added all the time and it seems the rate of corrections might be flattening out. The task seems beyond humans alone.

We’re currently redeveloping our newspapers interface, making it more responsive, adding shiny new browse features, and improving the overall performance. We’ll also be introducing some tools for ‘advanced’ text correction, allowing our users to modify not only the text, but some of the structural elements of the OCR — inserting new lines for example.

As we investigate opportunities for enrichment of our metadata, I think we’ll also need to think about the work we offer our discovery engineers. Correction could extend to geocoded placenames; named entity extraction could be integrated with user-defined relationships.

This technosocial shift is also evident at the other end of our pipelines, when our aggregated data is consumed and transformed.

Except APIs are not really pipelines are they? You don’t just turn on a tap, you have to ask the API a question. Our questions interact with the content of the reservoir to shape and colour the flow of data.

An API is a tool for transformation.

New tools and interfaces explicitly change the nature of our aggregations by carrying their use into different realms, by shifting contexts, by asking new questions. Each new use changes how we see the whole.

This is not reuse or recycling — this is remaking. We can dig the tunnels and fill the reservoirs, but it’s up to you — the coders, the builders, the developers and the makers — to show us what we’ve created.

The big challenge is to open up this transformative power to those who have no idea what an API is — people who have important and powerful questions to ask our APIs, but don’t know the language.

We need to make sure that the myth of the mega-project doesn’t blind us to the human dimensions of our undertaking. Let’s foster interventions as well as innovations, activists as well as evangelists. Let’s make sure our big ideas make space for other ideas to erupt and grow.

Seams and edges: Dreams of aggregation, access & discovery in a broken world

Presented at ALIA Online 2015, 3 February 2015 in Sydney. A longer version with bonus references will be made available on the ALIA Online site. Slides are on Slideshare.


In March 1930 the Sydney Electrical and Radio Exhibition opened in a blaze of excitement. Aboard his yacht in Genoa, inventor Guglielmo Marconi triggered a radio signal that reached across the world and switched on more than 2800 electric lights at the Sydney Town Hall. ‘All in less than a second!’, exclaimed the Sydney Mail, ‘Here was magic!’.

‘When Marconi Switched on the Lights The Sydney Electrical and Radio Exhibition’, Sydney Mail (NSW), 2 April 1930, p. 20.

‘When Marconi Switched on the Lights The Sydney Electrical and Radio Exhibition’, Sydney Mail (NSW), 2 April 1930, p. 20. <http://nla.gov.au/nla.news-article160633081>

According to the Sydney Morning Herald, radio had ‘eliminated time and distance’.

About a month later the British and Australian Prime Ministers spoke for the first time via wireless telephone. ‘These were days for the annihilation of time and space’, the British PM proclaimed.

Sounds familiar?

From railways to the telegraph, radio, and the internet, the progress of technology has often been imagined as a battle against time and space. Progress has been measured in the seconds we save, in the distances we conquer, in the barriers of terrain and politics we bridge.

Remember when we used to talk about the ‘Information superhighway’?

In the realm of information this march of conquest is accompanied by discussions of speed and scale, by adjectives such as ‘instantaneous’ and ‘seamless’.

And you don’t have to look too hard to find software and service vendors touting the promise of ‘seamless discovery’. Indeed, it turns out that ‘Seamless Discovery’ itself is the registered trademark of a video discovery platform used by Foxtel and others.

Technology promises instant access to information — a future beyond silos.

In the library world, seamless discovery is commonly associated with what are variously called ‘next-generation catalogues’, ‘web-scale discovery services’ or ‘discovery layers’.1

The idea is familiar and seductive. Instead of forcing searchers to construct multiple queries across a variety of databases, systems and interfaces, these services aggregate metadata from different sources and offer access through a single search portal.

A seam-free service is one that maximises ease-of-use.

We all know what such services look like, even if we’ve never used one. Search is no longer just a task to be accomplished in pursuit of a particular goal — to find a desired resource or piece of information.

Google has played a central role in re-engineering our understanding and expectations of online experience. Ours is increasingly a ‘culture of search’ where the technologies of discovery have become part of everyday life.2

It’s natural then that users of other discovery services will approach them with a set of expectations shaped by the Googlisation of modern culture.

It’s not just the simplicity of that single search box, it’s our faith that search will just work.

Every time Google responds to our query about some obscure piece of television trivia with 152 million results, we cannot fail to be impressed by the power at our fingertips. Every time Google predicts our query or customises our results we are beset with awe.

Here is magic.

Google’s dominance gives it immense power in presenting to us an image of the world constructed to it’s own secret formula. This power bears ontological weight — if we can’t find something on Google does it exist?

Of course we all want to make life as easy as possible for the people who use our services. The question is how the pursuit of a Google-like experience constrains our options and assumptions.

Metaphors matter. Pursuing ‘seamless discovery’ in the wake of Google means engaging with questions of politics and power.

Seams are not simply obstacles to a smooth user experience, they’re reminders that our online services are themselves constructed. There’s nothing natural or inevitable about a list of search results.

Mark Weiser, one of the pioneers of ubiquitous computing, argued against seamlessness because it made everything seem the same. Instead he imagined systems with ‘beautiful seams’ — that empowered users to manipulate their contexts and connections.3

As Mitchell Whitelaw notes ‘seamfulness is also an ethical and political stance’ — it’s a commitment to exposing the interpretative distance between our collection data and its online representation.4 There are opportunities here not only for transparency, but to explore alternatives to Google’s template for discovery.

Trove Mosaic by Mitchell Whitelaw.

Trove Mosaic by Mitchell Whitelaw. < http://mtchl.net/trovemosaic/>

Research into the visualisation of large cultural heritage collections has emphasised that search is only one way of representing a collection.

By focusing on the stylish minimalism of the search box, we discard opportunities for traversing relationships, for fostering serendipity, for seeing the big picture.

By creating experimental interfaces, by playing around with our expectations, we can start to think differently — to develop new metaphors for our online experience that are not framed around technological conquest.

Eyes on the past.

Eyes on the past. < http://eyespast.herokuapp.com/>

My own Eyes on the past, which allows you to find your way into Trove’s digitised newspapers through machine recognised faces and eyes, is far from a practical discovery tool. But building on my earlier work using facial detection technology as a means of archival intervention, it opens up questions about the lives embedded within our collections — we see them differently, we feel differently.

A Google-like search experience offers utility at the expense of critique. Its technologies are black boxed, its assumptions obscured.

How can those of us in the discovery business create a buffer for critical reflection while still meeting user expectations? What can we do in a service such as Trove that supports many thousands of enquiries a day?

I’d suggest we start with an acknowledgement of our limits, an attempt to trace the edges and the fractures that are too often glossed over in our pursuit of seamlessness. Let’s start by admitting what Trove is not:

  1. Trove is not perfect
  2. Trove is not everything
  3. Trove is not a machine

Trove is not perfect

Trove is an aggregator. It pulls together metadata from a variety of different sources, applies some normalisation across the required fields, and sends the results off to be indexed.

With close to 400 million resources harvested from hundreds of contributors through an assortment of different pipelines, it’s inevitable that there will be errors and oddities.

If you want to see errors, of course, you can head along to Trove newspapers zone where the limitations of Optical Character Recognition are on display for all to see. Unlike some full-text databases, Trove exposes the raw output of its OCR processing.

Trove’s transcriptions are improving all the time thanks to the efforts of thousands of online volunteers who correct the raw OCR output. Astonishingly, more than 130 million lines of text have been corrected by Trove users, in what is rightly touted as a highly successful crowdsourcing initiative.

But it’s also important to put this effort in perspective. Enter ‘has:corrections’ into the Trove search box to retrieve all the newspaper articles that have at least one crowdsourced correction. At the time I wrote this, the figure was 5,273,600 or just 3.6% of the total number of newspaper articles in Trove. Despite their important efforts, Trove’s volunteers will never be able to produce a perfect rendering of the newspaper content.

But what is ‘perfection’ anyway? OCR accuracy is important only in so far as it supports the interests and activities of users. For the purposes of discovery the accuracy of common search terms such as names, places or events are likely to be most important. But if you’re undertaking an analysis of changes in language across time, a much broader range of words would be significant.

Accuracy is something that need to be assessed and understood within the context of a specific activity.

Services like Trove have to be prepared to expose configurations, assumptions and limitations so that users can understand the impact of these of their own research.

If we are developing resources to support the creation of new knowledge we cannot simply black box our tech and trade on trust.

That’s Google’s game.

QueryPic is a simple tool that visualises search results in the Trove newspapers zone. QueryPic lets you see patterns and trends across the whole database.

When did the ‘Great War’ become the ‘First World War’? QueryPic can be used to explore this shift in terminology, but if you examine the results closely you’ll notice a small bump in the graph indicating that the term ‘World War I’ was being used during World War I. Huh?

When did the 'Great War' become the 'First World War'?

When did the ‘Great War’ become the ‘First World War’? < http://dhistory.org/querypic/43/>

If you drill down through the results you’ll find that this is because Trove users have been busily adding the tag ‘World War I’ to selected articles, and by default Trove searches user tags and comments as well as article text. The bump is an artefact of Trove’s search configuration.

Trove’s primary function is discovery — to make it as easy as possible for people to find things they’re interested in. But the sort of fuzziness that supports discovery works against other forms of analysis. We should make these sorts of assumptions more obvious.

By showing our seams, exposing our imperfections, we have the opportunity to educate. As well as helping people use Trove, we can open up bigger questions about the way search works on the web.

Trove is not everything

There’s nothing natural about our cultural collections or their digital representations — they have been created by many acts of selection, neglect, vision, accident and planning.

If you graph the number of newspaper articles in Trove by state and year you’ll notice a rather dramatic spike around 1914.

Newspaper articles in Trove by state and year.

Newspaper articles in Trove by state and year. < https://plot.ly/~wragge/22/trove-newspaper-articles-by-state/>

Why? Were more newspapers printed during the war era? The answer is simply funding. As part of the Australian Newspaper Digitisation Program, the NSW and Victorian State Libraries have chosen to invest in the digitisation of newspapers from the World War I period.

The contents of Trove’s newspaper zone, like any online collection, is constructed — shaped by many competing priorities. The consequences of this process are not always obvious.

In a competition for resources what gets digitised and why? There’s a danger that the sheer scale of aggregation services like Trove will reinforce existing prejudices. People already struggling for visibility and recognition within our cultural record might be lost amidst the overwhelming numbers of the safe and the sanctioned.

If we are concerned with absence as well as inclusion, with addressing the silences within our cultural record, we need to wary of sharing in Google’s aura of completeness. The ontological weight of search can too easily equate absence with non-existence.

But aggregation also offers new opportunities for analysis. Questions of representation and diversity can be explored through the metadata itself.

By way of a quick example, I used the Trove API to easily compare the languages spoken at home in Australia, according to the 2011 Census, with the languages of resources in Trove’s book zone.

Languages of Trove books compared to languages spoken at home in Australia (from 2011 Census).

It’s fascinating to consider how we might use socio-economic data to slice our cultural collections across the grain to reveal different patterns of access and exclusion.

By admitting the constructed nature of our collections, the gaps and the silences as well as their strengths, perhaps aggregations like Trove can become sites of both analysis and activism.

Trove is not a machine

Trove is not a single application, it’s a complex system with multiple components. This size and complexity focuses our attention on the technology — on the lines of code and racks of servers. But the system only exists to support human creativity and cooperation. Is it a machine, a community, or something else?

I often talk about Trove as a platform — it can be built upon in many ways, both through code and collaborations. In particular, by providing an open API, Trove invites the public to create new tools, analyses and interfaces.

But there are metaphorical dangers lurking here as well. Social media services such as Facebook and YouTube also describe themselves as platforms.5

If we are to embrace the ‘platform’ metaphor we must also be ready to unpack its implications. If we want progressive platforms we need to honestly address issues of openness, participation, and accessibility. Every API is an argument and no data is ever truly ‘open’.

For me the term ‘platform’ speaks of something unfinished — an invitation and an opportunity. Trove is permanently under construction, constantly improved through the labours of its developers and community.

This is most evident in the work of Trove’s text correctors, whose many small acts of repair help the technology to function more efficiently. But each tag or comment also changes Trove — aiding discovery, adding context, or creating new connections.

Other Trove-building activity is less visible, and the responsibilities more distributed. For example, Trove is currently working with Victorian Collections to bring many small, local collections from across Victoria into Trove.

But this collaboration is itself built on the labours of many people over many years — from the Museums Australia staff who train community groups, to the local volunteers who painstakingly digitise and describe their collections. Trove helps bring these efforts to the attention of the web, and is itself enriched.

For all the new terms we have for systems and devices we have thus far failed to find a language to describe online collaboration and social engagement. Instead we fall back on the awful term ’user’.6

By drawing attention away from ‘the machine’ to the many small acts that sustain and enlarge a service such as Trove, we create a space where language might evolve.

Broken worlds

Most technological futures are ultimately alienating and disempowering — people are cast as the passive consumers of the latest wonders and gadgets.

Instead of ‘progress’, Steven J. Jackson presents a vision of a fundamentally broken technosocial world, barely held together by numerous acts of concern, appropriation and repair.7 This focus on ‘repair’ helps us see the human agency at work, the possibilities for change.

What might happen if instead of seeing the seams and edges of our information landscape as speed bumps in the onward march of progress we recognised their fragility, and celebrated them as sites of collaboration, negotiation and repair?

What might we discover then?


  1. Joshua Barton and Lucas Mak, ‘Old Hopes, New Possibilities: Next-Generation Catalogues and the Centralization of Access’, Library Trends, vol. 61, no. 1, 2012, pp. 83–106. <http://muse.jhu.edu/journals/library_trends/v061/61.1.barton.html> []
  2. Ken Hillis, Michael Petit, and Kylie Jarrett, Google and the Culture of Search, Routledge, 2013. []
  3. Quoted in Matthew Chalmers and Ian MacColl, ‘Seamful and seamless design in ubiquitous computing’, in Workshop At the Crossroads: The Interaction of HCI and Systems Issues in UbiComp, 2003. []
  4. Mitchell Whitelaw, ‘Representing Digital Collections’, in Performing Digital: Multiple Perspectives on a Living Archive, ed. David Carlin and Laurene Vaughan, Ashgate Publishing, Farnham, UK, 2014. []
  5. Tarleton L. Gillespie, ‘The Politics of “Platforms”’, New Media & Society, vol. 12, no. 3, 1 May 2010. <http://papers.ssrn.com/abstract=1601487> []
  6. Peter Lyman, ‘Information Superhighways, Virtual Communities and Digital Libraries: Information society metaphors as political rhetoric’, in Technological Visions: The Hopes and Fears that Shape New Technologies, ed. Marita Sturken, Douglas Thomas, and Sandra J Ball Rokeach, Temple University Press, Philadelphia, 2004, pp. 201–218. []
  7. Steven J. Jackson, ‘Rethinking repair’, Media meets technology, MIT Press, 2013. []

2014 — the making and the talking

2014 — the making and the talking

This my now traditional ‘what I done this year’ post, which, if nothing else, makes me check that my various experiments are still alive. It’s been a challenging year trying to balance my work as Trove Manager with my broader passions and responsibilities as a member of the digital humanities community. So yeah. My personal highlights included heading to Japan to give a keynote at the annual conference of the Japanese Association of Digital Humanities, building Eyes on the past, and resurrecting THATCamp Canberra.

2015 is shaping up as both exciting and scary. On the scary front there’s the whole giving a keynote to hundreds of the world’s leading digital humanities scholars at DH2015 thing (cue imposter syndrome). There’ll also be the launch of Copyfight from NewSouth Publishing, which includes contributions from me and some really-real writers. I’m looking forward to squeezing in some more work on Invisible Australians and a few other research projects. Stay tuned.

The making

Inserting usual disclaimer here that this is not what I get paid to do as Trove Manager. These are projects and experiments I undertake in my own time, for my own reasons, at the cost of my own sanity. So all the problems and mistakes are also mine.

The talking

Sketching with Python and Plotly

Sketching with Python and Plotly

I’m currently trying to make some progress with my ‘seams and edges’ paper for ALIAOnline 2015 and naturally ended up writing some code (what me procrastinate?). I was wondering about ways of exploring the ‘representativeness’ of an aggregation like Trove — what’s there and what’s not — so started noodling around with the Trove API.

The first result was a graph representing the numbers of Trove contributors and resources by state, compared to the population of that state. All values are displayed as percentages of the total.

The ACT is over-represented, of course, because of the holdings of the National Library itself. The under-representation of Queensland looks interesting — something to explore in the future.

My next graph used data on languages spoken at home in Australia from the 2011 census. It compared the population speaking those languages with the number of books in that language included in Trove, again as percentages of the total. It doesn’t embed very well, so view the full-size version on Plotly.

As I was playing around I noticed a tweet from Bridget Griffen-Foley:

Being in a quick-coding sort of mood I had to see how long it would take me to create a graph showing the numbers of daily newspapers in Trove (where daily is defined as more than 300 issues in a year). The answer was about fifteen minutes.

All of the graphs are created using the web service Plotly. Plotly has an easy-to-use Python API which means all you need to do to create a graph is to add a few lines of code. There are other Python visualisation libraries, but I like Plotly because it creates something instantly shareable — perfectly suited to this sort of quick and dirty experimentation.

I don’t think any of these graphs are particularly revealing, and I’ve made some assumptions about the data that probably wouldn’t hold up under scrutiny. But what this fiddling around emphasised was how an API and some simple tools make it possible to ask quick questions of the data.

All the code is in my Trove-Sketches repository on GitHub.