8 months on

This has been a rather lean year on the blogging front. So as 2013 nears its end, I thought I should at least try to list a few recent talks and experiments.

Things changed a bit this year. No more am I the freelance troublemaker, coding in lonely seclusion, contemplating the mysteries of cashflow. Reader, I got a job.

And not just any old job. In May I started work at the National Library of Australia as the Manager of Trove.

Trove, of course, has featured prominently here. I’ve screen-scraped, harvested, graphed and analysed it — I even built an ‘unofficial’ API. Last year the NLA rewarded my tinkering with a Harold White Fellowship. This year they gave me the keys and let me sit behind the wheel. Now Trove is not only my obsession, it’s my responsibility.

Trove is a team effort, and soon you’ll be meeting more of the people that keep it running through our new blog. I manage the awesome Trove Support Team. We’re the frontline troops — working with users, content partners and developers, and generally keeping an eye on things.

And so my working hours are consumed by matters managerial — attending meetings, writing reports, planning plans and answering emails. But, when exhaustion allows, I return to the old WraggeLabs shed on weekends and evenings and the tinkering continues…


TroveNewsBot is a Twitter bot whose birth is chronicled in TroveNewsBot: The story so far. Several times a day he posts a recently-updated newspaper article from Trove. But he also responds to your queries — just tweet some keywords at him and he’ll reply with the closest match. You can read the docs for hints on modifying your query.

TroveNewsBot also offers comment on web pages. Tweet him a url and he’ll analyse its content and search for something relevant amidst his database of more than 100 million newspaper articles. Every few hours he automatically checks the ABC News Just In page for the latest headlines and offers a historical counterpoint.

In Conversations between collections you can read the disturbing story of how TroveNewsBot began to converse with his fellow collection bots, DPLABot and now DigitalNZBot. The rise of the bots has begun…

I should say something more serious here about the importance of mobilising our collections — of taking them into the spaces where people already are. But I think that might have to wait for another day.

Build-a-bot workshop

You can never have too many bots. Trove includes the collections of many individual libraries, archives and museums — conveniently aggregated for your searching pleasure. So why shouldn’t each of these collections have its own bot?

It didn’t take much work to clean up TroveNewsBot’s code and package it up as the Build-a-bot workshop. There any Trove contributor can find instructions for creating their own code-creature, tweeting their resources to the world.

So far Kasparbot (National Museum of Australia) and CurtinLibBot (Curtin University Library) have joined the march of the bots. Hopefully more will follow!

TroveNewsBot Selects

Inspired by the British Library’s Mechanical Curator, TroveNewsBot decided to widen his field of operations to include Tumblr. There at TroveNewsBot Selects he posts a new random newspaper illustration every few hours.

Screen Shot 2013-12-23 at 9.49.32 pm


Unfortunately being newly-employed meant that I had to give up my place at One Week | One Tool. The team created Serendip-o-matic, a web tool for serendipitous searching that used the DPLA, Europeana and Flickr APIs. But while I missed all the fun, I could at least jump in with a little code. Within a day of its launch, Serendip-o-matic was also searching Trove.

Research Trends

This was a quick hack for my presentation at eResearch2013 — I basically just took the QueryPic code and rewired it to search across Australian theses in Trove. What I ended up with was a simple way of exploring research trends in Australia from the 1950s.

'history AND identity' vs 'history AND class'

‘history AND identity’ vs ‘history AND class’

Some of the thesis metadata is a bit dodgy (we’re looking into it!) so I wouldn’t want to draw any serious conclusions, but I think it does suggest some interesting possibilities.

Trove API Console

As a Trove API user I’ve always been a bit frustrated about the inability to share live examples because of the need for a unique, private key. Europeana has a great API Console that lets you explore the output of API requests, so I thought I’d create something similar.

My Trove API Console is very simple at the moment. You just feed it API requests (no key required) and it will display nicely-formatted responses. You can also pass the API request as query parameter to the console, which means you can create easily shareable examples. Here’s a request for wragge AND weather in the newspapers zone.

This is also my first app hosted on Heroku. Building and deploying with Flask and Heroku was intoxicatingly easy.

Trove Zone Explorer

Yep, I finally got around to playing with d3. Nothing fancy, but once I’d figured out how to transform the faceted format data from Trove into the structure used by many of the d3 examples I could easily create a basic treemap and sunburst.

Screen Shot 2013-12-23 at 11.12.04 pm

The sunburst visualisation was pretty nice and I thought it might make a useful tool for exploring the contents of Trove’s various zones. After a bit more fiddling I created a zoomable version that automatically loads a random sample of resources whenever you click on one of the outer leaves — the Trove Zone Explorer was born.

Trove Collection Profiler

As mentioned above, Trove is made up of collections from many different contributors. For my talk at the Libraries Australia Forum I thought I’d make a tool that let you explore these collections as they appear within Trove.

The Trove Collection Profiler does that, and a bit more. Using filters you define a collection by specifying contributors, keywords, or a date range. You can then explore how that collection is distributed across the Trove zones — viewing the results over time as a graph, or drilling down through format types using another zoomable sunburst visualisation. As a bonus you get shareable urls to pass around your profiles.

The latest sunburst-enabled version is fresh out of the shed and badly in need of documentation. I’m thinking of creating embeddable versions, so that institutions can’t create visualisations of their own collections and include them in their sites.


Somewhere in amongst the managering and the tinkering I gave a few presentations:

Conversations with collections

Notes from a talk I gave at the Digital Treasures Symposium, 21 June 2013, University of Canberra.

Over the last couple of weekends I’ve been building a bot. Let me introduce you to the TroveNewsBot.

Screen Shot 2013-06-20 at 7.40.35 PM

TroveNewsBot is just a simple script that periodically checks for messages from Twitter, uses those messages to create queries in Trove’s newspaper database, and tweets back the results.

TroveNewsBot’s birth was, however, not without some pain. I ran into difficulty with Twitter’s automated spam police. At one stage everytime my bot tweeted, its Twitter account was suspended.

Twitter’s bots didn’t like my bot. [:(]

The problem has since been resolved — I think I must have done something when I was testing that upset the spam bots — but it did lead me to read in detail Twitter’s policies on spam and automation. This sentence in particular caused me to reflect:

The @reply and Mention functions are intended to make communication between users easier, and automating these processes in order to reach many users is considered an abuse of the feature.

So what is a user and what is communication? I read this sentence as suggesting that communications between individual human users were somehow more real, more authentic than automatically generated replies. But is a script tweeting someone a link to to a newspaper article that they might be interested in really less authentic than a lot of the human-generated traffic on the net?

Amongst the messages I received when I revealed TroveNewsBot to the world earlier this week was this:

And later from the same person:


Even as we live an increasing amount of our lives ‘connected’, still there remains a tendency to assume that experiences mediated through online technologies are somehow less authentic than those that take place in this space that we often refer to as ‘the real world’.

In the realm of cultural heritage, digitisation is frequently assumed to be a process of loss. We create surrogates, or derivatives — useful, but somehow inferior representations of ‘the real thing’.

Now let’s just all admit that, yes, we like the smell of old books, and that we can’t read in the bath with our iPad, and move beyond the sort of fetishism that often accompanies these sorts of discussions.

Yes, of course, digital and physical manifestations are different, the point is whether we get anywhere by arguing that one is necessarily inferior to the other.

A recent article in the Times Literary Supplement expressed concern at the money being spent on the manuscript digitisation programs that the author argued were ‘proceeding unchecked and unfocused, deflecting students into a virtual world and leaving them unequipped to deal responsibly with real rare materials’. Yes, there may be aspects of a physical page that a digital copy cannot represent, but as Alistair Dunning pointed out in response to the article, there’s no simple binary opposition:

The digital does not replace the analogue, but augments it, sometimes in dramatic and sometimes in subtle ways.

In his keynote address to the ‘Digital Transformers’ conference, Jim Mussell similarly argued against a simplistic understanding of digital deficiencies.

The key is to reconceive loss as difference and use the way the transformed object differs to reimagine what it actually was. Critical encounters with digitized objects make us rethink what we thought we knew.

I’m very pleased to be an adjunct here at the University of Canberra, but I’ve always felt a bit of a fraud around people like Mitchell when it comes to talking about visualisation. I’m actually much more comfortable with words than pictures. So why am I here talking to you today?

I think it’s because what we’re discussing today, what the Digital Treasures Program is about, is not just visualisation. It’s about transformation. It’s about taking cultural heritage collections and changing them. Changing what we can do with them. Changing how we see them. Changing how we think about them.

It’s about creating spaces within which we can have ‘critical encounters with digitized objects’ that ‘make us rethink what we thought we knew’.

And that to me is very exciting.

What might these transformations look like? Who knows? This is research, it should take us places we don’t expect and can’t predict.

However, for the sake of convenience today I’ve tried to define a few possible categories — most, admittedly, based on my own work. But I do so in the hope that the achievements of the Digital Treasures program will soon make my categories look ridiculously inadequate.


When we have stuff in digital form — and by stuff I mean both collection metadata and digital objects — we can isolate particular characteristics and add them up, compare them, graph them. We can start to see patterns that we couldn’t see before.


Putting a lot of similar things together in a way that enables us to see them differently.


Putting different things together in a way that enables us to find connections or similarities.


Displaying something unexpected or random.


Putting things in new contexts, new conceptual spaces, new physical spaces, new geospatial spaces. Creating interventions and explorations.

This is a very limited catalogue of possibilities, but meagre as it is I think its enough to demonstrate that the overwhelming feature of digital cultural collections is not loss or deficiency, but opportunity and inspiration.

In fact I’m less worried about the deficiencies of digital representation than I am about the possibility that we might end up doing too much — that we might become so skilled in design and transformation that we end up overdetermining the experience of our users, that we end up doing too much of the thinking for them.

It seems to me that when it comes to digital cultural collections an important part of the transformation process is knowing where to leave the gaps and spaces that invite feeling, reflection and critique. We have to find ways of representing what is missing, of acknowledging absence and exclusion. We have to be able to expose our arguments and assumptions, to be honest about our failures and limitations. We have to be prepared to leave a few raw edges, some loose threads that encourage users to unravel our carefully-woven tapestries.

As I was developing the TroveNewsBot I realised I needed some sort of avatar. So of course I started searching in the Trove newspaper database for robots — there I found Mr George Robot.

The Courier-Mail, 7 November 1935, page 21

The Courier-Mail, 7 November 1935, page 21

George Robot, ‘described as the greatest electro-mechanical achievement of the age’ toured Australia in 1935 and 1936. As one newspaper described, he:

rises and sits as requested, talks, sings, delivers an address on the most abstruse topics, gnashes his electric teeth in rage or derision, while he accentuates his remarks by the most natural movements of arms and hands

But it wasn’t just George’s technical sophistication that inspired comment. Articles also appeared that described George’s love for Mae West and his admiration for Hitler.

Robots provide us with an opportunity not just to marvel at their technological wizardry, but also to think about what it really is to be human.

In the same way, as we start to have new types of conversations with online collections, to explore their many-faceted personalities, we will of course be exploring ourselves.

The digital transformation of cultural collections is not about showcasing technology but about creating new online spaces in which we can simply be human.

‘A map and some pins': open data and unlimited horizons


This is the text of my keynote address to the Digisam conference on Open Heritage Data in the Nordic Region held in Malmö on 25 April 2013. You can also view the video and slides of my talk, or experience the full interactive experience by playing around with my text/presentation in all its LOD-powered glory. (Use a decent browser.)

The Australian poet and writer Edwin James Brady and his family lived for many years in the isolation of far eastern Victoria — in a little town called Mallacoota.

Edwin James Brady

Edwin James Brady (NLA: nla.pic-vn3704359)

Here, from time to time, Brady amused himself by taking a map of Australia down from the wall and sticking pins in it. The pins, Brady explained in 1938, included labels such as ‘Hydro-electric supply base’, ‘Irrigation area, and ‘Area for tropical settlement’. The map and its pins were one expression of Brady’s life-long obsession with Australia’s potential for development — for progress.

Maps and pins are probably more familiar now than they were in Brady’s time. We use them routinely for sharing our location, for plotting our travel, for finding the nearest restaurant. Maps and pins are one way that we document, understand and express our relationship to space.

Brady, however, was interested in using his pins to highlight possibilities. In the late nineteenth and early twentieth centuries size mattered. With the nations of Europe jostling for land and colonial possessions, space become an index of power. When the Australian colonies came together in 1901 to form a nation, maps and spatial calculations abounded. Australia was big and so it’s future was filled with promise.

Australia was big with promise.

Australia was big with promise.

In his travels around Australia, EJ Brady started to catalogue ways in which its vast, ‘empty’ spaces might be turned to productive use. A hardy yeomanry armed with the latest science could transform these so-called ‘wastes’, and Brady was determined to bring these opportunities to the attention of the world.

This evangelical crusade reached its high point in 1918, with the publication of his huge compendium, Australia Unlimited — 1139 pages of ‘Romance History Facts & Figures’.

National Archives of Australia:  A659, 1943/1/3907, page 135

National Archives of Australia:
A659, 1943/1/3907, page 135

Space may no longer be invested with the same sense of swelling power, but our maps and pins still figure in calculations of progress. Now it is the data itself that awaits exploitation. Our carefully plotted movements, preferences and passions may well have value to governments, planners or advertisers. Data, according to some financial pundits, ‘is the new oil’.

Whereas Brady traveled the land documenting its untapped riches, we can take his work and life and mine it for data — looking for new patterns and possibilities for analysis.

Brady wasn’t the first to use the phrase ‘Australia Unlimited’, though he did much to make it familiar. By exploring the huge collection of digitised newspapers available through the National Library of Australia’s discovery service, Trove, we can track the phrase through time.


Brady was a skilled self-publicist and his research trips in 1912 were eagerly reported by the local press such as the Barrier Miner in Broken Hill and the Cairns Post.

In 1918 the book was published, receiving a generally positive reception — as an advertising leaflet showed, even King George V thought the book was ‘of special interest’. In 1926, a copy of the book was presented to a visiting delegation from Japan.

NAA: A659, 1943/1/3907, page 55

NAA: A659, 1943/1/3907, page 55

Over the years Brady sought to build on his modest successes, planning a variety of new editions and even a movie. But while his hopes were thwarted, the phrase itself lived on.

In 1938 and 1952, there are references to a radio program called ‘Australia Unlimited’, apparently featuring young musical artists. Also in 1952 came the news that the author of Australia Unlimited, EJ Brady, had died in Mallacoota at the age of 83.

Sydney Morning Herald, 23 July 1952

Sydney Morning Herald, 23 July 1952

Unfortunately copyright restrictions bring this analysis to a rather unnatural end in 1954. If we were able to follow it through until the present, we could see that from 1957 the phrase was used by a leading daily newspaper as the title of an annual advertising supplement detailing Australia’s possibilities for development. In 1958, it was adopted as a campaign slogan by Australia’s main Conservative party. In 1999 it was resurrected again by a major media company for a political and business symposium. Even now it provides the focus for a ‘branding’ initiative supported by the federal government.

Graphs like this are pretty easy to interpret, but of course we should always ask how they were made. In this case I simply harvested the number of matching search results from the Trove newspaper database for each year. The tool I built to do this has been through several different versions and is now a freely-accessible web application called QueryPic. Anyone can quickly and easily create this sort of analysis just by entering a few keywords.

QueryPic is one product of my experiments with the newspaper database over the last couple of years. I’ve also looked at changes to content of front pages, I’ve created a combination discovery interface and fridge poetry generator, and I’ve even built a simple game called Headline Roulette — which I’m told is strangely addictive.

All of these tools and experiments take advantage of Trove’s API. I think it’s important to note that the delivery of cultural heritage resources in a machine-readable form, whether through a custom API or as Linked Open Data, provides more than just improved access or possibilities for aggregation. It opens those resources to transformation. It empowers us to move beyond ‘discovery’ as a mode of interaction to analyse, extract, visualise and play.

Using freely available tools we can extract named entities from a text, we can look for topic clusters across a collection of documents, we can find places and pin them to a map. With a little bit of code I can take the newspaper reports of Brady’s travels in 1912 and map them. With a bit more time I could take another of Brady’s travel books, River Rovers, available in digitised form through the Internet Archive, and plot his journey along Australia’s longest river, the Murray.

Such transformations help us see resources in different ways — we can find new patterns, new problems, new questions. But transformation is a lossy business. I can put a pin in a map to show that Brady stopped off in Mildura on his voyage along the Murray. What is much harder to represent are the emotions that surrounded that visit. While he was there Brady received news of a friend’s death. ‘Bad news makes hateful the most pleasant place of abiding’, he wrote mournfully, ‘I strained to open the gate to go forth again into a wilderness of salt bush and sere sand’. Travel can be a form of escape.

Digital humanist and designer Johanna Drucker has written about the problems of representing the human experience of time and space using existing digital tools. ‘If I am anxious’, she notes, ‘spatial and temporal dimensions are distinctly different than when I am not’. We do not experience our journeys in a straightforward linear fashion — as the accumulation of metres and minutes. We invest each footstep with associations and meanings, with hopes for the future and memories of the past. Drucker calls on humanities scholars to articulate these complexities and work towards the development of new techniques for modelling and visualising our data.1

‘If human beings matter, in their individual and collective existence, not as data points in the management of statistical information, but as persons living actual lives, then finding ways to represent them within the digital environment is important.’2

In a similar way, Australia Unlimited is not just a catalogue of potentialities, or a passionate plea for national progress. It’s also the story of a struggling poet trying to find some way of supporting his family. Proceeds from the book enabled Brady to buy a plot of land in Mallacoota and build a modest home — I suspect that it wasn’t the same home that was painted by his wife in the 1950s.


But even this small success was undermined. Distribution of the book was beset with difficulties and disappointments and and despite all his plans financial security remained elusive.

Brady’s youngest daughter, Edna June, was born to his third wife Florence in 1946. What could he leave he leave her? A ‘modern edition’ of Australia Unlimited lay completed but unpublished. ‘If it fails to find a publisher’, he remarked wistfully, ‘the MSS will be a liberal education for her after she has outgrown her father’s nonsense rhymes’. It was, he pondered, ‘a sort of heritage’.

One of the things I love about being a historian is that the more we focus in on the past the more complicated it gets. People don’t always do what we expect them to, and that’s both infuriating and wonderful.

Likewise, while we often have to clean up or ‘normalise’ cultural heritage data in order to do things with it, we should value its intrinsic messiness as a reminder that it is shot through with history. Invested with the complexities of human experience it resists our attempts at reduction, and that too is both infuriating and wonderful.

The glories of messiness challenge the extractive metaphors that often characterise our use of digital data. We’re not merely digging or mining or drilling for oil, because each journey into the data offers new possibilities — our horizons are opened, because our categories refuse to be closed. These are journeys of enrichment, interpretation and creation, not extraction.

We’re putting stuff back, not taking it out.

Cultural institutions have an exciting opportunity to help us work with this messiness. The challenge is not just to pump out data, anyone can do that. The challenge is to enrich the contexts within which we meet this data — to help us embrace nuance and uncertainty; to prevent us from taking the categories we use for granted.

For all it’s exuberant optimism, a current of fear ran through Australia Unlimited. The publisher’s prospectus boldly announced that it was a ‘Book with a Mission’. ‘A mere handful of White People’, perched uncomfortably near Asia’s ‘teeming centres of population’, could not expect to maintain unchallenged ownership of the continent and its potential riches. Australia’s survival as a white nation depended upon ‘Effective Occupation’, secured by a dramatic increase in population and the development of its vast, empty lands — ‘The Hour of Action is Now!’.

National Archives of Australia:  A659, 1943/1/3907, page 208

National Archives of Australia:
A659, 1943/1/3907, page 208

In 1901, one of the first acts of the newly-established nation of Australia was to introduce legislation designed to keep the country ‘white’. Restrictions on immigration, administered through a complex bureaucratic system, formed the basis of what became known as the White Australia Policy.

While the legislation was designed to keep non-white people out, an increase of the white population was seen as essential to strengthen security and legitimise Australia’s claim to the continent. Australia Unlimited was an exercise in national advertising aimed at filling the unsettling emptiness with sturdy, white settlers.

But White Australia always a myth. As well as the indigenous population there were, in 1901, many thousands of people classified as non-white living in Australia. They came from China, India, Indonesia, Turkey and elsewhere. A growing number had been born in Australia. They were building lives, making families and contributing to the community.

Here are some of them…

The real face of White Australia

The real face of White Australia

I built this wall of faces using records held by the National Archives of Australia. If a non-white person resident in Australia wanted to travel overseas they needed to carry special documents. Without them they could be prevented from re-entering the country — from coming home. Many, many thousands of these documents are preserved within the National Archives.

Kate Bagnall, a historian of Chinese-Australia, and I are exploring ways of exposing these records through an online project called Invisible Australians.

To build the wall I downloaded about 12,000 images from the Archives’ website — representing just a fraction of one series relating to the administration of the White Australia Policy. Unfortunately there’s no machine-readable access to this data, so I had to resort to cruder means — reverse-engineering interfaces and screen-scraping.

Once I had the images I ran them through a facial detection script to find and crop out the portraits. What we ended up with was a different way of accessing those records — an interface that brings the people to the front; an interface which is compelling, discomfiting, and often moving.

The wall of faces also raises interesting questions about context. Some people might be concerned by the loss of context when images are presented in this way, although each portrait is linked back to the document it was derived from, and to the Archive’s own collection database. What is more important, I think, are the contexts that are gained.

If you’re viewing digitised files on the National Archives’ own website, you can only do so one page at a time. Each document is separate and isolated. What changes when you can see the whole of the file at once? I’ve built another tool that lets you do just that with any digitised file in the Archives’ collection. You see the whole as well as the parts. You have a sense of the physical and conceptual shape of the file.

National Archives of Australia:  ST84/1, 1908/471-480

National Archives of Australia:
ST84/1, 1908/471-480

In the case of the wall of faces, bringing the images together, from across files, helps us understand the scale of the White Australia Policy and how it impacted on the lives of individuals and communities. These were not isolated cases, these were thousands of ordinary people caught up in the workings of a vast bureaucratic system. The shift of context wrought by these digital manipulations allows us to see, and to feel, something quite different.

And we can go the other way. In another experiment I created a userscript to insert faces back into Archives’ website. A userscript is just a little bit of code that rewrites web pages as they load in your browser. In this case the script grabs images relating to the files that you’re looking at from Invisible Australians.

So instead of this typical view of search results.



You see something quite different.



Instead of just the record metadata for an individual item, you see that there are people inside.

We also have to remember that the original context of these records was the administration of a system of racial surveillance and exclusion. The Archives preserves not only the records, but the recordkeeping systems that were used to monitor people’s movements. The remnants of that power adhere to the descriptive framework. There is power in the definition of categories and the elaboration of significance.

Thinking about this I came across Wendy Duff and Verne Harris’s call to develop ‘liberatory standards’ for archival description. Standards, like categories, are useful. They enable us to share information and integrate systems. But standards also embody power. How can we take advantage of the cooperative utility of standards while remaining attuned to the workings of power?

A liberatory descriptive standard, Duff and Harris argue: ‘would posit the record as always in the process of being made, the record opening out of the future. Such a standard would not seek to affirm the keeping of something already made. It would seek to affirm a process of open-ended making and re-making’.3

‘Holes would be created to allow the power to pour out.’

‘Making and re-making’ — sounds a lot like the open data credo of ‘re-use and re-mix’ doesn’t it? I think it’s important to carry these sorts of discussions about power over into the broader realm of open data. After all, open data must always, to some extent, be closed. Categories have been determined, data has been normalised, decisions made about what is significant and why. There is power embedded in every CSV file, arguments in every API.

This is inevitable. There is no neutral position. All we can do is encourage re-use of the data, recognising that every such use represents an opening out into new contexts and meanings. Beyond questions of access or format, data starts to become open through its use. In Duff and Harris’s words, we should see open data ‘as always in the process of being made’.

What this means for cultural institutions is that the sharing of open data is not just about letting people create new apps or interfaces. It’s about letting people create new meanings. We should be encouraging them to use our APIs and LOD to poke holes in our assumptions to let the power pour out.

There’s no magic formula for this beyond, perhaps, building confidence and creating opportunities. But I do think that Linked Open Data offers interesting possibilities as a framework for collaboration and contestation — for making and challenging meanings.

We tend to think about Linked Open Data as a way of publishing — of pushing our data out. But in fact the production and consumption of Linked Open Data are closely entwined. The links in our data come from re-using identifiers and vocabularies that others have developed. The linked data cloud grows through a process of give and take, by many small acts of creation and consumption.

There’s no reason why that process should be confined to cultural institutions, government departments, business, or research organisations. Linked Open Data enables any individual to talk about what’s important to them, while embedding their thoughts, collections, passions or obsessions within a global conversation. By sharing identifiers and vocabularies we create a platform for communication. Anyone can join in.

So, if we want people to engage with our data, perhaps we need to encourage them to create their own.

I’ve just been working on a project with the Mosman public library in Sydney aimed at gathering information about the experiences of local servicepeople during World War One. There are many such projects happening around the world at the moment, but I think ours is interesting in a couple of ways. The first is a strong emphasis on linking things up.

The are records relating to Australian service people available through the National Archives, the Australian War Memorial, and the Commonwealth War Graves Commission, but there are currently no links between these databases. I’ve created a series of screen scrapers that allow structured data to be easily extracted from these sources. That means that people can, for the first time, search across all these databases in one hit. It’s a very simple tool that I started coding to ease the boredom of a long bus trip — but it has proved remarkably popular with family historians.

Once you’ve found entries in these databases, you can just cut and paste the URL into a form on the Mosman website and a script will retrieve the relevant data and attach it to the record of the the person you’re interested in. Linking to a service record in the National Archives, for example, will automatically create entries for the person’s birthplace and next-of-kin.

The process of linking builds structures, and these structures will themselves all be available as Linked Open Data. Even more exciting is that the links will not only be between the holdings of cultural institutions. The stories, memories, photographs and documents that people contribute will also be connected, providing personal annotations on the official record.

None of this is particularly hard, it’s just about getting the basics right. Remembering that structure matters and that links can have meaning. It’s also about recognising that ‘crowd sourcing’ or user-generated content can be made anywhere. Using Linked Open Data people can attach meanings to your stuff without visiting your website. Through the process of give and take, creation and consumption, we can build layers of description, elaboration, and significance across the web.

What excites me most about open cultural data is not the possibility of shiny new apps or collection visualisations, but the possibility of doing old things better. The possibility of reimagining the humble footnote, for example, as a re-usable container of structured contextual data — as a form of distributed collection description. The possibility of developing new forms of publication that immerse text and narrative within a rich contextual soup brewed from the holdings of cultural institutions.

I want every online book or article to be a portal. I want every blog or social media site to be a collection interface.

What might this look like? Perhaps something like this presentation.

My slides today are embedded within a HTML document that incorporates all sorts of extra goodies. The full text of my talk is here, and as you scroll through it you’ll see more information about the people, places and resources I mention pop up in the sidebar. Alternatively you can explore just the people or the resources, looking at the connections between them and the contexts in which they’re mentioned within my text.

This is part of an ongoing interest to explore different ways of presenting historical narrative, that build a relationship between the text and the structured data that underlies the story.

All of the structured data is available in machine-readable form as RDF — it is, in itself, a source of LOD. In fact the interface is built using an assortment of JavaScript libraries that read the RDF into a little temporary triplestore, and then query it to create the different views. So the whole thing is really powered by LOD.

It’s still very much an experiment, but I think it raises some interesting possibilities for thinking about how we might consume and create LOD simply by doing what we’ve always done — telling stories.

EJ Brady’s dreams were never realised. Australia’s vast spaces remained largely empty, and the poet continued to wrestle with personal and financial disappointment. ‘After nearly eight decades association with the man’, Brady wrote of himself in 1949, ‘I have come to look upon him as the most successful failure in literary history’. This energetic booster of Australia’s potentialities was well aware of his own life’s mocking irony. ‘He has not… made the wages of a wharf laborer out of book writing yet he persists in asserting Australia is the best country in the world!’.

But still Brady continued to add pins to his map.’For half a century I’ve been heaping up notes, reports, clippings, pamphlets, etc. on… all phases of the country’s life and development’. ‘What in hell I accumulate such stuff for I don’t know’, he complained in 1947. As the elderly man surveyed the ‘bomb blasted pile of rubbish’ strewn about his writing tent, he admitted that ‘this collecting is a sort of mania’.

Brady’s map and pins told a complex story of hope and disappointment, of confidence and fear. A story that combined national progress with an individual’s attempts merely to live.

There are stories in our data too — complex and contradictory stories full of emotion and drama, disappointment and achievement, violence and love. Let’s find ways to tell those stories.

  1. Johanna Drucker, ‘Humanistic Theory and Digital Scholarship’, in Matthew K. Gold (ed.), Debates in the Digital Humanities, University of Minnesota Press, 2012. []
  2. Johanna Drucker, ‘Representation and the digital environment: Essential challenges for humanists’, University of Minnesota Press Blog, http://www.uminnpressblog.com/2012/05/representation-and-digital-environment.html []
  3. Wendy M. Duff and Verne Harris, ‘Stories and Names: Archival Description as Narrating Records and Constructing Meanings’, Archival Science, vol. 2, 2002, pp. 263–285. []

Exposing the archives of White Australia

I recently gave a presentation in the Institute of Historical Research’s Digital History Seminar series. The time difference between London and Canberra was a bit of a challenge, so I pre-recorded the presentation and then sat in my own Twitter backchannel while it played. For the full podcast information go to HistorySPOT. You can also play with my slides or peruse the #dhist Twitter archive.

Exposing the Archives of White Australia from History SPOT on Vimeo.

Bus trips and building

Last week I took my daughter to Sydney so she could attend a girls-only Minecraft workshop at the Powerhouse Museum (they created some wonderful things). It was a 3½ bus journey each way, so to keep myself occupied I set myself the challenge of trying to build something en route. I made a fair bit of progress, but ultimately failed. I had to steal a few extra hours this week to get it to the point where people might find it useful.

The Australian WWI Records Finder

The Australian WWI Records Finder

So here it is — a (sort of) aggregated search interface to records about Australian First World War service personnel. Give it a name and it will search:

It’s ‘sort-of’ aggregated because it’s really just a series of separate searches presented on the one page. But even this should make it easier for people to match up records across the different data sets.


Type in a family name and, optionally, a given name or a service number. Hit search. Wait. Wait a bit more. The National Archives’ RecordSearch database can often be pretty slow. Eventually though, each of the databases will be queried in turn and the results added to the page.

Once the results have loaded, click on a title and the little spinny thing will start up again as more details are retrieved from the database. In this ‘detail’ view, all the other results from the database are hidden. This makes it a bit easier to compare records across databases. Just click on the title again to go back to the ‘list’ view.

If your search returns lots of results, you can use the ‘next’ and ‘previous’ links to explore the complete set. They’ll all load in the current page via the magic of AJAX.

It’s not obvious from the interface, but you can feed query parameters directly via the url. For example try http://wraggelabs.com/ww1-records/?family_name=wragge. Why is this useful? Perhaps you’ve got your own database of names on the web. Using this you could easily create links from each name that looked for relevant records in the Finder.

That’s about it. It’s just a quick, bus-trip-inspired experiment, so there are many limitations and future possibilities.



I’m just using the standard search interfaces of the various databases and screen-scraping the results. Unfortunately they all work slightly differently. For example, the AWM databases don’t distinguish between family names and given names, so if you search for the family name ‘Smith’ you’ll also get results like ‘Jones, Bruce Smith’. The CWGC database, on the other hand, will only match an other name if it comes first, while RecordSearch (or more strictly NameSearch) will also match the names of next-of-kin. Fun fun fun.

I figure anything is better than nothing, but if you’re not getting the results you expect head off to the original interfaces and try your luck there. I’m making no promises.

You’ll also notice that the maximum number of results for each data source varies. The CWGC returns 15 results, while the AWM hands over a whopping 50. These are just the default settings for the original search engines. I could’ve fiddled with the settings, but it didn’t really seem worth it.

And oh yeah… screen scraping… inherently fragile… might fall over and die at any minute.


As you may have guessed from previous posts, I rather like making connections. This experiment grew out of the work I’m doing on the ‘Doing Our Bit’ project with the Mosman Library. I’ve been building a series of forms that will make it easy for contributors to link people in the Mosman project to any of these databases. Just paste in a url from RecordSearch and the system will automagically retrieve all the file metadata and also check for an entry in Mapping our Anzacs. It’s pretty nifty. But of course it made me think about having a way to search across all these different databases.

And then what?

Having found a series of records for an individual it would be good if they could then be permanently linked. If I had the time and money to do more work on this, I’d want to allow people to save the connections they find. And of course then expose these connections as Linked Open Data. It wouldn’t be difficult.

There’s probably also a lot more that could be done with machine matching of records. Perhaps someone’s already working on this for the centenary — it seems like an obvious point of attack. It would be good if the forthcoming centenary commemorations resulted in something that brought all these datasets together and exposed identifiers that could be easily used by community projects like ‘Doing Our Bit’.


Yes, I cheated. I had already done a lot of work on the screen-scrapery bits of this pre bus trip. I’ve been working a RecordSearch client on and off for a while to use with projects like Invisible Australians. The AWM and CWGC scrapers I wrote for ‘Doing Our Bit’. Feel free to grab the code and play.

The actual application was built using the Python micro-framework Flask. I’m a big fan of Django, but there’s a lot of overhead involved if you just want to throw together a simple app. I’ve been wanting to try Flask for a while and was pleased to find just how quick and fun it was to get something up and running.

To make the whole thing as responsive as possible, the search results are retrieved using AJAX calls to simple APIs I built in Flask on top of my screen scraper code. There’s actually very little code in the Flask app itself. The downside of this is that the Javascript is a bit of a mess. Ah well.


I don’t know whether I can put any more time into this at the moment — too many other projects competing for my time and no more bus trips coming up. But if you think it’s useful or worthwhile please let me know and I’ll see what I can do.

At the very least it shows how with just a little impatience and ingenuity we can find fairly simple ways to integrate records from a variety of sources. We don’t have to wait for some centralised solution.

2012 — The Making

I obviously did a lot of talking in 2012, but I also made a few things…

The evolution of QueryPic

Screen Shot 2012-09-27 at 12.08.28 AM

Try QueryPic

At the start of 2012 QueryPic was a fairly messy Python script that scraped data from the Trove newspaper database and generated a local html file. It worked well enough and was generously reviewed in the Journal of Digital Humanities. But QueryPic’s ability to generate a quick visualisation of a newspaper search was undermined by the work necessary to get the script running in the first place. I wanted it to be easy and accessible for everyone.

Fortunately the folks at the National Library of Australia had already started work on an API. Once it became available for beta testing, I started rebuilding QueryPic — replacing the Python and screen-scraping with Javascript and JSON.

In the meantime, I headed over the New Zealand for a digital history workshop and began to wonder about building a NZ version of QueryPic based on the content of Papers Past, available through the DigitialNZ API. The work I’d already done with the Trove API made this remarkable easy and QueryPic NZ was born.

Once the Trove API was publicly released I finished off the new version of QueryPic. Instead of a Python script that had to be downloaded and run from the command line, QueryPic was now a simple web form that generated visualisations on demand.

The new version also included a ‘shareable’ link, but all this really did was regenerate the query. There was no way of citing a visualisation as it existed at a certain point in time. If QueryPic was going to be of scholarly use, it needed to be properly citable. I also wanted to make it possible to visualise more complex queries.

And so the next step in QueryPic’s evolution was to hook the web form to a backend database that would store queries and make them available through persistent urls. With the addition of various other bells and whistles, QueryPic became a fully-fledged web application — a place for people to play, to share and to explore.

Headlines and history

Explore The Front Page

Explore The Front Page

Back in 2011 I started examining ways of finding and extracting editorials from digitised newspapers.  Because the location of editorials is often tied up with the main news stories, this started me thinking about when the news moved to the front page. And of course this meant that I ended up downloading the metadata for four million newspaper articles and building a public web application — The Front Page — to explore the results. ;-)

The Front Page was also the first resource published on my new dhistory site (since joined by the Archives Viewer and QueryPic). dhistory — ‘your digital history workbench’ — is where I hope to collect tools and resources that have graduated from WraggeLabs.

Viewing archives

Try Archives Viewer

Try Archives Viewer

In 2012 I also revisited some older projects. After much hair-pulling and head-scratching, I finally managed to get the Zotero translator for the National Archives of Australia’s RecordSearch database working nicely again. I also updated it to work with the latest versions of Zotero, including the new bookmarklet.

My various userscripts for RecordSearch also needed some maintenance. This prompted me to reconsider my hacked together alternative interface for viewing digitised files in RecordSearch. While the userscript worked pretty well, there were limits to what I could do. The alternative was to build a separate web interface… and so the Archives Viewer was born.

Stories and data

Expect bugs ye who enter here...

Expect bugs ye who enter here…


In the ‘work-in-progress’ category is the demo I put together for my NDF2012 talk, Small stories in a big data world. Expect to see more of this…

My favourite things

Two things I made in 2012 are rather special (to me at least). Instead of responding to particular needs or frustrations, these projects emerged from late night flashes of inspiration — ‘what if…?’ moments. They’re not particularly useful, but both have encouraged me to think about what I do in different ways.



The Future of the Past is a way of exploring a set of newspaper articles from Trove. I’ve told the story of its creation elsewhere — I simply fell in love with the evocative combinations of words that were being generated by text analysis and wanted to share them. It’s playful, surprising and frustrating. And you can make your own tweetable fridge poetry!

Screen Shot 2012-07-10 at 5.20.45 PM

The People Inside

One night I was thinking about The Real Face of White Australia and the work I’d done extracting photos of people from the records of the National Archives of Australia’s database. I wondered what would happen if we went the other way — if we put the people back into RecordSearch. The result was The People Inside — an experiment in rethinking archival interfaces.


2012 — the talking

In an attempt to try and figure out where this year went I’ve pulled together a list of my talks, presentations and workshops for 2012…

7 January 2012 — ‘Invisible Australians: Living under the White Australia Policy’, contribution to the Crowdsourcing History: Collaborative Online Transcription and Archives panel, American Historical Association annual conference, Chicago. [slides]

8 January 2012 — ‘Making friends with text mining’, contribution to the A Conversation about Text Mining as a Research Method panel, American Historical Association annual conference, Chicago.

10 January 2012 — ‘Collections, interfaces, power and people’, McGill University.

12 January 2012 — ‘Collections, interfaces, power and people’, University of Western Ontario.

7 February 2012Mining the treasures of Trove: new approaches and new tools, VALA2012.

23 March 2012 — ‘Mining Trove’, Digital History Workshop, Victoria University of Wellington.

29 March 2012 — ‘Inside the bureaucracy of White Australia’, Digital Humanities 2012, Canberra. [slides]

8 May 2012Mining for meanings, Harold White Fellowship Lecture, National Library of Australia, Canberra.

27 June 2012 — ‘Beyond the front page’, combined meeting of the Canberra Society of Editors and the Australian and New Zealand Society of Indexers, Canberra. [slides]

19 July 2012 — ‘The responsibilities of data’, Framing Lives: The 8th Biennial Conference of the International Auto/Biography Association, Canberra. [slides]

11 August 2012, Doing Our Bit Build-a-thon, Mosman Library.

12 October 2012Digital disruptions: Finding new ways to break things, Faculty of Arts eResearch Forum, University of Melbourne.

19 October 2012Too important not to try, Dipping a toe into Digital Humanities, Deakin University.

25 October 2012 — Digital disruptions: Finding new ways to break things, Australian National University.

1 November 2012 — Digital disruptions: Finding new ways to break things, Digital Humanities Symposium, University of Queensland.

13-15 November 2012Digital dimensions: A hand-on workshop for the DH curious, University of Queensland.

20 November 2012Small stories in a big data world, National Digital Forum, New Zealand.

22 November 2012, Learning how to break things, workshop at THATCamp Wellington. [outline]

29 November 2012Archives of emotion, Rethinking Archival Methods workshop, Sydney.

12 December 2012 — ‘Introducing Digital Humanities’, State Library of New South Wales.

Archives of emotion

Presented at the Reinventing Archival Methods workshop, 29 November 2012, in Sydney.

One weekend, a bit over a year ago, I built this — a wall of faces of people forced to live within the restrictions of the White Australia Policy, drawn from records held by the National Archives of Australia. It created a lot of interest, both here and overseas, particularly after I talked about it at the 2011 National Digital Forum in New Zealand.

My original post was republished in South Africa, my NDF talk made it into the inaugural edition of the Journal of Digital Humanities. The wall is being studied as part of a digital history course in the US, and was cited by two papers at the Museums and the Web conference this year. It’s also been referenced in discussions on visualisation, serendipity and race.

But perhaps most important was the email we received in which the sender described scrolling through the wall with tears rolling down their face.

It’s also important to note that the project of which the wall forms part — Invisible Australians — is completely unfunded and has no institutional home. It’s a project driven by passion. It’s a project born out of the sense of obligation and responsibility that my partner, Kate Bagnall, and I feel towards the people whose lives are documented in the archives.

Last week I was at NDF 2012, where Courtney Johnston called on us to consider the emotional landscapes in and around our collections. So it started me wondering, what is the role of emotion in the archives?

There is clearly no neutral position. In Archival Methods David Bearman rightly criticises the idea that the value of archivists lies in their political disengagement — as faithful guardians of the accumulated past. And of course archival writers like Verne Harris and Terry Cook have developed this critique in some detail.

Bearman suggests that archives can instead be seen as ‘marshaling centers’, that enable people not to observe some distant past, but to mobilise the past within their own lives — to find connections and meanings.

Recently I was talking to an academic researching the role of historical thinking in education. He argued that an emotional connection had to come first. Only then could rational arguments take root — only then could opinions, ideas and lives be changed.

And yet emotion still seems like something best avoided in public. We try not to ‘inflame’ it, we rarely seek to nurture it. Exposing the rawness of emotion is often seen as cheap or manipulative. And yet it happens, always, in and around our cultural collections.

What user or worker in archives has not been moved? By the voices and stories contained within the records, by the sheer excitement of discovery, or perhaps by the overwhelming burden of responsibility. If as Bearman argues, ‘the pasts we construct are all discussions with the present’, then these discussions are infused with joy and anger, with fear and longing, with sadness and gratitude.

Why are we so reluctant to acknowledge that archives are repositories of feeling? Is emotion meaningless because it can’t be quantified, dangerous because it can’t be controlled, or does it simply not fit with the professional discourse of evidence, authority and reliability.

As our experience of archives moves further into the online realm, so the possibilities for making emotional connections increases — simply because it’s so much easier to share. From the like button or the retweet, through to a lovingly-tended personal collection in something like Pinterest — we have new opportunities to explore what’s important to us and why.

This is happening now. Voices from the past are finding their way into online conversations. But what voices and whose conversations? Even as welcome this sort of engagement we have to remember what is not online, what is not accessible, and all the social, technical and political barriers that can prevent someone from joining the discussion.1

It worries me too that our emotional connections may be too small, too fragile to survive in the world of big data. We live in a age where our online preferences are monitored, our sentiments analysed — our feelings are harvested and tallied in order to sell us more stuff. The line between expression and consumption is increasingly blurred.

Back in the pre-web era, Bearman imagined access to archives through ‘intelligent artifices’ that would bridge databases and connect vocabularies — responding to, and learning from the activities of users. Twenty-five years later we’re exploring these possibilities at a global scale, through Linked Open Data.

While Linked Open Data is often described like a giant plumbing project, it’s really about making a whole lot of very small connections. To me it offers an opportunity to fight back against the homogenisation of data. We can use it to express complex relationships with the past. But we need to know how, and we need to find the points at which we can plug ourselves in.

Perhaps these are Bearman’s ‘marshaling centers’, short-circuiting our online connections to jack us into the past. Not a fixed or nostalgic past, but a challenging and contested past, both real and yet unknowable. As feeling becomes commodified and neutered through a variety of online filters, perhaps archives can hack us directly into powerful conduits of meaning and emotion.

How might this happen? There’s the technical stuff — persistent identifiers, blah, blah, blah — vitally important of course. But then there’s the relationship stuff. We have to stop talking about users and start talking about collaborators. We need to stop building services to be consumed, and start opening opportunities to create, to play, to break and to hack. We are all making connections.

Most importantly we need to find and support the people, both inside and outside our organisations, who are driven by passion. The people who care. The people who simply give a shit.2

  1. See, for example Tim Hitchcock’s 5 minute rant []
  2. ‘Give a shit’ from Alex Madrigal via Courtney Johnston’s opening remarks for NDF 2011 []

Small stories in a big data world

Presented at the National Digital Forum, Wellington, 20 November 2012. You can also watch the video.

Previously at NDF:

As we return to the action, Tim is wondering what happens when we bring stories and data together…

As historians, as cultural heritage professionals, as people — we make connections, we make meanings. That’s just what we do.

What really excites me about Linked Open Data is not the promise of smarter searches, but the possibilities for making connections and meanings in ways that are easier to traverse — to explore, to wander, to linger, or even to stumble.

What really frustrates me about Linked Open Data is that we still tend to talk about it as if it’s all engineering — an international plumbing project to pump data around the globe. Linked Open Data doesn’t have to be an industrial undertaking, it can be a craft, a mode of expression. It can be created with love or in anger.

And anyone can do it.

I’m currently working on a project with the Mosman Library in Sydney to collect information about the World War I experiences of local service people. The web resource we’re building will provide Linked Data all the way down. Every time someone adds a story about a person, uploads a photograph, identifies a place, or includes a link to another resource, they will be minting identifiers, creating relationships, documenting properties — sharing their knowledge as Linked Open Data.

It seems to me that Linked Open Data will be a success not when we’ve standardised on a few vocabularies, or linked everything we possibly can to DBpedia, but when have thriving online communities creating and sharing structured data about the things that are important to them. Not just the known and notable, but the local, the contested, the endangered, the ephemeral and the oppressed.

Many of us live within a Western tradition which equates knowledge with accumulation. Linked Open Data promises new means of aggregation, new powers of discovery — lots and lots more stuff! But it would be a tragedy if all we ended up with was a bigger database or a better search engine. I want more. I want new ways of using that data, of playing with structures and scales. I want to build rich contexts around my stories.

Last year I talked about this in a keynote I gave to the Australian and New Zealand Society of Indexers. To try and demonstrate some of the possibilities, I created a fancy presentation and added a whole lot of linked data to the text of my talk. But it was a bit of a cheat. The text, the triples and the presentation were still pretty much separate. What I really wanted to do was use the linked data to generate alternative views of the text, to take my story and look at it through a variety of linked data powered filters.

So for NDF this year I thought I’d have another go. I set myself a few groundrules:

  • Simple tools — should be possible for anyone with a text editor.
  • No platforms — no sneaky server-side stuff, it all had to happen in the browser, on the fly.
  • No markup madness — I wanted there to be a close relationship between the text and the data, but I wanted the markup process to be practical — something like creating a footnote.

So I hacked together a whole lot of existing Javascript libraries. I used them to extract all the triples from my text and follow external identifiers to get extra information. Then I queried the little databank I’d made to generate four different views of my talk…

WARNING WARNING! Very early demo! Expect bugs and general stupidity!

Now, none of this looks terribly exciting. Visually the various components look pretty familiar — and that’s part of the point, I’m showing how you can re-use existing tools and code libraries.

What’s interesting, I think, is the dialogue that’s evolving between text and data — a dialogue that’s taking place within one, just one, html document.

Expect bugs ye who enter here…

So here’s the text of my talk to the indexers last year. As you scroll through the document, each paragraph on the screen is examined and information about related entities — people, places, events, objects — are displayed in a sidebar. The text and the sidebar are linked, so if you click on a link in the text more information about the related entity opens in the sidebar.

If you want to look at the resources separately you can. You can re-order, and filter by type.

Then there’s the fairly traditional timeline and map views.

Most of the data that’s being displayed is coming from RDFa within the document, but not all. There are links to GeoNames and DBPedia that are drawing in data on the fly. As more Linked Open Data becomes available these links can become deeper and richer.

It’s a very rough demo and I have a long to-do list — for example better links between the data views and the text, showing their context within the narrative. But hopefully you can get an idea of how it might be possible to build data-rich stories — with layers and views that enrich, inform and engage with the narrative.

And all just with one html page, a bit of RDFa and a few Javascript libraries.

There’s no magic.

You might be wondering about my ground-rules — why did I constrain myself? Well, it has to do with this thing we call ‘access’. Oftentimes when we talk about access we mean the power to consume — the power for people to take what they’re given.

But to really have access, for something to be truly open, people also have to have the power to create. To take what they’re given and build something new — to challenge, to criticise, to offer alternatives.

That means allowing people the space to have ideas, giving them the confidence to experiment, providing useful tools and the knowledge to use them. That’s not a job for any particular institution, or sector, it’s a challenge for all of us who build things to strip away the magic and invite others to join in.

And I think it’s pretty important. I don’t really want to live in a world where data is just something that other people collect about and for us. I want slow data, as Chris described last year. I want us to enjoy the textures and tastes and not get addicted to the processed product. I want to create, enrich, wield and wonder.

So my vision of the future of Linked Open Data, is not of the Giant Global Graph linking all knowledge. But a revolutionary army of data-artisans, hand-crafting their richly contextualised stories into a glorious, messy, confusing, infuriating, WONDERFUL tapestry.

Now I know you’re all just waiting for me to press the BOOM! button.

So let’s blow some shit up!