Exposing the archives of White Australia

I recently gave a presentation in the Institute of Historical Research’s Digital History Seminar series. The time difference between London and Canberra was a bit of a challenge, so I pre-recorded the presentation and then sat in my own Twitter backchannel while it played. For the full podcast information go to HistorySPOT. You can also play with my slides or peruse the #dhist Twitter archive.

Exposing the Archives of White Australia from History SPOT on Vimeo.

Bus trips and building

Last week I took my daughter to Sydney so she could attend a girls-only Minecraft workshop at the Powerhouse Museum (they created some wonderful things). It was a 3½ bus journey each way, so to keep myself occupied I set myself the challenge of trying to build something en route. I made a fair bit of progress, but ultimately failed. I had to steal a few extra hours this week to get it to the point where people might find it useful.

The Australian WWI Records Finder

The Australian WWI Records Finder

So here it is — a (sort of) aggregated search interface to records about Australian First World War service personnel. Give it a name and it will search:

It’s ‘sort-of’ aggregated because it’s really just a series of separate searches presented on the one page. But even this should make it easier for people to match up records across the different data sets.

Using

Type in a family name and, optionally, a given name or a service number. Hit search. Wait. Wait a bit more. The National Archives’ RecordSearch database can often be pretty slow. Eventually though, each of the databases will be queried in turn and the results added to the page.

Once the results have loaded, click on a title and the little spinny thing will start up again as more details are retrieved from the database. In this ‘detail’ view, all the other results from the database are hidden. This makes it a bit easier to compare records across databases. Just click on the title again to go back to the ‘list’ view.

If your search returns lots of results, you can use the ‘next’ and ‘previous’ links to explore the complete set. They’ll all load in the current page via the magic of AJAX.

It’s not obvious from the interface, but you can feed query parameters directly via the url. For example try http://wraggelabs.com/ww1-records/?family_name=wragge. Why is this useful? Perhaps you’ve got your own database of names on the web. Using this you could easily create links from each name that looked for relevant records in the Finder.

That’s about it. It’s just a quick, bus-trip-inspired experiment, so there are many limitations and future possibilities.

Limitations

<!–INSERT USUAL WARNING ABOUT THE FRUSTRATIONS OF SCREEN SCRAPING–>

I’m just using the standard search interfaces of the various databases and screen-scraping the results. Unfortunately they all work slightly differently. For example, the AWM databases don’t distinguish between family names and given names, so if you search for the family name ‘Smith’ you’ll also get results like ‘Jones, Bruce Smith’. The CWGC database, on the other hand, will only match an other name if it comes first, while RecordSearch (or more strictly NameSearch) will also match the names of next-of-kin. Fun fun fun.

I figure anything is better than nothing, but if you’re not getting the results you expect head off to the original interfaces and try your luck there. I’m making no promises.

You’ll also notice that the maximum number of results for each data source varies. The CWGC returns 15 results, while the AWM hands over a whopping 50. These are just the default settings for the original search engines. I could’ve fiddled with the settings, but it didn’t really seem worth it.

And oh yeah… screen scraping… inherently fragile… might fall over and die at any minute.

Possibilities

As you may have guessed from previous posts, I rather like making connections. This experiment grew out of the work I’m doing on the ‘Doing Our Bit’ project with the Mosman Library. I’ve been building a series of forms that will make it easy for contributors to link people in the Mosman project to any of these databases. Just paste in a url from RecordSearch and the system will automagically retrieve all the file metadata and also check for an entry in Mapping our Anzacs. It’s pretty nifty. But of course it made me think about having a way to search across all these different databases.

And then what?

Having found a series of records for an individual it would be good if they could then be permanently linked. If I had the time and money to do more work on this, I’d want to allow people to save the connections they find. And of course then expose these connections as Linked Open Data. It wouldn’t be difficult.

There’s probably also a lot more that could be done with machine matching of records. Perhaps someone’s already working on this for the centenary — it seems like an obvious point of attack. It would be good if the forthcoming centenary commemorations resulted in something that brought all these datasets together and exposed identifiers that could be easily used by community projects like ‘Doing Our Bit’.

Details

Yes, I cheated. I had already done a lot of work on the screen-scrapery bits of this pre bus trip. I’ve been working a RecordSearch client on and off for a while to use with projects like Invisible Australians. The AWM and CWGC scrapers I wrote for ‘Doing Our Bit’. Feel free to grab the code and play.

The actual application was built using the Python micro-framework Flask. I’m a big fan of Django, but there’s a lot of overhead involved if you just want to throw together a simple app. I’ve been wanting to try Flask for a while and was pleased to find just how quick and fun it was to get something up and running.

To make the whole thing as responsive as possible, the search results are retrieved using AJAX calls to simple APIs I built in Flask on top of my screen scraper code. There’s actually very little code in the Flask app itself. The downside of this is that the Javascript is a bit of a mess. Ah well.

Next

I don’t know whether I can put any more time into this at the moment — too many other projects competing for my time and no more bus trips coming up. But if you think it’s useful or worthwhile please let me know and I’ll see what I can do.

At the very least it shows how with just a little impatience and ingenuity we can find fairly simple ways to integrate records from a variety of sources. We don’t have to wait for some centralised solution.

2012 — The Making

I obviously did a lot of talking in 2012, but I also made a few things…

The evolution of QueryPic

Screen Shot 2012-09-27 at 12.08.28 AM

Try QueryPic

At the start of 2012 QueryPic was a fairly messy Python script that scraped data from the Trove newspaper database and generated a local html file. It worked well enough and was generously reviewed in the Journal of Digital Humanities. But QueryPic’s ability to generate a quick visualisation of a newspaper search was undermined by the work necessary to get the script running in the first place. I wanted it to be easy and accessible for everyone.

Fortunately the folks at the National Library of Australia had already started work on an API. Once it became available for beta testing, I started rebuilding QueryPic — replacing the Python and screen-scraping with Javascript and JSON.

In the meantime, I headed over the New Zealand for a digital history workshop and began to wonder about building a NZ version of QueryPic based on the content of Papers Past, available through the DigitialNZ API. The work I’d already done with the Trove API made this remarkable easy and QueryPic NZ was born.

Once the Trove API was publicly released I finished off the new version of QueryPic. Instead of a Python script that had to be downloaded and run from the command line, QueryPic was now a simple web form that generated visualisations on demand.

The new version also included a ‘shareable’ link, but all this really did was regenerate the query. There was no way of citing a visualisation as it existed at a certain point in time. If QueryPic was going to be of scholarly use, it needed to be properly citable. I also wanted to make it possible to visualise more complex queries.

And so the next step in QueryPic’s evolution was to hook the web form to a backend database that would store queries and make them available through persistent urls. With the addition of various other bells and whistles, QueryPic became a fully-fledged web application — a place for people to play, to share and to explore.

Headlines and history

Explore The Front Page

Explore The Front Page

Back in 2011 I started examining ways of finding and extracting editorials from digitised newspapers.  Because the location of editorials is often tied up with the main news stories, this started me thinking about when the news moved to the front page. And of course this meant that I ended up downloading the metadata for four million newspaper articles and building a public web application — The Front Page — to explore the results. ;-)

The Front Page was also the first resource published on my new dhistory site (since joined by the Archives Viewer and QueryPic). dhistory — ‘your digital history workbench’ — is where I hope to collect tools and resources that have graduated from WraggeLabs.

Viewing archives

Try Archives Viewer

Try Archives Viewer

In 2012 I also revisited some older projects. After much hair-pulling and head-scratching, I finally managed to get the Zotero translator for the National Archives of Australia’s RecordSearch database working nicely again. I also updated it to work with the latest versions of Zotero, including the new bookmarklet.

My various userscripts for RecordSearch also needed some maintenance. This prompted me to reconsider my hacked together alternative interface for viewing digitised files in RecordSearch. While the userscript worked pretty well, there were limits to what I could do. The alternative was to build a separate web interface… and so the Archives Viewer was born.

Stories and data

Expect bugs ye who enter here...

Expect bugs ye who enter here…

 

In the ‘work-in-progress’ category is the demo I put together for my NDF2012 talk, Small stories in a big data world. Expect to see more of this…

My favourite things

Two things I made in 2012 are rather special (to me at least). Instead of responding to particular needs or frustrations, these projects emerged from late night flashes of inspiration — ‘what if…?’ moments. They’re not particularly useful, but both have encouraged me to think about what I do in different ways.

Play!

Play!

The Future of the Past is a way of exploring a set of newspaper articles from Trove. I’ve told the story of its creation elsewhere — I simply fell in love with the evocative combinations of words that were being generated by text analysis and wanted to share them. It’s playful, surprising and frustrating. And you can make your own tweetable fridge poetry!

Screen Shot 2012-07-10 at 5.20.45 PM

The People Inside

One night I was thinking about The Real Face of White Australia and the work I’d done extracting photos of people from the records of the National Archives of Australia’s database. I wondered what would happen if we went the other way — if we put the people back into RecordSearch. The result was The People Inside – an experiment in rethinking archival interfaces.

 

2012 — the talking

In an attempt to try and figure out where this year went I’ve pulled together a list of my talks, presentations and workshops for 2012…

7 January 2012 — ‘Invisible Australians: Living under the White Australia Policy’, contribution to the Crowdsourcing History: Collaborative Online Transcription and Archives panel, American Historical Association annual conference, Chicago. [slides]

8 January 2012 — ‘Making friends with text mining’, contribution to the A Conversation about Text Mining as a Research Method panel, American Historical Association annual conference, Chicago.

10 January 2012 — ‘Collections, interfaces, power and people’, McGill University.

12 January 2012 — ‘Collections, interfaces, power and people’, University of Western Ontario.

7 February 2012Mining the treasures of Trove: new approaches and new tools, VALA2012.

23 March 2012 — ‘Mining Trove’, Digital History Workshop, Victoria University of Wellington.

29 March 2012 — ‘Inside the bureaucracy of White Australia’, Digital Humanities 2012, Canberra. [slides]

8 May 2012Mining for meanings, Harold White Fellowship Lecture, National Library of Australia, Canberra.

27 June 2012 — ‘Beyond the front page’, combined meeting of the Canberra Society of Editors and the Australian and New Zealand Society of Indexers, Canberra. [slides]

19 July 2012 — ‘The responsibilities of data’, Framing Lives: The 8th Biennial Conference of the International Auto/Biography Association, Canberra. [slides]

11 August 2012, Doing Our Bit Build-a-thon, Mosman Library.

12 October 2012Digital disruptions: Finding new ways to break things, Faculty of Arts eResearch Forum, University of Melbourne.

19 October 2012Too important not to try, Dipping a toe into Digital Humanities, Deakin University.

25 October 2012 — Digital disruptions: Finding new ways to break things, Australian National University.

1 November 2012 — Digital disruptions: Finding new ways to break things, Digital Humanities Symposium, University of Queensland.

13-15 November 2012Digital dimensions: A hand-on workshop for the DH curious, University of Queensland.

20 November 2012Small stories in a big data world, National Digital Forum, New Zealand.

22 November 2012, Learning how to break things, workshop at THATCamp Wellington. [outline]

29 November 2012Archives of emotion, Rethinking Archival Methods workshop, Sydney.

12 December 2012 — ‘Introducing Digital Humanities’, State Library of New South Wales.

Archives of emotion

Presented at the Reinventing Archival Methods workshop, 29 November 2012, in Sydney.

One weekend, a bit over a year ago, I built this — a wall of faces of people forced to live within the restrictions of the White Australia Policy, drawn from records held by the National Archives of Australia. It created a lot of interest, both here and overseas, particularly after I talked about it at the 2011 National Digital Forum in New Zealand.

My original post was republished in South Africa, my NDF talk made it into the inaugural edition of the Journal of Digital Humanities. The wall is being studied as part of a digital history course in the US, and was cited by two papers at the Museums and the Web conference this year. It’s also been referenced in discussions on visualisation, serendipity and race.

But perhaps most important was the email we received in which the sender described scrolling through the wall with tears rolling down their face.

It’s also important to note that the project of which the wall forms part — Invisible Australians — is completely unfunded and has no institutional home. It’s a project driven by passion. It’s a project born out of the sense of obligation and responsibility that my partner, Kate Bagnall, and I feel towards the people whose lives are documented in the archives.

Last week I was at NDF 2012, where Courtney Johnston called on us to consider the emotional landscapes in and around our collections. So it started me wondering, what is the role of emotion in the archives?

There is clearly no neutral position. In Archival Methods David Bearman rightly criticises the idea that the value of archivists lies in their political disengagement — as faithful guardians of the accumulated past. And of course archival writers like Verne Harris and Terry Cook have developed this critique in some detail.

Bearman suggests that archives can instead be seen as ‘marshaling centers’, that enable people not to observe some distant past, but to mobilise the past within their own lives — to find connections and meanings.

Recently I was talking to an academic researching the role of historical thinking in education. He argued that an emotional connection had to come first. Only then could rational arguments take root — only then could opinions, ideas and lives be changed.

And yet emotion still seems like something best avoided in public. We try not to ‘inflame’ it, we rarely seek to nurture it. Exposing the rawness of emotion is often seen as cheap or manipulative. And yet it happens, always, in and around our cultural collections.

What user or worker in archives has not been moved? By the voices and stories contained within the records, by the sheer excitement of discovery, or perhaps by the overwhelming burden of responsibility. If as Bearman argues, ‘the pasts we construct are all discussions with the present’, then these discussions are infused with joy and anger, with fear and longing, with sadness and gratitude.

Why are we so reluctant to acknowledge that archives are repositories of feeling? Is emotion meaningless because it can’t be quantified, dangerous because it can’t be controlled, or does it simply not fit with the professional discourse of evidence, authority and reliability.

As our experience of archives moves further into the online realm, so the possibilities for making emotional connections increases — simply because it’s so much easier to share. From the like button or the retweet, through to a lovingly-tended personal collection in something like Pinterest — we have new opportunities to explore what’s important to us and why.

This is happening now. Voices from the past are finding their way into online conversations. But what voices and whose conversations? Even as welcome this sort of engagement we have to remember what is not online, what is not accessible, and all the social, technical and political barriers that can prevent someone from joining the discussion.1

It worries me too that our emotional connections may be too small, too fragile to survive in the world of big data. We live in a age where our online preferences are monitored, our sentiments analysed — our feelings are harvested and tallied in order to sell us more stuff. The line between expression and consumption is increasingly blurred.

Back in the pre-web era, Bearman imagined access to archives through ‘intelligent artifices’ that would bridge databases and connect vocabularies — responding to, and learning from the activities of users. Twenty-five years later we’re exploring these possibilities at a global scale, through Linked Open Data.

While Linked Open Data is often described like a giant plumbing project, it’s really about making a whole lot of very small connections. To me it offers an opportunity to fight back against the homogenisation of data. We can use it to express complex relationships with the past. But we need to know how, and we need to find the points at which we can plug ourselves in.

Perhaps these are Bearman’s ‘marshaling centers’, short-circuiting our online connections to jack us into the past. Not a fixed or nostalgic past, but a challenging and contested past, both real and yet unknowable. As feeling becomes commodified and neutered through a variety of online filters, perhaps archives can hack us directly into powerful conduits of meaning and emotion.

How might this happen? There’s the technical stuff — persistent identifiers, blah, blah, blah — vitally important of course. But then there’s the relationship stuff. We have to stop talking about users and start talking about collaborators. We need to stop building services to be consumed, and start opening opportunities to create, to play, to break and to hack. We are all making connections.

Most importantly we need to find and support the people, both inside and outside our organisations, who are driven by passion. The people who care. The people who simply give a shit.2

  1. See, for example Tim Hitchcock’s 5 minute rant []
  2. ‘Give a shit’ from Alex Madrigal via Courtney Johnston’s opening remarks for NDF 2011 []

Small stories in a big data world

Presented at the National Digital Forum, Wellington, 20 November 2012. You can also watch the video.

Previously at NDF:

As we return to the action, Tim is wondering what happens when we bring stories and data together…

As historians, as cultural heritage professionals, as people — we make connections, we make meanings. That’s just what we do.

What really excites me about Linked Open Data is not the promise of smarter searches, but the possibilities for making connections and meanings in ways that are easier to traverse — to explore, to wander, to linger, or even to stumble.

What really frustrates me about Linked Open Data is that we still tend to talk about it as if it’s all engineering — an international plumbing project to pump data around the globe. Linked Open Data doesn’t have to be an industrial undertaking, it can be a craft, a mode of expression. It can be created with love or in anger.

And anyone can do it.

I’m currently working on a project with the Mosman Library in Sydney to collect information about the World War I experiences of local service people. The web resource we’re building will provide Linked Data all the way down. Every time someone adds a story about a person, uploads a photograph, identifies a place, or includes a link to another resource, they will be minting identifiers, creating relationships, documenting properties — sharing their knowledge as Linked Open Data.

It seems to me that Linked Open Data will be a success not when we’ve standardised on a few vocabularies, or linked everything we possibly can to DBpedia, but when have thriving online communities creating and sharing structured data about the things that are important to them. Not just the known and notable, but the local, the contested, the endangered, the ephemeral and the oppressed.

Many of us live within a Western tradition which equates knowledge with accumulation. Linked Open Data promises new means of aggregation, new powers of discovery — lots and lots more stuff! But it would be a tragedy if all we ended up with was a bigger database or a better search engine. I want more. I want new ways of using that data, of playing with structures and scales. I want to build rich contexts around my stories.

Last year I talked about this in a keynote I gave to the Australian and New Zealand Society of Indexers. To try and demonstrate some of the possibilities, I created a fancy presentation and added a whole lot of linked data to the text of my talk. But it was a bit of a cheat. The text, the triples and the presentation were still pretty much separate. What I really wanted to do was use the linked data to generate alternative views of the text, to take my story and look at it through a variety of linked data powered filters.

So for NDF this year I thought I’d have another go. I set myself a few groundrules:

  • Simple tools — should be possible for anyone with a text editor.
  • No platforms — no sneaky server-side stuff, it all had to happen in the browser, on the fly.
  • No markup madness — I wanted there to be a close relationship between the text and the data, but I wanted the markup process to be practical — something like creating a footnote.

So I hacked together a whole lot of existing Javascript libraries. I used them to extract all the triples from my text and follow external identifiers to get extra information. Then I queried the little databank I’d made to generate four different views of my talk…

WARNING WARNING! Very early demo! Expect bugs and general stupidity!

Now, none of this looks terribly exciting. Visually the various components look pretty familiar — and that’s part of the point, I’m showing how you can re-use existing tools and code libraries.

What’s interesting, I think, is the dialogue that’s evolving between text and data — a dialogue that’s taking place within one, just one, html document.

Expect bugs ye who enter here…

So here’s the text of my talk to the indexers last year. As you scroll through the document, each paragraph on the screen is examined and information about related entities — people, places, events, objects — are displayed in a sidebar. The text and the sidebar are linked, so if you click on a link in the text more information about the related entity opens in the sidebar.

If you want to look at the resources separately you can. You can re-order, and filter by type.

Then there’s the fairly traditional timeline and map views.

Most of the data that’s being displayed is coming from RDFa within the document, but not all. There are links to GeoNames and DBPedia that are drawing in data on the fly. As more Linked Open Data becomes available these links can become deeper and richer.

It’s a very rough demo and I have a long to-do list — for example better links between the data views and the text, showing their context within the narrative. But hopefully you can get an idea of how it might be possible to build data-rich stories — with layers and views that enrich, inform and engage with the narrative.

And all just with one html page, a bit of RDFa and a few Javascript libraries.

There’s no magic.

You might be wondering about my ground-rules — why did I constrain myself? Well, it has to do with this thing we call ‘access’. Oftentimes when we talk about access we mean the power to consume — the power for people to take what they’re given.

But to really have access, for something to be truly open, people also have to have the power to create. To take what they’re given and build something new — to challenge, to criticise, to offer alternatives.

That means allowing people the space to have ideas, giving them the confidence to experiment, providing useful tools and the knowledge to use them. That’s not a job for any particular institution, or sector, it’s a challenge for all of us who build things to strip away the magic and invite others to join in.

And I think it’s pretty important. I don’t really want to live in a world where data is just something that other people collect about and for us. I want slow data, as Chris described last year. I want us to enjoy the textures and tastes and not get addicted to the processed product. I want to create, enrich, wield and wonder.

So my vision of the future of Linked Open Data, is not of the Giant Global Graph linking all knowledge. But a revolutionary army of data-artisans, hand-crafting their richly contextualised stories into a glorious, messy, confusing, infuriating, WONDERFUL tapestry.

Now I know you’re all just waiting for me to press the BOOM! button.

So let’s blow some shit up!

Teaching by example?

There’s been plenty of discussion within the digital humanities community about the difficulty of getting academic recognition for digital projects. But what about being recognised for alternative forms of teaching? I don’t mean online courses, I mean the sort of peer-to-peer teaching that takes place through blogs, or Twitter, or the comments in our code. We all learn from each other.

I’ve been thinking about this while working on a few job applications recently. My opportunities for formal teaching or supervision have been limited, but over the last few years I’ve worked hard to introduce the digital humanities to a broad range of audiences. I’ve given talks to all sorts of professional and community groups, including librarians, museum curators, archivists and family historians. I’ve organised a couple of THATCamps. I’ve given papers at disciplinary conferences. I’ve blogged about my experiments and my frustrations. I’ve created a series of digital tools and made them available for all to use. Most recently I’ve been visiting universities giving talks and workshops to help staff and students make use of digital tools and resources in their own research. But I don’t ‘teach’ — or do I?

Most of this work is unpaid of course. I do it because I love it, and because I think it’s important. I do it because I want DH to live up to it’s promise of being open and engaging — I want others to share the excitement, the possibilities and the power. Sometimes it’s hard to know if it really makes any difference — usually I only hear anecdotally about the way my tools are used. But when I do receive feedback from people it’s often to say how I’ve ‘inspired’ them.

It seems to me that the ability to teach by example, to broaden horizons, and offer inspiration, is something that should find a place in a job application, but where? As I was pondering this the other night I fired off an idle tweet that brought a couple of encouraging responses:

So I’ve adopted @ProfessMoravec’s suggestion and created a Testimonials page. If I’ve managed to inspire or assist you in some way, feel free to leave a comment. Maybe next time I put together a job application I’ll have something to point to to demonstrate my ‘teaching’ credentials.

Too important not to try

On Friday 19 October I joined an enthusiastic group of digital humanities explorers at a Deakin University event entitled Dipping a Toe into the Digital Humanities and Creative Arts. @catspyjamasnz has assembled an excellent summary of the day in Storify.

In the morning I told the story of Invisible Australians. You can view the slides of Too or important not to try and listen to my dodgy audio recording via SoundCloud.

In the afternoon I gave a whirlwind workshop which included a headline roulette smackdown and an introduction to the wonders of Zotero.

Digital disruptions: Finding new ways to break things

Recently I gave a presentation at the University of Melbourne’s Faculty of Arts eResearch Forum. The slides for my talk, ‘Digital Disruptions: Finding New Ways to Break Things’, are available online (thanks to reveal.js). I also managed to make a fairly basic recording — I’m intending to create a transcript, but for now you’re welcome to download and listen you can listen via SoundCloud.

Basically I was arguing that as well as making stuff, digital humanities can involve a lot of stretching, twisting, pushing and breaking stuff. The web is not fixed or static, there are many points at which we can intervene and change the way information is presented. What we need is confidence to pull things apart, and the ability to critically examine why things work the way they do (or don’t). And imagine alternatives.

After my talk there were a number of interesting reports from people around the university. Brett Holman has provided a great summary on his Airminded blog, as well as doing his best to find me a job!