the real face of white australia

In many of the presentations I’ve given in recent times I’ve managed to include a question raised by Tim Hitchcock in his chapter in The Virtual Representation of the Past. Tim asks:

What changes when we examine the world through the collected fragments of knowledge that we can recover about a single person, reorganised as a biographical narrative, rather than as part of an archival system?

The idea of turning archival systems on their head to expose the people rather than the bureaucracy is what motivates Kate Bagnall and I in our attempts to make the Invisible Australians project into a reality.

Invisible Australians aims to liberate the lives of those who suffered under the restrictions of the White Australia Policy from the rich archival holdings of the National Archives of Australia and elsewhere.

We always knew that the portrait photographs, included on a range of government documents, would provide a compelling perspective on these lives, but we weren’t quite sure how we were going to extract them. Up until last weekend, I’d assumed that we’d develop a crowdsourcing tool that contributors would use to mark-up the photos.

Now I’m not so sure.

In the space of a couple of days I’ve extracted over 7,000 photographs and built an application to browse them — here is the real face of White Australia

How did I do it? Paul Hagon, at the National Library of Australia, gave a presentation last year in which he explored the possibilities of facial detection in developing access to photographic collections. The idea lodged in my brain somewhere and a few days ago I started to poke around looking to see how practical it might be for Invisible Australians.

It didn’t take long to find a python script that used the OpenCV library to detect faces in photographs. I tried the script on a few of the NAA documents and was impressed — there were a few false positives, but the faces were being found!

So then the excitement kicked in. I modified the script so that instead of just finding the coordinates of faces it would enlarge the selected area by 50px on each side and then crop the image. This did a great job of extracting the portraits. I tweaked a few of the settings as well to try and reduce the number of false positives. Eventually, I developed a two-pass system that repeated the detection process after the image had been cropped and it’s contrast adjusted. This seemed to weed out a few more errors. You can find the code on GitHub.

Once the script was working I had to assemble the documents. I already had a basic harvester that would retrieve both the file metadata and digitised images for any series in the NAA database. Acting on Kate’s advice, I pointed it at series ST84/1 and downloaded 12,502 page images.

All I then had to do was loop the facial detection script over the images. Simple! The only problem was that my 3-year-old laptop wasn’t quite up to the task. As it’s CPU temperature rose and rose, I was forced to employ a special high-tech cooling system.

Keeping my laptop alive...

But after running for several hours, my faithful old laptop finally worked it’s way through all the documents. The result was a directory full of 11,170 cropped images.

The results

There were still quite a lot of false positives and so I simply worked my way through the files, manually deleting the errors. I ended up with 7,247 photos of people. That’s a strike rate of nearly 65% which seems pretty good. The classifier, which does the actual facial detection, was probably trained on conventional photographs rather than on the mixed-format documents I was feeding it.

Then it was just a matter of building a web app to display the portraits. I used Django for the backend work of managing the metadata and delivering the content, while the interface was built using a combination or Isotope, Infinite Scroll and FancyBox.

It’s important to note that the portraits provide a way of exploring the records themselves. If you click on a face you see a copy of the document from which the photo was extracted. A link is provided to examine the full context of the image in RecordSearch. This is not just an exhibition, it’s a finding aid.

What next? There are many more of these documents to be harvested and processed (and many more still yet to be digitised). I will be adding more series as I can (though I might have to wait until I can afford a new computer!). I’d also like to explore the possibilities of facial or object detection a bit more. Could I train my own classifier? Could I detect handprints, or even classify the type of form?

In the meantime, I think our experimental browser helps us to understand why the Invisible Australians project is so important — you look at their faces and you simply want to know more. Who are they? What were their lives like?

UPDATE: For more on the photos and the issues they raise, see Kate Bagnall’s posts over at the Tiger’s Mouth.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Tim Sherratt Written by:

I'm a historian and hacker who researches the possibilities and politics of digital cultural collections.

18 Comments

  1. September 22, 2011

    I like the idea of making visible those who white Australia tried to diminish – the presence of people who were not of European ancestry. You say in this post: “This is not just an exhibition, it’s a finding aid.” I have limited imagination and can’t see how it can be used other than by randomly clicking on faces and seeing the underlying document.

  2. September 22, 2011

    But why does your clicking have to be random? Kate, for example, has already been using it to find women, children and people of mixed heritage as part of her research into Chinese-Australian families. There’s no way to do that sort of browsing within the normal RecordSearch interface.

    Even if you don’t have a specific research question in mind, you might just see someone who looks interesting and you want to know more. Clicking on the image shows you the document, which has more information about the person. Clicking on the citation below the document will take you to the file containing the document in RecordSearch. There might be related documents in that file. Having found the person’s name on the original document you could then search RecordSearch for any additional references.

    The aim of Invisible Australians is, of course, to extract the data from documents such as these to build a rich contextual web around these people, but in the meantime I think even something simple like this provides a totally new means of seeing, understanding and exploring the records.

  3. Noela Bajjali
    September 22, 2011

    I really like this idea Tim.
    Extracting the images from items within the series throws light on the rich content of the records.
    I noted that when I link to the item details the items are described as a group with limited detail about each document that the image has been drawn from.
    I wondered if there is an opportunity to supplement the structured information recorded about the items (sub-items) by allowing viewers to populate a data collection screen (maybe that pops up next to the image?) adding data they can view in the full record (name, age, nationality).
    I haven’t done this in an electronic forum but I’ve had good success in asking people who are viewing images in our collection to add details to our data collection sheets as they identify them.
    Cheers,
    Noela

    • September 22, 2011

      Thanks Noella – yep, that’s very much our plan for Invisible Australians, indeed that’s where the whole thing started — looking at the forms and thinking how great it would be to extract, expose and link all that rich, structured data. We plan to build a crowdsourcing app (or modify an existing app) to enable people to transcribe the documents, the only thing holding us back is a complete lack of time and money. But we’ll get there eventually. You can read more about our plans at discontents, or on Kate’s blog the tiger’s mouth.

  4. Rebecca
    October 5, 2011

    Hi Tim,

    Would it be possible for us to republish this on our website, ArchivalPlatform.org? We would, of course, link back! Many thanks

  5. Lai Lam
    October 20, 2011

    Hi Tim,
    It’s fascinating going through the steps you wrote about the Invisible Australians project again. i was at your presenation at the Melbourne Indexing conference last month which I greatly enjoyed. Back in 2008 I was involved in a Poll Tax funded project to index the New Zealand Chinese Heritage Journal database for the public library here in Auckland. I’m thinking to myself the possibilities of extending that project using your ideas. I’ll forward this to some of my old colleagues. Many thanks for the inspirations! Very impressive indeed.
    Lai

  6. […] in the “Real Face of White Australia” project, which I read about a couple of weeks ago on Tim Sherratt’s blog (note to future Access organizers: get this guy!) It starts from scans of immigration documents for […]

  7. […] “The Real Face of White Australians,” takes a look at the struggles of non-Europeans who dealt with harsh racism in Australia. Today, I would like to think we live in a place that’s a bit more accepting, but this article got me thinking about the college admissions process and its association with race. Thinking back to filling out college apps around this time two years ago, certain schools only allowed one box to be checked in the ‘ethnicity’ part of their application. Coming from a French dad and a Filipino mom, I found myself a bit confused on how to pick which race to claim an association with. I would do some quick research on the school demographic to see where I could possibly fit into a minority, and ‘Pacific Islander’ emerged as my go-to option. I felt that these online checkboxes were too binding, and portrayed a stigma of racial categorization. As we’ve discussed in class, computers are not fully able to grasp ‘human’ concepts such as race, shown by the inadequacies of the facial detection script from the article. All structures have their imperfections though, and it would be an intriguing argument to see how human intervention would fair in this system. (link 1 + link 2) […]

Leave a Reply