Hacking a research project

Amongst the holdings of the National Archives of Australia are some of the most visually arresting documents you’ll see — thousands and thousands of forms from the early decades of the twentieth century, each with a portrait photograph and palm print, each documenting the movements of a non-white resident. Along with many other certificates, regulations, correspondence and case files, these forms are part of the massive bureaucratic legacy of the White Australia Policy.1

These certificates allowed non-white Australians travelling overseas to re-enter the country. NAA: ST84/1, 1906/21-30

But these are more than just interesting looking pieces of paper, they are snapshots of people’s lives. The forms capture data about an individual’s place of birth, physical characteristics and more. Over time a person might have submitted several of these forms, so by bringing them together we could trace their history, we could map their journeys — we could even watch them age.2

The system which sought to render non-whites invisible has captured and preserved the outlines of their lives. By extracting and linking this data we could build a picture of another Australia, an Australia in which non-white residents lived, loved, struggled and succeeded, despite the impositions of a repressive regime.

I talked about these records at the AAHC conference last year, inspired in part by Tim Hitchcock’s chapter in the Virtual Representation of the Past. Tim Hitchcock argues that technology can allow us to restructure archives, looking beyond institutional hierarchies to the lives of individuals contained within:

What changes when we examine the world through the collected fragments of knowledge that we can recover about a single person, reorganised as a biographical narrative, rather than as part of an archival system?3

I don’t know, but I’d like to find out.

During my AAHC talk, Dave Lester suggested that the extraction of data from these forms might make a good crowdsourcing project. It’s a great idea. As you can see, the data is generally well-structured and legible, it should be possible to construct a simple series of forms that would allow volunteers to transcribe the data. The next stage would be to try and match identities across forms. That’s more complicated, but projects such as Tim Hitchcock’s London Lives show how users can construct identities by connecting a range of historical documents.

Then there are connections to resources outside of the archives — photographs, local histories, newspapers, genealogies, cemetery registers and more. By keeping our system open and extensible, and by working with others to help them expose their information in standard ways, it should be possible to develop the framework for an evolving mesh of biographical data.

So, how do we get started? This is the point when you usually have to start thinking about money — how can I fund this? In Australia that generally means a journey into the arcane world of the Australian Research Council. The ARC suffers from all the problems of a peer-reviewed system, but added to this is a rather antiquated notion of what research is.

In the rules covering each of the main schemes it’s clearly stated that the ‘compilation of data’ and the ‘development of research aids or tools’ are not supported. I spend part of my life working for the Australian National Data Service, an organisation that seeks to highlight how the sharing and reuse of data can open up new research possibilities. The ARC, however, seems to think that data has little value beyond its original research context.

Of course you can still mount a case for such activities. Applicants for a ‘Discovery’ grant can argue that data creation is integral to their project and provide details of the ‘specific research questions to be addressed’. But what if you don’t yet know what the questions are? Part of the point of a project such as this is to try and find out what questions we are able to ask. Until we start to compile, link and explore the data, the ‘specific research questions’ will be little more than convenient fictions, dreamt up to satisfy the prodding of peer reviewers.

Tom Scheinfeldt wrote a fantastic blog post recently, responding to concerns about the failure of many digital humanities projects to make arguments or answer questions. Drawing examples from the history of science, Tom argues:

we need to make room for both kinds of digital humanities, the kind that seeks to make arguments and answer questions now and the kind that builds tools and resources with questions in mind, but only in the back of its mind and only for later. We need time to experiment and even… time to play.

The ARC does not fund play.

You might imagine that the ARC’s infrastructure funding scheme would offer more hope for a project such as this. And yes, there are many worthy projects involving databases and online tools that have been supported in this way (and I have benefited from some of them!). But it seems that in the minds of research funders infrastructure is always BIG. Grants start at $150,000, and applications are expected to involve multiple institutional partners. Projects have to be scaled up to fit the ARC’s definition of infrastructure, often resulting in complex, lumbering, long-term projects whose products are out of date by the time of their release.

There is no room in our current infrastructure models for agile, innovative, user-focused digital toolmakers seeking small amounts to experiment with apps, prototypes, datasets or visualisations. I often look with envy upon the US National Endowment for the Humanities Digital Humanities Start-Up Grants.

In any case, neither I nor my partner in this endeavour, Kate Bagnall (@baibi), are currently in academic positions, so our chances of gaining any sort of research funding are next to none. We have the expertise — Kate has spent many years researching Australian-Chinese families and knows the records back-to-front, while I just can’t help playing with biographical data — but is that enough? How can you mount an ongoing research project without institutional support, research funding and the various badges and signifiers of academic authority?

I don’t know that either, but I have some ideas.

Ah Yin Pak Chong
Mrs Ah Yin Pak Chong. NAA: ST84/1, 1907/321-330

I didn’t manage to get a contribution together for Dan Cohen and Tom Scheinfeldt’s crowdsourced-in-a-week book, Hacking the Academy, but watching the process from afar I did begin to wonder about how we might hack the way we build and run major research projects. This is what I have in mind:

  • To strip down the large, lumbering beasts and design projects that are modular and opportunistic — able to grow quickly when resources allow, to bolt on related projects, to absorb existing tools.
  • To follow the data freely across technological and institutional boundaries, developing open networks that invite participation and use.
  • To develop a floating pool of collaborators, both inside and outside of academia, who are able to come and go, contributing whatever and whenever they can.
  • To make everything public, accessible and standards-compliant, so that even if the project stalls it could be picked up and developed by someone else.

Most of all I just want to be able to do it. I don’t want to second-guess the ARC. I don’t want to spend months negotiating with potential partners or begging for an institutional home. I want to build, experiment and play. I want to make a start.

So that’s what we’re going to do.

We have a topic, plenty of raw materials, some basic principles and the beginnings of a plan. We even have a name — Invisible Australians: Living under the White Australia Policy.

As the project develops, I’ll be blogging here about some of the technical stuff, while Kate will be exploring the content over at the tiger’s mouth. I hope to have a prototype of the transcription tool ready to demo at THATCamp Canberra, while Kate is already at work putting together guides on using the records and developing an Omeka site that follows a number of Chinese-Australian families through the archives.

Can we hack together a major research project? Let’s find out.

  1. For examples of the types of documents and what they can tell us, see Kate Bagnall, A legacy of White Australia: Records about Chinese Australians in the National Archives, paper presented at the Fourth International Conference of Institutes and Libraries for Chinese Overseas Studies, Jinan University, Guangzhou China
    10 May 2009. []
  2. See for example my mini-exhibition, Who was Shadee Kahn?, part of the Muslim Journeys site. []
  3. Tim Hitchcock, ‘Digital searching and the re-formulation of historical knowledge’, in The Virtual Representation of the Past, edited by Mark Greengrass and Lorna Hughes, Farnham, UK, Ashgate, 2008, p. 90. []

This work is licensed under a Creative Commons Attribution 4.0 International License.

Tim Sherratt Written by:

I'm a historian and hacker who researches the possibilities and politics of digital cultural collections.


  1. Chris Rusbridge
    June 18, 2010

    I think it’s a great idea to crowd-source as you suggest, if possible. Would there be issues either with some individuals still being alive, or some cultural sensitivities? I think you’d have to be very careful in your ethical clearance for such a project!

  2. Chris Rusbridge
    June 18, 2010

    BTW I’m also the person who suggested to JISC (UK) the “Rapid Innovation” project idea. These are relatively small amounts of money, <£30K or so, relatively short, potentially throw away projects. Some of these have had great results; I think the SWORD repository submission protocol was one. There have now been several rounds of "RI" projects funded by JISC.

    It may be that ARC and others might welcome such a suggestion!

    • June 20, 2010

      Chris — thanks for the info about the JISC ‘Rapid Innovation’ program, I’ve just been checking out some details — sounds great and certainly the sort of thing we’re lacking here in Australia.

      Chris and Shane, on privacy — the records we’re talking about are all pre-1945, so the chances of any of the subjects still being alive are slim. They’re Commonwealth records and before such records are made publicly available under the Australian Archives Act, issues of privacy are considered. Of course, as Shane points out the questions become harder once you start linking up multiple sources of information. I might get Kate to comment in more detail, as she’s done a lot of work within the Chinese-Australian community and, indeed, has often provided descendents with previously-unknown details of their own family history. I suppose I’m hoping that we can minimise ethical problems by building strong connections such as these with the communities most affected by the White Australia Policy.

      Penny — wow, there’s some great detective work there. It looks like a great candidate for mapping/visualising relationships. Have you had much feedback from family historians?

      Tim — You’ve given me much to think about. I have to admit, I hadn’t really considered the project in international terms, although we already had a few people in mind as potential collaborators/advisers. Thanks for the suggestions — I might have to seek your advice further down the track!

      Jon — yes it seems there are many of us thinking the same way, and a few useful models are emerging. Your project sounds great — I certainly hope that we too can help push the Linked Data message and do a bit of capacity-building.

  3. June 18, 2010

    I’ve been revisiting the appendix of my 1996 dissertation — a list of 500 women who attended the same Southern female academy between 1809-1818 — in blog format (link above), with some of the same goals. I’m reaching into the huge family history world online, and I want them to reach back. A blog seemed like an easy way to tackle this in small pieces while inviting comments and linking to other sites.

  4. A wonderful piece, and a wonderful source that should have a huge audience and impact – and thanks for the mention.

    Overall, I believe it is entirely possible to create a crowd-sourced and folksonomic approach that would get this stuff out there (and which could leverage a great deal of expertise from the family history communities); but that it is dependent on ensuring that there are at least one or two people actually employed to make it happen. This brings us right back to your funding issues. I wonder if one way forward is to create a network that includes a few people with institutions, as well as a wider community, and to then use this community to apply for grants. It might tend to de-centre your own work (because academics are all pathetic careerists), but could be used to organise applications for funding in the UK to the AHRC, JISC, etc; and to NEH, SSHRC in N. America and the Australian equivalent (the name of which I should know).

    The Leverhulme has a network grants programme that might work for this. And it is a straightforward enough thing to put together a list of ten to fifteen people worldwide, who should be interested!

  5. I share Chris Rusbridge’s concern with the sensitivity of such materials, even as I love the idea of this project. It’s very similar in spirit to my ideas about some crowdsourced and linked-data projects in US women’s history— like mapping rural midwives.

    I wrote awhile back about the professional-ethics dimensions of this technical problem, particularly as it concerns collections containing intimate stories. The era of searchable and linked data requires new questions about our historical practice and about the boundaries between “public” and “private”— particularly when we build crowdsourcing systems, where the very publicness of the original sources is what makes them transcribable, searchable, and useful for scholars and the general public.

    This is an area of professional historical practice where I think learned societies could be really useful, but I’m not sure that enough historians really understand the full dimensions of the problem to be able to articulate preferred ethical guidelines. The archives-and-preservation professional organizations might well also keep an eye on these topics, as a way to offer relevant guidance to people planning DIY-style public archives projects.

  6. June 19, 2010

    This is a great idea in and of itself Tim, but what is also captivating here is the idea of coming up with better ways to support small research projects by independent developers and researchers. I love Chris Rusbridge’s of the JISC Rapid Innovation grants. We’ve been working on proposals for a similar project that would provide small grants and technical support for projects seeking to utilize Linked Data in Libraries, Archives and Museums. Hopefully this kind of thing will be available soon to support your work and other ideas like it!

    Rock on. Jon

  7. Andy McGregor
    June 21, 2010

    If you want to see the results of the latest round of JISC rapid innovation funding, we published a newspaper rounding up some of the most successful projects. You can see a digital version here: http://ie-repository.jisc.ac.uk/450/ the paper version contains a map of all the projects we funded, here is a hyperlinked version: http://bit.ly/bfWyML

  8. June 21, 2010

    In the dissertation I mapped the alumnae and made family charts for them, but that was before any online options–no search engines, no genealogy forums. So the blogged reboot of the project will have much richer data for mapping and family relationships.

    I haven’t taken the project “out” into the family history online world yet–I will soon, when I have about 50 names done–but I’ve had some inquiries and tips already. A small house museum in the South asked if the family who owned the house sent any daughters to the school; and some tipped me to a tombstone in Gibraltar related to one of the students’ stories.

  9. June 21, 2010

    Thanks to Chris and Shane for their comments about the privacy issue. Here are some thoughts connected to my own research into the history of Anglo-Chinese Australian families.

    I’ve been working with the White Australia Policy/Immigration Restriction Act records for more than ten years and it has grown to be something of an obsession. My passion for them, and for what they can tell us, comes precisely because they contain such rich information about real people.

    Much of the history of the Chinese in Australia has been written from sources such as government reports, parliamentary debates and major urban newspapers, and unfortunately many of the biases of these sources have been carried through into later descriptions of the life of 19th and early 20th century Chinese Australians. My particular interest is in women, sex and the family (especially interracial families), an area in which there are stereotypes aplenty.

    What I sought to do in my doctoral research was to get past the stereotypes, and the way I saw to do this was to find out all I could about real people. I manually did the kind of data matching that could now potentially be done in the online world, working from sources that gave me a bit of information here and a bit of information there, to build up something of a picture of their lives. The Immigration Restriction Act records played a big part in this, particularly in putting faces to the names, and in giving a voice to my subjects themselves.

    The other thing I chose to do was to seek contact with the descendants of the individuals and families that I encountered in the archives. I put ads in newspapers, wrote to family history societies, posted to online genealogy forums and co-founded a Chinese family history group. When my thesis was completed, I made it accessible online through the Australian Digital Theses program. I did not disguise the identities of the people I wrote about either, which means that I continue to receive email from people researching their families who’ve come across my work through a simple Google search.

    Because of the racism encountered by many Chinese and mixed race families, the Chinese part of family stories have often been hidden, and most of those people who contact me are just embarking on a process of discovery. Finding that the National Archives holds these records, with photographs, physical descriptions, personal and family details, the names of ancestral homes in China, signatures and sometimes letters written by family members is a revelation. In all my contact with descendants over more than a decade, I can think of no more than two instances where the families had concerns about the records being available publicly.

    Making these connections between the archive, the historian and the family/community can have benefits for all of us. Every time I make contact with another family I am reminded how the archive, however rich, really only tells one part of the story of someone’s life. Getting to hear about people from the family’s perspective is a real privilege.

    I tend to think therefore that there is a greater good being served by making these records, and the people in them, more visible – through better archival description (to enable name-based keyword searching, for instance), through digitisation and through projects like the one Tim and I are embarking on. It helps people to reconnect with their own pasts, and it also gives the rest of us a better understanding of how the White Australian Policy (which was ostensibly designed to limit non-white migration) affected the lives of people who were in fact Australians too.

  10. June 21, 2010

    This is such an interesting topic. But I do think the issue of privacy – or the right to say how and when and why your image/information is used – is absolutely central to this issue. Even if the subject has passed away, these issues still apply. There are ongoing discussions on this topic in regards to the use of materials referencing indigenous Australians, and bodies like AIATSIS can offer useful advice.
    There’s a conference coming up very soon which will have attendees who can advise: http://www.aiatsis.gov.au/research/symposia/Digi10.html

    Sometimes it’s more important to be able to restrict the use of your family’s image/information than to communicate it. This has come up in regards to Yuendumu in quite interesting ways http://www.warlpiri.com.au/visitors.htm has a fascinating document about how access to Yuendumu as a physical place is related to land rights, to civil liberty and to the documentation of the lives of disempowered or marginalised people.

  11. asa letourneau
    June 23, 2010

    Would love to lend my support to this and would be especially interested in discussing opportunities for collaboration with PROV, (though I have no idea how this would play out). From working with PROV over the last 5 years I am aware of some of the cultural sensitiivity issues surrounding projects such as the KIN database, and a an exhibition dealing with Chinese prisoners in Victoria in the 19th century. Consultation, as far as is reasonably practicable, with the relevant communities would to be an obvious source of collaborative power. To re-interpret and re-possess the past in order to accurately re-present what was actually being done to these communtities is crucial to a shared and honest understanding. As for the funding? That never stood in the way of a great idea…let’s make it happen!

  12. […] of you may have noticed that my Hacking a research project post featured a file from the National Archives of Australia embedded as a Cooliris widget. Huh? To […]

  13. September 20, 2010

    Great project. Our project tracing Tasmanian convicts from their convictions in Britain and Ireland to their descendants’ experience in the First Australian Infantry Force has faced many of the technical and resourcing issues you describe. We have had some success obtaining ARC funding for this and an earlier project to reconstruct the Victorian Koori population, and have created rich prosopographical databases which are potentially permanent research resources. But almost by definition the ARC does not provide funds for sustainable databases. We are already facing the need to migrate the Koori data to a more up-to-date repository. As others have commented, there are several issue here: the need to recognise the creation and maintenance of sustainable databases as a central part of the research process (which is implicit in ANDS); the need for some sort of technical infrastructure to support it; and (especially) the need to harness the creativity of the open source community in its development. The challenge is to find a way of bringing together the necessarily bureaucratic infrastructure component with the creative hacker component.

  14. […] multi-institutional partnerships. It just doesn’t seem like a great model for innovation. As I’ve previously argued, I’d like to see something more like the funding schemes offered by the NEH Office for […]

Leave a Reply