• Unremembering the forgotten

    Keynote presented at DH2015, 3 July 2015. Full slides available on SlideShare.   This, you might be surprised to learn, is not the first time that Australia has welcomed some of the world’s leading thinkers to its shores. Just over a hundred years ago, the British Association for the Advancement of Science held its annual meeting […]

  • Asking better questions: History, Trove and the risks that count

      This is the published version of my chapter ‘Asking better questions: History, Trove and the risks that count’ in the book CopyFight, edited by Phillipa McGuiness. It’s reproduced here with the permission of the publishers. You can buy a copy of the book from NewSouth Press or just about any bookstore. You can also download […]

  • My two lives

    My blog hasn’t quite caught up the fact that I now have two jobs to go along with my two lives. Monday to Wednesday I’m still part of the Trove management team at the National Library of Australia, but on Thursdays and Fridays I’m Associate Professor of Digital Heritage in the Faculty of Arts and […]

  • Stories for machines, data for humans

    Presented at the New Factual Storytelling symposium, 10 April 2015, University of Canberra I feel like the nerdy kid at the cool kids’ party. There are lots of interesting and creative projects on show today and I… well I want to talk to you about metadata. Data. According to some pundits it’s the new oil, […]

  • Myths, mega-projects and making

    Keynote presented by video at EuropeanaTech 2015, 13 February 2015. The video of this presentation is available on Vimeo and the slides are on SlideShare.   In the aftermath of World War II, Australian hopes for a new era of national progress were expressed in a massive engineering project called the Snowy Mountains Scheme. The project promised […]

  • Seams and edges: Dreams of aggregation, access & discovery in a broken world

    Presented at ALIA Online 2015, 3 February 2015 in Sydney. A longer version with bonus references will be made available on the ALIA Online site. Slides are on Slideshare.   In March 1930 the Sydney Electrical and Radio Exhibition opened in a blaze of excitement. Aboard his yacht in Genoa, inventor Guglielmo Marconi triggered a […]

  • 2014 — the making and the talking

    This my now traditional ‘what I done this year’ post, which, if nothing else, makes me check that my various experiments are still alive. It’s been a challenging year trying to balance my work as Trove Manager with my broader passions and responsibilities as a member of the digital humanities community. So yeah. My personal […]

Unremembering the forgotten

Keynote presented at DH2015, 3 July 2015. Full slides available on SlideShare.


 

This, you might be surprised to learn, is not the first time that Australia has welcomed some of the world’s leading thinkers to its shores. Just over a hundred years ago, the British Association for the Advancement of Science held its annual meeting in Australia. In earlier years the Association had journeyed to Canada and South Africa, but this was it’s first tour of Australia. One senior Australian scientist heralded the Association’s arrival as ‘a great event in the history of Imperial unity’.

unremembering_dh2015.002

More than 300 scientists made the trip, including such notables as Ernest Rutherford and William Bateson. I’m a little embarrassed to admit that their travel was heavily subsidised by the Federal government. But then, it did take them more than a month to get here. Think about that on your flight home.

The eminent Australian geologist Edgeworth David described the Association’s visit as ‘an epoch making event’. He expected Australian researchers to be ‘strengthened and confirmed’ in their work, reaffirmed through the ‘inspiration which comes alone from personal contact with master minds’.

It was also an occasion to celebrate the ideals of science. War had been declared while the scientists were at sea, but events proceeded nonetheless with delegates barnstorming across the country from Adelaide to Melbourne, Sydney and Brisbane. The spirit of proceedings was summed up in Melbourne where the presentation of an honorary degree to the German geologist Johannes Walter was greeted with a ‘perfect storm of applause’. ‘Truly science knows not distinction between belligerent and belligerent’, noted one newspaper. Australia’s Governor General, Sir Ronald Munro Ferguson, welcomed the scientists with the observation that the looming dangers of war had at least ‘enabled them to realise that all men of science were brothers’.

And of course, they were mostly men.

If you’d like a bit of data around that, you can grab a digitised copy of the report of the meeting from the Internet Archive and run a script over the list of members, grouping them by title – Miss, Mrs and Lady. Here’s what you get.

unremembering_dh2015.004

You can do the same for the people who joined the Association at one of the Australian venues.

unremembering_dh2015.005

Ok, so this 10 minute analysis might not show anything unexpected, but I love the fact that with a digitised text and a few lines of Python I can ask a question and get an almost instant answer.

What the official report doesn’t say is that despite these proclamations of scientific brotherhood, not all German scientists were welcome in wartime Australia. Those who extended their stay beyond the meeting dates fell under suspicion.

Two of them, Fritz Graebner and Peter Pringsheim, were interned as suspected spies and imprisoned for the remainder of the war. The press which had fawned over the travelling savants now railed against these ‘scientists in disguise’ whose ‘supreme act of treachery’ was undoubtedly part of a German plot to capture Australia. The Minister of Defence noted that the case emphasised the ‘real and pressing nature’ of the wartime emergency. Honorary degrees awarded to two German scientists by the University of Adelaide were expunged from the record.

unremembering_dh2015.006

At this point I feel I should warn all our international visitors that legislation introduced in recent years to combat the so-called ‘war on terror’ has added new limits to our freedom of speech and movement. We are all under suspicion.

The German scientists were interned alongside many thousands of others. Most had had no charges brought against them. Many were naturalised British subjects, or Australian-born of German descent. Australia was their home. That didn’t stop the government repatriating many of them to Germany at war’s end.

To orient them on their antipodean adventure, visitors for the British Association meeting were supplied with specially-prepared handbooks that described conditions in Australia. At a time when violence against Indigenous people was still common along the frontiers of settlement, the Commonwealth Handbook informed visitors that Australian Aboriginals ‘represent the most backward race extant’.

Australia was big, but its population was small. The Commonwealth Handbook noted the challenges of maintaining ‘control of so large a territory by a mere handful of people’, pointing to the significance of the ‘White Australia’ policy in avoiding the ‘difficulties’ of ‘heterogenous’ populations. Chris Watson, who served as Australia’s first Labor prime minister a decade earlier, expanded on this theme in the NSW Handbook. Concerns about the financial impact of ‘coloured’ labour, he explained, had been fused with an ‘abhorrence of racial admixture’ to create ‘practically a unanimous demand for a “White Australia”’. ‘White Australia’ was both an ideal and an obligation, an opportunity and a threat. Watson observed:

The aboriginal natives are numerically a negligible quantity, so there is every opportunity for the building up of a great white democracy if the community can maintain possession against the natural desire of the brown and yellow races to participate in the good things to be found in the Commonwealth. That the Asiatic will for ever tamely submit to be excluded from a country which, while presenting golden opportunities, is yet comparatively unpeopled, can hardly be expected. Therefore Australians are realising that to maintain their ideals they must fill their waste spaces and prepare for effective defence.

Welcome to Australia a hundred years later where we remember 1914 not for its institutionalised racism, but because it marked the beginning of a war that has come to be strongly associated with ideas of Australian nationhood.

You have arrived here amidst the ‘Anzac Centenary’ which, the official website notes, ‘is a milestone of special significance to all Australians’. It must be true because, according to the Honest History site, we’re spending more than half a billion dollars on commemorative activities. That’s a lot of remembering.

Amidst the travelling roadshows, the memorials, the exhibitions, and the rolling anniversaries, are of course many worthy digital projects. Some of these will provide new access to war-related collections, or gather community content and memories. They will result in important new historical resources. But who are we remembering and why? As a historian and hacker, as a maker of tools and a scraper of sites, I want today to poke around for a while amidst the complexities of memory.


 

It’s not all about the war. Recent decades have brought attempts to remember more difficult histories. Peter Read coined the phrase ‘stolen generations’ to draw attention to the devastating effects of official policies that resulted in the forced removal of Indigenous children from their families, through until the 1970s. The damaging experiences of children in institutional care, the ‘forgotten Australians’, have also been opened to scrutiny. Both of these have brought official apologies from the Commonwealth government. Even now, almost every day brings more horrifying testimony as the Royal Commission into Institutional Responses to Child Sexual Abuse continues its hearings.

In each case we have learnt to our shame of continuing failures to protect the most vulnerable in Australian society – children.

Often these investigations are cast as attempts to bring to the surface forgotten aspects of our history. But to those who suffered through these events, who have continued to live with the consequences, they have never been far from memory.

Nor have they been entirely lost to the historical record. One of the responses to these inquiries has been to discover, marshal and deploy existing archival resources. The National Archives of Australia created an exhibition based on the experiences of some of the Stolen Generation. They also developed a new name index to their collections to help Indigenous people reconnect with their families through official records.

The eScholarship Research Centre at the University of Melbourne drew on its experience in documenting a wide variety of archival collections to create Find & Connect – a web resource that assembles information about institutional care in Australia and assists care leavers in recovering their own stories. Official records have been supplemented by oral history programs and other collecting initiatives to ensure that these memories are secure.

Such histories are ‘forgotten’ not because they are unremembered or undocumented, but because they sit uncomfortably alongside more widely promulgated visions of Australia’s past. As researchers on the Find & Connect project noted, the stories of care leavers ‘did not “fit in” with the narratives in the public domain. Their memories were “outside discourse”’.1 Remembering the forgotten is not just a matter of recall or rediscovery, but a battle over the boundaries of what matters.

Libraries, archives and museums are often referred to as memory institutions. Rhetorically it can be a useful way of positioning cultural institutions in respect to structures of governance and assessments of public value. The idea of losing our memory, whether as a society or an individual, is frightening.

But there are contradictions here. We frequently talk about memory in terms of storage – the ability of our technologies to tuck away useful pieces of information for retrieval later. There’s the ‘M’ in RAM and ROM, the fields in our database, our backups in the cloud. Memory is an accumulation of key/value pairs. Each time we query a particular key, we expect to get the same value back.

Memory, as we experience it, is something quite different. It’s fragmentary, uncertain, and shaped by context. The process of recall is unpredictable and sometimes disturbing – memories are often triggered involuntarily. Within a society memories are contested and contradictory. Who controls the keys?

Cultural institutions are trying to respond to this complexity. On the one hand they offer the security of authority – sources to be trusted in world overflowing with information. But they are also looking for ways of capturing and representing alternative voices.

I think we can help with that.


 

Both in my work at Trove and my own noodling about I use the word ‘access’ a lot. But the more I use it the more I suspect it really doesn’t mean very much. What does it say that we now distinguish between ‘open’ and ‘closed’ access?

We tend to think of ‘access’ as the way we get to stuff. It’s the pathway along which we can explore our cultural collections. But as Mitchell Whitelaw argues, one of our primary means of access, the common or garden variety search box, constrains our view of the resources beyond.  Search provides not an open door, but a grumpy ‘Yes, what?’

I’d suggest that these sort of constraints don’t stand in the way of access, they construct it. Through legislation, technology, and professional practice, through the metadata we create and the interfaces we build, limits are created around what we can see and what we can do. Access is a process of control rather than liberation.

unremembering_dh2015.011

In 1952, in another notable act of ‘imperial unity’, Britain exploded an atomic bomb off the coast of Western Australia. A further 11 atomic tests were carried out here, most at a mainland testing site called Maralinga in South Australia. As a young research student in 1984, the British atomic tests introduced me both to the gloriously rich collections of the National Archives of Australia, and to the contradictions of access.

Under the Archives Act, most government records are opened to the public after 20 years (this was reduced from 30 years in 2010). However, before they are released they undergo examination to see whether they contain material that is exempted from public access – for example any secret squirrel business that could endanger our national security. The access process can therefore result in records that are ‘closed’ or ‘open with exception’.

What does ‘closed’ access look like? A few weeks ago I harvested details of all the files in the National Archives’ online database that have the access status ‘closed’. The records include the reasons why the files remain restricted. If you group them by reason, you can see that the most common grounds for restriction is Section 33(1)(g) of the Archives Act which seeks to prevent the ‘unreasonable disclosure of information relating to the personal affairs of any person’. Fair enough. Coming second is the rather less obvious category of ‘withheld pending advice’. These are files that have gone back to the government agencies that created or controlled them to check that they really can be released. So they’re actually part way through the process.

unremembering_dh2015.013

Using the contents dates of the files we can see how old they are. Section 33(1)(a) of the Archives Act exempts records from public scrutiny if they might ‘cause damage to the security, defence or international relations of the Commonwealth’. Most of the records closed on these grounds are over 50 years old, with a peak in 1956.

unremembering_dh2015.014

And here’s a word cloud of the closed file titles from 1956. I’m sure that we all feel a lot safer knowing all those Cold War secrets are still being protected.

unremembering_dh2015.015

Back in 1984 I asked for some of those secret files to be opened so I could write my Honours thesis on the role of Australian scientists in the British Atomic tests. A number of the files I was interested in went off to agencies for advice, and some even made their way to the British High Commission. Being young, optimistic, and on a deadline, I wrote to the British High Commissioner asking if anything could be done to speed the process up.

unremembering_dh2015.016

I received a very polite reply explaining that they were obligated under the Nuclear Non-Profileration Treaty to make sure that they didn’t unleash any atomic bomb secrets upon the world. This was hilariously and tragically ironic, as the argument of my thesis was that the British government withheld information from their Australian hosts to curry favour with the USA. There was no way that atomic bomb plans would be in Australian government files. Yeah – hilarious.

Access is political. Cassie Findlay has contrasted the Australian government’s processes for the release of records with the creation and use of the WikiLeaks Cablegate archive.2 Cassie argues that the ‘hyper-dissemination’ model of WikiLeaks, through which large volumes of material are shared across multiple platforms, creates a ‘pluralised archive’ that ‘exists beyond spatial and temporal boundaries, transcends state and economic controls and encourages and incorporates people’s participation and comment’. Instead of gatekeepers and reading rooms there are hackers and torrents.

Traditional forms of access are often celebrated as if they are a gift to a grateful nation. As Cassie notes, the release of Cabinet documents by the National Archives is a yearly ritual where stories of 30 year old political manoeuvring are mixed with the comforts of nostalgia. But with each release more files are closed, withheld from public access. The workings of a bureaucratic process developed to control the release of information is recast as an opportunity to party. Invested with the cultural power of the secret and the political weight of national security, access itself becomes mysterious and magical.

We are left to ponder such wonders as: ‘Named country [imposed title, original title wholly exempt]’.

At the same time, governments are pumping out ‘open data’, bringing the promise of greater transparency, and new fuel for the engines of innovation. But for all its benefits, open data isn’t. It only exists because decisions have been made about what is valuable to record and to keep – structures have been defined, categories have been closed. As Geoffrey Bowker and Susan Leigh Starr remind us, the definition, elaboration and enforcement of categories lies at the heart of bureaucracy and the infrastructure of the state.3 Data is not just a product of government, it is implicated in the workings of power.

Chris Watson’s vision of a White Australia was, by 1914, well established as a system of bureaucratic surveillance and control. The Commonwealth Handbook benignly noted that ‘an immigrant may be required to pass a dictation test before being admitted into the Commonwealth’. It added that ‘in general practice this test is not imposed upon persons of European race’. The dictation test was a mechanism of exclusion. Any intending immigrant deemed not to be ‘white’ would be subjected to the dictation test and they would fail. But there were already many people born or resident in Australia of Asian descent. If they wanted to travel overseas they were forced to carry official documents to protect them from the application of the dictation test – otherwise they might not be allowed to return home. Many thousands of these documents are now preserved in the National Archives of Australia. With portrait photographs and inky-black handprints, they are visually compelling, and disturbing, documents. They need to be seen.

unremembering_dh2015.018

A few years ago, my partner Kate Bagnall and I harvested thousands of these documents from the National Archives website, ran them through a facial detection script and created ‘The Real Face of White Australia’.

unremembering_dh2015.019

You might have seen it before. It’s been widely cited, and it’s probably one of the main reasons I’m standing here today. For Kate and me this was part of our ongoing attempts to use the bureaucratic remnants of the White Australia policy to reconstruct the lives of those who lived within its grasp. But it’s also an example of the complications of access.

In the past I’ve tended to gloss over the hardest part of this project – just harvesting those 12,000 images. It was only possible because I’d spent a lot of time, over a number of years, wrestling with RecordSearch, the National Archives’ online database. I think it was back in 2008 that I wrote my first Zotero translator to extract structured data from RecordSearch. It was one of those Eureka moments. Although I’d been developing web applications for a long time, I hadn’t really thought of the web as a source to be mined, manipulated and transformed. I could take what was delivered in my browser and change it.

Thanks to the Bill Turkel and the Programming Historian, I taught myself enough Python to be dangerous and was soon creating screen scrapers for a variety of sites – taking their HTML and turning it into data. I was no longer bound to a particular interface. The meaning of access had changed.

But screen scrapers are a pain. Sites change and scrapers break. I don’t know how many hours I’ve spent inspecting RecordSearch response headers, trying to figure out where my requests were going. I’ve given up several times, but always gone back, because there’s always more to do.

Amongst the enthusiasm for open data there’s perhaps a tendency to overlook the opening of data – the way that hackers, tinkerers, journalists, activists and others have been stretching the limits of access.

The various projects of the Open Australia Foundation are a great example of this – they’ve even established their own public scraping framework called Morph, to share both the code and the data that’s been liberated from websites and pdfs.

The Australian Parliament recently passed changes to the Copyright Act that will enable copyright holders to apply to the Federal Court to block piracy-related websites. Of all the changes needed to copyright, this is the one that went to Parliament.

But what I love is that even before the legislation had passed, even before the first application has been made or site blocked, there was a website and Twitter account ready to document and publicise any site-blocking orders – created not by government, but by an ABC journalist.

Archivists Wendy Duff and Verne Harris have talked about records ‘as always in the process of being made’, not locked in the past but ‘opening out of the future’.4 Cassie Findlay similarly notes that the Cablegate archive ‘is still forming’. She argues for models of participation and access around archives that open ‘more directly from the affairs that they document’.

The act of opening – records, archives, sources – is contingent and contextual. It creates a connection between inside and outside, past and present, us and them. What we do with that connection is up to us.

unremembering_dh2015.023

What would have happened if instead of hearing about ‘prohibited immigrants’, instead of seeing ‘wanted’ posters of escaped Chinese seaman, Australians in 1914 had seen something like our wall of faces?

What would happen now if instead of hearing about ‘illegal maritime arrivals’ (IMAs) we were exposed to the stories of those who arrive in Australia in search of asylum?

unremembering_dh2015.024

Access will never be open. Every CSV is an expression of power, every API is an argument. While I would gladly take back the time I’ve spent wresting data from HTML I recognise the value of the struggle. The bureaucratic structures of the White Australia policy live on in the descriptive hierarchies of the National Archives. To build our wall of faces we had to dismantle these structures – to drill down through series, items, documents and images until we found the people inside. I feel differently about the records because of that. Access can never simply be given, at some level it has to be taken.


 

unremembering_dh2015.025

In 1987 I ended up outside the gates of Pine Gap, a US intelligence facility near Alice Springs, dressed as a kangaroo. Having finished my honours thesis on the British atomic tests, I couldn’t ignore the parallels between the bombs and the bases. I even organised a conference entitled, ‘From Maralinga to Pine Gap: The historical fallout’. I remember pulling over on the road to Alice Springs because there was one point where you could just glimpse the top of one of the white domes that protected Pine Gap’s receivers. It was a pretty thrilling moment.

Now you can just type ‘Pine Gap’ into Google Maps and there it is.

unremembering_dh2015.026

It’s still secret, it’s still gathering unknown quantities of electronic intelligence, but last time I checked it also had 21 reviews and an average rating of 3.6 stars. Keep it in mind for your next Aussie holiday!

unremembering_dh2015.027

Digital tools enable us to see things differently – to demystify the secret, to expose patterns and trends locked up in tables, statistics, or cultural collections.

Mapping Police Violence, for example, displays your chances of being killed by police in the US based on your location. It also presents the photos and details of more than 100 unarmed black people killed by police in 2014.

@CongressEdits is a Twitter bot, created by Ed Summers, that tweets anonymous edits to Wikipedia made within the US Congress. A similar bot exists for Australian state and federal governments.

I love the way that Twitter bots, in particular, can play around with our ideas of context and significance. I’ve created a few myself that automatically tweet content from Trove, and I’m interested in what happens when we mobilise cultural collections and let them loose in the places where people already congregate. Steve Lubar argues that ‘the randomness of the museumbot calls attention to the choices that we take for granted’. Twitter bots can challenge the sense of control and authority that adhere to our collection databases.

But bots can be more. Mark Sample’s important essay on ‘bots of conviction’ explores the possibilities for protest and intervention. He describes protest bots as ‘tactical media’ creating ‘messy moments that destabilise narratives, perspectives, and events’. Wendy Duff and Verne Harris warn archivists of the dangers of the story in disguising the exercise of power, in stealing from individuals what they need to construct their own narratives – ‘space, confusion, [and] a sense of meaninglessness’. Against the brutal logic of the state, a bot’s algorithmic nonsense can help us to see differently, to feel differently.

Caleb McDaniel’s bot @Every3Minutes is an example of how powerful these interventions can be. Working from estimates of the volume of the slave trade in the American South, it tweets a reminder every three minutes – a person was just traded, a child was just bought – often with links to historical sources. Mark Sample notes that ‘it is in the aggregate that a protest bot’s tweets attain power’ and it is through simple, unyielding repetition that @Every3Minutes reaches us. As Alex Madrigal noted: ‘To follow this bot is to agree to reweave the horrors of slavery into the fabric of your life’.

My own protest bot is trivial by comparison to Mark or Caleb’s work. @OperationBot merely assembles random words to create new names for national security operations. It’s a bot born of frustration and fury as the Australian government responded to the plight of asylum seekers by launching ‘Operation Sovereign Borders’. As @OperationBot proudly proclaims, its aim is to ‘protect Australia from meaning’.

Perhaps more significant than @OperationBot’s supposed subversions is the fact that I could create it in a couple of hours sitting in front of the TV. Digital skills and tools allow us to try things, to create and experiment, without any expectation of significance or impact.

One of the more controversial sessions at the British Association meeting in Australia in 1914 was devoted to the structure of the atom. Ernest Rutherford reported on experiments that pointed to the now familiar model where the atom’s mass is concentrated in a tiny, central nucleus. Firing charged particles at a thin sheet of gold foil, Rutherford, Geiger and Marsden had expected the particles to pass through largely undeflected. But some bounced back. As Rutherford later noted: ‘It was almost as incredible as if you fired a 15-inch shell at a piece of tissue paper and it came back and hit you’.

I wonder if that’s what we’re doing – firing off experiments into the net, waiting for one to hit something solid and bounce back. PING!

@Every3Minutes – PING!

The Real Face of White Australia – PING!

In 2012 Kate and I received an email from Mayu Kanamori, an artist researching the life of an early Japanese Australian photographer. She described her reaction to the Real Face of White Australia:

When I scrolled down the Faces section of your website, browsing through the faces, tears welled up, and I couldn’t stop crying as if some sort of flood gates had been removed.

We knew that that the documents and the images were powerful, but displaying the faces on that seemingly endless scrolling wall did something more than we were expecting.

Jenny Edkins has been exploring the politics of faces, and she suggests that alongside our attempts to ‘read’ portrait photographs we also respond in a more visceral fashion, provoking responses such as ‘guilt, obligation, and reciprocity’.5 Like the ‘messy moments’ of protest bots, she argues that the connections we make through photos of faces can disrupt the ‘linear narrative temporality’ on which sovereign power depends. We are connected through time, not with history, not with the past, but with people. And that has implications.

unremembering_dh2015.036

Last year I tried extracting faces, and eyes within those faces, from photos I’d harvested via Trove’s digitised newspapers. The result was Eyes on the Past. It presents a random selection of eyes, slowly blinking on and off. Clicking on an eye reveals the full face and the source of the image. Where the Real Face of White Australia overwhelms with scale and meaning, Eyes on the Past is minimal and mysterious. Eyes on the Past emphasises absence, and the fragility of our connection with the past, even while it provides a new way of exploring the digitised newspapers. Perhaps the best thing about it is the range of responses it has provoked – from those who found it beautiful, to those who thought it was just creepy.

unremembering_dh2015.038

More recently I’ve been playing around with the possibility of connection, and creepiness, through The Vintage Face Depot. Tweet a photo of yourself to @FaceDepot and a bot will select a face at random from my collection of newspaper images and superimpose that face over yours – tweeting you back the result. It sounds stupid, and it probably is. I’m still waiting for it to go viral like Microsoft’s age detection thing. But sometimes… PING!

One night I started fiddling with the transparency of the superimposed images. All of a sudden I could see the colour of my face showing through. I could see my glasses on this face from the past.

unremembering_dh2015.039

unremembering_dh2015.040

Experimenting on Kate, I saw the blue of her eyes peering through eyes of another person. Again, the potential is there to mess around with the barriers that put some people on the other side of this wall we call the past – to explore what Devon Elliot suggested on Twitter was an ‘uncanny temporal valley’.

unremembering_dh2015.041

The Australian historian Greg Dening has argued:

Nothing can be returned to the past. Not life to its dead. Not justice to its victimised. But we take something from the past with our hindsighted clarity. That which we take we can return. We disempower the people of the past when we rob them of their present moments.

There is no open access to the past. There is no key we can enter to recall a life. I do this sort of stuff not because I want to contribute to some form of national memory, but because I want to unsettle what it means to remember – to go beyond the listing of names and the cataloguing of files to develop modes of access that are confusing, challenging, inspiring, uncomfortable and sometimes creepy.

Perhaps my favourite experiments are a couple of simple userscripts. They sit in your browser and change the behaviour of Trove and RecordSearch.

unremembering_dh2015.042

Instead of pulling faces out of documents, they put them back in. Instead of seeing lists of search results, you see the people inside. Like the faces on our wall the people bubble up though the interfaces. They are present.


 

Despite the apparent enthusiasm for the visit of the British Association in 1914, there was in Australia a lingering suspicion of scientists as ‘impractical dreamers’, as mere theorists unwilling to address the nation’s most urgent needs. In debates over the application of knowledge to Australian development, the scientist commonly battled it out against the supposed virtues of the ‘practical man’.

I imagine my grandfather, Henry Sherratt, was a practical man. He was a brass moulder with a workshop in Brunswick, a suburb of Melbourne. His father and brother, both brass workers, lived and laboured nearby. I have a small brass ashtray that Henry made.

unremembering_dh2015.044

Henry’s name isn’t amongst those who joined the British Association in Melbourne, though perhaps he attended one of the ‘Public or Citizens Lectures’ which, until the 1911 meeting, had been known as ‘Lectures for the Operative Classes’. Neither is Henry’s name amongst those who journeyed to the battlefields of Europe and the Middle East. He is not one of those honoured by the Anzac Centenary for having ‘served our country and worn our nation’s uniform’. And yet he went to war.

Henry Sherratt was amongst a select group of tradesmen who travelled to Britain in 1916 to help meet the desperate need for skilled workers in munitions factories. He worked as a foreman brass moulder in Scotland, before an accident in which he ‘strained his heart’ carrying a ladle of molten iron. He never really recovered and as his income suffered, so did his family at home. Henry finally returned in 1919 and was offered £50 compensation with no admission of liability. He died in 1955. I never knew him.

Who do we remember and why?

unremembering_dh2015.046

The Commonwealth Bureau of Census and Statistics reported that 159 people died as a result of industrial accidents in 1914. But these were only the accidents that had been reported under the provisions of state legislation. There must have been more. Where is their memorial? What about mothers who died in childbirth, or victims of domestic violence? How do we remember them?

At a recent workshop organised by Europeana, Lucy Delap described how her project to historicise child sex abuse in 20th century Britain was making use of digitised newspapers. As well as documenting individual cases, the researchers hope to create ‘a map of change over time in the reporting of child sexual abuse’ that would enable them to test theories about how different organisations respond to abuse.

In the week that the British Association met in Melbourne, newspapers tell us that David Phillips, an engine driver, was fatally injured at Flinders Street Station. I’m thinking about how we might use Trove’s digitised newspapers to collect the stories of those who went off to work, but never returned. What might we learn about economic history, unionism, industrial legislation – about the value we place on an individual life?

As I’ve often said in regard to our work on the White Australia records – it just seems too important not to try.

As I was writing this talk I was also keeping an eye on my harvesting scripts which were chugging away, pulling down more images from the National Archives. For the original wall of faces I downloaded about 12,000 images from one series, I’ve now got more than 150,000 from about 10 series. You’ll see more of that soon.

As I was writing this talk I stopped at various times to play around with code – to look at the gender balance at the British Association, to investigate ‘closed’ files in the National Archives, to create a public Face API for anyone to use. The code and apps are all out there now for you to play with or improve.

Writing, making, thinking, playing, sharing. It all happens together. I’m a maker like my grandfather. While he poured metal I cut code. I do it because I want to find ways to connect with people like him, ordinary people living their lives. Those connections will always be fleeting and fragile, lacking the certainty of commemoration, but hopefully bearing some of the meaning and complexity of memory.

It’s a task that needs to be both playful and political. It’s not about making things, but trying to make a difference.

  1. Shurlee Swain, Leonie Sheedy, and Cate O’Neill, ‘Responding to “Forgotten Australians”: historians and the legacy of out-of-home “care”’, Journal of Australian Studies, vol. 36, no. 1, 1 March 2012, pp. 17–28. <doi:10.1080/14443058.2011.646283> []
  2. Cassie Findlay, ‘People, records and power: what archives can learn from WikiLeaks’, Archives and Manuscripts, vol. 41, no. 1, 2013, pp. 7–22. <http://www.tandfonline.com/doi/abs/10.1080/01576895.2013.779926> []
  3. Geoffrey C. Bowker and Susan Leigh Star, Sorting Things Out: Classification and Its Consequences, MIT Press, 2000. []
  4. Wendy M Duff and Verne Harris, ‘Stories and names: Archival description as narrating records and constructing meanings’, Archival Science, vol. 2, 2002, pp. 263–85. []
  5. Jenny Edkins, Face Politics, Routledge, 2015. []

Asking better questions: History, Trove and the risks that count

 

Buy the book from NewSouth Press!

Buy the book from NewSouth Press!

This is the published version of my chapter ‘Asking better questions: History, Trove and the risks that count’ in the book CopyFight, edited by Phillipa McGuiness. It’s reproduced here with the permission of the publishers.

You can buy a copy of the book from NewSouth Press or just about any bookstore. You can also download a pdf version of this chapter.

 


 

A few years ago historian Kate Bagnall and I created ‘The real face of White Australia’. In 1901 the Immigration Restriction Act gave legislative force to a system of racial exclusion and control that came to be known as the White Australia Policy. The bureaucratic remnants of this system survive today in the National Archives of Australia. But how can we find them? How can we see them? Our online experiment brings some of these lives, previously deemed out of place in a ‘white’ Australia, to the surface. Instead of documents, files or search results, all you see are faces – a continuous, scrolling wall displaying thousands of faces. It’s compelling, challenging and discomfiting. Some viewers were brought to tears.

The faces come from portrait photographs that were attached to official certificates. If non-white residents wanted to travel overseas, they needed special identity documents. Without them they could be refused entry on their return. They would not be allowed to come home. On the front of each certificate are photographs and basic biographical details, on the back is a palm print. Many thousands of these documents are preserved in the archives.

To extract the faces I reverse-engineered the National Archives’ online database to automatically harvest images of the documents. From just one series or group of records, I downloaded more than 12,000 images. Then I tinkered with some facial detection code until I was able to find and crop out the portraits. I ended up with 7000 faces – just a sample of the archives’ holdings, but enough.

The whole thing was a quick experiment, mostly completed over the space of a weekend, but its influence has been widely felt. The project has been cited in discussions around race, archives, visualisation and serendipitous discovery. It has been assigned for student reflection in university courses around the world and is regularly held up as an example of what the digital humanities has to offer. But whenever I give a talk about it in Australia, one question seems inevitably to arise – what is the copyright status of the images?

In a keynote address to New Zealand’s National Digital Forum in 2011, library technologist Michael Lascarides challenged those of us who work with digital cultural collections to ‘ask better questions’. When confronted with the remains of our racist history, when looking into the eyes of people whose lives were monitored and controlled by the government because of the colour of their skin, why should we feel compelled to consider the technicalities of copyright?

Surely there are better questions to ask?

©©©©©

I describe myself as a cultural data-hacker, but my business hours are currently spent on the other side of the fence, as the manager of Trove at the National Library of Australia.Trove is a discovery service that makes Australian resources easier to find and use. It’s a collection of collections, bringing together the holdings of many libraries, archives, museums, universities, government agencies and more. The most heavily used part of Trove is an ever-growing collection of digitised newspapers – the full text of more than 130 million articles documenting Australian history from 1803 onwards.

It’s hard to measure the cultural impact of Trove’s digitised newspapers. Technologies like optical character recognition (OCR) and keyword searching are now commonplace, but apply them to 150 years of Australian history and something transformative happens. Easy access to historical newspapers is changing our relationship with the past.

It’s not just about convenience – the ability to do your research at home in your pyjamas – although the significance of opening access to rural and remote communities across a large country like Australia shouldn’t be underestimated. It’s also about using the granularity of newspapers to expose the local, the particular, the personal and the ephemeral – glimpses of ordinary lives otherwise unrecorded.

Kate is a historian of Chinese Australia, interested in intimate relationships between Chinese men and white women. The people she studies often lived at the fringes of society and their lives can be difficult to recover. But searching across digitised newspapers she can find shards and fragments, stories of love and loss, full of the vivid, turbulent detail of everyday life. Each shard helps to build a bigger picture, a different view of Australian society and history.

Other researchers have mined the newspapers in pursuit of topics as diverse as invasive species, climate change, poetry and legal history. These uses will multiply as the corpus grows and our tools develop. Like a big telescope or a particle accelerator, digitised newspapers support large-scale fundamental research across a range of disciplines. Old papers have become a site for the creation of new knowledge.

But digitised newspapers are not solely the province of professional researchers. The size and diversity of their content support almost any interest, feed almost any passion. I recently harvested a sample of articles from the web that include links to Trove’s newspapers – 3116 webpages containing 13,389 links. We know that family and local historians make heavy use of the newspapers and their efforts are well represented in my sample. But there was more – sport, war, science, politics, architecture, music, art… from popular entertainment to academic treatises, from hateful diatribes to thoughtful reflections, they were all there.

More surprising than just the range of topics, styles and prejudices is the different ways the newspapers are used online. In 2013, for example, the local media in Western Australia reported on Cockburn City Council’s plan to erect a shark barrier at Coogee Beach. When one councillor expressed doubts, noting the lack of ‘serious or fatal shark attacks at Coogee Beach since records commenced in the 1800s’, a reader could quickly challenge her comments by citing two Trove newspaper articles that documented local attacks. References to the digitised newspapers are embedded online not just within narratives or compilations, but as commentary and debate. Trove provides a ready source of evidence to test historical claims without lengthy research or the mediation of experts.

Easy accessibility is helping to break down the otherness of the past, allowing it to be mobilised in contemporary discussions. New conversations between past and present are emerging around the digitised newspapers. Trove has launched us upon a massive ongoing experiment in collaborative meaning-making.
And it might never have happened.

©©©©©

Years before I was given the job as Trove manager, I was poking and prodding at the interface, trying to extract useful data to use in new applications, and generally making a nuisance of myself. Among the tools I created was QueryPic, a simple way of visualising newspaper searches. It’s been through several versions but the principle remains constant – just feed in your search query and QueryPic will create a line chart showing you the number of newspaper articles per year that match your query.

QueryPic is well used and has even been cited in scholarly articles, but I think its greatest value is as an example of what becomes possible when you make large quantities of cultural content available in digital form. Instead of the normal list of search results, QueryPic shows you trends and patterns. You can observe changes in language, the rise and fall of our cultural obsessions or the impact of major events. When did the ‘Great War’ become the ‘First World War’? Is that jolly Christmas visitor called ‘Father Christmas’ or ‘Santa Claus’? QueryPic helps you see things differently.

One of the most dramatic and unexpected patterns revealed by QueryPic is that Australian history ends in 1954. Who would have guessed? With very few exceptions, Trove’s collection of digitised newspapers comes to an abrupt halt in 1954, when the possibilities of the digital age meet the realities of copyright. You want to trace cultural patterns beyond 31 December 1954? Sorry, you’re out of luck.

Why 1954? We’re currently about halfway through the great AUSFTA culture drought. On 1 January 2005, the Australia–US Free Trade Agreement (AUSFTA) extended the standard period of copyright protection from fifty to seventy years and changed the way photographs are treated. We might have to wait until 2025 before Trove’s newspapers can start edging forward, year by year, beyond 1954.

But it’s even more complicated than that, as there’s no certainty that newspaper articles published before 1955 are out of copyright. To be sure, the National Library would have to investigate any named authors to confirm they all died before 1955. That’s simply impossible in a mass digitisation project. Instead the library weighed the copyright risks against the cultural benefit and decided to proceed. If the library had been more cautious, if the risks or uncertainties had seemed too great, Trove would have no digitised newspapers. And we would all be poorer.

These types of judgements are made all the time by cultural organisations wanting to open online access to their collections. Libraries, archives and museums are full of so-called ‘orphan’ works, whose creators cannot be identified or located. There’s no risk-free way of making this content available online.

But these risks are not only assessed and managed, they’re passed on to users, who must themselves try and navigate the thicket of copyright law. As use of online collections moves beyond traditional forms of citation into new types of digital aggregation, analysis and annotation, the doubts and complexities accumulate. The price of innovation is increased risk. The only safe course is to do nothing.

©©©©©

Why do we put cultural heritage collections online? Is it for the sake of efficiency, preservation, marketing or perhaps an informed citizenry? Usually we fall back on fuzzy notions of ‘engagement’ or ‘access’. More access is good, particularly if we can measure it easily through web stats.

Recently we surveyed Trove users to gain a broad picture of satisfaction and use. One finding in particular keeps me coming into work each day – 90 per cent of our general users agreed with the statement ‘Trove has made me interested in learning and discovering more’. Access can’t simply be measured in collection images or web-page hits. What we’re creating is an enlarged space for reflection, research, learning, creativity and critique. We’re enabling people to do more, with more.

Trove is not alone. Around the world, projects such as Europeana, the Digital Public Library of America (DPLA) and Digital New Zealand all work to open our collected cultural heritage to new forms of use. Europeana, in particular, has drawn on its research into the value and impact of online collections to proclaim a wonderfully ambitious agenda. Their aim is to ‘transform lives’ – to unlock Europe’s cultural heritage, enabling it to act as a ‘catalyst for social and economic change’.

Resisting Europeana’s efforts to ‘transform the world with culture’ are an array of different copyright regimes across Europe. Advocacy on behalf of the very idea of ‘openness’ is crucial to the success of their mission. But it’s never simply a matter of law.

Our cultural collections contain many resources that are already free of copyright restrictions, but it’s not always easy to find them. A lack of clear identification can stymie reuse as effectively as copyright restrictions – it’s not enough to share the resources, organisations also need to share licensing information so that open content can be easily discovered across collections. Sometimes institutional requirements for permission are weighed upon public domain resources, fostering doubt in place of certainty. Wherever copyright lingers, rights statements bloom in astonishing diversity. The DPLA estimates that there are more than 26,000 different rights statements attached to items in their aggregated collection. How are users expected to know what they’re allowed to do?

The complexity of copyright fosters confusion and uncertainty beyond the reach of mere law. It’s not just legislation that has to change.

Recently Dan Cohen, the Executive Director of the DPLA, argued that the licensing of cultural data should address more than just legal and technical issues. Instead of seeking to enforce acknowledgement of the source of the data through licence conditions, the DPLA wants to push discussion of attribution and reuse ‘into the social or ethical realm’ by ‘pairing a permissive license with a strong moral entreaty’. Instead of a statement about legal constraints, we have the opportunity for a conversation about what matters and why.

It might not figure in our web stats, but the National Library has plenty of anecdotal evidence that Trove changes lives – a grandfather’s face glimpsed for the first time, or perhaps a family mystery solved. One man, who grew up in care, found through Trove the only known photograph of himself as a child. How do we weigh such opportunities against questions of ownership and control?

We need to shift the discussion away from the nature of property to the value of use. What do we want to do? Online culture is read–write. We do not simply consume – we share, we remix, we curate and create. Increased participation brings new opportunities for understanding. Do we give them flight or lock them down?

©©©©©

One of the main sources of traffic to Trove, up there with Facebook and Wikipedia, is the knitting site Ravelry. Why? Because Ravelry users have found and shared hundreds of craft patterns from Trove’s digitised newspapers. And not just shared, but made. The most popular pattern, ‘Elegant elephant’, discovered in a 1959 edition of the Australian Women’s Weekly, has been made more than forty times, often with individual embellishments.

This to me seems a great example of the wonderful complexities that the digitisation of cultural heritage collections are introducing into our relationship with the past. A digital version of a fifty-year-old pattern is shared online and spawns a herd of cuddly elephants. From past to present, from digital to physical, the transformations pile one on top of another.

It’s also unexpected. It’s not a use that was designed into the system, it’s a new set of experiences brought to life through the passions and ingenuity of Trove users.

The application of digital tools to large cultural collections also promises to surprise. Techniques drawn from computational linguistics, for example, are being used to analyse the spread of ideas through nineteenth-century newspapers. Another project is using computer vision to identify poems in newspapers through their distinctive shapes. New structures can be found and visualised within digitised sources. New questions can be asked.

But these techniques also carry new risks. Researchers in the digital humanities, exploring the application of technology to fields such as literature and history, have been prominent in discussions around the copyright implications of mass digitisation projects and the legal status of text-mining. Increased clarity around concepts such as ‘transformative use’ is necessary to ensure that researchers have access to data and the confidence to explore.

And yet the ultimate goal isn’t certainty, it’s a greater awareness of the constraints around our engagement with the past. Within both government and the cultural sector the value of ‘open’ data is rightly proclaimed, but open data is always, to some extent, closed. Categories have been assigned, formats have been cleaned, deci- sions made about what belongs and what doesn’t – every spreadsheet contains an argument.

Each elephant is different. The tensions imposed by our overly complex system of copyright do at least remind us that the past can never be ‘open’, its limits cannot be legislated, its boundaries cannot be fixed. Beneath questions of access and use are better questions about our responsibilities to the past.

©©©©©

I’ll admit that part of my discomfort in being questioned about the copyright status of ‘The real face of White Australia’ stemmed from my ignorance. I just didn’t know the answer.

I’m still not sure.

I do know now that photographs taken before 1955 are okay. But these were attached to official forms, so perhaps the government owns the copyright. Are they published? I suspect the only way to be certain would be to seek permission from the current government department with responsibility for … what exactly? The White Australia Policy lives on, its workings preserved within the archives.

I’ll admit too that I always thought it would be interesting if some part of the government challenged our use of the documents. What exactly would they be claiming ownership of?

‘The real face of White Australia’ was motivated by a strong sense of responsibility towards those people whose lives are glimpsed through the records. To me the question of responsibility still seems more important than the intricacies of ownership. Our debts are to the people who confront us with their gaze, who defy the legislation that told them they did not belong.

Copyright law will never be able to make these judgements for us. No system can predict the individual ethical calculations that shape our engagement with the past. We may always be confronted with risks.

The word ‘access’ itself is full of politics. To what? By whom? When it comes to our cultural heritage we should never be satisfied. We must ask about the silences and the gaps. We must challenge the definitions. Access can never simply be given, to some extent it has to be taken. In the struggle we will find meaning.

There must always be risks. The point is to make the risks count.

My two lives

My blog hasn’t quite caught up the fact that I now have two jobs to go along with my two lives. Monday to Wednesday I’m still part of the Trove management team at the National Library of Australia, but on Thursdays and Fridays I’m Associate Professor of Digital Heritage in the Faculty of Arts and Design at the University of Canberra.

Screen Shot 2015-04-16 at 5.01.37 pm

I love working with the Trove team, but I also want to keep contributing to the development of the digital humanities in Australia through my own teaching and research. Hopefully now I can do both.

At the University of Canberra I’ll be helping to develop new digital heritage offerings in undergraduate, postgraduate and professional development courses. Exciting times ahead!

On the research front I’m hoping to reinvigorate a few stalled projects and poke around some more amidst the possibilities and politics of digital cultural collections.

These are the themes I’m thinking about at the moment.

Digital diversity

Using digital tools to expose alternative voices and experiences from within the cultural record.

  • Invisible Australians — yes it’s time to give this important project a home and kick it into gear
  • Every life counts — this is the working title for a new project around workplace deaths
  • I also want to expand on some of the questions in Seams and edges

Access, impact and understanding

How digitisation projects change our relationship with the past.

Data-enriched narratives

Developing new forms of online publication that use Linked Open Data to integrate historical writing and cultural collections.

All that in two days a week — wish me luck!

And if you’re interested in collaborating, please get in touch

Stories for machines, data for humans

Presented at the New Factual Storytelling symposium, 10 April 2015, University of Canberra

I feel like the nerdy kid at the cool kids’ party.

There are lots of interesting and creative projects on show today and I… well I want to talk to you about metadata.

Data. According to some pundits it’s the new oil, or the new electricity. Fuel for economic development — a raw material ready to be ‘mined’ for insights, innovation and our purchasing preferences.

In the cultural heritage sector the data metaphors are more likely to be framed around liberation than exploitation. Our data wants to be ‘open’. But there’s still a tendency to think on an industrial scale — it’s about pumping out large datasets for potential re-use.

What can be lost in metaphors of extraction and scale is an appreciation of the human origins of data. We are not buoys bobbing in the ocean reporting on the heights of passing waves. Big data is made up of many small acts of living.

So today I want to talk about small-scale, free-range, artisanal data. I want to talk about data, alongside storytelling, as the product of creativity, imagination, frustration and fury.

Let’s think for a moment about the work of a historian — identifying actors, defining relationships, documenting the complex networks that bring together people, places and events over time. It’s painstaking, exhilirating and potentially soul-destroying work. It’s also an exercise in data modelling. Whether the results are preserved in a triplestore, a spreadsheet, or on a drawer full of index cards — it’s nodes and edges, it’s entities and relationships, it’s data.

And that’s ok. Making data doesn’t condemn you to a rigidly empirical, deterministic framework. There’s always room for nuance, interpretation and doubt. there’s always room for stories.

But what happens when historians undertake the oddly-named process of ‘writing up’. The complex data models are flattened down to a series of sentences neatly arranged in linear sequence — our things become strings. The data is squeezed out and discarded, glimpsed only as fragile echoes hiding in footnotes.

This is of course part of the skill of historical writing — the ability to represent complex relationships through narrative. But why can’t we have our stories and data too?

This is a question I’ve returned to a number of times over the last few years.

It’s come up because I get excited about Linked Open Data’s potential to deliver structured, machine-readable information via the web. But then I wonder, whose stories will we be telling to the machines. How can we explore the expressive possibilities of Linked Open Data and not be constrained by instrumentalist assumptions about the models we make.

It’s come up because I get excited about embedding cultural heritage collections within the passion and practice of everyday life. Why squeeze out the data from historical publications when every article could be an online exhibition, every book could be a digital portal, every footnote could be a link for exploration and aggregation?

I don’t really have answers, but I do make stuff. I’ve a few goes at trying to create narratives that embed Linked Open Data.

They’re not very exciting from a design point of view, but I keep coming back to them because there still seems to be a lack of alternatives. There’s lots of talk about publishing Linked Open Data, but much less about how the use and consumption of Linked Open Data can be built into creative practice.

So here I am again.

This exercise has a number of constraints built in. The main one is NO PLATFORMS — a historian using a series of simple tools should be able to create and publish a data-driven web page without any dependencies. It should be as simple as uploading an html page to a server.

In my idealised workflow, the historian would manage their data about people, places, events and resources in a simple database capable of exporting a flavour of Linked Open Data known as JSON-LD.

docLab2015.004

docLab2015.006

Then, having created their narrative, they’d mark it up in the tool of their choice to relate specific names or phrases in the text to the entities in their database.

docLab2015.007

Then they’d just drop the text and the data into a html page and whack it on the web. With a bit of javascript magic to activate the data, you’d have something like this

docLab2015.008

The demo is live (though still under construction), so have a play.

  • Scroll the text to see those carefully inserted identifiers create pop ups in the sidebar.
  • The text has itself become data, each paragraph is an object — try filtering the text or linking to an individual paragraph.
  • Browse all the people, or resources. Explore all the relationships for Inigo Jones.
  • Mapping to existing identifiers from sources like Trove and Wikipedia help put the ‘linked’ into Linked Open Data.
  • There’s a rather boring map, a timeline and a wall view. New views could be easily added by dropping in some extra javascript.
  • It’s all data, so other visualisations and analyses might be created on the fly.

That’s what humans see, but what about machines? All the carefully curated data is exposed in a machine-readable form. Lots of triples…

docLab2015.009

The code for the viewer and maker is all available if anyone wants to play with it, and I’m intending this year to develop two substantial monographs using these tools. Both have many links into cultural collections.

My aim here is not to develop a fully-operational publishing system. I just want to get a better idea of what’s useful, what’s interesting, what’s possible. To think beyond the current limits of scholarly publishing into a world where data and narrative can live together, where interpretative work is represented in all data-inflected glory.

Myths, mega-projects and making

Keynote presented by video at EuropeanaTech 2015, 13 February 2015.

The video of this presentation is available on Vimeo and the slides are on SlideShare.


 

In the aftermath of World War II, Australian hopes for a new era of national progress were expressed in a massive engineering project called the Snowy Mountains Scheme. The project promised new reserves of water and electricity to power development of Australia’s inland.

‘The Snowy Mountains Scheme’, Canberra Times, 
26 December 1975, p. 5.

‘The Snowy Mountains Scheme’, Canberra Times, 
26 December 1975, p. 5. <http://nla.gov.au/nla.news-article102193928>

Rivers were diverted, towns were relocated, and new reservoirs were created. Over 145km of tunnels were carved through the granite peaks of Australia’s Great Dividing Range. Finally completed in the 1970s, the Snowy Mountains Scheme was an engineering marvel.

http://www.zenlan.com/collage/trove/#snowy scheme

http://www.zenlan.com/collage/trove/#snowy scheme

But this symbol of national pride would not have been possible without the labours of thousands of ’New Australians’ drawn from across Europe. Some were recruited because of their skills, others were plucked from displaced persons camps and offered the chance of a new life — as long as they were prepared to work where the Australian government wanted them to.

‘AT WORK ON SNOWY MOUNTAINS SCHEME’, Sydney Morning Herald, 8 January 1954, p. 5. 


‘AT WORK ON SNOWY MOUNTAINS SCHEME’, Sydney Morning Herald, 8 January 1954, p. 5. 
<http://nla.gov.au/nla.news-article18403620>

The human and environmental costs of the project are still debated, but the Snowy Scheme is regularly invoked as the country’s prime nation-building project — an example of what can be achieved together through vision, leadership, and toil.

Why am I talking about this today?

Well, I suppose it’s a great chance to say ‘Hey Europe, thanks for all the people!’.

But it’s also because I wanted to highlight the mythic qualities of the mega-project — the cultural power that resides in the ‘big idea’ that promises to set us upon a path towards the future.

We are here today because we are embarked upon ambitious undertakings.

Our projects aim to reshape the cultural landscape. We are building pipelines and reservoirs — moving massive amounts of data across countries, across the world.

But as we’ve tried to show, these large scale efforts are only possible because of many smaller, local collaborations.

The Snowy Scheme was built by individuals fleeing the disruptions of war. They took a risk in the hope of something better. It’s important for us to reflect on the contributions and motivations of our communities, our partners, and our users. A big idea isn’t enough.

You probably all know that as well as metadata from libraries, museums, archives and universities, Trove provides access to almost 150 million full-text digitised newspaper articles. The OCR’d text of the articles is fully searchable, but suffers from the usual errors and inaccuracies.

Fortunately Trove users have been eager to help. Anyone can jump in and correct the OCR output, and they do. More than 150 million lines of text have been corrected so far. Our top corrector (yes we have a scoreboard) has corrected more than 3 million lines of text.

Recently I’ve been thinking about this work and the limitations of language around online engagement. Our correctors are more than ‘users’ — ‘contributors’ perhaps, or ‘volunteers’?

But all of these words seem to place correctors on the other side of the interface — as clients rather than builders.

Each correction is a tweak of our search index. It changes the way the backend functions, increasing the efficiency of the system by getting people to the things they’re interested in more quickly.

Perhaps we should call our correctors ‘discovery engineers’?

The mythic mega-project maintains a sense of otherness — it is exceptional, an achievement above and beyond the realms of ordinary experience. But this obscures the many small acts of commitment and cooperation that make it possible. These are the expressions of ordinary lives, the routine and repetitive alongside moments of passion and meaning.

The success of our projects will ultimately depend not on the speed of our servers or the cleanliness of our code, but on the interactions that emerge as our aggregations become part of the simple business of living.

People correct our newspapers for many reasons, but few of these motivations are likely to align with our own strategic objectives.

It’s not just corrections either. More than 80,000 comments and 3 million tags have been added to resources in Trove. These are just plain text tags, we make no effort to control their content. This creates some interesting possibilities.

http://trove.nla.gov.au/tag

http://trove.nla.gov.au/tag

I wonder if you can guess the meaning of our most heavily-used tag? It’s ‘LRRSA’ and it’s attached to more than 16,000 items.

Any ideas? It’s an acronym that stands for the Light Rail Research Society of Australia. Members of the society use the tag to share material of interest — it’s become a means of collaboration.

Another popular tag is ‘TBD’ or ‘To be done’. This one’s used by text correctors to manage their own workflows.

The numerous guises of a simple tag illustrate the value of ‘underspecified tools’ — of leaving functionality open to ad hoc elaboration. The boundaries between systems and their use is fluid. Tagging behaviour can extend system functionality.

From machine to human and back again, the limits of what is possible are open to negotiation and change.

My favourite example of this is the work of one man who has been identifying out-of-copyright sheet music in Trove. He’s not a musician but he uses his computer to create performances of the pieces. He then uploads the performances to YouTube or SoundCloud and adds a link to them in a comment on Trove. People who find these works on Trove can now just click to hear them. The functionality of the system has been extended without a single line of code being written.

But the permeability of these boundaries means we can’t take the roles of people and machines as given. Five years ago, crowdsourced text-correction was a cost-effective solution to the vagaries of OCR, but as the technology improves do we continue to ask humans to undertake tasks that a machine might do more easily? Do we continue to ask our volunteers to change every instance of ‘tbe’ to ‘the’?

While an astonishing 150 million lines of text have been corrected, more than 96% of articles have no corrections at all. More articles are being added all the time and it seems the rate of corrections might be flattening out. The task seems beyond humans alone.

We’re currently redeveloping our newspapers interface, making it more responsive, adding shiny new browse features, and improving the overall performance. We’ll also be introducing some tools for ‘advanced’ text correction, allowing our users to modify not only the text, but some of the structural elements of the OCR — inserting new lines for example.

As we investigate opportunities for enrichment of our metadata, I think we’ll also need to think about the work we offer our discovery engineers. Correction could extend to geocoded placenames; named entity extraction could be integrated with user-defined relationships.

This technosocial shift is also evident at the other end of our pipelines, when our aggregated data is consumed and transformed.

Except APIs are not really pipelines are they? You don’t just turn on a tap, you have to ask the API a question. Our questions interact with the content of the reservoir to shape and colour the flow of data.

An API is a tool for transformation.

New tools and interfaces explicitly change the nature of our aggregations by carrying their use into different realms, by shifting contexts, by asking new questions. Each new use changes how we see the whole.

This is not reuse or recycling — this is remaking. We can dig the tunnels and fill the reservoirs, but it’s up to you — the coders, the builders, the developers and the makers — to show us what we’ve created.

The big challenge is to open up this transformative power to those who have no idea what an API is — people who have important and powerful questions to ask our APIs, but don’t know the language.

We need to make sure that the myth of the mega-project doesn’t blind us to the human dimensions of our undertaking. Let’s foster interventions as well as innovations, activists as well as evangelists. Let’s make sure our big ideas make space for other ideas to erupt and grow.

Seams and edges: Dreams of aggregation, access & discovery in a broken world

Presented at ALIA Online 2015, 3 February 2015 in Sydney. A longer version with bonus references will be made available on the ALIA Online site. Slides are on Slideshare.


 

In March 1930 the Sydney Electrical and Radio Exhibition opened in a blaze of excitement. Aboard his yacht in Genoa, inventor Guglielmo Marconi triggered a radio signal that reached across the world and switched on more than 2800 electric lights at the Sydney Town Hall. ‘All in less than a second!’, exclaimed the Sydney Mail, ‘Here was magic!’.

‘When Marconi Switched on the Lights The Sydney Electrical and Radio Exhibition’, Sydney Mail (NSW), 2 April 1930, p. 20.

‘When Marconi Switched on the Lights The Sydney Electrical and Radio Exhibition’, Sydney Mail (NSW), 2 April 1930, p. 20. <http://nla.gov.au/nla.news-article160633081>

According to the Sydney Morning Herald, radio had ‘eliminated time and distance’.

About a month later the British and Australian Prime Ministers spoke for the first time via wireless telephone. ‘These were days for the annihilation of time and space’, the British PM proclaimed.

Sounds familiar?

From railways to the telegraph, radio, and the internet, the progress of technology has often been imagined as a battle against time and space. Progress has been measured in the seconds we save, in the distances we conquer, in the barriers of terrain and politics we bridge.

Remember when we used to talk about the ‘Information superhighway’?

In the realm of information this march of conquest is accompanied by discussions of speed and scale, by adjectives such as ‘instantaneous’ and ‘seamless’.

And you don’t have to look too hard to find software and service vendors touting the promise of ‘seamless discovery’. Indeed, it turns out that ‘Seamless Discovery’ itself is the registered trademark of a video discovery platform used by Foxtel and others.

Technology promises instant access to information — a future beyond silos.

In the library world, seamless discovery is commonly associated with what are variously called ‘next-generation catalogues’, ‘web-scale discovery services’ or ‘discovery layers’.1

The idea is familiar and seductive. Instead of forcing searchers to construct multiple queries across a variety of databases, systems and interfaces, these services aggregate metadata from different sources and offer access through a single search portal.

A seam-free service is one that maximises ease-of-use.

We all know what such services look like, even if we’ve never used one. Search is no longer just a task to be accomplished in pursuit of a particular goal — to find a desired resource or piece of information.

Google has played a central role in re-engineering our understanding and expectations of online experience. Ours is increasingly a ‘culture of search’ where the technologies of discovery have become part of everyday life.2

It’s natural then that users of other discovery services will approach them with a set of expectations shaped by the Googlisation of modern culture.

It’s not just the simplicity of that single search box, it’s our faith that search will just work.

Every time Google responds to our query about some obscure piece of television trivia with 152 million results, we cannot fail to be impressed by the power at our fingertips. Every time Google predicts our query or customises our results we are beset with awe.

Here is magic.

Google’s dominance gives it immense power in presenting to us an image of the world constructed to it’s own secret formula. This power bears ontological weight — if we can’t find something on Google does it exist?

Of course we all want to make life as easy as possible for the people who use our services. The question is how the pursuit of a Google-like experience constrains our options and assumptions.

Metaphors matter. Pursuing ‘seamless discovery’ in the wake of Google means engaging with questions of politics and power.

Seams are not simply obstacles to a smooth user experience, they’re reminders that our online services are themselves constructed. There’s nothing natural or inevitable about a list of search results.

Mark Weiser, one of the pioneers of ubiquitous computing, argued against seamlessness because it made everything seem the same. Instead he imagined systems with ‘beautiful seams’ — that empowered users to manipulate their contexts and connections.3

As Mitchell Whitelaw notes ‘seamfulness is also an ethical and political stance’ — it’s a commitment to exposing the interpretative distance between our collection data and its online representation.4 There are opportunities here not only for transparency, but to explore alternatives to Google’s template for discovery.

Trove Mosaic by Mitchell Whitelaw.

Trove Mosaic by Mitchell Whitelaw. < http://mtchl.net/trovemosaic/>

Research into the visualisation of large cultural heritage collections has emphasised that search is only one way of representing a collection.

By focusing on the stylish minimalism of the search box, we discard opportunities for traversing relationships, for fostering serendipity, for seeing the big picture.

By creating experimental interfaces, by playing around with our expectations, we can start to think differently — to develop new metaphors for our online experience that are not framed around technological conquest.

Eyes on the past.

Eyes on the past. < http://eyespast.herokuapp.com/>

My own Eyes on the past, which allows you to find your way into Trove’s digitised newspapers through machine recognised faces and eyes, is far from a practical discovery tool. But building on my earlier work using facial detection technology as a means of archival intervention, it opens up questions about the lives embedded within our collections — we see them differently, we feel differently.

A Google-like search experience offers utility at the expense of critique. Its technologies are black boxed, its assumptions obscured.

How can those of us in the discovery business create a buffer for critical reflection while still meeting user expectations? What can we do in a service such as Trove that supports many thousands of enquiries a day?

I’d suggest we start with an acknowledgement of our limits, an attempt to trace the edges and the fractures that are too often glossed over in our pursuit of seamlessness. Let’s start by admitting what Trove is not:

  1. Trove is not perfect
  2. Trove is not everything
  3. Trove is not a machine

Trove is not perfect

Trove is an aggregator. It pulls together metadata from a variety of different sources, applies some normalisation across the required fields, and sends the results off to be indexed.

With close to 400 million resources harvested from hundreds of contributors through an assortment of different pipelines, it’s inevitable that there will be errors and oddities.

If you want to see errors, of course, you can head along to Trove newspapers zone where the limitations of Optical Character Recognition are on display for all to see. Unlike some full-text databases, Trove exposes the raw output of its OCR processing.

Trove’s transcriptions are improving all the time thanks to the efforts of thousands of online volunteers who correct the raw OCR output. Astonishingly, more than 130 million lines of text have been corrected by Trove users, in what is rightly touted as a highly successful crowdsourcing initiative.

But it’s also important to put this effort in perspective. Enter ‘has:corrections’ into the Trove search box to retrieve all the newspaper articles that have at least one crowdsourced correction. At the time I wrote this, the figure was 5,273,600 or just 3.6% of the total number of newspaper articles in Trove. Despite their important efforts, Trove’s volunteers will never be able to produce a perfect rendering of the newspaper content.

But what is ‘perfection’ anyway? OCR accuracy is important only in so far as it supports the interests and activities of users. For the purposes of discovery the accuracy of common search terms such as names, places or events are likely to be most important. But if you’re undertaking an analysis of changes in language across time, a much broader range of words would be significant.

Accuracy is something that need to be assessed and understood within the context of a specific activity.

Services like Trove have to be prepared to expose configurations, assumptions and limitations so that users can understand the impact of these of their own research.

If we are developing resources to support the creation of new knowledge we cannot simply black box our tech and trade on trust.

That’s Google’s game.

QueryPic is a simple tool that visualises search results in the Trove newspapers zone. QueryPic lets you see patterns and trends across the whole database.

When did the ‘Great War’ become the ‘First World War’? QueryPic can be used to explore this shift in terminology, but if you examine the results closely you’ll notice a small bump in the graph indicating that the term ‘World War I’ was being used during World War I. Huh?

When did the 'Great War' become the 'First World War'?

When did the ‘Great War’ become the ‘First World War’? < http://dhistory.org/querypic/43/>

If you drill down through the results you’ll find that this is because Trove users have been busily adding the tag ‘World War I’ to selected articles, and by default Trove searches user tags and comments as well as article text. The bump is an artefact of Trove’s search configuration.

Trove’s primary function is discovery — to make it as easy as possible for people to find things they’re interested in. But the sort of fuzziness that supports discovery works against other forms of analysis. We should make these sorts of assumptions more obvious.

By showing our seams, exposing our imperfections, we have the opportunity to educate. As well as helping people use Trove, we can open up bigger questions about the way search works on the web.

Trove is not everything

There’s nothing natural about our cultural collections or their digital representations — they have been created by many acts of selection, neglect, vision, accident and planning.

If you graph the number of newspaper articles in Trove by state and year you’ll notice a rather dramatic spike around 1914.

Newspaper articles in Trove by state and year.

Newspaper articles in Trove by state and year. < https://plot.ly/~wragge/22/trove-newspaper-articles-by-state/>

Why? Were more newspapers printed during the war era? The answer is simply funding. As part of the Australian Newspaper Digitisation Program, the NSW and Victorian State Libraries have chosen to invest in the digitisation of newspapers from the World War I period.

The contents of Trove’s newspaper zone, like any online collection, is constructed — shaped by many competing priorities. The consequences of this process are not always obvious.

In a competition for resources what gets digitised and why? There’s a danger that the sheer scale of aggregation services like Trove will reinforce existing prejudices. People already struggling for visibility and recognition within our cultural record might be lost amidst the overwhelming numbers of the safe and the sanctioned.

If we are concerned with absence as well as inclusion, with addressing the silences within our cultural record, we need to wary of sharing in Google’s aura of completeness. The ontological weight of search can too easily equate absence with non-existence.

But aggregation also offers new opportunities for analysis. Questions of representation and diversity can be explored through the metadata itself.

By way of a quick example, I used the Trove API to easily compare the languages spoken at home in Australia, according to the 2011 Census, with the languages of resources in Trove’s book zone.

Languages of Trove books compared to languages spoken at home in Australia (from 2011 Census).

It’s fascinating to consider how we might use socio-economic data to slice our cultural collections across the grain to reveal different patterns of access and exclusion.

By admitting the constructed nature of our collections, the gaps and the silences as well as their strengths, perhaps aggregations like Trove can become sites of both analysis and activism.

Trove is not a machine

Trove is not a single application, it’s a complex system with multiple components. This size and complexity focuses our attention on the technology — on the lines of code and racks of servers. But the system only exists to support human creativity and cooperation. Is it a machine, a community, or something else?

I often talk about Trove as a platform — it can be built upon in many ways, both through code and collaborations. In particular, by providing an open API, Trove invites the public to create new tools, analyses and interfaces.

But there are metaphorical dangers lurking here as well. Social media services such as Facebook and YouTube also describe themselves as platforms.5

If we are to embrace the ‘platform’ metaphor we must also be ready to unpack its implications. If we want progressive platforms we need to honestly address issues of openness, participation, and accessibility. Every API is an argument and no data is ever truly ‘open’.

For me the term ‘platform’ speaks of something unfinished — an invitation and an opportunity. Trove is permanently under construction, constantly improved through the labours of its developers and community.

This is most evident in the work of Trove’s text correctors, whose many small acts of repair help the technology to function more efficiently. But each tag or comment also changes Trove — aiding discovery, adding context, or creating new connections.

Other Trove-building activity is less visible, and the responsibilities more distributed. For example, Trove is currently working with Victorian Collections to bring many small, local collections from across Victoria into Trove.

But this collaboration is itself built on the labours of many people over many years — from the Museums Australia staff who train community groups, to the local volunteers who painstakingly digitise and describe their collections. Trove helps bring these efforts to the attention of the web, and is itself enriched.

For all the new terms we have for systems and devices we have thus far failed to find a language to describe online collaboration and social engagement. Instead we fall back on the awful term ’user’.6

By drawing attention away from ‘the machine’ to the many small acts that sustain and enlarge a service such as Trove, we create a space where language might evolve.

Broken worlds

Most technological futures are ultimately alienating and disempowering — people are cast as the passive consumers of the latest wonders and gadgets.

Instead of ‘progress’, Steven J. Jackson presents a vision of a fundamentally broken technosocial world, barely held together by numerous acts of concern, appropriation and repair.7 This focus on ‘repair’ helps us see the human agency at work, the possibilities for change.

What might happen if instead of seeing the seams and edges of our information landscape as speed bumps in the onward march of progress we recognised their fragility, and celebrated them as sites of collaboration, negotiation and repair?

What might we discover then?


 

  1. Joshua Barton and Lucas Mak, ‘Old Hopes, New Possibilities: Next-Generation Catalogues and the Centralization of Access’, Library Trends, vol. 61, no. 1, 2012, pp. 83–106. <http://muse.jhu.edu/journals/library_trends/v061/61.1.barton.html> []
  2. Ken Hillis, Michael Petit, and Kylie Jarrett, Google and the Culture of Search, Routledge, 2013. []
  3. Quoted in Matthew Chalmers and Ian MacColl, ‘Seamful and seamless design in ubiquitous computing’, in Workshop At the Crossroads: The Interaction of HCI and Systems Issues in UbiComp, 2003. []
  4. Mitchell Whitelaw, ‘Representing Digital Collections’, in Performing Digital: Multiple Perspectives on a Living Archive, ed. David Carlin and Laurene Vaughan, Ashgate Publishing, Farnham, UK, 2014. []
  5. Tarleton L. Gillespie, ‘The Politics of “Platforms”’, New Media & Society, vol. 12, no. 3, 1 May 2010. <http://papers.ssrn.com/abstract=1601487> []
  6. Peter Lyman, ‘Information Superhighways, Virtual Communities and Digital Libraries: Information society metaphors as political rhetoric’, in Technological Visions: The Hopes and Fears that Shape New Technologies, ed. Marita Sturken, Douglas Thomas, and Sandra J Ball Rokeach, Temple University Press, Philadelphia, 2004, pp. 201–218. []
  7. Steven J. Jackson, ‘Rethinking repair’, Media meets technology, MIT Press, 2013. []

2014 — the making and the talking

2014 — the making and the talking

This my now traditional ‘what I done this year’ post, which, if nothing else, makes me check that my various experiments are still alive. It’s been a challenging year trying to balance my work as Trove Manager with my broader passions and responsibilities as a member of the digital humanities community. So yeah. My personal highlights included heading to Japan to give a keynote at the annual conference of the Japanese Association of Digital Humanities, building Eyes on the past, and resurrecting THATCamp Canberra.

2015 is shaping up as both exciting and scary. On the scary front there’s the whole giving a keynote to hundreds of the world’s leading digital humanities scholars at DH2015 thing (cue imposter syndrome). There’ll also be the launch of Copyfight from NewSouth Publishing, which includes contributions from me and some really-real writers. I’m looking forward to squeezing in some more work on Invisible Australians and a few other research projects. Stay tuned.

The making

Inserting usual disclaimer here that this is not what I get paid to do as Trove Manager. These are projects and experiments I undertake in my own time, for my own reasons, at the cost of my own sanity. So all the problems and mistakes are also mine.

The talking

Sketching with Python and Plotly

Sketching with Python and Plotly

I’m currently trying to make some progress with my ‘seams and edges’ paper for ALIAOnline 2015 and naturally ended up writing some code (what me procrastinate?). I was wondering about ways of exploring the ‘representativeness’ of an aggregation like Trove — what’s there and what’s not — so started noodling around with the Trove API.

The first result was a graph representing the numbers of Trove contributors and resources by state, compared to the population of that state. All values are displayed as percentages of the total.

The ACT is over-represented, of course, because of the holdings of the National Library itself. The under-representation of Queensland looks interesting — something to explore in the future.

My next graph used data on languages spoken at home in Australia from the 2011 census. It compared the population speaking those languages with the number of books in that language included in Trove, again as percentages of the total. It doesn’t embed very well, so view the full-size version on Plotly.

As I was playing around I noticed a tweet from Bridget Griffen-Foley:

Being in a quick-coding sort of mood I had to see how long it would take me to create a graph showing the numbers of daily newspapers in Trove (where daily is defined as more than 300 issues in a year). The answer was about fifteen minutes.

All of the graphs are created using the web service Plotly. Plotly has an easy-to-use Python API which means all you need to do to create a graph is to add a few lines of code. There are other Python visualisation libraries, but I like Plotly because it creates something instantly shareable — perfectly suited to this sort of quick and dirty experimentation.

I don’t think any of these graphs are particularly revealing, and I’ve made some assumptions about the data that probably wouldn’t hold up under scrutiny. But what this fiddling around emphasised was how an API and some simple tools make it possible to ask quick questions of the data.

All the code is in my Trove-Sketches repository on GitHub.

Life on the outside: Collections, contexts, and the wild, wild web

Keynote presented at the Annual Conference of the Japanese Association for the Digital Humanities, 20 September 2014, Tsukuba.

The full set of slides is available on SlideShare.

Cross-published on Medium.


 

This is Tatsuzo Nakata. In 1913 he was living on Thursday Island in the Torres Strait, just off the northern tip of Australia.

life on the outside.002

From the late 19th century there was a substantial Japanese population on Thursday Island, mostly associated with the development of the pearling industry.

I’ll admit that I know very little about Tatsuzo, and I’ve selected him more or less at random from a large body of records held by the National Archives of Australia.

I present him here out of context and in too little detail, simply as an example. Working backwards from this photograph I want to restore some layers of context and reveal to you a complex and shameful history.

This photograph was attached to an official government form called a ‘Certificate Exempting From Dictation Test’.

From the form we learn that the 32 year-old Tatsuzo was born in Wakayama. He had a scar over his right eye.

life on the outside.004

Tatsuzo carried a copy of this form with him when he departed for Japan aboard the Yawata Maru in May 1913. When he returned the following year the form was collected and compared with a duplicate held by port officials. The forms matched, and Tatsuzo was allowed to disembark.

To help confirm his identity, the form carried on its reverse side an impression of Tatsuzo’s hand.

life on the outside.005

You might think that this was a travel document — an early form of visa perhaps. But at the top of the form you’ll notice a reference to the Immigration Restriction Act, a piece of legislation introduced by the newly-federated Australian nation in 1901. The Immigration Restriction Act and the complex bureaucratic procedures that supported its administration came to be known more generally as the White Australia Policy.

If Tatsuzo had tried to return to Australia without one of these forms, he would have been subjected to the Dictation Test, and he would have failed. Despite its benign-sounding name, the Dictation Test was a form of racial exclusion aimed at anyone deemed non-white. No-one was meant to pass. If he hadn’t carried this form exempting him from the Dictation Test, Tatsuzo would most likely have been denied re-entry.

This certificate is drawn from one of more than 14,000 files in Series J2483 in the National Archives of Australia. This series is solely concerned with the administration of the White Australia Policy. There are many other series from other ports and other time periods full of documents like this. The National Archives holds many, many thousands of these certificates documenting the lives and movements of people considered out of place in a White Australia.

Photographs, forms, files, series, legislation — this small shard of Tatsuzo’s life is preserved as part of a racist system of exclusion and control. But what happens when we extract the photos from their context within the recordkeeping system and simply present them as people?

I’ve created a site where you can explore some of the records relating to Japanese people held in Series J2483. Instead of navigating lists of files, you can start with faces — with the people, not the system.

life on the outside.008

I’m starting today with Tatsuzo and this wall of faces because what I want to explore are some of the complexities of context.

Shark Attack!

After a series of fatal shark attacks in Australian waters, the community of Port Hacking, in southern Sydney, began to wonder if they too were at risk.

In January 2014 the local newspaper published an article under the heading ‘Shark “cover up” in Port Hacking’ alleging that research into the dangers had been suppressed.

Ten days later the newspaper followed up with details of the area’s only recorded fatal shark attack in 1927. A local government member, it reported, had ‘unearthed the article on Trove’.

‘It’s long been a story that a boy was killed by a shark at Grays Point many years ago’, he said, ‘I knew about it 30 to 40 years ago but if you talk to people around here, nobody knows about it’.

‘A lot of people say there are no sharks in Port Hacking but this is rubbish’, he added.

Let me reassure anyone thinking about coming to DH2015 in Sydney next year that shark attacks are extremely rare.

What interested me about these articles was not the risk of gruesome death, but the relationship between past and present. The question of whether shark attacks were possible could be answered — simply by searching Trove.

Trove

For those who don’t know, Trove is a discovery service developed and maintained by the National Library of Australia. Like Europeana, the Digital Public Library of America, and DigitalNZ, it aggregates resources from the cultural heritage sector, and beyond.

It also provides access to more than 130 million newspaper articles from 1803 onwards. The articles are drawn from over 600 different titles — large and small, rural and metropolitan — with more are being added all the time.

Search for just about anything and you’re likely to find a match of some sort amongst the digitised newspapers. So of course I searched for Tsukuba

life on the outside.015

Trove is also a community. Users correct the OCR’d text of newspaper articles. They also add thousands of tags and comments to resources across Trove.

  • 138,000 users
  • 3,000,000 tags
  • 80,000 comments
  • 139,000,000 corrections
  • 58,000 lists

Perhaps my favourite example of user-generated content on Trove are the Lists. Lists are pretty much what they sound like — collections of resources. They make it easy for you to save and share your research. But more than tags or comments they expose people’s interests and passions. They give some insight into the many acts of meaning-making that occur in and around Trove.

Lists are also exposed through Trove’s Application Programming Interface (API) in a form fit for machine consumption. So with just a dash of code I can harvest the titles of all public lists and do some very basic word frequency analysis courtesy of Voyant Tools.

life on the outside.017

There’s nothing too surprising here — we know that family historians are our largest user group. But we can also see the long tail in action — the way that huge collections like Trove can support very focused, specific interests.

Which leads me back to shark attacks.

Old Speak

The Port Hacking article made me wonder how many other web pages there might be out on the wider web that cited Trove newspapers in a discussion of shark attacks. The answer was many. But what was most interesting wasn’t the volume of references, it was the variety of contexts — in blog posts, on Facebook, in fishing forums.

‘Ahh, old time newspapers are fascinating things aren’t they?’, notes one post in a weather forum, citing details of a shark attack in Sydney from 1952.

On a fishing site, a thread on bull shark attacks in Western Australia’s Swan River begins: ‘I found a great website to view really old newspapers in perth. Just found a few swan river shark storys [sic]…’.

The author follows up with a direct link to the Trove search page, prompting the exchange:

Redfin 4 Life: ‘Haha you would never know there had been that many incedents in the swan without seeing these…’

Goodz: ‘Oh how newspapers have changed the way the write… love the old speak!’

Alan James: ‘That’s right Goodz, and more often than not I’m sure they actually reported the truth.’

So a discussion of shark attacks turns to a consideration of the changing style of newspaper reporting.

Perhaps even more interesting is the way that digitised newspapers are used to test a hypothesis, challenge an interpretation, or argue a case. As in the Port Hacking case, questions about the history of shark attacks can be explored without needing to turn to experts, history books, or official statistics.

So when a local politician is quoted as saying ‘there have not been any serious or fatal shark attacks at Coogee Beach since records commenced in the 1800s’, a reader can respond with two Trove newspaper citations and the comment: ‘No previous shark attacks? Or are they only searching for fatalities?’

When a media outlet asks its Facebook followers whether the export of live sheep from Western Australia might be increasing the number of shark attacks off the coast, one follower can simply share a Trove link to a newspaper article from 1950 and ask ‘Did they have live sheep export in 1950?’

I don’t want to argue that these interactions are particularly profound or remarkable. In fact I’d suggest that they’re interesting because they’re not remarkable. 130 million digitised newspaper articles chronicling 150 years of Australian history are just another resource woven into the fabric of online experience. The past can be mobilised, shared and embedded in our daily interactions as easily as pictures of cats.

Traces

And it’s not just shark attacks. To explore the variety of contexts in which Trove newspaper articles are used and shared, I started mining backlinks.

Backlinks, as the name suggests, are just links out there on the wild, wild web that point back to your site. You can find them in your referrer logs, in Google’s webmaster tools, or simply by searching. I started with a ‘try before you buy’ sample of backlinks from an SEO service.

From there I wrote a script to harvest the linking pages, remove duplicates, extract the newspaper references, retrieve the article details from the Trove API, and save everything to a database for easy exploration. You can play with the results online.

life on the outside.025

I ended up harvesting 3116 pages from 1780 domains containing 13,389 links to 11,242 articles in Trove. Remember that’s just a sample of all the links to Trove newspapers out there on the web.

What was more surprising than the raw numbers was the diversity of content across those pages. I knew that family and local historians were busily blogging about their Trove discoveries, but I didn’t know that Trove newspapers were being cited in discussions about politics, science, war, sport, music — just about any topic you could imagine.

Nor are these discussions just about Australia. A little quick and dirty analysis suggests that more than 30 languages are represented across those 3000 pages.

life on the outside.027

This is a work in progress. I hope to expand my hunt for traces — crawling sites for additional references, mining referrals, and inviting the public to nominate pages for inclusion. By adding a simple API I could make it possible for Trove to include links back to relevant pages, like trackbacks on a blog. I also want to understand more about the scope of the content and the motivations of its authors. What is going on here?

Undoubtedly some of these pages constitute link spam or attempts to game search engines, but most do not. Browsing the database you find many examples of interpretation, persistence, and passion. People around the world have something they want to say, something they want to share, and Trove’s millions of newspaper articles provide them with a readily-accessible source of inspiration and evidence.

It’s clear that those many small acts of meaning-making we can observe in Trove’s activity statistics extend beyond a single site — to a much much wider (and wilder) world.

Scale

One day earlier this year, Trove received more than three times its usual number of visitors.

life on the outside.029

The culprit was the WTF subreddit — a popular place for sharing the weirdities of the web. Someone posted a link to a Trove newspaper article describing the unfortunate demise of a poodle called Cachi, whose fall from a thirteenth-story balcony in Buenos Aires resulted in the deaths of three passers-by.

As well as causing a dramatic spike in Trove’s visitor stats, the post received more than 3000 votes and attracted 677 comments on reddit. Cachi was a hit.

Trove articles pop up regularly on reddit. The traffic spikes they bring are reminders that however proud we might be of our stats, we are but a tiny corner of the web. There’s something much bigger out there.

Michael Peter Edson has long sought to alert cultural heritage organisations to the challenges of scale. In a recent essay he described the web’s ‘dark matter':

There’s just an enormous, humongous, gigantic audience out there connected to the Internet that is starving for authenticity, ideas, and meaning. We’re so accustomed to the scale of attention that we get from visitation to bricks-and-mortar buildings that it’s difficult to understand how big the Internet is—and how much attention, curiosity, and creativity a couple of billion people can have.

Libraries, archives and museums, he argues, need to meet the public where they are, to recognise that vigorous sites of meaning-making are scattered across the vast terrain of the web. Trove newspaper traces and reddit spikes are mere glimpses of the ‘dark matter’ of cultural activity that lurks beneath the apps, the stats, and the corporate hype.

People are already using our digital stuff in ways we don’t expect. The question is whether libraries, archives and museums see this hunger for connection as an invitation or a threat. Do we join the party, or call the police to complain about the noise?

Sharing

There’s something fundamentally human about sharing. Yes, it’s easy to mock the shallowness of a Facebook ‘Like'; to see our obsession with followers, friends and retweets as evidence of our dwindling capacity for attention — reducing engagement and understanding to a single click. But haven’t we always shared — through stories, gossip, jokes, performances, and rituals? Rather than being measured against a threshold of meaning, surely each act of sharing exists on a continuum from the flippant to the philosophical. Just because the act of sharing has been commodified by large social media services seeking to mine our preferences for profit, doesn’t mean it lacks deeper human significance.

A retweet can represent a fleeting interest, a brief moment of distraction. But it can also mark the start of a journey.

Cultural heritage institutions around the world have begun to recognise that sharing is not just a marketing strategy, it’s a mission. As Merete Sanderhoff notes in her foreword to the anthology Sharing is Caring:

When cultural heritage is digital, open and shareable, it becomes common property, something that is right at hand every day. It becomes a part of us.

Aggregation services, like Trove, the Digital Public Library of America, Europeana, and DigitalNZ, bring resources together to share them more easily with the world. Aggregation is only worthwhile if it serves discovery and reuse — it’s a process of mobilisation, rather than collection. As Europeana argues in their 2020 strategy:

We believe culture is a catalyst for social and economic change. But that’s only possible if it’s readily usable and easily accessible for people to build with, build on and share.

Of course the hard part is understanding what makes something ‘readily usable and easily accessible’. What balance do we need between push and pull? Between ease-of-use and technical power? Between licensing and liberty? Between context and creativity?

Busy Bots

The Mechanical Curator was born in the British Library Labs as part of their innovative digital scholarship program. In September 2013, she started posting to Tumblr random images automatically extracted from a collection of 65,000 digitised 19th century books.

It was, Ben O’Steen explained, an experiment in ‘providing undirected engagement with the British Library’s digital content’. The book illustrations moved from inside to outside, opening opportunities for discovery beyond the covers.

But that was just the beginning. A few months later the Mechanical Curator dramatically expanded its labours, uploading more than a million public domain images to Flickr.

What followed was something of a cultural feeding frenzy as people from all over the world starting sharing, tagging, collecting, and creating with this rich assortment of 19th century illustrations. Since then the images have been mashed up into new works, added and organised in the Wikimedia Commons, and featured in an installation at the Burning Man festival in Nevada.

life on the outside.038

Having been locked away within books for more than a hundred years, the illustrations were given new life online as works in their own right. Opportunities for innovation and expression were created by a rupture in context.

Meanwhile on Twitter, a growing army of bots was liberating items from cultural collections around the world. Inspired by the bot-making genius of Mark Sample, I created @TroveNewsBot in June 2013 to tweet newspaper articles from Trove.

He was joined by @DPLABot, @EuropeanaBot, @Kasparbot, @CurtinLibBot, @DigitalNZ.bot, @museumbot, @cooperhewittbot, @bklynmuseumbot, and no doubt others — all sharing random collection items. Of course @MechCuratorBot soon joined the fray from the British Library, and I eventually added @Trovebot to tweet material from all the non-newspapery sections of Trove.

The possibilities of serendipitous discovery are receiving increasing attention within the digital humanities. At DH2014, Kim Martin and Anabel Quan-Haase critically examined four DH tools — including @TroveNewsBot — in the light of existing models of serendipity. Their discussion noted that randomness is not the same as serendipity, and outlined how serendipity could be understood as type of encounter with information. I do wonder though if what makes the bots interesting is not randomness as such, but the way randomness can play around with our assumptions about context.

Steve Lubar observes that the random offerings of collection bots can also expose the choices that are made in the creation and display of cultural collections. Randomness can challenge our expectations. Describing the genesis of the Mechanical Curator, James Baker notes:

And so as what at first seemed simple descends into complexity the Mechanical Curator achieves her peculiar aim: giving knowledge with one hand, carpet bombing the foundations of that knowledge with the other.

The Trove bots I created do more than tweet random offerings, they also allow you to interact with Trove without ever leaving Twitter. Send a few keywords their way and they’ll do your searching for you, tweeting back the most relevant result. You can modify their default behaviour by adding a series of hashtags — #luckydip, for example, will spice your result with a touch of randomness.

More interestingly, perhaps, you can tweet a url at them and they’ll extract keywords from the web page and use them to construct the search. This means that @TroveNewsBot can offer commentary on current events.

Several times a day he retrieves the latest headlines from a news site and searches for something similar amidst Trove’s 130 million historical newspaper articles. What emerges is a strange conversation between past and present.

life on the outside.041

These bots do not simply present collection items outside of the familiar context of discovery interfaces or online exhibitions, they move the encounter itself into a wholly new space. Just as the Mechanical Curator liberates illustrations from the printed page, the Twitter bots loosen the institutional context of collections to allow them to participate in a space where people already congregate. They send collection items out into the wilds of the web, to find new meanings, new connections and perhaps even new love.

Broken & Repaired

But letting go can be scary. A 2008 survey of libraries, archives and museums revealed that one of the main factors inhibiting the opening up of online collections was the desire to avoid misrepresentation, mislabeling or misuse of cultural objects. Easy sharing brings the risk that our carefully curated content will be shorn of context and bounced around the web — adrift and abused.

Earlier this year Sarah Werner took aim at Twitter feeds that pump out streams of ‘historical’ photos — unattributed and often wrongly captioned. But it wasn’t simply the lack of attribution that angered her:

These accounts capitalize on a notion that history is nothing more than superficial glimpses of some vaguely defined time before ours, one that exists for us to look at and exclaim over and move on from without worrying about what it means and whether it happened.

I have to admit that the excitement of seeing Trove’s visitor numbers suddenly soar thanks to reddit is frequently tempered by the realisation that what is being shared is yet another story of gruesome death, violence, or misfortune. 150 years of Australian history is reduced to clickbait by our tabloid sensibilities. Most of those who arrive from reddit read the article and click away — the bounce rate is around 97%. This is not ‘engagement’?

And yet, I can’t help but wonder about the 3% who don’t immediately leave, who pause and look around. Three percent of a lot is still a lot — a lot of people who might have been exposed to Trove and Australian history for the very first time. Similarly while the viral pics industry is frustrating and exploitative, it might yet offer opportunities to learn.

One of my favourite Twitter accounts is @PicsPedant. It monitors many of the viral pics feeds, researches the images, and tweets the results — providing a steady stream of attributions, corrections, critiques, and context. Not only do you find out about the images, you pick up research tips, and learn about the cannibalistic tendencies of the pic bots themselves — constantly recycling content from their kin.

@AhistoricalPics offers a different form of education, satirising the whole viral pics genre with its fabricated captions, and pricking at our own inclination to believe.

life on the outside.045

Freeing collections opens them to misuse, but it also exposes that misuse to analysis and critique. Contexts can be rediscovered as well as lost, restored as well as broken.

Generous signposts

It’s wonderful to see many Trove newspaper articles shared on Twitter. Unfortunately a significant proportion of these come from climate change deniers, who mine the newspapers for freak weather events and past climatic theories, imagining that such reports undermine current research. This is bad science and bad history. Their efforts are also well-represented in my database of web page citations, along with expressions of hatred and prejudice that I’d prefer to stay submerged. It’s depressing, but it seems inevitable that people will do bad things with your stuff.

In a recent post about the DPLA’s metadata licensing arrangements, Dan Cohen suggested we should look beyond technical and legal controls around online use towards social and ethical guidelines:

The cynics, of course, will say that bad actors will do bad things with all that open data. But here’s the thing about the open web: bad actors will do bad things, regardless… The flip side of worries about bad actors is that we underestimate the number of good actors doing the right thing.

Bad people will do bad things, but by asserting a social and ethical framework for the use of digital cultural collections we strengthen the resolve and commitment of those who want to do right.

Already there are examples in the work of the Local Contexts project which is developing a series of licenses and labels to guide use of traditional knowledge and cultural materials. Similarly, Creative Commons Aotearoa New Zealand have been developing an Indigenous Knowledge Notice to educate the public about what constitutes appropriate use.

We should remember too that footnotes have always been at the heart of an ethical pact. The Australian historian Tom Griffiths has described footnotes as ‘honest expressions of vulnerability’ — ‘generous signposts to anyone who wants to retrace the path and test the insights’. This ‘professional paraphernalia’ has, he argues, grown out of a series of ethical questions:

To whom are we responsible – to the people in our stories, to our sources, to our informants, to our readers and audiences, to the integrity of the past itself? How do we pay our respects, allow for dissent, accommodate complexity, distinguish between our voice and those of our characters?1

Such questions remain crucial as we consider the relationship between cultural collections and their online users. If we expect people to erect ‘generous signposts’ we have to make our stuff easy to find and share. If we want them to consider their responsibility to the past we should focus on providing trust, confidence, and support, not permission.

Responsibilities

If my wall of faces seems seems familiar, it might be because a few years ago I created something similar called The Real Face of White Australia.

The two walls use different sets of records, but they were constructed in much the same way: I reverse-engineered the National Archives’ online database, downloaded images of digitised files, and used a facial detection script to identify and extract faces.

The Real Face of White Australia was an experiment, built over the course of a weekend. But its discomfiting power was immediately evident. Where there had been records, there were people — looking at us, challenging us.

My partner Kate Bagnall is a historian of Chinese-Australia and we were working together on a project called Invisible Australians, aimed at liberating the lives of these people from the bureaucracy of the White Australia Policy.

The project was motivated by a strong sense of responsibility — not to the National Archives, not to the records, but to the people themselves.

We often talk about preserving context as if it’s an end in itself; as if context is just a set of attributes to be catalogued and controlled. The exciting, terrifying, wonderful thing about the wild, wild web is how it upsets our notions of relevance and meaning. Historic newspapers can find their way into contemporary debates. Century-old illustrations can be remade as art. Twitter bots can inspire conversations with collections. The people buried inside a recordkeeping system can be brought at last to the surface. Contexts are unstable, shifting. And through that instability we can glimpse other worlds, we can imagine alternatives, we can build something new.

What’s important is not training users to understand the context of our collections, but helping them explore and understand their responsibilities to the pasts those collections represent.

Let’s remove technical barriers, minimise legal restrictions, and trust in the good will of our audiences. Instead of building shrines to our descriptive methodologies, let’s create systems that provide stable shareable anchors, that connect, but don’t constrain.

Contexts will flow and mingle, some will fade and some will burn. Contexts will survive not because we demand it in our terms of service, or embed them in our interfaces, but because they capture something that matters.

The ways we find and use cultural collections will continue to change, but questions about responsibility, value, and meaning will remain.

 

  1. Tom Griffiths, ‘History and the creative imagination’, History Australia, Vol. 6, No. 3, 2009. []

On seams and edges

On seams and edges

Recently I submitted the abstract below for ALIA Information Online 2015. I haven’t heard yet whether it’s been accepted, but I thought I’d post it here anyway because, even if I don’t get to talk about it at the conference, I want to think about the topic some more. If nothing else, this is an extended NTS…

Many thanks to @edsu and @nowviskie for pointing me towards ideas of ‘repair’ and ‘broken world thinking’, which I reckon will help me develop the arguments I was gesturing towards earlier this year in a talk on The Future of Trove. In that talk I drew on some of my old research on the nature of progress to describe a future for Trove that avoided visions of technological power and sophistication:

The future of Trove shouldn’t be envisaged in terms of slick interfaces and fast search (though I’d like some more of that).

The future of Trove will be messy, it will be complicated, and it will be complicated, because life is just like that, and while Trove is built of metadata, it’s powered by the people that contribute, use, share and annotate that metadata.

Life can also be disappointing, painful and disturbing, and all of that too must figure in the future of Trove.

It’s important to try and see Trove as a series of accommodations, agreements, and annotations, rather than as a big aggregation machine. There’s a fragility in the connections that we make that needs to be understood. There’s no inevitability here, but many acts of goodwill, generosity, and repair.

More to come on this, I hope… (I’m also collecting some relevant bits and pieces in Zotero.)

On seams and edges — dreams of aggregation, access & discovery in a broken world

Visions of technological utopia often portray an increasingly ‘seamless’ world, where technology integrates experience across space and time. Edges are blurred as we move easily between devices and contexts, between the digital and the physical.

But Mark Weiser, one of the pioneers of ubiquitous computing, questioned the idea of seamlessness, arguing instead for ‘beautiful seams’ — exposed edges that encouraged questions and the exploration of connections and meanings.

With discovery services and software vendors still promoting ‘seamless discovery’ as one of their major selling points, it seems the value of seams and edges requires further discussion. As we imagine the future of a service such as Trove, how do we balance the benefits of consistency, coordination and centralisation against the reality of a fragmented, unequal, and fundamentally broken world.

This paper will examine the rhetoric of ‘seamlessness’ in the world of discovery services, focusing in particular on the possibilities and problems facing Trove. By analysing both the literature around discovery, and the data about user behaviours currently available through Trove, I intend to expose the edges of meaning-making and explore the role of technology in both inhibiting and enriching experience.

How does our dream of comprehensiveness mask the biases in our collections? How do new tools for visualisation reinforce the invisibility of the missing and excluded? How do the assumptions of ‘access’ direct attention away from practical barriers to participation?

How does the very idea of systems and services, of complex and powerful ‘machines’ ready to do our bidding, discourage us from seeing the many, fragile acts of collaboration, connection, interpretation, and repair that hold these systems together?

Trove is an aggregator and a community; a collection of metadata and a platform for engagement. But as we imagine its future, how do avoid the rhetoric of technological power, and expose its seams and edges to scrutiny.