• My two lives

    My blog hasn’t quite caught up the fact that I now have two jobs to go along with my two lives. Monday to Wednesday I’m still part of the Trove management team at the National Library of Australia, but on Thursdays and Fridays I’m Associate Professor of Digital Heritage in the Faculty of Arts and […]

  • Stories for machines, data for humans

    Presented at the New Factual Storytelling symposium, 10 April 2015, University of Canberra I feel like the nerdy kid at the cool kids’ party. There are lots of interesting and creative projects on show today and I… well I want to talk to you about metadata. Data. According to some pundits it’s the new oil, […]

  • Myths, mega-projects and making

    Keynote presented by video at EuropeanaTech 2015, 13 February 2015. The video of this presentation is available on Vimeo and the slides are on SlideShare.   In the aftermath of World War II, Australian hopes for a new era of national progress were expressed in a massive engineering project called the Snowy Mountains Scheme. The project promised […]

  • Seams and edges: Dreams of aggregation, access & discovery in a broken world

    Presented at ALIA Online 2015, 3 February 2015 in Sydney. A longer version with bonus references will be made available on the ALIA Online site. Slides are on Slideshare.   In March 1930 the Sydney Electrical and Radio Exhibition opened in a blaze of excitement. Aboard his yacht in Genoa, inventor Guglielmo Marconi triggered a […]

  • 2014 — the making and the talking

    This my now traditional ‘what I done this year’ post, which, if nothing else, makes me check that my various experiments are still alive. It’s been a challenging year trying to balance my work as Trove Manager with my broader passions and responsibilities as a member of the digital humanities community. So yeah. My personal […]

  • Sketching with Python and Plotly

    I’m currently trying to make some progress with my ‘seams and edges’ paper for ALIAOnline 2015 and naturally ended up writing some code (what me procrastinate?). I was wondering about ways of exploring the ‘representativeness’ of an aggregation like Trove — what’s there and what’s not — so started noodling around with the Trove API. The […]

  • Life on the outside: Collections, contexts, and the wild, wild web

    Keynote presented at the Annual Conference of the Japanese Association for the Digital Humanities, 20 September 2014, Tsukuba. The full set of slides is available on SlideShare. Cross-published on Medium.   This is Tatsuzo Nakata. In 1913 he was living on Thursday Island in the Torres Strait, just off the northern tip of Australia. From […]

My two lives

My blog hasn’t quite caught up the fact that I now have two jobs to go along with my two lives. Monday to Wednesday I’m still part of the Trove management team at the National Library of Australia, but on Thursdays and Fridays I’m Associate Professor of Digital Heritage in the Faculty of Arts and Design at the University of Canberra.

Screen Shot 2015-04-16 at 5.01.37 pm

I love working with the Trove team, but I also want to keep contributing to the development of the digital humanities in Australia through my own teaching and research. Hopefully now I can do both.

At the University of Canberra I’ll be helping to develop new digital heritage offerings in undergraduate, postgraduate and professional development courses. Exciting times ahead!

On the research front I’m hoping to reinvigorate a few stalled projects and poke around some more amidst the possibilities and politics of digital cultural collections.

These are the themes I’m thinking about at the moment.

Digital diversity

Using digital tools to expose alternative voices and experiences from within the cultural record.

  • Invisible Australians — yes it’s time to give this important project a home and kick it into gear
  • Every life counts — this is the working title for a new project around workplace deaths
  • I also want to expand on some of the questions in Seams and edges

Access, impact and understanding

How digitisation projects change our relationship with the past.

Data-enriched narratives

Developing new forms of online publication that use Linked Open Data to integrate historical writing and cultural collections.

All that in two days a week — wish me luck!

And if you’re interested in collaborating, please get in touch

Stories for machines, data for humans

Presented at the New Factual Storytelling symposium, 10 April 2015, University of Canberra

I feel like the nerdy kid at the cool kids’ party.

There are lots of interesting and creative projects on show today and I… well I want to talk to you about metadata.

Data. According to some pundits it’s the new oil, or the new electricity. Fuel for economic development — a raw material ready to be ‘mined’ for insights, innovation and our purchasing preferences.

In the cultural heritage sector the data metaphors are more likely to be framed around liberation than exploitation. Our data wants to be ‘open’. But there’s still a tendency to think on an industrial scale — it’s about pumping out large datasets for potential re-use.

What can be lost in metaphors of extraction and scale is an appreciation of the human origins of data. We are not buoys bobbing in the ocean reporting on the heights of passing waves. Big data is made up of many small acts of living.

So today I want to talk about small-scale, free-range, artisanal data. I want to talk about data, alongside storytelling, as the product of creativity, imagination, frustration and fury.

Let’s think for a moment about the work of a historian — identifying actors, defining relationships, documenting the complex networks that bring together people, places and events over time. It’s painstaking, exhilirating and potentially soul-destroying work. It’s also an exercise in data modelling. Whether the results are preserved in a triplestore, a spreadsheet, or on a drawer full of index cards — it’s nodes and edges, it’s entities and relationships, it’s data.

And that’s ok. Making data doesn’t condemn you to a rigidly empirical, deterministic framework. There’s always room for nuance, interpretation and doubt. there’s always room for stories.

But what happens when historians undertake the oddly-named process of ‘writing up’. The complex data models are flattened down to a series of sentences neatly arranged in linear sequence — our things become strings. The data is squeezed out and discarded, glimpsed only as fragile echoes hiding in footnotes.

This is of course part of the skill of historical writing — the ability to represent complex relationships through narrative. But why can’t we have our stories and data too?

This is a question I’ve returned to a number of times over the last few years.

It’s come up because I get excited about Linked Open Data’s potential to deliver structured, machine-readable information via the web. But then I wonder, whose stories will we be telling to the machines. How can we explore the expressive possibilities of Linked Open Data and not be constrained by instrumentalist assumptions about the models we make.

It’s come up because I get excited about embedding cultural heritage collections within the passion and practice of everyday life. Why squeeze out the data from historical publications when every article could be an online exhibition, every book could be a digital portal, every footnote could be a link for exploration and aggregation?

I don’t really have answers, but I do make stuff. I’ve a few goes at trying to create narratives that embed Linked Open Data.

They’re not very exciting from a design point of view, but I keep coming back to them because there still seems to be a lack of alternatives. There’s lots of talk about publishing Linked Open Data, but much less about how the use and consumption of Linked Open Data can be built into creative practice.

So here I am again.

This exercise has a number of constraints built in. The main one is NO PLATFORMS — a historian using a series of simple tools should be able to create and publish a data-driven web page without any dependencies. It should be as simple as uploading an html page to a server.

In my idealised workflow, the historian would manage their data about people, places, events and resources in a simple database capable of exporting a flavour of Linked Open Data known as JSON-LD.



Then, having created their narrative, they’d mark it up in the tool of their choice to relate specific names or phrases in the text to the entities in their database.


Then they’d just drop the text and the data into a html page and whack it on the web. With a bit of javascript magic to activate the data, you’d have something like this


The demo is live (though still under construction), so have a play.

  • Scroll the text to see those carefully inserted identifiers create pop ups in the sidebar.
  • The text has itself become data, each paragraph is an object — try filtering the text or linking to an individual paragraph.
  • Browse all the people, or resources. Explore all the relationships for Inigo Jones.
  • Mapping to existing identifiers from sources like Trove and Wikipedia help put the ‘linked’ into Linked Open Data.
  • There’s a rather boring map, a timeline and a wall view. New views could be easily added by dropping in some extra javascript.
  • It’s all data, so other visualisations and analyses might be created on the fly.

That’s what humans see, but what about machines? All the carefully curated data is exposed in a machine-readable form. Lots of triples…


The code for the viewer and maker is all available if anyone wants to play with it, and I’m intending this year to develop two substantial monographs using these tools. Both have many links into cultural collections.

My aim here is not to develop a fully-operational publishing system. I just want to get a better idea of what’s useful, what’s interesting, what’s possible. To think beyond the current limits of scholarly publishing into a world where data and narrative can live together, where interpretative work is represented in all data-inflected glory.

Myths, mega-projects and making

Keynote presented by video at EuropeanaTech 2015, 13 February 2015.

The video of this presentation is available on Vimeo and the slides are on SlideShare.


In the aftermath of World War II, Australian hopes for a new era of national progress were expressed in a massive engineering project called the Snowy Mountains Scheme. The project promised new reserves of water and electricity to power development of Australia’s inland.

‘The Snowy Mountains Scheme’, Canberra Times, 
26 December 1975, p. 5.

‘The Snowy Mountains Scheme’, Canberra Times, 
26 December 1975, p. 5. <http://nla.gov.au/nla.news-article102193928>

Rivers were diverted, towns were relocated, and new reservoirs were created. Over 145km of tunnels were carved through the granite peaks of Australia’s Great Dividing Range. Finally completed in the 1970s, the Snowy Mountains Scheme was an engineering marvel.

http://www.zenlan.com/collage/trove/#snowy scheme

http://www.zenlan.com/collage/trove/#snowy scheme

But this symbol of national pride would not have been possible without the labours of thousands of ’New Australians’ drawn from across Europe. Some were recruited because of their skills, others were plucked from displaced persons camps and offered the chance of a new life — as long as they were prepared to work where the Australian government wanted them to.

‘AT WORK ON SNOWY MOUNTAINS SCHEME’, Sydney Morning Herald, 8 January 1954, p. 5. 

‘AT WORK ON SNOWY MOUNTAINS SCHEME’, Sydney Morning Herald, 8 January 1954, p. 5. 

The human and environmental costs of the project are still debated, but the Snowy Scheme is regularly invoked as the country’s prime nation-building project — an example of what can be achieved together through vision, leadership, and toil.

Why am I talking about this today?

Well, I suppose it’s a great chance to say ‘Hey Europe, thanks for all the people!’.

But it’s also because I wanted to highlight the mythic qualities of the mega-project — the cultural power that resides in the ‘big idea’ that promises to set us upon a path towards the future.

We are here today because we are embarked upon ambitious undertakings.

Our projects aim to reshape the cultural landscape. We are building pipelines and reservoirs — moving massive amounts of data across countries, across the world.

But as we’ve tried to show, these large scale efforts are only possible because of many smaller, local collaborations.

The Snowy Scheme was built by individuals fleeing the disruptions of war. They took a risk in the hope of something better. It’s important for us to reflect on the contributions and motivations of our communities, our partners, and our users. A big idea isn’t enough.

You probably all know that as well as metadata from libraries, museums, archives and universities, Trove provides access to almost 150 million full-text digitised newspaper articles. The OCR’d text of the articles is fully searchable, but suffers from the usual errors and inaccuracies.

Fortunately Trove users have been eager to help. Anyone can jump in and correct the OCR output, and they do. More than 150 million lines of text have been corrected so far. Our top corrector (yes we have a scoreboard) has corrected more than 3 million lines of text.

Recently I’ve been thinking about this work and the limitations of language around online engagement. Our correctors are more than ‘users’ — ‘contributors’ perhaps, or ‘volunteers’?

But all of these words seem to place correctors on the other side of the interface — as clients rather than builders.

Each correction is a tweak of our search index. It changes the way the backend functions, increasing the efficiency of the system by getting people to the things they’re interested in more quickly.

Perhaps we should call our correctors ‘discovery engineers’?

The mythic mega-project maintains a sense of otherness — it is exceptional, an achievement above and beyond the realms of ordinary experience. But this obscures the many small acts of commitment and cooperation that make it possible. These are the expressions of ordinary lives, the routine and repetitive alongside moments of passion and meaning.

The success of our projects will ultimately depend not on the speed of our servers or the cleanliness of our code, but on the interactions that emerge as our aggregations become part of the simple business of living.

People correct our newspapers for many reasons, but few of these motivations are likely to align with our own strategic objectives.

It’s not just corrections either. More than 80,000 comments and 3 million tags have been added to resources in Trove. These are just plain text tags, we make no effort to control their content. This creates some interesting possibilities.



I wonder if you can guess the meaning of our most heavily-used tag? It’s ‘LRRSA’ and it’s attached to more than 16,000 items.

Any ideas? It’s an acronym that stands for the Light Rail Research Society of Australia. Members of the society use the tag to share material of interest — it’s become a means of collaboration.

Another popular tag is ‘TBD’ or ‘To be done’. This one’s used by text correctors to manage their own workflows.

The numerous guises of a simple tag illustrate the value of ‘underspecified tools’ — of leaving functionality open to ad hoc elaboration. The boundaries between systems and their use is fluid. Tagging behaviour can extend system functionality.

From machine to human and back again, the limits of what is possible are open to negotiation and change.

My favourite example of this is the work of one man who has been identifying out-of-copyright sheet music in Trove. He’s not a musician but he uses his computer to create performances of the pieces. He then uploads the performances to YouTube or SoundCloud and adds a link to them in a comment on Trove. People who find these works on Trove can now just click to hear them. The functionality of the system has been extended without a single line of code being written.

But the permeability of these boundaries means we can’t take the roles of people and machines as given. Five years ago, crowdsourced text-correction was a cost-effective solution to the vagaries of OCR, but as the technology improves do we continue to ask humans to undertake tasks that a machine might do more easily? Do we continue to ask our volunteers to change every instance of ‘tbe’ to ‘the’?

While an astonishing 150 million lines of text have been corrected, more than 96% of articles have no corrections at all. More articles are being added all the time and it seems the rate of corrections might be flattening out. The task seems beyond humans alone.

We’re currently redeveloping our newspapers interface, making it more responsive, adding shiny new browse features, and improving the overall performance. We’ll also be introducing some tools for ‘advanced’ text correction, allowing our users to modify not only the text, but some of the structural elements of the OCR — inserting new lines for example.

As we investigate opportunities for enrichment of our metadata, I think we’ll also need to think about the work we offer our discovery engineers. Correction could extend to geocoded placenames; named entity extraction could be integrated with user-defined relationships.

This technosocial shift is also evident at the other end of our pipelines, when our aggregated data is consumed and transformed.

Except APIs are not really pipelines are they? You don’t just turn on a tap, you have to ask the API a question. Our questions interact with the content of the reservoir to shape and colour the flow of data.

An API is a tool for transformation.

New tools and interfaces explicitly change the nature of our aggregations by carrying their use into different realms, by shifting contexts, by asking new questions. Each new use changes how we see the whole.

This is not reuse or recycling — this is remaking. We can dig the tunnels and fill the reservoirs, but it’s up to you — the coders, the builders, the developers and the makers — to show us what we’ve created.

The big challenge is to open up this transformative power to those who have no idea what an API is — people who have important and powerful questions to ask our APIs, but don’t know the language.

We need to make sure that the myth of the mega-project doesn’t blind us to the human dimensions of our undertaking. Let’s foster interventions as well as innovations, activists as well as evangelists. Let’s make sure our big ideas make space for other ideas to erupt and grow.

Seams and edges: Dreams of aggregation, access & discovery in a broken world

Presented at ALIA Online 2015, 3 February 2015 in Sydney. A longer version with bonus references will be made available on the ALIA Online site. Slides are on Slideshare.


In March 1930 the Sydney Electrical and Radio Exhibition opened in a blaze of excitement. Aboard his yacht in Genoa, inventor Guglielmo Marconi triggered a radio signal that reached across the world and switched on more than 2800 electric lights at the Sydney Town Hall. ‘All in less than a second!’, exclaimed the Sydney Mail, ‘Here was magic!’.

‘When Marconi Switched on the Lights The Sydney Electrical and Radio Exhibition’, Sydney Mail (NSW), 2 April 1930, p. 20.

‘When Marconi Switched on the Lights The Sydney Electrical and Radio Exhibition’, Sydney Mail (NSW), 2 April 1930, p. 20. <http://nla.gov.au/nla.news-article160633081>

According to the Sydney Morning Herald, radio had ‘eliminated time and distance’.

About a month later the British and Australian Prime Ministers spoke for the first time via wireless telephone. ‘These were days for the annihilation of time and space’, the British PM proclaimed.

Sounds familiar?

From railways to the telegraph, radio, and the internet, the progress of technology has often been imagined as a battle against time and space. Progress has been measured in the seconds we save, in the distances we conquer, in the barriers of terrain and politics we bridge.

Remember when we used to talk about the ‘Information superhighway’?

In the realm of information this march of conquest is accompanied by discussions of speed and scale, by adjectives such as ‘instantaneous’ and ‘seamless’.

And you don’t have to look too hard to find software and service vendors touting the promise of ‘seamless discovery’. Indeed, it turns out that ‘Seamless Discovery’ itself is the registered trademark of a video discovery platform used by Foxtel and others.

Technology promises instant access to information — a future beyond silos.

In the library world, seamless discovery is commonly associated with what are variously called ‘next-generation catalogues’, ‘web-scale discovery services’ or ‘discovery layers’.1

The idea is familiar and seductive. Instead of forcing searchers to construct multiple queries across a variety of databases, systems and interfaces, these services aggregate metadata from different sources and offer access through a single search portal.

A seam-free service is one that maximises ease-of-use.

We all know what such services look like, even if we’ve never used one. Search is no longer just a task to be accomplished in pursuit of a particular goal — to find a desired resource or piece of information.

Google has played a central role in re-engineering our understanding and expectations of online experience. Ours is increasingly a ‘culture of search’ where the technologies of discovery have become part of everyday life.2

It’s natural then that users of other discovery services will approach them with a set of expectations shaped by the Googlisation of modern culture.

It’s not just the simplicity of that single search box, it’s our faith that search will just work.

Every time Google responds to our query about some obscure piece of television trivia with 152 million results, we cannot fail to be impressed by the power at our fingertips. Every time Google predicts our query or customises our results we are beset with awe.

Here is magic.

Google’s dominance gives it immense power in presenting to us an image of the world constructed to it’s own secret formula. This power bears ontological weight — if we can’t find something on Google does it exist?

Of course we all want to make life as easy as possible for the people who use our services. The question is how the pursuit of a Google-like experience constrains our options and assumptions.

Metaphors matter. Pursuing ‘seamless discovery’ in the wake of Google means engaging with questions of politics and power.

Seams are not simply obstacles to a smooth user experience, they’re reminders that our online services are themselves constructed. There’s nothing natural or inevitable about a list of search results.

Mark Weiser, one of the pioneers of ubiquitous computing, argued against seamlessness because it made everything seem the same. Instead he imagined systems with ‘beautiful seams’ — that empowered users to manipulate their contexts and connections.3

As Mitchell Whitelaw notes ‘seamfulness is also an ethical and political stance’ — it’s a commitment to exposing the interpretative distance between our collection data and its online representation.4 There are opportunities here not only for transparency, but to explore alternatives to Google’s template for discovery.

Trove Mosaic by Mitchell Whitelaw.

Trove Mosaic by Mitchell Whitelaw. < http://mtchl.net/trovemosaic/>

Research into the visualisation of large cultural heritage collections has emphasised that search is only one way of representing a collection.

By focusing on the stylish minimalism of the search box, we discard opportunities for traversing relationships, for fostering serendipity, for seeing the big picture.

By creating experimental interfaces, by playing around with our expectations, we can start to think differently — to develop new metaphors for our online experience that are not framed around technological conquest.

Eyes on the past.

Eyes on the past. < http://eyespast.herokuapp.com/>

My own Eyes on the past, which allows you to find your way into Trove’s digitised newspapers through machine recognised faces and eyes, is far from a practical discovery tool. But building on my earlier work using facial detection technology as a means of archival intervention, it opens up questions about the lives embedded within our collections — we see them differently, we feel differently.

A Google-like search experience offers utility at the expense of critique. Its technologies are black boxed, its assumptions obscured.

How can those of us in the discovery business create a buffer for critical reflection while still meeting user expectations? What can we do in a service such as Trove that supports many thousands of enquiries a day?

I’d suggest we start with an acknowledgement of our limits, an attempt to trace the edges and the fractures that are too often glossed over in our pursuit of seamlessness. Let’s start by admitting what Trove is not:

  1. Trove is not perfect
  2. Trove is not everything
  3. Trove is not a machine

Trove is not perfect

Trove is an aggregator. It pulls together metadata from a variety of different sources, applies some normalisation across the required fields, and sends the results off to be indexed.

With close to 400 million resources harvested from hundreds of contributors through an assortment of different pipelines, it’s inevitable that there will be errors and oddities.

If you want to see errors, of course, you can head along to Trove newspapers zone where the limitations of Optical Character Recognition are on display for all to see. Unlike some full-text databases, Trove exposes the raw output of its OCR processing.

Trove’s transcriptions are improving all the time thanks to the efforts of thousands of online volunteers who correct the raw OCR output. Astonishingly, more than 130 million lines of text have been corrected by Trove users, in what is rightly touted as a highly successful crowdsourcing initiative.

But it’s also important to put this effort in perspective. Enter ‘has:corrections’ into the Trove search box to retrieve all the newspaper articles that have at least one crowdsourced correction. At the time I wrote this, the figure was 5,273,600 or just 3.6% of the total number of newspaper articles in Trove. Despite their important efforts, Trove’s volunteers will never be able to produce a perfect rendering of the newspaper content.

But what is ‘perfection’ anyway? OCR accuracy is important only in so far as it supports the interests and activities of users. For the purposes of discovery the accuracy of common search terms such as names, places or events are likely to be most important. But if you’re undertaking an analysis of changes in language across time, a much broader range of words would be significant.

Accuracy is something that need to be assessed and understood within the context of a specific activity.

Services like Trove have to be prepared to expose configurations, assumptions and limitations so that users can understand the impact of these of their own research.

If we are developing resources to support the creation of new knowledge we cannot simply black box our tech and trade on trust.

That’s Google’s game.

QueryPic is a simple tool that visualises search results in the Trove newspapers zone. QueryPic lets you see patterns and trends across the whole database.

When did the ‘Great War’ become the ‘First World War’? QueryPic can be used to explore this shift in terminology, but if you examine the results closely you’ll notice a small bump in the graph indicating that the term ‘World War I’ was being used during World War I. Huh?

When did the 'Great War' become the 'First World War'?

When did the ‘Great War’ become the ‘First World War’? < http://dhistory.org/querypic/43/>

If you drill down through the results you’ll find that this is because Trove users have been busily adding the tag ‘World War I’ to selected articles, and by default Trove searches user tags and comments as well as article text. The bump is an artefact of Trove’s search configuration.

Trove’s primary function is discovery — to make it as easy as possible for people to find things they’re interested in. But the sort of fuzziness that supports discovery works against other forms of analysis. We should make these sorts of assumptions more obvious.

By showing our seams, exposing our imperfections, we have the opportunity to educate. As well as helping people use Trove, we can open up bigger questions about the way search works on the web.

Trove is not everything

There’s nothing natural about our cultural collections or their digital representations — they have been created by many acts of selection, neglect, vision, accident and planning.

If you graph the number of newspaper articles in Trove by state and year you’ll notice a rather dramatic spike around 1914.

Newspaper articles in Trove by state and year.

Newspaper articles in Trove by state and year. < https://plot.ly/~wragge/22/trove-newspaper-articles-by-state/>

Why? Were more newspapers printed during the war era? The answer is simply funding. As part of the Australian Newspaper Digitisation Program, the NSW and Victorian State Libraries have chosen to invest in the digitisation of newspapers from the World War I period.

The contents of Trove’s newspaper zone, like any online collection, is constructed — shaped by many competing priorities. The consequences of this process are not always obvious.

In a competition for resources what gets digitised and why? There’s a danger that the sheer scale of aggregation services like Trove will reinforce existing prejudices. People already struggling for visibility and recognition within our cultural record might be lost amidst the overwhelming numbers of the safe and the sanctioned.

If we are concerned with absence as well as inclusion, with addressing the silences within our cultural record, we need to wary of sharing in Google’s aura of completeness. The ontological weight of search can too easily equate absence with non-existence.

But aggregation also offers new opportunities for analysis. Questions of representation and diversity can be explored through the metadata itself.

By way of a quick example, I used the Trove API to easily compare the languages spoken at home in Australia, according to the 2011 Census, with the languages of resources in Trove’s book zone.

Languages of Trove books compared to languages spoken at home in Australia (from 2011 Census).

It’s fascinating to consider how we might use socio-economic data to slice our cultural collections across the grain to reveal different patterns of access and exclusion.

By admitting the constructed nature of our collections, the gaps and the silences as well as their strengths, perhaps aggregations like Trove can become sites of both analysis and activism.

Trove is not a machine

Trove is not a single application, it’s a complex system with multiple components. This size and complexity focuses our attention on the technology — on the lines of code and racks of servers. But the system only exists to support human creativity and cooperation. Is it a machine, a community, or something else?

I often talk about Trove as a platform — it can be built upon in many ways, both through code and collaborations. In particular, by providing an open API, Trove invites the public to create new tools, analyses and interfaces.

But there are metaphorical dangers lurking here as well. Social media services such as Facebook and YouTube also describe themselves as platforms.5

If we are to embrace the ‘platform’ metaphor we must also be ready to unpack its implications. If we want progressive platforms we need to honestly address issues of openness, participation, and accessibility. Every API is an argument and no data is ever truly ‘open’.

For me the term ‘platform’ speaks of something unfinished — an invitation and an opportunity. Trove is permanently under construction, constantly improved through the labours of its developers and community.

This is most evident in the work of Trove’s text correctors, whose many small acts of repair help the technology to function more efficiently. But each tag or comment also changes Trove — aiding discovery, adding context, or creating new connections.

Other Trove-building activity is less visible, and the responsibilities more distributed. For example, Trove is currently working with Victorian Collections to bring many small, local collections from across Victoria into Trove.

But this collaboration is itself built on the labours of many people over many years — from the Museums Australia staff who train community groups, to the local volunteers who painstakingly digitise and describe their collections. Trove helps bring these efforts to the attention of the web, and is itself enriched.

For all the new terms we have for systems and devices we have thus far failed to find a language to describe online collaboration and social engagement. Instead we fall back on the awful term ’user’.6

By drawing attention away from ‘the machine’ to the many small acts that sustain and enlarge a service such as Trove, we create a space where language might evolve.

Broken worlds

Most technological futures are ultimately alienating and disempowering — people are cast as the passive consumers of the latest wonders and gadgets.

Instead of ‘progress’, Steven J. Jackson presents a vision of a fundamentally broken technosocial world, barely held together by numerous acts of concern, appropriation and repair.7 This focus on ‘repair’ helps us see the human agency at work, the possibilities for change.

What might happen if instead of seeing the seams and edges of our information landscape as speed bumps in the onward march of progress we recognised their fragility, and celebrated them as sites of collaboration, negotiation and repair?

What might we discover then?


  1. Joshua Barton and Lucas Mak, ‘Old Hopes, New Possibilities: Next-Generation Catalogues and the Centralization of Access’, Library Trends, vol. 61, no. 1, 2012, pp. 83–106. <http://muse.jhu.edu/journals/library_trends/v061/61.1.barton.html> []
  2. Ken Hillis, Michael Petit, and Kylie Jarrett, Google and the Culture of Search, Routledge, 2013. []
  3. Quoted in Matthew Chalmers and Ian MacColl, ‘Seamful and seamless design in ubiquitous computing’, in Workshop At the Crossroads: The Interaction of HCI and Systems Issues in UbiComp, 2003. []
  4. Mitchell Whitelaw, ‘Representing Digital Collections’, in Performing Digital: Multiple Perspectives on a Living Archive, ed. David Carlin and Laurene Vaughan, Ashgate Publishing, Farnham, UK, 2014. []
  5. Tarleton L. Gillespie, ‘The Politics of “Platforms”’, New Media & Society, vol. 12, no. 3, 1 May 2010. <http://papers.ssrn.com/abstract=1601487> []
  6. Peter Lyman, ‘Information Superhighways, Virtual Communities and Digital Libraries: Information society metaphors as political rhetoric’, in Technological Visions: The Hopes and Fears that Shape New Technologies, ed. Marita Sturken, Douglas Thomas, and Sandra J Ball Rokeach, Temple University Press, Philadelphia, 2004, pp. 201–218. []
  7. Steven J. Jackson, ‘Rethinking repair’, Media meets technology, MIT Press, 2013. []

2014 — the making and the talking

2014 — the making and the talking

This my now traditional ‘what I done this year’ post, which, if nothing else, makes me check that my various experiments are still alive. It’s been a challenging year trying to balance my work as Trove Manager with my broader passions and responsibilities as a member of the digital humanities community. So yeah. My personal highlights included heading to Japan to give a keynote at the annual conference of the Japanese Association of Digital Humanities, building Eyes on the past, and resurrecting THATCamp Canberra.

2015 is shaping up as both exciting and scary. On the scary front there’s the whole giving a keynote to hundreds of the world’s leading digital humanities scholars at DH2015 thing (cue imposter syndrome). There’ll also be the launch of Copyfight from NewSouth Publishing, which includes contributions from me and some really-real writers. I’m looking forward to squeezing in some more work on Invisible Australians and a few other research projects. Stay tuned.

The making

Inserting usual disclaimer here that this is not what I get paid to do as Trove Manager. These are projects and experiments I undertake in my own time, for my own reasons, at the cost of my own sanity. So all the problems and mistakes are also mine.

The talking

Sketching with Python and Plotly

Sketching with Python and Plotly

I’m currently trying to make some progress with my ‘seams and edges’ paper for ALIAOnline 2015 and naturally ended up writing some code (what me procrastinate?). I was wondering about ways of exploring the ‘representativeness’ of an aggregation like Trove — what’s there and what’s not — so started noodling around with the Trove API.

The first result was a graph representing the numbers of Trove contributors and resources by state, compared to the population of that state. All values are displayed as percentages of the total.

The ACT is over-represented, of course, because of the holdings of the National Library itself. The under-representation of Queensland looks interesting — something to explore in the future.

My next graph used data on languages spoken at home in Australia from the 2011 census. It compared the population speaking those languages with the number of books in that language included in Trove, again as percentages of the total. It doesn’t embed very well, so view the full-size version on Plotly.

As I was playing around I noticed a tweet from Bridget Griffen-Foley:

Being in a quick-coding sort of mood I had to see how long it would take me to create a graph showing the numbers of daily newspapers in Trove (where daily is defined as more than 300 issues in a year). The answer was about fifteen minutes.

All of the graphs are created using the web service Plotly. Plotly has an easy-to-use Python API which means all you need to do to create a graph is to add a few lines of code. There are other Python visualisation libraries, but I like Plotly because it creates something instantly shareable — perfectly suited to this sort of quick and dirty experimentation.

I don’t think any of these graphs are particularly revealing, and I’ve made some assumptions about the data that probably wouldn’t hold up under scrutiny. But what this fiddling around emphasised was how an API and some simple tools make it possible to ask quick questions of the data.

All the code is in my Trove-Sketches repository on GitHub.

Life on the outside: Collections, contexts, and the wild, wild web

Keynote presented at the Annual Conference of the Japanese Association for the Digital Humanities, 20 September 2014, Tsukuba.

The full set of slides is available on SlideShare.

Cross-published on Medium.


This is Tatsuzo Nakata. In 1913 he was living on Thursday Island in the Torres Strait, just off the northern tip of Australia.

life on the outside.002

From the late 19th century there was a substantial Japanese population on Thursday Island, mostly associated with the development of the pearling industry.

I’ll admit that I know very little about Tatsuzo, and I’ve selected him more or less at random from a large body of records held by the National Archives of Australia.

I present him here out of context and in too little detail, simply as an example. Working backwards from this photograph I want to restore some layers of context and reveal to you a complex and shameful history.

This photograph was attached to an official government form called a ‘Certificate Exempting From Dictation Test’.

From the form we learn that the 32 year-old Tatsuzo was born in Wakayama. He had a scar over his right eye.

life on the outside.004

Tatsuzo carried a copy of this form with him when he departed for Japan aboard the Yawata Maru in May 1913. When he returned the following year the form was collected and compared with a duplicate held by port officials. The forms matched, and Tatsuzo was allowed to disembark.

To help confirm his identity, the form carried on its reverse side an impression of Tatsuzo’s hand.

life on the outside.005

You might think that this was a travel document — an early form of visa perhaps. But at the top of the form you’ll notice a reference to the Immigration Restriction Act, a piece of legislation introduced by the newly-federated Australian nation in 1901. The Immigration Restriction Act and the complex bureaucratic procedures that supported its administration came to be known more generally as the White Australia Policy.

If Tatsuzo had tried to return to Australia without one of these forms, he would have been subjected to the Dictation Test, and he would have failed. Despite its benign-sounding name, the Dictation Test was a form of racial exclusion aimed at anyone deemed non-white. No-one was meant to pass. If he hadn’t carried this form exempting him from the Dictation Test, Tatsuzo would most likely have been denied re-entry.

This certificate is drawn from one of more than 14,000 files in Series J2483 in the National Archives of Australia. This series is solely concerned with the administration of the White Australia Policy. There are many other series from other ports and other time periods full of documents like this. The National Archives holds many, many thousands of these certificates documenting the lives and movements of people considered out of place in a White Australia.

Photographs, forms, files, series, legislation — this small shard of Tatsuzo’s life is preserved as part of a racist system of exclusion and control. But what happens when we extract the photos from their context within the recordkeeping system and simply present them as people?

I’ve created a site where you can explore some of the records relating to Japanese people held in Series J2483. Instead of navigating lists of files, you can start with faces — with the people, not the system.

life on the outside.008

I’m starting today with Tatsuzo and this wall of faces because what I want to explore are some of the complexities of context.

Shark Attack!

After a series of fatal shark attacks in Australian waters, the community of Port Hacking, in southern Sydney, began to wonder if they too were at risk.

In January 2014 the local newspaper published an article under the heading ‘Shark “cover up” in Port Hacking’ alleging that research into the dangers had been suppressed.

Ten days later the newspaper followed up with details of the area’s only recorded fatal shark attack in 1927. A local government member, it reported, had ‘unearthed the article on Trove’.

‘It’s long been a story that a boy was killed by a shark at Grays Point many years ago’, he said, ‘I knew about it 30 to 40 years ago but if you talk to people around here, nobody knows about it’.

‘A lot of people say there are no sharks in Port Hacking but this is rubbish’, he added.

Let me reassure anyone thinking about coming to DH2015 in Sydney next year that shark attacks are extremely rare.

What interested me about these articles was not the risk of gruesome death, but the relationship between past and present. The question of whether shark attacks were possible could be answered — simply by searching Trove.


For those who don’t know, Trove is a discovery service developed and maintained by the National Library of Australia. Like Europeana, the Digital Public Library of America, and DigitalNZ, it aggregates resources from the cultural heritage sector, and beyond.

It also provides access to more than 130 million newspaper articles from 1803 onwards. The articles are drawn from over 600 different titles — large and small, rural and metropolitan — with more are being added all the time.

Search for just about anything and you’re likely to find a match of some sort amongst the digitised newspapers. So of course I searched for Tsukuba

life on the outside.015

Trove is also a community. Users correct the OCR’d text of newspaper articles. They also add thousands of tags and comments to resources across Trove.

  • 138,000 users
  • 3,000,000 tags
  • 139,000,000 corrections
  • 58,000 lists

Perhaps my favourite example of user-generated content on Trove are the Lists. Lists are pretty much what they sound like — collections of resources. They make it easy for you to save and share your research. But more than tags or comments they expose people’s interests and passions. They give some insight into the many acts of meaning-making that occur in and around Trove.

Lists are also exposed through Trove’s Application Programming Interface (API) in a form fit for machine consumption. So with just a dash of code I can harvest the titles of all public lists and do some very basic word frequency analysis courtesy of Voyant Tools.

life on the outside.017

There’s nothing too surprising here — we know that family historians are our largest user group. But we can also see the long tail in action — the way that huge collections like Trove can support very focused, specific interests.

Which leads me back to shark attacks.

Old Speak

The Port Hacking article made me wonder how many other web pages there might be out on the wider web that cited Trove newspapers in a discussion of shark attacks. The answer was many. But what was most interesting wasn’t the volume of references, it was the variety of contexts — in blog posts, on Facebook, in fishing forums.

‘Ahh, old time newspapers are fascinating things aren’t they?’, notes one post in a weather forum, citing details of a shark attack in Sydney from 1952.

On a fishing site, a thread on bull shark attacks in Western Australia’s Swan River begins: ‘I found a great website to view really old newspapers in perth. Just found a few swan river shark storys [sic]…’.

The author follows up with a direct link to the Trove search page, prompting the exchange:

Redfin 4 Life: ‘Haha you would never know there had been that many incedents in the swan without seeing these…’

Goodz: ‘Oh how newspapers have changed the way the write… love the old speak!’

Alan James: ‘That’s right Goodz, and more often than not I’m sure they actually reported the truth.’

So a discussion of shark attacks turns to a consideration of the changing style of newspaper reporting.

Perhaps even more interesting is the way that digitised newspapers are used to test a hypothesis, challenge an interpretation, or argue a case. As in the Port Hacking case, questions about the history of shark attacks can be explored without needing to turn to experts, history books, or official statistics.

So when a local politician is quoted as saying ‘there have not been any serious or fatal shark attacks at Coogee Beach since records commenced in the 1800s’, a reader can respond with two Trove newspaper citations and the comment: ‘No previous shark attacks? Or are they only searching for fatalities?’

When a media outlet asks its Facebook followers whether the export of live sheep from Western Australia might be increasing the number of shark attacks off the coast, one follower can simply share a Trove link to a newspaper article from 1950 and ask ‘Did they have live sheep export in 1950?’

I don’t want to argue that these interactions are particularly profound or remarkable. In fact I’d suggest that they’re interesting because they’re not remarkable. 130 million digitised newspaper articles chronicling 150 years of Australian history are just another resource woven into the fabric of online experience. The past can be mobilised, shared and embedded in our daily interactions as easily as pictures of cats.


And it’s not just shark attacks. To explore the variety of contexts in which Trove newspaper articles are used and shared, I started mining backlinks.

Backlinks, as the name suggests, are just links out there on the wild, wild web that point back to your site. You can find them in your referrer logs, in Google’s webmaster tools, or simply by searching. I started with a ‘try before you buy’ sample of backlinks from an SEO service.

From there I wrote a script to harvest the linking pages, remove duplicates, extract the newspaper references, retrieve the article details from the Trove API, and save everything to a database for easy exploration. You can play with the results online.

life on the outside.025

I ended up harvesting 3116 pages from 1780 domains containing 13,389 links to 11,242 articles in Trove. Remember that’s just a sample of all the links to Trove newspapers out there on the web.

What was more surprising than the raw numbers was the diversity of content across those pages. I knew that family and local historians were busily blogging about their Trove discoveries, but I didn’t know that Trove newspapers were being cited in discussions about politics, science, war, sport, music — just about any topic you could imagine.

Nor are these discussions just about Australia. A little quick and dirty analysis suggests that more than 30 languages are represented across those 3000 pages.

life on the outside.027

This is a work in progress. I hope to expand my hunt for traces — crawling sites for additional references, mining referrals, and inviting the public to nominate pages for inclusion. By adding a simple API I could make it possible for Trove to include links back to relevant pages, like trackbacks on a blog. I also want to understand more about the scope of the content and the motivations of its authors. What is going on here?

Undoubtedly some of these pages constitute link spam or attempts to game search engines, but most do not. Browsing the database you find many examples of interpretation, persistence, and passion. People around the world have something they want to say, something they want to share, and Trove’s millions of newspaper articles provide them with a readily-accessible source of inspiration and evidence.

It’s clear that those many small acts of meaning-making we can observe in Trove’s activity statistics extend beyond a single site — to a much much wider (and wilder) world.


One day earlier this year, Trove received more than three times its usual number of visitors.

life on the outside.029

The culprit was the WTF subreddit — a popular place for sharing the weirdities of the web. Someone posted a link to a Trove newspaper article describing the unfortunate demise of a poodle called Cachi, whose fall from a thirteenth-story balcony in Buenos Aires resulted in the deaths of three passers-by.

As well as causing a dramatic spike in Trove’s visitor stats, the post received more than 3000 votes and attracted 677 comments on reddit. Cachi was a hit.

Trove articles pop up regularly on reddit. The traffic spikes they bring are reminders that however proud we might be of our stats, we are but a tiny corner of the web. There’s something much bigger out there.

Michael Peter Edson has long sought to alert cultural heritage organisations to the challenges of scale. In a recent essay he described the web’s ‘dark matter':

There’s just an enormous, humongous, gigantic audience out there connected to the Internet that is starving for authenticity, ideas, and meaning. We’re so accustomed to the scale of attention that we get from visitation to bricks-and-mortar buildings that it’s difficult to understand how big the Internet is—and how much attention, curiosity, and creativity a couple of billion people can have.

Libraries, archives and museums, he argues, need to meet the public where they are, to recognise that vigorous sites of meaning-making are scattered across the vast terrain of the web. Trove newspaper traces and reddit spikes are mere glimpses of the ‘dark matter’ of cultural activity that lurks beneath the apps, the stats, and the corporate hype.

People are already using our digital stuff in ways we don’t expect. The question is whether libraries, archives and museums see this hunger for connection as an invitation or a threat. Do we join the party, or call the police to complain about the noise?


There’s something fundamentally human about sharing. Yes, it’s easy to mock the shallowness of a Facebook ‘Like'; to see our obsession with followers, friends and retweets as evidence of our dwindling capacity for attention — reducing engagement and understanding to a single click. But haven’t we always shared — through stories, gossip, jokes, performances, and rituals? Rather than being measured against a threshold of meaning, surely each act of sharing exists on a continuum from the flippant to the philosophical. Just because the act of sharing has been commodified by large social media services seeking to mine our preferences for profit, doesn’t mean it lacks deeper human significance.

A retweet can represent a fleeting interest, a brief moment of distraction. But it can also mark the start of a journey.

Cultural heritage institutions around the world have begun to recognise that sharing is not just a marketing strategy, it’s a mission. As Merete Sanderhoff notes in her foreword to the anthology Sharing is Caring:

When cultural heritage is digital, open and shareable, it becomes common property, something that is right at hand every day. It becomes a part of us.

Aggregation services, like Trove, the Digital Public Library of America, Europeana, and DigitalNZ, bring resources together to share them more easily with the world. Aggregation is only worthwhile if it serves discovery and reuse — it’s a process of mobilisation, rather than collection. As Europeana argues in their 2020 strategy:

We believe culture is a catalyst for social and economic change. But that’s only possible if it’s readily usable and easily accessible for people to build with, build on and share.

Of course the hard part is understanding what makes something ‘readily usable and easily accessible’. What balance do we need between push and pull? Between ease-of-use and technical power? Between licensing and liberty? Between context and creativity?

Busy Bots

The Mechanical Curator was born in the British Library Labs as part of their innovative digital scholarship program. In September 2013, she started posting to Tumblr random images automatically extracted from a collection of 65,000 digitised 19th century books.

It was, Ben O’Steen explained, an experiment in ‘providing undirected engagement with the British Library’s digital content’. The book illustrations moved from inside to outside, opening opportunities for discovery beyond the covers.

But that was just the beginning. A few months later the Mechanical Curator dramatically expanded its labours, uploading more than a million public domain images to Flickr.

What followed was something of a cultural feeding frenzy as people from all over the world starting sharing, tagging, collecting, and creating with this rich assortment of 19th century illustrations. Since then the images have been mashed up into new works, added and organised in the Wikimedia Commons, and featured in an installation at the Burning Man festival in Nevada.

life on the outside.038

Having been locked away within books for more than a hundred years, the illustrations were given new life online as works in their own right. Opportunities for innovation and expression were created by a rupture in context.

Meanwhile on Twitter, a growing army of bots was liberating items from cultural collections around the world. Inspired by the bot-making genius of Mark Sample, I created @TroveNewsBot in June 2013 to tweet newspaper articles from Trove.

He was joined by @DPLABot, @EuropeanaBot, @Kasparbot, @CurtinLibBot, @DigitalNZ.bot, @museumbot, @cooperhewittbot, @bklynmuseumbot, and no doubt others — all sharing random collection items. Of course @MechCuratorBot soon joined the fray from the British Library, and I eventually added @Trovebot to tweet material from all the non-newspapery sections of Trove.

The possibilities of serendipitous discovery are receiving increasing attention within the digital humanities. At DH2014, Kim Martin and Anabel Quan-Haase critically examined four DH tools — including @TroveNewsBot — in the light of existing models of serendipity. Their discussion noted that randomness is not the same as serendipity, and outlined how serendipity could be understood as type of encounter with information. I do wonder though if what makes the bots interesting is not randomness as such, but the way randomness can play around with our assumptions about context.

Steve Lubar observes that the random offerings of collection bots can also expose the choices that are made in the creation and display of cultural collections. Randomness can challenge our expectations. Describing the genesis of the Mechanical Curator, James Baker notes:

And so as what at first seemed simple descends into complexity the Mechanical Curator achieves her peculiar aim: giving knowledge with one hand, carpet bombing the foundations of that knowledge with the other.

The Trove bots I created do more than tweet random offerings, they also allow you to interact with Trove without ever leaving Twitter. Send a few keywords their way and they’ll do your searching for you, tweeting back the most relevant result. You can modify their default behaviour by adding a series of hashtags — #luckydip, for example, will spice your result with a touch of randomness.

More interestingly, perhaps, you can tweet a url at them and they’ll extract keywords from the web page and use them to construct the search. This means that @TroveNewsBot can offer commentary on current events.

Several times a day he retrieves the latest headlines from a news site and searches for something similar amidst Trove’s 130 million historical newspaper articles. What emerges is a strange conversation between past and present.

life on the outside.041

These bots do not simply present collection items outside of the familiar context of discovery interfaces or online exhibitions, they move the encounter itself into a wholly new space. Just as the Mechanical Curator liberates illustrations from the printed page, the Twitter bots loosen the institutional context of collections to allow them to participate in a space where people already congregate. They send collection items out into the wilds of the web, to find new meanings, new connections and perhaps even new love.

Broken & Repaired

But letting go can be scary. A 2008 survey of libraries, archives and museums revealed that one of the main factors inhibiting the opening up of online collections was the desire to avoid misrepresentation, mislabeling or misuse of cultural objects. Easy sharing brings the risk that our carefully curated content will be shorn of context and bounced around the web — adrift and abused.

Earlier this year Sarah Werner took aim at Twitter feeds that pump out streams of ‘historical’ photos — unattributed and often wrongly captioned. But it wasn’t simply the lack of attribution that angered her:

These accounts capitalize on a notion that history is nothing more than superficial glimpses of some vaguely defined time before ours, one that exists for us to look at and exclaim over and move on from without worrying about what it means and whether it happened.

I have to admit that the excitement of seeing Trove’s visitor numbers suddenly soar thanks to reddit is frequently tempered by the realisation that what is being shared is yet another story of gruesome death, violence, or misfortune. 150 years of Australian history is reduced to clickbait by our tabloid sensibilities. Most of those who arrive from reddit read the article and click away — the bounce rate is around 97%. This is not ‘engagement’?

And yet, I can’t help but wonder about the 3% who don’t immediately leave, who pause and look around. Three percent of a lot is still a lot — a lot of people who might have been exposed to Trove and Australian history for the very first time. Similarly while the viral pics industry is frustrating and exploitative, it might yet offer opportunities to learn.

One of my favourite Twitter accounts is @PicsPedant. It monitors many of the viral pics feeds, researches the images, and tweets the results — providing a steady stream of attributions, corrections, critiques, and context. Not only do you find out about the images, you pick up research tips, and learn about the cannibalistic tendencies of the pic bots themselves — constantly recycling content from their kin.

@AhistoricalPics offers a different form of education, satirising the whole viral pics genre with its fabricated captions, and pricking at our own inclination to believe.

life on the outside.045

Freeing collections opens them to misuse, but it also exposes that misuse to analysis and critique. Contexts can be rediscovered as well as lost, restored as well as broken.

Generous signposts

It’s wonderful to see many Trove newspaper articles shared on Twitter. Unfortunately a significant proportion of these come from climate change deniers, who mine the newspapers for freak weather events and past climatic theories, imagining that such reports undermine current research. This is bad science and bad history. Their efforts are also well-represented in my database of web page citations, along with expressions of hatred and prejudice that I’d prefer to stay submerged. It’s depressing, but it seems inevitable that people will do bad things with your stuff.

In a recent post about the DPLA’s metadata licensing arrangements, Dan Cohen suggested we should look beyond technical and legal controls around online use towards social and ethical guidelines:

The cynics, of course, will say that bad actors will do bad things with all that open data. But here’s the thing about the open web: bad actors will do bad things, regardless… The flip side of worries about bad actors is that we underestimate the number of good actors doing the right thing.

Bad people will do bad things, but by asserting a social and ethical framework for the use of digital cultural collections we strengthen the resolve and commitment of those who want to do right.

Already there are examples in the work of the Local Contexts project which is developing a series of licenses and labels to guide use of traditional knowledge and cultural materials. Similarly, Creative Commons Aotearoa New Zealand have been developing an Indigenous Knowledge Notice to educate the public about what constitutes appropriate use.

We should remember too that footnotes have always been at the heart of an ethical pact. The Australian historian Tom Griffiths has described footnotes as ‘honest expressions of vulnerability’ — ‘generous signposts to anyone who wants to retrace the path and test the insights’. This ‘professional paraphernalia’ has, he argues, grown out of a series of ethical questions:

To whom are we responsible – to the people in our stories, to our sources, to our informants, to our readers and audiences, to the integrity of the past itself? How do we pay our respects, allow for dissent, accommodate complexity, distinguish between our voice and those of our characters?1

Such questions remain crucial as we consider the relationship between cultural collections and their online users. If we expect people to erect ‘generous signposts’ we have to make our stuff easy to find and share. If we want them to consider their responsibility to the past we should focus on providing trust, confidence, and support, not permission.


If my wall of faces seems seems familiar, it might be because a few years ago I created something similar called The Real Face of White Australia.

The two walls use different sets of records, but they were constructed in much the same way: I reverse-engineered the National Archives’ online database, downloaded images of digitised files, and used a facial detection script to identify and extract faces.

The Real Face of White Australia was an experiment, built over the course of a weekend. But its discomfiting power was immediately evident. Where there had been records, there were people — looking at us, challenging us.

My partner Kate Bagnall is a historian of Chinese-Australia and we were working together on a project called Invisible Australians, aimed at liberating the lives of these people from the bureaucracy of the White Australia Policy.

The project was motivated by a strong sense of responsibility — not to the National Archives, not to the records, but to the people themselves.

We often talk about preserving context as if it’s an end in itself; as if context is just a set of attributes to be catalogued and controlled. The exciting, terrifying, wonderful thing about the wild, wild web is how it upsets our notions of relevance and meaning. Historic newspapers can find their way into contemporary debates. Century-old illustrations can be remade as art. Twitter bots can inspire conversations with collections. The people buried inside a recordkeeping system can be brought at last to the surface. Contexts are unstable, shifting. And through that instability we can glimpse other worlds, we can imagine alternatives, we can build something new.

What’s important is not training users to understand the context of our collections, but helping them explore and understand their responsibilities to the pasts those collections represent.

Let’s remove technical barriers, minimise legal restrictions, and trust in the good will of our audiences. Instead of building shrines to our descriptive methodologies, let’s create systems that provide stable shareable anchors, that connect, but don’t constrain.

Contexts will flow and mingle, some will fade and some will burn. Contexts will survive not because we demand it in our terms of service, or embed them in our interfaces, but because they capture something that matters.

The ways we find and use cultural collections will continue to change, but questions about responsibility, value, and meaning will remain.


  1. Tom Griffiths, ‘History and the creative imagination’, History Australia, Vol. 6, No. 3, 2009. []

On seams and edges

On seams and edges

Recently I submitted the abstract below for ALIA Information Online 2015. I haven’t heard yet whether it’s been accepted, but I thought I’d post it here anyway because, even if I don’t get to talk about it at the conference, I want to think about the topic some more. If nothing else, this is an extended NTS…

Many thanks to @edsu and @nowviskie for pointing me towards ideas of ‘repair’ and ‘broken world thinking’, which I reckon will help me develop the arguments I was gesturing towards earlier this year in a talk on The Future of Trove. In that talk I drew on some of my old research on the nature of progress to describe a future for Trove that avoided visions of technological power and sophistication:

The future of Trove shouldn’t be envisaged in terms of slick interfaces and fast search (though I’d like some more of that).

The future of Trove will be messy, it will be complicated, and it will be complicated, because life is just like that, and while Trove is built of metadata, it’s powered by the people that contribute, use, share and annotate that metadata.

Life can also be disappointing, painful and disturbing, and all of that too must figure in the future of Trove.

It’s important to try and see Trove as a series of accommodations, agreements, and annotations, rather than as a big aggregation machine. There’s a fragility in the connections that we make that needs to be understood. There’s no inevitability here, but many acts of goodwill, generosity, and repair.

More to come on this, I hope… (I’m also collecting some relevant bits and pieces in Zotero.)

On seams and edges — dreams of aggregation, access & discovery in a broken world

Visions of technological utopia often portray an increasingly ‘seamless’ world, where technology integrates experience across space and time. Edges are blurred as we move easily between devices and contexts, between the digital and the physical.

But Mark Weiser, one of the pioneers of ubiquitous computing, questioned the idea of seamlessness, arguing instead for ‘beautiful seams’ — exposed edges that encouraged questions and the exploration of connections and meanings.

With discovery services and software vendors still promoting ‘seamless discovery’ as one of their major selling points, it seems the value of seams and edges requires further discussion. As we imagine the future of a service such as Trove, how do we balance the benefits of consistency, coordination and centralisation against the reality of a fragmented, unequal, and fundamentally broken world.

This paper will examine the rhetoric of ‘seamlessness’ in the world of discovery services, focusing in particular on the possibilities and problems facing Trove. By analysing both the literature around discovery, and the data about user behaviours currently available through Trove, I intend to expose the edges of meaning-making and explore the role of technology in both inhibiting and enriching experience.

How does our dream of comprehensiveness mask the biases in our collections? How do new tools for visualisation reinforce the invisibility of the missing and excluded? How do the assumptions of ‘access’ direct attention away from practical barriers to participation?

How does the very idea of systems and services, of complex and powerful ‘machines’ ready to do our bidding, discourage us from seeing the many, fragile acts of collaboration, connection, interpretation, and repair that hold these systems together?

Trove is an aggregator and a community; a collection of metadata and a platform for engagement. But as we imagine its future, how do avoid the rhetoric of technological power, and expose its seams and edges to scrutiny.


Eyes on the past

Eyes on the past

Faces offer an instant connection to history, reminding us that the past is full of people. People like us, but different. People with their own lives and stories. People we might only know through a picture, a few documentary fragments, or a newspaper article.

Eyes on the Past is an experimental interface, built in a weekend. I’m exploring whether faces can provide a way to explore more than 120 million newspaper articles available on Trove.

This collection of tweets tells the story of its development.


There’s some details about the software used in the site’s about page. You can view the harvest/detection and the website code on GitHub.