• On seams and edges

    Recently I submitted the abstract below for ALIA Information Online 2015. I haven’t heard yet whether it’s been accepted, but I thought I’d post it here anyway because, even if I don’t get to talk about it at the conference, I want to think about the topic some more. If nothing else, this is an(…)

  • Eyes on the past

    Faces offer an instant connection to history, reminding us that the past is full of people. People like us, but different. People with their own lives and stories. People we might only know through a picture, a few documentary fragments, or a newspaper article. Eyes on the Past is an experimental interface, built in a weekend.(…)

  • What’s in a word

    Meanwhile over on the Trove blog I posted about ‘In a word’ – my quick experiment with all the lovely Radio National data now available in Trove.

  • Easter eggsperiments

    No, nothing to do with Easter or eggs, but it’s Easter Sunday and who can resist a good opportunity for a bad pun? This is another catch-up post, pulling together some recent experiments. If nothing else, it’ll help me keep track of things I’m otherwise likely to forget. WWI Faces In our last instalment I(…)

  • Enriching WWI data with the Trove API

    I can’t resist a challenge, particularly when it involves lots of new historical data and an excuse to muck around with the Trove API. So when Katie Hannan from the State Library of South Australia asked me about putting the API to work to enrich one of their World War I datasets, I had to(…)

  • An addition to the family

    What’s the collective noun for a group of Twitter bots? Inspiration is failing me at the moment, so let’s just say that the Trove bot family recently welcomed a new member — @TroveBot. @TroveBot is a sibling of @TroveNewsBot, who’s been tweeting away since June last year. But while @TroveNewsBot draws his inspiration from 120+ million(…)

  • Have collection, will travel

    A few years ago it seemed fashionable for cultural institutions to create a ‘My [insert institution name]‘ space on their website where visitors could create their own online exhibits from collection materials. It bothered me at the time because it seemed to be a case of creating silos within silos. What could people do with(…)

On seams and edges

On seams and edges

Recently I submitted the abstract below for ALIA Information Online 2015. I haven’t heard yet whether it’s been accepted, but I thought I’d post it here anyway because, even if I don’t get to talk about it at the conference, I want to think about the topic some more. If nothing else, this is an extended NTS…

Many thanks to @edsu and @nowviskie for pointing me towards ideas of ‘repair’ and ‘broken world thinking’, which I reckon will help me develop the arguments I was gesturing towards earlier this year in a talk on The Future of Trove. In that talk I drew on some of my old research on the nature of progress to describe a future for Trove that avoided visions of technological power and sophistication:

The future of Trove shouldn’t be envisaged in terms of slick interfaces and fast search (though I’d like some more of that).

The future of Trove will be messy, it will be complicated, and it will be complicated, because life is just like that, and while Trove is built of metadata, it’s powered by the people that contribute, use, share and annotate that metadata.

Life can also be disappointing, painful and disturbing, and all of that too must figure in the future of Trove.

It’s important to try and see Trove as a series of accommodations, agreements, and annotations, rather than as a big aggregation machine. There’s a fragility in the connections that we make that needs to be understood. There’s no inevitability here, but many acts of goodwill, generosity, and repair.

More to come on this, I hope… (I’m also collecting some relevant bits and pieces in Zotero.)

On seams and edges — dreams of aggregation, access & discovery in a broken world

Visions of technological utopia often portray an increasingly ‘seamless’ world, where technology integrates experience across space and time. Edges are blurred as we move easily between devices and contexts, between the digital and the physical.

But Mark Weiser, one of the pioneers of ubiquitous computing, questioned the idea of seamlessness, arguing instead for ‘beautiful seams’ — exposed edges that encouraged questions and the exploration of connections and meanings.

With discovery services and software vendors still promoting ‘seamless discovery’ as one of their major selling points, it seems the value of seams and edges requires further discussion. As we imagine the future of a service such as Trove, how do we balance the benefits of consistency, coordination and centralisation against the reality of a fragmented, unequal, and fundamentally broken world.

This paper will examine the rhetoric of ‘seamlessness’ in the world of discovery services, focusing in particular on the possibilities and problems facing Trove. By analysing both the literature around discovery, and the data about user behaviours currently available through Trove, I intend to expose the edges of meaning-making and explore the role of technology in both inhibiting and enriching experience.

How does our dream of comprehensiveness mask the biases in our collections? How do new tools for visualisation reinforce the invisibility of the missing and excluded? How do the assumptions of ‘access’ direct attention away from practical barriers to participation?

How does the very idea of systems and services, of complex and powerful ‘machines’ ready to do our bidding, discourage us from seeing the many, fragile acts of collaboration, connection, interpretation, and repair that hold these systems together?

Trove is an aggregator and a community; a collection of metadata and a platform for engagement. But as we imagine its future, how do avoid the rhetoric of technological power, and expose its seams and edges to scrutiny.

 

Eyes on the past

Eyes on the past

Faces offer an instant connection to history, reminding us that the past is full of people. People like us, but different. People with their own lives and stories. People we might only know through a picture, a few documentary fragments, or a newspaper article.

Eyes on the Past is an experimental interface, built in a weekend. I’m exploring whether faces can provide a way to explore more than 120 million newspaper articles available on Trove.

This collection of tweets tells the story of its development.

 

There’s some details about the software used in the site’s about page. You can view the harvest/detection and the website code on GitHub.

Easter eggsperiments

No, nothing to do with Easter or eggs, but it’s Easter Sunday and who can resist a good opportunity for a bad pun?

This is another catch-up post, pulling together some recent experiments. If nothing else, it’ll help me keep track of things I’m otherwise likely to forget.

WWI Faces

In our last instalment I was playing around with some WWI data from the State Library of South Australia. I’m really pleased to report that SLSA staff have used my experiments to help them add Trove links to more than 6,000 of their Heroes of the Great War records. Here’s an example – note the ‘article’ link which goes straight to a digitised newspaper article in Trove. With some good data and bit of API wrangling we’ve now established rich linkages between an important WWI resource and Trove. Win!

I’ve also continued my fiddling with articles from the Adelaide Chronicle as I start to think about how Trove’s newspapers might be used in a WWI exhibition being developed by the National Library. At the end of my last post I’d created a list of articles from the Chronicle that were likely to include biographical details of WWI personnel. I knew that many of these included portrait photos, so I filtered them on Trove’s built-in ‘illustrated’ facet and saved the page images for the remaining articles. You can browse the resulting collection of pages on Dropbox. As you can see there are indeed many portraits of service people.

So the next step was to try and extract the portraits from the pages. This was rather familiar territory, as I’d already used a facial detection script to create The Real Face of White Australia. But I wasn’t sure how the pattern recognition software would cope with the lower quality newspaper images. After getting all the necessary libraries installed (the hardest bit of the whole process), I pointed the script at the page images and… it worked!

A small sample of the faces extracted from the Chronicle.

A small sample of the faces extracted from the Chronicle.

From 141 pages I extracted 1,738 images, and most of them were faces. You can browse all 1,738, but be warned, I’ve just dumped them onto a single page and added a bit of Isotope magic — so they’ll take a fair while to load and your browser might object. You’ll also notice that I haven’t tried to filter out photos of non-service people, I just wanted to see if it worked. And it does. Even in this rough form you can sense some of the emotive power. What’s really amazing is the way that even small images of faces in group photographs were identified. All I was aiming for at this stage was a proof of concept — yes, I can extract photos of WWI service people from newspapers. Hmmm…

Trove in space

All the faces above were from one South Australian newspaper. Several years ago I worked on a project to map places of birth and enlistment of WWI service people, and while I have no interest in the national mythologies surrounding WWI, I do still wonder about the local impact of war — all those small communities sending off their sons and daughters…

So I’m wondering whether we might be able to use the digitised newspapers in Trove to navigate from place to face. To choose a town anywhere in Australia, and present photographs of service personnel published in nearby newspapers.

I now know I can extract the photos, but how can we navigate Trove newspapers by location? Time for a new experiment…

The Trove API provides a complete list of digitised newspaper titles. You’ll notice that some of the titles include a place name as part of the summary information in brackets, while many others will include place names in their titles, for example:

  • Illawarra Daily Mercury (Wollongong, NSW : 1950 – 1954)
  • Hawkesbury Herald (Windsor, NSW : 1902 – 1945)
  • Kiama Examiner (NSW : 1858 – 1859)
  • Narromine News and Trangie Advocate (NSW : 1898 – 1955)

I haven’t had much luck getting automated named entity extraction tools to work on short text strings like this, so I decided to roll my own using Geoscience Australia’s Gazetteer of Australia 2012. I opened up the GML file containing all Australian places and saved the populated locations to my own Mongo database. This gave me a handy place name database, complete with geo-locations.

Next I went to work on the newspaper titles. Extracting the places from the summary information was easy because they followed a regular pattern, but finding them in the body of the title was trickier. First I had to exclude those words that were obviously not place names. Aside from the usual stopwords (‘and’ and ‘the’), there are many words that commonly occur in newspaper titles — ‘Herald’, ‘Star’, ‘Chronicle’ etc. To find these words I pulled apart all the titles and calculated the frequency of every word. You can explore the raw results – ‘Advertiser’ (116) wins by a large margin, with ‘Times’ (67) in second place. From these results I could create a list of words that I knew were not places and could safely be ignored.

Then it was just a matter of tokenising the titles (breaking them up into individual words), removing all the stopwords (the standard list and my special list), and then looking up the words in my place name database. I did this in two passes, first as bigrams (pairs of words), and then as single words — this allowed me to find compound place names like ‘North Melbourne’. The Trove API gives you s ‘state’ value for each title, so I could use this in the query to increase accuracy.

If I found a place name, I added the place details, including the latitude and longitude, to the title record from the API and included it in my own newspaper title database.

So I ended up to two databases — one with geolocated places, and another with geolocated newspapers. That meant I could build a simple interface to find newspaper titles by place. It’s nothing fancy — just another proof of concept — but it works pretty well. Just type in a place name and select a state and a query is run against the place name database. If the place is found then the latitude and longitude is fed to the titles database to find the closest newspapers. After removing some duplicates, the 10 nearest newspapers are displayed.

Find Trove newspapers by place

Find Trove newspapers by place

Building some sort of map interface on top of this is pretty trivial. What’s more important is to do some analysis of my place matching to see what I might have missed. But so far so good!

Trove is…

Trove is more than newspapers. This is a message the Trove team tries to emphasise at every opportunity. The digitised newspapers are an incredible resource of course, but there’s so much other interesting stuff to explore.

To try and give a quick and easy introduction to this richness, I created a simple dashboard-type view of Trove, imaginatively titled Trove is…

What is Trove?

What is Trove?

Trove is… gives a basic status report on each of the 1o Trove zones, with statistics updated daily (except for the archived websites as there’s no API access at the moment). The BIG NUMBERS are counter-balanced by a single randomly-selected example from each zone. It’s a summary, an overview, a portal and a snapshot. Reload the page and the zones will be reordered and the examples will change.

It’s pretty simple, but I think it works quite well, and thanks to Twitter Bootstrap it looks really nice on my phone! But while the idea was simple, the implementation was pretty tricky — particularly the balance between randomness and performance. If all the examples were truly random, drawn from the complete holdings to Trove on every page reload, you’d spend a lot of time watching spinning arrows waiting for content to appear. I tried a number of different approaches and finally settled on a system where random selections of 100 resources per zone are made every hour by background processes and cached. When you load the page, this cache is queried and an item selected. So if you keep hitting reload you’ll probably notice that some examples reappear. It’s random, but at any moment the pool of possibilities is quite limited. Come back later in the day and everything will be different.

Anyway, if anyone asks you what Trove is, you now know where to point them…

Who listens to the radio?

After a lot of hard work, the Trove team was excited to announce recently that more than 200,000 records from 54 ABC Radio National programs were available through Trove.

To make it a bit easier to explore this wonderful new content, I created a simple search interface. All it really does is help you build a query using the RN program titles, and then sends the query off to Trove. Not fancy, but useful (my family motto).

Of course, I couldn’t leave my Twitter bot family out of the action. @TroveBot has been Radio National enabled. Just tweet the tag #abcrn at him to receive a randomly-selected Radio National story. To search for something amidst the RN records, just tweet a keyword or two and add the #abcrn tag to limit the results. Consult the TroveBot manual for complete operating instructions.

In a word…

But the Radio National content is not just findable through the Trove web interface — all that lovely data is freely accessible through the Trove API. That includes just about every segment of every edition of the ABC’s flagship current affairs programs, AM, PM, and The World Today from 1999 onwards. What sort of questions could you ask of this data?

I’ll be writing something soon on the Trove blog about accessing these riches, but I couldn’t resist having a play. So I harvested all the RN data via the API and built a new thing…

What's in a word?

What’s in a word?

It’s called In a word: Currents in Australian affairs, 2003–2013, and for once it’s quite well documented, so I won’t go into details here. I’ll just say that it’s one of my favourite creations, and I hope you find it interesting.

Addendum (21 April) — The Tung Wah Newspaper Index

See, I told you I forget things…

I recently finished resurrecting the Tung Wah Newspaper Index. Kate has described the original project on her blog, and there’s a fair bit of contextual information on the site, so I won’t go into details here. Suffice it to say it’s an important resource for Chinese Australian history that had succumbed to technological decay.

The original FileMaker database has been MySqld, Solrised, and Bootstrapped to get it all working nicely. I also took the opportunity to introduce a bit of LOD love, with plenty of machine-readable data built-in.

The whole site follows an interface as API type pattern. So if you want a resource as JSON-LD, you just change the file extension to .json. To help you out, there are links at the bottom of each page to the various serialisations, and of course you can also use content negotiation to get what you’re after. There’s some examples of all this in the GitHub repository, as well as a CSV dump of the whole database.

 

Enriching WWI data with the Trove API

I can’t resist a challenge, particularly when it involves lots of new historical data and an excuse to muck around with the Trove API. So when Katie Hannan from the State Library of South Australia asked me about putting the API to work to enrich one of their World War I datasets, I had to dive in and have a play.

The dataset consists of references to South Australian WWI service personnel published in the Adelaide Chronicle between 1914 and 1919. In a massive effort starting back in 2000, SLSA staff manually extracted more than 13,000 references and grouped them under 9709 headings, mostly names. You can explore the data in the SLSA catalogue as part of the Heroes of the Great War collection.

It’s great data, but it would be even better if there was a direct link to each article in Trove — hence Katie’s interest in the possibilities of the API!

Chronicle (Adelaide, SA : 1895 - 1954) 14 Sep 1918, p. 24, http://nla.gov.au/nla.news-page8611372

Chronicle (Adelaide, SA : 1895 – 1954) 14 Sep 1918, p. 24,
http://nla.gov.au/nla.news-page8611372

Katie sent me a spreadsheet containing the data. Each row corresponds to an individual entry and includes an identifier, a name, a year, and a list of references separated by semicolons. My plan was simple, for each row I’d construct a search based on the name, then loop through the search results to try find an article that matched the date and page number of each reference. This might seem a bit cumbersome, but currently there’s no way of searching Trove for newspaper articles published on a particular day.

You’ll find all of the code on GitHub. I’ve tried to include plenty of comments to make it easy to follow along.

Let’s look at the entry for Lieutenant Frank Rosevear. It includes the following references: ‘Chronicle, 7 September 1918, p. 27, col. c;Chronicle, 14 September 1918, p. 24, col. a p.38, col. d’. If you look closely, you’ll see that there’s two page numbers for 14 September 1918, so there’s actually three references included in this string. The first thing I had to do was to pull out all the references and format them in a standard way.

Assuming that the last name was the surname, I then constructed a query that searched for an exact match of the surname together with at least one of the other names. In Lieutenant Rosevear’s case the query would’ve been ‘fulltext:”Rosevear” AND (“Lieutenant” OR “Frank”)’. Note the use of the ‘fulltext’ modifier to indicate an exact match. To this query I added a date filter to limit the search to the specified year and an ‘l-title’ value to search only the Adelaide Chronicle.

You can see the results for this query in my Trove API console. Try modifying the query string to see what difference it makes.

Once the results came back from the API I compared them to the references, looking for matches on both the date and page number. You might notice that the second result from the API query, dated 7 September 1918, is a match for one of our references. Yay! This gets saved to a list of strong matches. But what about the other references?

Just in case there’s been a transcription error, or the page numbering differed across editions,  I relax the rules a bit in a second pass and accept matches on the date, but not the page. These are saved to a list of close matches.

This second pass doesn’t help much with  Lieutenant Rosevear’s missing references, so we have to broaden our search query a bit. This time we search on the surname only. Bingo! The first result is a match and points us to one of the ‘Heroes of the Great War’ series.

'HEROES OF THE GREAT WAR: THEY GAVE THEIR LIVES FOR KING AND COUNTRY.', Chronicle (Adelaide, SA : 1895 - 1954) 14 Sep 1918, p. 24, http://nla.gov.au/nla.news-article87553854

‘HEROES OF THE GREAT WAR: THEY GAVE THEIR LIVES FOR KING AND COUNTRY.’, Chronicle (Adelaide, SA : 1895 – 1954) 14 Sep 1918, p. 24,
http://nla.gov.au/nla.news-article87553854

It took a while, but my script eventually worked it’s way through all 9709 entries like this, writing the results out to csv files containing the strong and close matches. It also created a summary for each entry, listing the original number of references alongside the number of strong and close matches.

Ever since I read Trevor Munoz’s post on using Pandas with data from the NYPL’s What’s on the Menu? project, I’ve wanted to have a play with it. So I decided to use Pandas to produce some quick stats from my results file.

>>> import pandas as pd
>>> df = pd.read_csv('data/slsa_results.csv')
# How many entries?
>>> len(df)
9709
# How many references?
>>> df['references'].sum()
13504
# How many entries had strong matches?
>>>len(df[df.strong > 0])
6440
# As a percentage thank you...
>>> 100 * len(df[df.strong > 0]) / len(df)
66
# In how many entries did the number of refs = the number of strong matches
>>> len(df[df.references == df.strong])
4399
# As a percentage thank you...
>>> 100 * len(df[df.references == df.strong]) / len(df)
45
# How many entries had at least one strong or close match?
>>> len(df[df.total > 0])
8989
# As a percentage thank you...
>>> 100 * len(df[df.strong > 0]) / len(df)
92

Not bad. The number of strong matches equalled the number of references in 45% of cases, and overall 66% of entries had a least one strong match. I might be able to get those numbers up by tweaking the search query a bit, but of course the main limiting factor is the quality of the OCR. If the article text isn’t good enough we’re never going to find the names we’re after.

Katie tells me that the State Library intends to point volunteer text correctors towards identified articles. As the correctors start to clean things up, we should be able to find more matches simply by re-running this script at regular intervals.

But what articles should they point the volunteers to? Many of them included the title ‘Heroes of the Great War’, so they’re easy to find, but there were others as well. By analysing the matches we’ve already found we can pull out the most frequent titles and build a list of likely candidates. Something like this:

title_phrases = [
'heroes of the great war they gave their lives for king and country',
'australian soldiers died for their country',
'casualty list south australia killed in action',
'on active service',
'honoring soldiers',
'military honors australians honored',
'casualty lists south australian losses list killed in action',
'australian soldiers died for his country',
'died for their country',
'australian soldiers died for the country',
'australian soldiers died for their country photographs of soldiers',
'quality lists south australian losses list killed in action',
'list killed in action',
'answered the call enlistments',
'gallant south australians how they won their honors',
'casualty list south australia died of wounds',
]

Now we can feed these phrases into a series of API queries and automatically generate a list of articles that are likely to contain details of WWI service people. This list should provide a useful starting point for keen text correctors.

I might not have completely solved Katie’s problem, but I think I’ve shown that the Trove API can be usefully called into action for these sorts of projects. Taking this approach should certainly save a lot of manual searching, clicking, cutting and pasting. And while I’ve focused on the South Australian data, there’s no reason why similar approaches couldn’t be applied to other WWI projects.

An addition to the family

What’s the collective noun for a group of Twitter bots? Inspiration is failing me at the moment, so let’s just say that the Trove bot family recently welcomed a new member — @TroveBot.

Proof cover for Rogue Robot, Thrills Incorporated pulp series, 1951. By Belli Luigi.

Proof cover for Rogue Robot, Thrills Incorporated pulp series, 1951. By Belli Luigi.

@TroveBot is a sibling of @TroveNewsBot, who’s been tweeting away since June last year. But while @TroveNewsBot draws his inspiration from 120+ million historical newspaper articles, @TroveBot digs away in the millions of books, theses, pictures, articles, maps and archives that make up the rest of Trove. Trove, as we always like to remind people, is not just newspapers.

Like @TroveNewsBot, the newcomer tweets random finds at regular intervals during the day. But both bots also respond to queries. Just tweet a few keywords at them and they’ll have a poke around Trove and reply with something that seems relevant. There’s various ways of modifying this basic search, as explained on the GitHub pages of TroveNewsBot and TroveBot.

@TroveBot’s behaviour is a little more complex because of Trove’s zone structure. The zones bring together resources with a similar format. If you want to get a more detailed idea of what’s in them, you can play around with my Zone Explorer (an experiment for every occasion!). If you just tweet some keywords at @TroveBot he’ll search for them across all the zones, then choose one of the matching zones at random. If you want to limit your search to a particular zone or format, just add one of the format tags listed on the GitHub site.

Let’s say you want a book about robots, just tweet: ‘robots #book’. Or a photos of pelicans, try ‘pelican #photo’.  It’s really that easy.

But you don’t just have to use keywords, you can also feed @TroveBot a url. Perhaps you want to find a thesis in Trove that is related to a Wikipedia page — just tweet the url together with the tag ‘#thesis’. Yes, really.

Behind the scenes @TroveBot makes use of AlchemyAPI to extract keywords from the url you supply. These keywords are then bundled up and shipped off to Trove for a response.

You probably know that @TroveNewsBot is similarly url-enabled. This allows him to do things like respond to the tweets of his friends @DigitalNZBot and @DPLABot, and offer regular commentary on the ABC News headlines.

So what would happen if @TroveBot and @TroveNewsBot started exchanging links? Would they ever stop? Would they break the internet?

With the Australian Tennis Open coming to a climax over the last few days I thought it was a suitable time to set up a game of bot tennis. To avoid the possibility of internet implosion, I decided to act as intermediary, forwarding the urls from one to the other. The rules were simple — the point was lost if a bot failed to respond or repeated a link from earlier in the match.

As a sporting spectacle the inaugural game wasn’t exactly scintillating. In fact, the first game is still locked at 40-40. But it’s been interesting to see the connections that were made. It’s also been a useful opportunity for me to find bugs and tweak their behaviours. You can catch up with the full, gripping coverage via Storify.

After a few more experiments I’m thinking I might try and set up a permanent, endless conversation between them. It would be fascinating to see where they’d end up over time — the links they’d find, the leaps they’d make.

Hmm, collective nouns… What about a serendipity of bots?

Have collection, will travel

A few years ago it seemed fashionable for cultural institutions to create a ‘My [insert institution name]‘ space on their website where visitors could create their own online exhibits from collection materials. It bothered me at the time because it seemed to be a case of creating silos within silos. What could people do with their collections once they’d assembled them?

I was reminded of this recently as I undertook my Christmas-break mini project to think about Trove integration with Zotero. Some years ago I created a Zotero translator for the Trove newspaper zone, but I’d much rather we just exposed metadata within Trove pages that Zotero (and other tools like Hypothes.is and Mendeley) could pick up without any special code. More about that soon…

However. embedded metadata only addresses part of the problem — there’s also questions around tools and workflows. Trove includes a number of simple tools that enable users to annotate and organise resources — tags, comments, and lists. Tags and comments… well you know what they are. Lists are just collections of resources, and like tags, they can be public or private.

They may not be terribly exciting tools, but they are very heavily used. More than 2 million tags have been added by Trove users, but it’s lists that have shown the most growth in recent times. There are currently more than 47,000 lists and 30,000 of those are public. That’s a pretty impressive exercise in collection building. What are the lists about? A few months ago I harvested the titles of all public lists and threw them into Voyant for a quick word frequency check.

Word frequencies in the titles of Trove lists

Word frequencies in the titles of Trove lists

Given what we know about Trove users, it wasn’t surprising to see that many of the lists related to family history, but there are a few wonderful oddities buried in there as well. I love to be surprised by the passions of Trove users.

I suspect these tools are popular because they’re simple and open-ended. There are few constraints on what you can use for a tag or add to a list. Following some threads about game design recently I came upon a discussion of ‘underspecified’ tools — ‘the use of which you can never fully predict’. By underspecifying we leave open possibilities for innovation, experimentation and play. It seems like a pretty good design approach for the cultural heritage sector.

But wait a minute, you might be wondering, what sort of Trove Manager magic did I have to weave in order to extract those thousands of list titles? None, none at all. You could do exactly the same thing.

I’ve been talking a lot in recent months about Trove as a platform rather than a website — something to build on. One of our main construction tools is, of course, the Trove API. I suppose a good cultural heritage API is also underspecified — focused enough to be useful, but fuzzy enough to encourage a bit of screwing around. What you may not know about the Trove API is that as well as giving you access to around 300 million resources, it lets you access user comments, tags and lists.

I’m looking forward to researchers using the API to explore the various modes of meaning-making that occur around resources in Trove. But right now one thing it offers is portability — the collections people make can be moved. And that brings us back to Zotero.

Why should a Trove user have to decide up front whether they want to use Zotero or create a Trove list? Pursuing Europeana’s exciting vision of putting collections in our workflows we need to recognise that workflows change, projects grow, and new uses emerge. We should support and encourage this by making it as easy as possible for people to move their stuff around.

So of course I had to build something.

My Christmas project has resulted in some Python code that lets you export a Trove list or tag to a Zotero collection — API to API. Again it’s a simple idea, but I think it opens up some interesting possibilities for things like collaborative tagging projects — with a few lines of code hundreds of tagged items could be saved to Zotero for further organisation or annotation.

Along the way I ended up starting a more general Trove-Python library — it’s very incomplete, but it might be useful to someone. It’s all on GitHub — shared, as usual, not because I think the code is very good, but because I think it’s really important to share examples of what’s possible. Hopefully someone will find a slender spark of inspiration in my crappy code and build something better. Needless to say, this isn’t an official Trove product.

So what do you do if you want to export a list or tag?

First of all get yourself Python 2.7 and set up a virtualenv where you can play without messing anything up. Then install my library…

git clone https://github.com/Trove-Toolshed/trove-python.git
cd trove-python
python setup.py install

You’ll also need to install PyZotero. Once that’s done you can fire up Python and export a list from the command line like this…

from pyzotero import zotero
from trove_python.trove_core import trove
from trove_python.trove_zotero import export

zotero_api = zotero.Zotero('[Your Zotero user id]', 'user', '[Your Zotero API key]')
trove_api = trove.Trove('[Your Trove API key]')

export.export_list(list_id='[Your Trovelist id]', zotero_api=zotero_api, trove_api=trove_api)

Obviously you’ll also need to get yourself a Trove API key, and a Zotero key for your user account.

Exporting items with a particular tag is just as easy…

from pyzotero import zotero
from trove_python.trove_core import trove
from trove_python.trove_zotero import export

zotero_api = zotero.Zotero('[Your Zotero user id]', 'user', '[Your Zotero API key]')
trove_api = trove.Trove(['Your Trove API key'])
exporter = export.TagExporter(trove_api, zotero_api)

exporter.export('[Your tag]')

What do you end up with? Here’s my test list on Trove and the resulting Zotero collection. Here’s a set of resources tagged with ‘inigo’, and here’s the collection I created from them. You’ll notice that I added a few little extras, like attaching pdf copies where they’re available.

Sorry, no GUIs, no support, and not much documentation. Just a bit of rough code and some ideas to play with.

8 months on

This has been a rather lean year on the blogging front. So as 2013 nears its end, I thought I should at least try to list a few recent talks and experiments.

Things changed a bit this year. No more am I the freelance troublemaker, coding in lonely seclusion, contemplating the mysteries of cashflow. Reader, I got a job.

And not just any old job. In May I started work at the National Library of Australia as the Manager of Trove.

Trove, of course, has featured prominently here. I’ve screen-scraped, harvested, graphed and analysed it — I even built an ‘unofficial’ API. Last year the NLA rewarded my tinkering with a Harold White Fellowship. This year they gave me the keys and let me sit behind the wheel. Now Trove is not only my obsession, it’s my responsibility.

Trove is a team effort, and soon you’ll be meeting more of the people that keep it running through our new blog. I manage the awesome Trove Support Team. We’re the frontline troops — working with users, content partners and developers, and generally keeping an eye on things.

And so my working hours are consumed by matters managerial — attending meetings, writing reports, planning plans and answering emails. But, when exhaustion allows, I return to the old WraggeLabs shed on weekends and evenings and the tinkering continues…

TroveNewsBot

TroveNewsBot is a Twitter bot whose birth is chronicled in TroveNewsBot: The story so far. Several times a day he posts a recently-updated newspaper article from Trove. But he also responds to your queries — just tweet some keywords at him and he’ll reply with the closest match. You can read the docs for hints on modifying your query.

TroveNewsBot also offers comment on web pages. Tweet him a url and he’ll analyse its content and search for something relevant amidst his database of more than 100 million newspaper articles. Every few hours he automatically checks the ABC News Just In page for the latest headlines and offers a historical counterpoint.

In Conversations between collections you can read the disturbing story of how TroveNewsBot began to converse with his fellow collection bots, DPLABot and now DigitalNZBot. The rise of the bots has begun…

I should say something more serious here about the importance of mobilising our collections — of taking them into the spaces where people already are. But I think that might have to wait for another day.

Build-a-bot workshop

You can never have too many bots. Trove includes the collections of many individual libraries, archives and museums — conveniently aggregated for your searching pleasure. So why shouldn’t each of these collections have its own bot?

It didn’t take much work to clean up TroveNewsBot’s code and package it up as the Build-a-bot workshop. There any Trove contributor can find instructions for creating their own code-creature, tweeting their resources to the world.

So far Kasparbot (National Museum of Australia) and CurtinLibBot (Curtin University Library) have joined the march of the bots. Hopefully more will follow!

TroveNewsBot Selects

Inspired by the British Library’s Mechanical Curator, TroveNewsBot decided to widen his field of operations to include Tumblr. There at TroveNewsBot Selects he posts a new random newspaper illustration every few hours.

Screen Shot 2013-12-23 at 9.49.32 pm

Serendip-o-matic

Unfortunately being newly-employed meant that I had to give up my place at One Week | One Tool. The team created Serendip-o-matic, a web tool for serendipitous searching that used the DPLA, Europeana and Flickr APIs. But while I missed all the fun, I could at least jump in with a little code. Within a day of its launch, Serendip-o-matic was also searching Trove.

Research Trends

This was a quick hack for my presentation at eResearch2013 — I basically just took the QueryPic code and rewired it to search across Australian theses in Trove. What I ended up with was a simple way of exploring research trends in Australia from the 1950s.

'history AND identity' vs 'history AND class'

‘history AND identity’ vs ‘history AND class’

Some of the thesis metadata is a bit dodgy (we’re looking into it!) so I wouldn’t want to draw any serious conclusions, but I think it does suggest some interesting possibilities.

Trove API Console

As a Trove API user I’ve always been a bit frustrated about the inability to share live examples because of the need for a unique, private key. Europeana has a great API Console that lets you explore the output of API requests, so I thought I’d create something similar.

My Trove API Console is very simple at the moment. You just feed it API requests (no key required) and it will display nicely-formatted responses. You can also pass the API request as query parameter to the console, which means you can create easily shareable examples. Here’s a request for wragge AND weather in the newspapers zone.

This is also my first app hosted on Heroku. Building and deploying with Flask and Heroku was intoxicatingly easy.

Trove Zone Explorer

Yep, I finally got around to playing with d3. Nothing fancy, but once I’d figured out how to transform the faceted format data from Trove into the structure used by many of the d3 examples I could easily create a basic treemap and sunburst.

Screen Shot 2013-12-23 at 11.12.04 pm

The sunburst visualisation was pretty nice and I thought it might make a useful tool for exploring the contents of Trove’s various zones. After a bit more fiddling I created a zoomable version that automatically loads a random sample of resources whenever you click on one of the outer leaves — the Trove Zone Explorer was born.

Trove Collection Profiler

As mentioned above, Trove is made up of collections from many different contributors. For my talk at the Libraries Australia Forum I thought I’d make a tool that let you explore these collections as they appear within Trove.

The Trove Collection Profiler does that, and a bit more. Using filters you define a collection by specifying contributors, keywords, or a date range. You can then explore how that collection is distributed across the Trove zones — viewing the results over time as a graph, or drilling down through format types using another zoomable sunburst visualisation. As a bonus you get shareable urls to pass around your profiles.

The latest sunburst-enabled version is fresh out of the shed and badly in need of documentation. I’m thinking of creating embeddable versions, so that institutions can’t create visualisations of their own collections and include them in their sites.

Presentations

Somewhere in amongst the managering and the tinkering I gave a few presentations:

Conversations with collections

Notes from a talk I gave at the Digital Treasures Symposium, 21 June 2013, University of Canberra.

Over the last couple of weekends I’ve been building a bot. Let me introduce you to the TroveNewsBot.

Screen Shot 2013-06-20 at 7.40.35 PM

TroveNewsBot is just a simple script that periodically checks for messages from Twitter, uses those messages to create queries in Trove’s newspaper database, and tweets back the results.

TroveNewsBot’s birth was, however, not without some pain. I ran into difficulty with Twitter’s automated spam police. At one stage everytime my bot tweeted, its Twitter account was suspended.

Twitter’s bots didn’t like my bot. [:(]

The problem has since been resolved — I think I must have done something when I was testing that upset the spam bots — but it did lead me to read in detail Twitter’s policies on spam and automation. This sentence in particular caused me to reflect:

The @reply and Mention functions are intended to make communication between users easier, and automating these processes in order to reach many users is considered an abuse of the feature.

So what is a user and what is communication? I read this sentence as suggesting that communications between individual human users were somehow more real, more authentic than automatically generated replies. But is a script tweeting someone a link to to a newspaper article that they might be interested in really less authentic than a lot of the human-generated traffic on the net?

Amongst the messages I received when I revealed TroveNewsBot to the world earlier this week was this:

And later from the same person:

 

Even as we live an increasing amount of our lives ‘connected’, still there remains a tendency to assume that experiences mediated through online technologies are somehow less authentic than those that take place in this space that we often refer to as ‘the real world’.

In the realm of cultural heritage, digitisation is frequently assumed to be a process of loss. We create surrogates, or derivatives — useful, but somehow inferior representations of ‘the real thing’.

Now let’s just all admit that, yes, we like the smell of old books, and that we can’t read in the bath with our iPad, and move beyond the sort of fetishism that often accompanies these sorts of discussions.

Yes, of course, digital and physical manifestations are different, the point is whether we get anywhere by arguing that one is necessarily inferior to the other.

A recent article in the Times Literary Supplement expressed concern at the money being spent on the manuscript digitisation programs that the author argued were ‘proceeding unchecked and unfocused, deflecting students into a virtual world and leaving them unequipped to deal responsibly with real rare materials’. Yes, there may be aspects of a physical page that a digital copy cannot represent, but as Alistair Dunning pointed out in response to the article, there’s no simple binary opposition:

The digital does not replace the analogue, but augments it, sometimes in dramatic and sometimes in subtle ways.

In his keynote address to the ‘Digital Transformers’ conference, Jim Mussell similarly argued against a simplistic understanding of digital deficiencies.

The key is to reconceive loss as difference and use the way the transformed object differs to reimagine what it actually was. Critical encounters with digitized objects make us rethink what we thought we knew.

I’m very pleased to be an adjunct here at the University of Canberra, but I’ve always felt a bit of a fraud around people like Mitchell when it comes to talking about visualisation. I’m actually much more comfortable with words than pictures. So why am I here talking to you today?

I think it’s because what we’re discussing today, what the Digital Treasures Program is about, is not just visualisation. It’s about transformation. It’s about taking cultural heritage collections and changing them. Changing what we can do with them. Changing how we see them. Changing how we think about them.

It’s about creating spaces within which we can have ‘critical encounters with digitized objects’ that ‘make us rethink what we thought we knew’.

And that to me is very exciting.

What might these transformations look like? Who knows? This is research, it should take us places we don’t expect and can’t predict.

However, for the sake of convenience today I’ve tried to define a few possible categories — most, admittedly, based on my own work. But I do so in the hope that the achievements of the Digital Treasures program will soon make my categories look ridiculously inadequate.

Analysis

When we have stuff in digital form — and by stuff I mean both collection metadata and digital objects — we can isolate particular characteristics and add them up, compare them, graph them. We can start to see patterns that we couldn’t see before.

Assembly

Putting a lot of similar things together in a way that enables us to see them differently.

Juxtaposition

Putting different things together in a way that enables us to find connections or similarities.

Serendipity

Displaying something unexpected or random.

Mobilisation

Putting things in new contexts, new conceptual spaces, new physical spaces, new geospatial spaces. Creating interventions and explorations.

This is a very limited catalogue of possibilities, but meagre as it is I think its enough to demonstrate that the overwhelming feature of digital cultural collections is not loss or deficiency, but opportunity and inspiration.

In fact I’m less worried about the deficiencies of digital representation than I am about the possibility that we might end up doing too much — that we might become so skilled in design and transformation that we end up overdetermining the experience of our users, that we end up doing too much of the thinking for them.

It seems to me that when it comes to digital cultural collections an important part of the transformation process is knowing where to leave the gaps and spaces that invite feeling, reflection and critique. We have to find ways of representing what is missing, of acknowledging absence and exclusion. We have to be able to expose our arguments and assumptions, to be honest about our failures and limitations. We have to be prepared to leave a few raw edges, some loose threads that encourage users to unravel our carefully-woven tapestries.

As I was developing the TroveNewsBot I realised I needed some sort of avatar. So of course I started searching in the Trove newspaper database for robots — there I found Mr George Robot.

The Courier-Mail, 7 November 1935, page 21

The Courier-Mail, 7 November 1935, page 21

George Robot, ‘described as the greatest electro-mechanical achievement of the age’ toured Australia in 1935 and 1936. As one newspaper described, he:

rises and sits as requested, talks, sings, delivers an address on the most abstruse topics, gnashes his electric teeth in rage or derision, while he accentuates his remarks by the most natural movements of arms and hands

But it wasn’t just George’s technical sophistication that inspired comment. Articles also appeared that described George’s love for Mae West and his admiration for Hitler.

Robots provide us with an opportunity not just to marvel at their technological wizardry, but also to think about what it really is to be human.

In the same way, as we start to have new types of conversations with online collections, to explore their many-faceted personalities, we will of course be exploring ourselves.

The digital transformation of cultural collections is not about showcasing technology but about creating new online spaces in which we can simply be human.

‘A map and some pins’: open data and unlimited horizons

8681035823_72f8c57ae0_m

This is the text of my keynote address to the Digisam conference on Open Heritage Data in the Nordic Region held in Malmö on 25 April 2013. You can also view the video and slides of my talk, or experience the full interactive experience by playing around with my text/presentation in all its LOD-powered glory. (Use a decent browser.)


The Australian poet and writer Edwin James Brady and his family lived for many years in the isolation of far eastern Victoria — in a little town called Mallacoota.

Edwin James Brady

Edwin James Brady (NLA: nla.pic-vn3704359)

Here, from time to time, Brady amused himself by taking a map of Australia down from the wall and sticking pins in it. The pins, Brady explained in 1938, included labels such as ‘Hydro-electric supply base’, ‘Irrigation area, and ‘Area for tropical settlement’. The map and its pins were one expression of Brady’s life-long obsession with Australia’s potential for development — for progress.

Maps and pins are probably more familiar now than they were in Brady’s time. We use them routinely for sharing our location, for plotting our travel, for finding the nearest restaurant. Maps and pins are one way that we document, understand and express our relationship to space.

Brady, however, was interested in using his pins to highlight possibilities. In the late nineteenth and early twentieth centuries size mattered. With the nations of Europe jostling for land and colonial possessions, space become an index of power. When the Australian colonies came together in 1901 to form a nation, maps and spatial calculations abounded. Australia was big and so it’s future was filled with promise.

Australia was big with promise.

Australia was big with promise.

In his travels around Australia, EJ Brady started to catalogue ways in which its vast, ‘empty’ spaces might be turned to productive use. A hardy yeomanry armed with the latest science could transform these so-called ‘wastes’, and Brady was determined to bring these opportunities to the attention of the world.

This evangelical crusade reached its high point in 1918, with the publication of his huge compendium, Australia Unlimited — 1139 pages of ‘Romance History Facts & Figures’.

National Archives of Australia:  A659, 1943/1/3907, page 135

National Archives of Australia:
A659, 1943/1/3907, page 135

Space may no longer be invested with the same sense of swelling power, but our maps and pins still figure in calculations of progress. Now it is the data itself that awaits exploitation. Our carefully plotted movements, preferences and passions may well have value to governments, planners or advertisers. Data, according to some financial pundits, ‘is the new oil’.

Whereas Brady traveled the land documenting its untapped riches, we can take his work and life and mine it for data — looking for new patterns and possibilities for analysis.

Brady wasn’t the first to use the phrase ‘Australia Unlimited’, though he did much to make it familiar. By exploring the huge collection of digitised newspapers available through the National Library of Australia’s discovery service, Trove, we can track the phrase through time.

australia_unlimited

Brady was a skilled self-publicist and his research trips in 1912 were eagerly reported by the local press such as the Barrier Miner in Broken Hill and the Cairns Post.

In 1918 the book was published, receiving a generally positive reception — as an advertising leaflet showed, even King George V thought the book was ‘of special interest’. In 1926, a copy of the book was presented to a visiting delegation from Japan.

NAA: A659, 1943/1/3907, page 55

NAA: A659, 1943/1/3907, page 55

Over the years Brady sought to build on his modest successes, planning a variety of new editions and even a movie. But while his hopes were thwarted, the phrase itself lived on.

In 1938 and 1952, there are references to a radio program called ‘Australia Unlimited’, apparently featuring young musical artists. Also in 1952 came the news that the author of Australia Unlimited, EJ Brady, had died in Mallacoota at the age of 83.

Sydney Morning Herald, 23 July 1952

Sydney Morning Herald, 23 July 1952

Unfortunately copyright restrictions bring this analysis to a rather unnatural end in 1954. If we were able to follow it through until the present, we could see that from 1957 the phrase was used by a leading daily newspaper as the title of an annual advertising supplement detailing Australia’s possibilities for development. In 1958, it was adopted as a campaign slogan by Australia’s main Conservative party. In 1999 it was resurrected again by a major media company for a political and business symposium. Even now it provides the focus for a ‘branding’ initiative supported by the federal government.

Graphs like this are pretty easy to interpret, but of course we should always ask how they were made. In this case I simply harvested the number of matching search results from the Trove newspaper database for each year. The tool I built to do this has been through several different versions and is now a freely-accessible web application called QueryPic. Anyone can quickly and easily create this sort of analysis just by entering a few keywords.

QueryPic is one product of my experiments with the newspaper database over the last couple of years. I’ve also looked at changes to content of front pages, I’ve created a combination discovery interface and fridge poetry generator, and I’ve even built a simple game called Headline Roulette — which I’m told is strangely addictive.

All of these tools and experiments take advantage of Trove’s API. I think it’s important to note that the delivery of cultural heritage resources in a machine-readable form, whether through a custom API or as Linked Open Data, provides more than just improved access or possibilities for aggregation. It opens those resources to transformation. It empowers us to move beyond ‘discovery’ as a mode of interaction to analyse, extract, visualise and play.

Using freely available tools we can extract named entities from a text, we can look for topic clusters across a collection of documents, we can find places and pin them to a map. With a little bit of code I can take the newspaper reports of Brady’s travels in 1912 and map them. With a bit more time I could take another of Brady’s travel books, River Rovers, available in digitised form through the Internet Archive, and plot his journey along Australia’s longest river, the Murray.

Such transformations help us see resources in different ways — we can find new patterns, new problems, new questions. But transformation is a lossy business. I can put a pin in a map to show that Brady stopped off in Mildura on his voyage along the Murray. What is much harder to represent are the emotions that surrounded that visit. While he was there Brady received news of a friend’s death. ‘Bad news makes hateful the most pleasant place of abiding’, he wrote mournfully, ‘I strained to open the gate to go forth again into a wilderness of salt bush and sere sand’. Travel can be a form of escape.

Digital humanist and designer Johanna Drucker has written about the problems of representing the human experience of time and space using existing digital tools. ‘If I am anxious’, she notes, ‘spatial and temporal dimensions are distinctly different than when I am not’. We do not experience our journeys in a straightforward linear fashion — as the accumulation of metres and minutes. We invest each footstep with associations and meanings, with hopes for the future and memories of the past. Drucker calls on humanities scholars to articulate these complexities and work towards the development of new techniques for modelling and visualising our data.1

‘If human beings matter, in their individual and collective existence, not as data points in the management of statistical information, but as persons living actual lives, then finding ways to represent them within the digital environment is important.’2

In a similar way, Australia Unlimited is not just a catalogue of potentialities, or a passionate plea for national progress. It’s also the story of a struggling poet trying to find some way of supporting his family. Proceeds from the book enabled Brady to buy a plot of land in Mallacoota and build a modest home — I suspect that it wasn’t the same home that was painted by his wife in the 1950s.

nla.pic-an2287718-v

But even this small success was undermined. Distribution of the book was beset with difficulties and disappointments and and despite all his plans financial security remained elusive.

Brady’s youngest daughter, Edna June, was born to his third wife Florence in 1946. What could he leave he leave her? A ‘modern edition’ of Australia Unlimited lay completed but unpublished. ‘If it fails to find a publisher’, he remarked wistfully, ‘the MSS will be a liberal education for her after she has outgrown her father’s nonsense rhymes’. It was, he pondered, ‘a sort of heritage’.

One of the things I love about being a historian is that the more we focus in on the past the more complicated it gets. People don’t always do what we expect them to, and that’s both infuriating and wonderful.

Likewise, while we often have to clean up or ‘normalise’ cultural heritage data in order to do things with it, we should value its intrinsic messiness as a reminder that it is shot through with history. Invested with the complexities of human experience it resists our attempts at reduction, and that too is both infuriating and wonderful.

The glories of messiness challenge the extractive metaphors that often characterise our use of digital data. We’re not merely digging or mining or drilling for oil, because each journey into the data offers new possibilities — our horizons are opened, because our categories refuse to be closed. These are journeys of enrichment, interpretation and creation, not extraction.

We’re putting stuff back, not taking it out.

Cultural institutions have an exciting opportunity to help us work with this messiness. The challenge is not just to pump out data, anyone can do that. The challenge is to enrich the contexts within which we meet this data — to help us embrace nuance and uncertainty; to prevent us from taking the categories we use for granted.

For all it’s exuberant optimism, a current of fear ran through Australia Unlimited. The publisher’s prospectus boldly announced that it was a ‘Book with a Mission’. ‘A mere handful of White People’, perched uncomfortably near Asia’s ‘teeming centres of population’, could not expect to maintain unchallenged ownership of the continent and its potential riches. Australia’s survival as a white nation depended upon ‘Effective Occupation’, secured by a dramatic increase in population and the development of its vast, empty lands — ‘The Hour of Action is Now!’.

National Archives of Australia:  A659, 1943/1/3907, page 208

National Archives of Australia:
A659, 1943/1/3907, page 208

In 1901, one of the first acts of the newly-established nation of Australia was to introduce legislation designed to keep the country ‘white’. Restrictions on immigration, administered through a complex bureaucratic system, formed the basis of what became known as the White Australia Policy.

While the legislation was designed to keep non-white people out, an increase of the white population was seen as essential to strengthen security and legitimise Australia’s claim to the continent. Australia Unlimited was an exercise in national advertising aimed at filling the unsettling emptiness with sturdy, white settlers.

But White Australia always a myth. As well as the indigenous population there were, in 1901, many thousands of people classified as non-white living in Australia. They came from China, India, Indonesia, Turkey and elsewhere. A growing number had been born in Australia. They were building lives, making families and contributing to the community.

Here are some of them…

The real face of White Australia

The real face of White Australia

I built this wall of faces using records held by the National Archives of Australia. If a non-white person resident in Australia wanted to travel overseas they needed to carry special documents. Without them they could be prevented from re-entering the country — from coming home. Many, many thousands of these documents are preserved within the National Archives.

Kate Bagnall, a historian of Chinese-Australia, and I are exploring ways of exposing these records through an online project called Invisible Australians.

To build the wall I downloaded about 12,000 images from the Archives’ website — representing just a fraction of one series relating to the administration of the White Australia Policy. Unfortunately there’s no machine-readable access to this data, so I had to resort to cruder means — reverse-engineering interfaces and screen-scraping.

Once I had the images I ran them through a facial detection script to find and crop out the portraits. What we ended up with was a different way of accessing those records — an interface that brings the people to the front; an interface which is compelling, discomfiting, and often moving.

The wall of faces also raises interesting questions about context. Some people might be concerned by the loss of context when images are presented in this way, although each portrait is linked back to the document it was derived from, and to the Archive’s own collection database. What is more important, I think, are the contexts that are gained.

If you’re viewing digitised files on the National Archives’ own website, you can only do so one page at a time. Each document is separate and isolated. What changes when you can see the whole of the file at once? I’ve built another tool that lets you do just that with any digitised file in the Archives’ collection. You see the whole as well as the parts. You have a sense of the physical and conceptual shape of the file.

National Archives of Australia:  ST84/1, 1908/471-480

National Archives of Australia:
ST84/1, 1908/471-480

In the case of the wall of faces, bringing the images together, from across files, helps us understand the scale of the White Australia Policy and how it impacted on the lives of individuals and communities. These were not isolated cases, these were thousands of ordinary people caught up in the workings of a vast bureaucratic system. The shift of context wrought by these digital manipulations allows us to see, and to feel, something quite different.

And we can go the other way. In another experiment I created a userscript to insert faces back into Archives’ website. A userscript is just a little bit of code that rewrites web pages as they load in your browser. In this case the script grabs images relating to the files that you’re looking at from Invisible Australians.

So instead of this typical view of search results.

Before

Before

You see something quite different.

After

After

Instead of just the record metadata for an individual item, you see that there are people inside.

We also have to remember that the original context of these records was the administration of a system of racial surveillance and exclusion. The Archives preserves not only the records, but the recordkeeping systems that were used to monitor people’s movements. The remnants of that power adhere to the descriptive framework. There is power in the definition of categories and the elaboration of significance.

Thinking about this I came across Wendy Duff and Verne Harris’s call to develop ‘liberatory standards’ for archival description. Standards, like categories, are useful. They enable us to share information and integrate systems. But standards also embody power. How can we take advantage of the cooperative utility of standards while remaining attuned to the workings of power?

A liberatory descriptive standard, Duff and Harris argue: ‘would posit the record as always in the process of being made, the record opening out of the future. Such a standard would not seek to affirm the keeping of something already made. It would seek to affirm a process of open-ended making and re-making’.3

‘Holes would be created to allow the power to pour out.’

‘Making and re-making’ — sounds a lot like the open data credo of ‘re-use and re-mix’ doesn’t it? I think it’s important to carry these sorts of discussions about power over into the broader realm of open data. After all, open data must always, to some extent, be closed. Categories have been determined, data has been normalised, decisions made about what is significant and why. There is power embedded in every CSV file, arguments in every API.

This is inevitable. There is no neutral position. All we can do is encourage re-use of the data, recognising that every such use represents an opening out into new contexts and meanings. Beyond questions of access or format, data starts to become open through its use. In Duff and Harris’s words, we should see open data ‘as always in the process of being made’.

What this means for cultural institutions is that the sharing of open data is not just about letting people create new apps or interfaces. It’s about letting people create new meanings. We should be encouraging them to use our APIs and LOD to poke holes in our assumptions to let the power pour out.

There’s no magic formula for this beyond, perhaps, building confidence and creating opportunities. But I do think that Linked Open Data offers interesting possibilities as a framework for collaboration and contestation — for making and challenging meanings.

We tend to think about Linked Open Data as a way of publishing — of pushing our data out. But in fact the production and consumption of Linked Open Data are closely entwined. The links in our data come from re-using identifiers and vocabularies that others have developed. The linked data cloud grows through a process of give and take, by many small acts of creation and consumption.

There’s no reason why that process should be confined to cultural institutions, government departments, business, or research organisations. Linked Open Data enables any individual to talk about what’s important to them, while embedding their thoughts, collections, passions or obsessions within a global conversation. By sharing identifiers and vocabularies we create a platform for communication. Anyone can join in.

So, if we want people to engage with our data, perhaps we need to encourage them to create their own.

I’ve just been working on a project with the Mosman public library in Sydney aimed at gathering information about the experiences of local servicepeople during World War One. There are many such projects happening around the world at the moment, but I think ours is interesting in a couple of ways. The first is a strong emphasis on linking things up.

The are records relating to Australian service people available through the National Archives, the Australian War Memorial, and the Commonwealth War Graves Commission, but there are currently no links between these databases. I’ve created a series of screen scrapers that allow structured data to be easily extracted from these sources. That means that people can, for the first time, search across all these databases in one hit. It’s a very simple tool that I started coding to ease the boredom of a long bus trip — but it has proved remarkably popular with family historians.

Once you’ve found entries in these databases, you can just cut and paste the URL into a form on the Mosman website and a script will retrieve the relevant data and attach it to the record of the the person you’re interested in. Linking to a service record in the National Archives, for example, will automatically create entries for the person’s birthplace and next-of-kin.

The process of linking builds structures, and these structures will themselves all be available as Linked Open Data. Even more exciting is that the links will not only be between the holdings of cultural institutions. The stories, memories, photographs and documents that people contribute will also be connected, providing personal annotations on the official record.

None of this is particularly hard, it’s just about getting the basics right. Remembering that structure matters and that links can have meaning. It’s also about recognising that ‘crowd sourcing’ or user-generated content can be made anywhere. Using Linked Open Data people can attach meanings to your stuff without visiting your website. Through the process of give and take, creation and consumption, we can build layers of description, elaboration, and significance across the web.

What excites me most about open cultural data is not the possibility of shiny new apps or collection visualisations, but the possibility of doing old things better. The possibility of reimagining the humble footnote, for example, as a re-usable container of structured contextual data — as a form of distributed collection description. The possibility of developing new forms of publication that immerse text and narrative within a rich contextual soup brewed from the holdings of cultural institutions.

I want every online book or article to be a portal. I want every blog or social media site to be a collection interface.

What might this look like? Perhaps something like this presentation.

My slides today are embedded within a HTML document that incorporates all sorts of extra goodies. The full text of my talk is here, and as you scroll through it you’ll see more information about the people, places and resources I mention pop up in the sidebar. Alternatively you can explore just the people or the resources, looking at the connections between them and the contexts in which they’re mentioned within my text.

This is part of an ongoing interest to explore different ways of presenting historical narrative, that build a relationship between the text and the structured data that underlies the story.

All of the structured data is available in machine-readable form as RDF — it is, in itself, a source of LOD. In fact the interface is built using an assortment of JavaScript libraries that read the RDF into a little temporary triplestore, and then query it to create the different views. So the whole thing is really powered by LOD.

It’s still very much an experiment, but I think it raises some interesting possibilities for thinking about how we might consume and create LOD simply by doing what we’ve always done — telling stories.

EJ Brady’s dreams were never realised. Australia’s vast spaces remained largely empty, and the poet continued to wrestle with personal and financial disappointment. ‘After nearly eight decades association with the man’, Brady wrote of himself in 1949, ‘I have come to look upon him as the most successful failure in literary history’. This energetic booster of Australia’s potentialities was well aware of his own life’s mocking irony. ‘He has not… made the wages of a wharf laborer out of book writing yet he persists in asserting Australia is the best country in the world!’.

But still Brady continued to add pins to his map.’For half a century I’ve been heaping up notes, reports, clippings, pamphlets, etc. on… all phases of the country’s life and development’. ‘What in hell I accumulate such stuff for I don’t know’, he complained in 1947. As the elderly man surveyed the ‘bomb blasted pile of rubbish’ strewn about his writing tent, he admitted that ‘this collecting is a sort of mania’.

Brady’s map and pins told a complex story of hope and disappointment, of confidence and fear. A story that combined national progress with an individual’s attempts merely to live.

There are stories in our data too — complex and contradictory stories full of emotion and drama, disappointment and achievement, violence and love. Let’s find ways to tell those stories.

  1. Johanna Drucker, ‘Humanistic Theory and Digital Scholarship’, in Matthew K. Gold (ed.), Debates in the Digital Humanities, University of Minnesota Press, 2012. []
  2. Johanna Drucker, ‘Representation and the digital environment: Essential challenges for humanists’, University of Minnesota Press Blog, http://www.uminnpressblog.com/2012/05/representation-and-digital-environment.html []
  3. Wendy M. Duff and Verne Harris, ‘Stories and Names: Archival Description as Narrating Records and Constructing Meanings’, Archival Science, vol. 2, 2002, pp. 263–285. []