I’m currently trying to make some progress with my ‘seams and edges’ paper for ALIAOnline 2015 and naturally ended up writing some code (what me procrastinate?). I was wondering about ways of exploring the ‘representativeness’ of an aggregation like Trove — what’s there and what’s not — so started noodling around with the Trove API.
The first result was a graph representing the numbers of Trove contributors and resources by state, compared to the population of that state. All values are displayed as percentages of the total.
The ACT is over-represented, of course, because of the holdings of the National Library itself. The under-representation of Queensland looks interesting — something to explore in the future.
My next graph used data on languages spoken at home in Australia from the 2011 census. It compared the population speaking those languages with the number of books in that language included in Trove, again as percentages of the total. It doesn’t embed very well, so view the full-size version on Plotly.
As I was playing around I noticed a tweet from Bridget Griffen-Foley:
Being in a quick-coding sort of mood I had to see how long it would take me to create a graph showing the numbers of daily newspapers in Trove (where daily is defined as more than 300 issues in a year). The answer was about fifteen minutes.
All of the graphs are created using the web service Plotly. Plotly has an easy-to-use Python API which means all you need to do to create a graph is to add a few lines of code. There are other Python visualisation libraries, but I like Plotly because it creates something instantly shareable — perfectly suited to this sort of quick and dirty experimentation.
I don’t think any of these graphs are particularly revealing, and I’ve made some assumptions about the data that probably wouldn’t hold up under scrutiny. But what this fiddling around emphasised was how an API and some simple tools make it possible to ask quick questions of the data.
All the code is in my Trove-Sketches repository on GitHub.