I have problems with the idea of infrastructure, particularly that of the e-research variety. It seems like we always end up talking about huge amounts of money and multi-institutional partnerships. It just doesn’t seem like a great model for innovation. As I’ve previously argued, I’d like to see something more like the funding schemes offered by the NEH Office for Digital Humanities. Encourage people with ideas, don’t just reward the good networkers. Build tools and apis, not portals and platforms.
Of course I’d still like to see the digital humanities well represented in the list of Virtual Laboratories and eResearch Tools currently under consideration by NeCTAR. It’s time the digital research needs of the humanities were properly recognised. There are lots of possibilities, most of which we can’t yet envisage, but as I was asked what I would like to see as part of a Virtual Laboratory I had a go at setting down a few brief ideas. For what it’s worth, here’s my e-research infrastructure wishlist…
Grappling with abundance
Traditional historical research is often based on a presumed scarcity of resources — the skill is in tracking down the sources. But large digital collections, like the Trove newspapers database, change this — you now have to make sense of the sheer volume of material. Digital history, through techniques such as text-mining and visualisation, offer a way of using these new riches effectively. We need to ensure that investments in digitisation are accompanied by evolutions in scholarly practice.
Understanding what’s not online
At the same time, it must be recognised that large quantities of our cultural heritage are not available in digital form. For example, only about 10% of the holdings of the National Archives of Australia are described in their collection database, and only a small proportion of these are digitised. Easy online access could foster a certain circularity in historical research where only ‘known’ resources are consulted. We need to develop tools and visualisations that reveal the valleys as well as the mountaintops — identifying the holes in our research fabric.
More generally, we need to foster critical engagement with the tools and assumptions of digital research. Federated searching sounds great, but as scholars we need to expose the assumptions implicit in any such tool. What is being federated, from where, how is relevance being determined etc? Humanities e-research infrastructure should have built-in levels of reflexivity that enable scholars to understand the limits and assumptions of their digital research. Every algorithm contains an argument.
The resources we build are arguments with are subject to change. The Trove newspapers database, for example, is constantly adding new titles and articles, while users are improving the text transcriptions. Any analysis based on the holdings of this database needs to explicitly recognise this. At the very least the tools we have need to be able to generate time-stamped citations. It would be even better if we could capture a snapshot of the data to accompany our analyses. Perhaps there are possibilities for using something like the Memento project to ensure that the temporal context of humanities research is adequately documented.
Show your working out
Scholarly publication in history, and the humanities generally, tends to present a finished product. But as we delve further into digital research the research processes themselves will be equally important both for fostering critical engagement with tools and methods and for enabling others to reproduce or extend the research. We need easy ways for researchers to expose their working out (subject to whatever access controls they think appropriate). It should be possible to save a series of steps – search, analysis, visualisation etc as modules for sharing and re-use.
Follow your nose
Search needs to be complemented by rich, exploratory environments that encourage browsing, enable you to follow relationships, and foster serendipitous discovery. The problem with many collections is knowing enough about what’s in them to frame a useful search. Browsing, though a variety of interfaces — people, maps, events, record types, physical proximity — overcomes this problem. As more cultural institutions make use of Linked Open Data and shared identifiers — such as People Australia, Geonames or the Powerhouse Object Thesaurus — the possibilities for navigating this rich contextual space will increase.
We need to develop better models for embedding rich citations within scholarly research — citations that describe not only the resource in structured, machine-readable forms, but also relevant relationships. This will link research directly to resources, making scholarly outputs a means of resource discovery, and enabling resource databases to re-use the scholarly research to enhance their own descriptions and finding aids.
Moving beyond simple citation, we need better ways of exposing the structures of people, events, places and things that are referenced in our narratives. Linked Open Data provides a model, but we need tools to make it simple and examples to make it obvious.
This work is licensed under a Creative Commons Attribution 4.0 International License.