I’m interested in time — in the way we imagine, manipulate, experience and describe time, particularly in the service of ideas such as ‘progress’.
This was one of the themes of Atomic Wonderland, but beyond constructing a few case studies it’s not all that easy to study. Or at least it wasn’t. Now projects such as Victorian Books are showing how we can explore the changing weights of ideas across times and cultures by analysing the contents of large textual collections.
Returning visitors will be probably be aware of my own experiments mining the contents of the National Library of Australia’s digitised newspapers database, available through Trove. So far I’ve focused on the development of generic tools and techniques, but I thought it would be interesting to apply these to my study of ‘progress’. Happily the NLA agreed and have awarded me a Harold White Fellowship for 2012 to do just that. Yippee!
I’ll be taking up the fellowship in February, but in preparation I’ve started to develop a few little sketches that prod at our fondness for periodisation. Labels such as ‘the Roaring Twenties’, ‘the Great Depression’ or even ‘the First World War’ are so familiar that we sometimes forget that they themselves have a history.
To begin with I decided to examine the question of when ‘the Great War’ became ‘the First World War’. At some point we realised that the Great War was not the final act in a centuries-long drama of European jealousy and jostling, but the first in a series of global conflicts. Can newspapers tell us when?
I already had a script that would generate a basic time series from a Trove query string. It simply takes the query, fires off a separate search for each year and grabs the number of matching articles. If the number of matches is more than zero, it also retrieves the total number of articles for that year and calculates the proportion matching the query. The results are saved in a json file which can be easily visualised using something like HighCharts. The original script needed a few tweaks to streamline the process, but I’ll describe these in detail in my next post.
For this experiment I constructed two queries. The first simply searched for the phrase ‘the great war‘ between 1900 and 1954. The second was a bit more complicated — it searched for any of the phrases ‘first world war’, ‘world war one’, ‘world war 1’ or ‘world war i’ across the same period. I fed the queries to my script and after a bit of ker-chugging, whirring and clunking I ended up with a graph.
The result is not really surprising. As you can see on the full graph, the two lines cross late in 1941. With German victories across Europe and North Africa, the opening of the Eastern Front and, finally, the Japanese attack on Pearl Harbour, 1941 seems to make sense. But it’s interesting to see this reflected so clearly in such a rough and ready analysis.
What is perhaps more intriguing is the huge spike in 1939. Of course it makes sense that people would be referring back to the Great War as the prospect of a new conflict loomed, but it does make you wonder about the context of these discussions and how they might have developed as war edged closer.
Notable too are the earlier blips in the First World War count — the first centred on 1916 and the second on 1935. The peak in 1916 is actually due to the tags and comments added by Trove users. The standard ‘search everything’ option in Trove includes these as well as the text of the articles themselves. By using other search options you can choose to exclude the tags that match your query, but that seems rather messy. It would be nicer if Trove gave you the option of ignoring these matches from the start.
The second blip is a bit more interesting. By clicking on the graph and exploring the results from Trove, you can see that it’s due to the screening of a documentary film called ‘The First World War‘. The film used archival footage drawn from a number of nations and was based on Laurence Stalling’s book The First World War: A Photographic History. As one newspaper article noted: ‘this picture presents war, stripped of its gaudy trappings, and fearful in its grim reality’.
By way of comparison I tried a similar query using the Google Books Ngram viewer. The crossover point seems a little later, but of course books take longer to publish than newspapers. There is, however, no peak in 1939 for ‘the Great War’ — at least not if you use the combined ‘English’ corpus. If you examine the British-English and American-English corpora separately it’s a rather different story. Querying the British-English corpus produces something much closer to our Trove graph, complete with a spike around 1939. Again, this is only as we’d expect given the lesser significance of the First World War in American history.
This is, of course, only a sketch — something to prompt new questions or suggest avenues for attack. It’s made me want to find out a bit more about the nature of discussions in 1939, so I’ve fired up my Trove Newspaper Harvester and downloaded the text of all 6,582 articles from 1939 that include the phrase ‘the Great War’. More about that soon…
This work is licensed under a Creative Commons Attribution 4.0 International License.