Back when I was looking at ‘When did the Great War become the First World War?‘ I promised a detailed post on how I constructed the graphs. But of course I got distracted. Then I started adding new features to the script and redesigning the graphs, so…
Anyway, the result is a rather neat little gizmo henceforth named QueryPic (I got a bit sick of ‘search summariser’ and ‘graph-maker thing’). The first version just harvested data and left all the graph-making to you. But QueryPic does it all! It harvests the data and makes the graph. Woohoo.
Here’s an example showing ‘drought’ versus ‘flood':
- Explore your Trove newspaper query over time in the form of a simple line graph.
- Interactive — click on a point to retrieve sample articles from that date.
- Combine data sources to compare queries.
- Choose your interval — plot by year or month.
- Switch views between total results and the proportion of all articles.
Yes, it’s a Python script and yes it runs on the command line. Let’s get that out of the way now. I don’t think I have the time and energy to develop cross-platform gui versions of all my tools. I’d rather spend the time adding new features or exploring new possibilities. Sorry, but until I have a wealthy benefactor or a technical support team, I think that’s the way it has to be. In any case, the code is all there — so build your own gui!
Actually, if I did have the time and energy I don’t think I’d build a standalone gui anyway. What would be much cooler would be a web service, where people could run, share and combine their queries. Social graph-making! A celebration of serendipity! A historical playground! Hmmm…
But for now there’s this python script. It’s dead easy to use. Starting from the beginning…
- Do you have Python installed? If you have a Mac or Linux the answer is yes. Fire up a terminal and type ‘python -V’ — see, I told you. If you have Windows you can get a handy installer. Do it.
- Get the source code. Just download this zip file and open it into a new folder.
- Open a terminal and cd into the new folder.
- Run ‘python do_totals.py [your Trove query]’.
- Watch in excitement as the script chugs away retrieving data from Trove.
- Once the script is finished, go to the ‘graphs’ directory, where you’ll find your newly-created html page complete with fancy interactive graph.
- Open the html page in the web browser of your choice.
- Enjoy! Celebrate! Drink a toast in my honour!
There are a number of optional arguments that you add to the command line to customise your results:
-n (or –name) [a query name]
Give a name to your query. The name is used to create filenames for the html and data files, it is also used in the legend of the graph. The default is to use the search keywords as the name.
-d (or –directory) [a directory path]
The full pathname of the directory/folder for your results. The default is a ‘graphs’ sub-directory in the current directory.
-g (or –graph) [a graph name]
Specify the name of the html file that’s created. This is useful for displaying multiple queries on a single graph. Just run QueryPic for each query, using the same graph name each time. The default is either the value specified by the -n parameter or a name derived from the search keywords.
-m (or –monthly)
Plot the query at monthly intervals. The default interval is a year.
What QueryPic actually does
QueryPic builds a simple visualisation of your search query in the Trove newspaper database. A list of search results is difficult to interpret and offers little context. QueryPic shows you the number of articles matching your query over time, enabling you reframe your questions, pursue hunches, or simply play around.
QueryPic takes your Trove newspaper query and looks for a date range. If it doesn’t find one, it assumes you want your graph to go from 1803 to 1954 (the complete contents of the newspaper database — except for the Women’s Weekly). QueryPic then strips out any date parameters from the query, so it can fire off the query within the start and end dates, at the specified date interval.
Date interval? In the previous version of this script you could only plot points at yearly intervals, so it was impossible to zoom in an see what might be happening over the span of a single year or two. But amazing advances in QueryPic technology mean you can now plot changes by month. Here for example is a new version of my Great War/First World War graph, focused on 1938–1946 and plotted at monthly intervals.
So for each interval within the date range QueryPic fires off a request to Trove. From the response it scrapes out the total number of results for that date. If the total is greater than zero, it then fires off a second request to find the total number of newspaper articles for that year. Your query results divided by the total number of articles gives the proportion of articles for that date matching your search query.
Plot ‘cat’ against ‘dog’ in a graph called ‘animals':
python do_totals.py "http://trove.nla.gov.au/newspaper/result?q=cat" -g "animals" python do_totals.py "http://trove.nla.gov.au/newspaper/result?q=cat" -g "animals"
Specify a directory for your results:
python do_totals.py "http://trove.nla.gov.au/newspaper/result?q=cat" -d "/User/bill/Documents/graphs"
Plot results at monthly intervals:
python do_totals.py "http://trove.nla.gov.au/newspaper/result?q=cat&fromyyyy=1920&toyyyy=1921" -m
Specify a name:
python do_totals.py "http://trove.nla.gov.au/newspaper/result?q=cat" -n "Felines"