With a bit of time to play over Christmas I had a go at applying some of the techniques described at ProgrammingHistorian to the ADB Online. I thought it might be interesting to create some word clouds, both for what they could reveal about the content of the ADB, and to see what they had to offer as a way of improving access to the articles.
So I set about learning Python and was soon downloading and scraping the more than 10,000 articles that make up the ADB online.
My first tests revealed that the most frequent words in ADB articles were…
born and died
Who’d have thought it? In a biographical dictionary?
After further refining the stopwords list I started to generate some useful clouds. Finally after 147 minutes of processing time, I had a word cloud representing the content of all 16 volumes of the Australian Dictionary of Biography.
The new version of my Greasemonkey userscript, RecordSearch Image Tools, gives RecordSearch’s digital image pages a rather new look. My previous version had done away with the tired ol ‘lemon-chiffon’ background colour, but I decided it was time to get a bit more adventurous, so I blitzed the old design and rebuilt the page from the beginning.
As you can see from the screenshot, I’ve tried to give the images as much as the screen as possible. I’ve also created a consistent set of navigation buttons, and improved the functionality in various ways. Continue reading »