With a bit of time to play over Christmas I had a go at applying some of the techniques described at ProgrammingHistorian to the ADB Online. I thought it might be interesting to create some word clouds, both for what they could reveal about the content of the ADB, and to see what they had to offer as a way of improving access to the articles.
So I set about learning Python and was soon downloading and scraping the more than 10,000 articles that make up the ADB online.
My first tests revealed that the most frequent words in ADB articles were…
born and died
Who’d have thought it? In a biographical dictionary?
After further refining the stopwords list I started to generate some useful clouds. Finally after 147 minutes of processing time, I had a word cloud representing the content of all 16 volumes of the Australian Dictionary of Biography.