ADB Online

February 4, 2009 /

So I was thinking, wouldn’t it be nice if the Australian Dictionary of Biography‘s ‘born on this day‘ feature could be made available as an RSS feed. Every morning you’d get a new list of biographies delivered direct to your feed reader. And so…

[sounds of xpath wrangling and PHP coding]

here it is.

It’s pretty simple – it harvests all the links of people born on the current day, then loops through the links to gather the first paragraph of each biography. Then it’s just a matter of writing everything to an RSS file. Read MoreADB DIY RSS

January 24, 2009 /

With a bit of time to play over Christmas I had a go at applying some of the techniques described at ProgrammingHistorian to the ADB Online.  I thought it might be interesting to create some word clouds, both for what they could reveal about the content of the ADB, and to see what they had to offer as a way of improving access to the articles.

So I set about learning Python and was soon downloading and scraping the more than 10,000 articles that make up the ADB online.

My first tests revealed that the most frequent words in ADB articles were…

born and died

Who’d have thought it? In a biographical dictionary?

After further refining the stopwords list I started to generate some useful clouds. Finally after 147 minutes of processing time, I had a word cloud representing the content of all 16 volumes of the Australian Dictionary of Biography.

The complete ADB word cloud
The complete ADB word cloud

Read MoreCloudy biographies and portrait walls