ADB DIY RSS

So I was thinking, wouldn’t it be nice if the Australian Dictionary of Biography‘s ‘born on this day‘ feature could be made available as an RSS feed. Every morning you’d get a new list of biographies delivered direct to your feed reader. And so…

[sounds of xpath wrangling and PHP coding]

here it is.

It’s pretty simple – it harvests all the links of people born on the current day, then loops through the links to gather the first paragraph of each biography. Then it’s just a matter of writing everything to an RSS file.

In case you missed it, I also created a Media RSS feed for portrait images used in the ADB. This enables them to be viewed in CoolIris.

Code follows…

[code language="php"]
loadHTML($html);

$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("//ul[@class='pb-results'][1]/li/a");
$titles = $xpath->evaluate("//ul[@class='pb-results'][1]/li/a/text()");

echo "n";
echo "n";
echo "n";
echo "ADB Online - Born on this dayn";
echo "http://www.adb.online.anu.edu.au/scripts/adbp-births-deaths.phpn";
echo "A list of all those people in the Australian Dictionary of Biography who were born on this day.n";
for ($i = 0; $i < $hrefs->length; $i++) {
	$href = $hrefs->item($i);
	$title = $href->nodeValue;
	$bio = "";
	$url = "http://www.adb.online.anu.edu.au" . substr($href->getAttribute('href'),2);
	$html = getPage($url, $ch);
	$dom = new DOMDocument();
	@$dom->loadHTML($html);
	$xpath = new DOMXPath($dom);
	$paras = $xpath->evaluate("//div[@id='content']/p[1]/text()");
	foreach ($paras as $para) {
		$bio .= $para->nodeValue;
	}
	$bio .= "...";
	$bio = htmlspecialchars($bio, ENT_QUOTES);
	$bio = str_replace('n', '', $bio);
	echo "n";
	echo "$titlen";
	echo "$urln";
	echo "$bion";
	echo "n";
}
echo "n";
?>
[/code]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Tim Sherratt Written by:

I'm a historian and hacker who researches the possibilities and politics of digital cultural collections.

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *