I link therefore I am

Let me be clear. I am not Tim Sherratt the sound engineer. Nor, indeed, am I Timothy Sherratt, author of Saints as Citizens: A Guide to Public Responsibilities for Christians. We are three different people, spread across three continents, locked in a deadly battle for global supremacy via Google search rankings. There can be only one…

Of course you probably knew I wasn’t a British sound engineer or an American politics professor. There are plenty of contextual clues within this website, even on this page, to indicate that my interests lie elsewhere. But while we humans are pretty good at picking up such clues, it’s much harder for computers. When Google comes to index my site, how does it know I’m not a sound engineer who likes to dabble in history? Indeed, how does Google, or any computer know that the words ‘Tim Sherratt’ are actually a person’s name? These are questions of both identity and semantics.

Librarians have been dealing with questions of identity for many, many years developing detailed name authority records. Such records allow name variations to be cross-referenced and individuals to be uniquely identified. For example I have a control number of ‘n 2005043272’ in the Library of Congress authorities database, while Timothy R Sherratt, the politics professor has been assigned ‘n 94106739’.

The National Library of Australia has developed its own name authority file. However, the NLA has realised that reliable identity data has a much broader application that simply cataloguing, and is using its name authority data as the foundation of an exciting new resource – People Australia. People Australia will mesh its own records with biographical data from a variety of outside sources, creating a rich collection of interlinked identities. Already entries from the Australian Dictionary of Biography have been ingested.

So now, thanks to People Australia, if I ever get confused about who I am I just have to remember one little url – my very own persistent identifier – http://nla.gov.au/nla.party-479364. I’m going to get a t-shirt made up.

But that doesn’t help our new machine overlords very much. How can a computer tell that the words ‘Tim Sherratt’ describe a person and that more information about that person can be found at http://nla.gov.au/nla.party-479364? This is the sort of problem that the semantic web hopes to solve. The semantic web aims to expose the structures that are buried in our documents and databases, to make explicit the contextual clues that humans pick up, but computers ignore. As the slogan goes, it represents a change from a ‘web of documents to a web of data’.

The semantic web uses a variety of tools and standards to encode information in a form that means something to computers. FOAF (Friend of a Friend) is, for example, a machine-readable ontology that describes people and their relationships. A computer visiting this page can in fact find out a fair bit about me, including my NLA persistent identifier, because there is a link to a small XML file in which my details are expressed using FOAF.

But if this seems a little daunting, the semantic web offers another technology which is really just as easy as marking up a page in HTML – it’s called RDFa. This link – Tim Sherratt – is more than it seems. Here is what a computer sees:

<a typeof="foaf:Person" property="foaf:name" content="Sherratt, Tim" rel="foaf:isPrimaryTopicOf" href="http://nla.gov.au/nla.party-479364">Tim Sherratt</a>

This says that Tim Sherratt is a person whose name has the standard form ‘Sherratt, Tim’ and who is the primary topic of the page to be found at http://nla.gov.au/nla.party-479364. There’s a fair bit of semantic goodness in that one little link. If the NLA page also expressed its data in a machine-readable form, this link could send search engines and browsers into a whole new world of associations and inferences.

But I suppose you’re thinking that the code still looks a bit complicated. Well never fear, this long post is really just an introduction to a new project I’ve been working on – something that will help you generate markup like this with just a couple of clicks.

Introducing Wragge’s identity browser

I’ve been interested in publishing biographical data way back from the early days of Bright Sparcs and, sad as it may seem, I find the possibilities of People Australia pretty exciting. However, I don’t think we should expect the NLA to do all the work. People Australia provides a framework that we can all use to enrich our own documents, databases, finding aids, and applications.

You can easily access People Australia data through Trove. But to get a better idea of what’s in the database, I’d suggest you spend some time playing with its SRU interface. Using this you can query the database directly, retrieving results in XML – ready for your own application to suck up and use.

To make this even easier, I’ve written a People Australia client library in Python. This enables you to quickly extract and use identity information. Using it, your own web application can talk to People Australia directly. I won’t go into the details here – the code is farily heavily commented – but I welcome any feedback, suggestions or contributions. Copy it, change it, use it!

To try out my library and to provide a tool that might be of use to the average punter I’ve also built:

<TA-DA>Wragge’s identity browser!</TA-DA>

It’s pretty simple. Search for a surname, pick a name from the result list, and view their identity details. For example, here’s Clement Wragge’s details.

But there are a couple of extra features that I am rather smugly pleased with. First of all, there’s an 'Identify me!' bookmarklet. Just drag the link to your browser’s bookmarks or favourites toolbar (see below for some further notes).

Once you have the bookmarklet installed all you have to do to find the identity record for someone is to highlight their name on a webpage and click ‘Identify me!’. You could then grab the People Australia ID to store in your own application, allowing you (with the help of my client library) to automatically include links to relevant entries in the Australian Dictionary of Biography, for example.

Even better, Wragge’s identity browser will automagically generate the RDFa markup you need to semantically enrich your document. Whether you’re writing a blog post, publishing an article, drafting a caption, creating a database entry, or preparing a finding aid you can quickly and easily find an individual and then cut and paste the code you need.

To show this in action I used the bookmarklet to help me mark up many of the people named in one of my articles. We humans see a normal page with a few extra links. Computers, however, can extract the embedded RDFa to get at the structured information that’s hidden in the page.

Now I’ve got to go and semantify the rest of my articles…

Go forth and identify! And in the process help build a better web.

Notes on the bookmarklet

  • Internet Explorer has ‘Favorites’, Firefox has ‘Bookmarks’ – whatever you’re using first make sure that your Bookmarks/Favourites toolbar is visible. Look under Tools->Toolbars in IE8, View->Toolbars in Firefox.
  • Try dragging the ‘Identify me!’ link to your Bookmarks/Favourites toolbar. If it doesn’t work, try right clicking on the link and choose ‘Bookmark this link’ or ‘Add to Favourites’. Make sure you add it to the toolbar folder. IE will probably give you various warnings – ignore them.
  • You should now have a working bookmarklet – highlight a name and click on it, a new window should open with results from Wragge’s identity browser. IE might complain about opening a pop-up – allow pop-ups and try again.
  • The bookmarklet is pretty clever about working out which part of the highlighted text is the surname, so you can highlight names in a number of formats including:
    • Surname
    • Surname’s
    • Surname, Othernames
    • Othernames Surname
    • Othernames Surname’s
  • For the moment this only works with ‘straight’, ie non-curly, apostrophes – but I’ll fix this asap. Fixed!

Notes on RDFa markup

  • You have a choice between visible (ie clickable) links or invisible ones. They look the same to computers, so it’s just a matter of whether you want your human visitors to see them. Click ‘change’ to toggle between the two options.
  • You can just paste the RDFa markup straight into your document. If you’ve used the bookmarklet, the text you highlighted will be automatically inserted as the link text – so just copy and paste. If you haven’t used the bookmarklet you can insert the link text yourself.
  • Somewhere in your document you need to tell computers what the FOAF in your RDFa markup means. You do this by inserting the text:
    xmlns:foaf="http://xmlns.com/foaf/0.1/" inside a tag that contains your marked up text. If you can edit the raw html of your page, you can just insert it in the <html> tag itself, so it becomes <html xmlns:foaf="http://xmlns.com/foaf/0.1/" >. Otherwise you can wrap your marked up text in a <div> tag and put the extra code in there.
  • If you’re using something like WordPress that strips out or converts any markup that it doesn’t expect, you need to be able to enter the RDFa as ‘raw’ html. In WordPress you can do this using the Raw HTML plugin.
  • For more on using RDFa have a look at: RDFa for HTML Authors.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Tim Sherratt Written by:

I'm a historian and hacker who researches the possibilities and politics of digital cultural collections.

10 Comments

  1. Andrew Wilson
    January 20, 2010

    Hi Tim

    THis is very cool stuff. However, I did a search on ‘wilson’ using Wragge’s and it came back with over 1300 names – I don’t think most people will be bothered to go through a lsit that long. Is there any way of refining the search or the result set?
    cheers
    Andrew

  2. January 20, 2010

    Andrew, I’d hoped the page links might alleviate some of that problem, but you’re quite right, it’s not very satisfactory.

    The main problem at the moment is that (last time I checked) pattern matching wasn’t working on the SRU interface, so that makes it impossible to filter by initial.

    I could put in an other names box, but sometimes only initials are recorded, so I was concerned that there’d end up being quite a lot of false negatives.

    So I decided to stay just with the surname search until pattern matching was working. Once that’s fixed I’ll refine the search interface.

  3. b3rn
    January 20, 2010

    Great post. I’m working in my spare time on a WWII honour roll that lists local residents. I’ll be looking to RDF the content (that is, the people). Any suggestions or ideas? Ideally it will allow links (now or in the future) with other resources & institutions.

    • January 21, 2010

      B3rn – Well there’s the WWII nominal roll – no rdf, but there do seem to be individual identifiers for people. So you could use foaf:isPrimaryTopicOf links to http://www.ww2roll.gov.au/script/veteran.asp?VeteranID=%5Bveteran_id%5D. If the DVA could expose this data through OAI-PMH then People Australia could ingest it, creating the possibility of further links. (Another job for you Basil?) Perhaps we should start lobbying DVA.

      The other possibility is to use their service records in the National Archives. There’s info on finding them here: Once you’ve found a service record (and perhaps requested it to be digitised) you can create a persistent url using the following formula: http://www.naa.gov.au/cgi-bin/Search?O=I&Number=%5Bbarcode%5D – where barcode is the file barcode. Barcodes are unique and, in most cases, persistent, so they can provide a quasi-identifier.

      Once again I would hope that the NAA will start to expose its people data so it can be ingested by People Australia, but…

  4. January 21, 2010

    Note that the People Australia SRU interface that Tim is using (http://www.nla.gov.au/apps/srw/search/peopleaustralia) provides 44 searchable indexes including surname, forename, identifiers, description, possessingInstitution, biography, relatedname, relatedresource and, the catchall, anywhere.

    Also, to make life even easier, in the near future we’ll be implementing pattern matching.

  5. […] Sometimes, though, you just want to link one or two names to Trove.  Maybe you want to disambiguate someone in a blog post.  In that case, think about using Wragge’s Identity Browser. Tim Sherratt has written Wragge’s Identity Browser as a Python library that helps you to quickly generate an RDFa link to someone in Trove. That means that you can link to a person in Trove and (with the help of a Friend of a Friend (FOAF) prefix mapping) tell harvesters that this is a link to a person and that person is Grace Cossington Smith.  All that semantic goodness. Tim has described how it all works in I link therefore I am. […]

  6. […] In that same post, I mentioned how people like Tim Sherratt are starting to build ways to access the data if you are not an institution. Wragge’s Identity Browser is designed for finding just one name and the link that will point to it. This is handy if you are writing a blog post about someone, for example, and want to link to them. Tim has described how it all works in I link therefore I am. […]

Leave a Reply