Being sure that we’re all talking about the same person is a vital part of sharing collections and data between sources. Lucy Schrader, Digital Channels Outreach Manager, shows how we’re using Wikidata to make it easier.
Bob (IRN 45580)’s your uncle (Q888246)
Names are great. They’re steeped in culture, history and personality, and we feel deeply attached to them. I should know, I spent ages choosing mine. What they aren’t is unique.
Bob Semple is Bob Semple, but not the Semple family couple, and definitely not the Semple Tank.
This can make research, attribution, and sharing information much harder. If you’re looking at the work of one A. Cromwell Shepherd, you might have to do a lot of extra work filtering out things that have nothing to do with your Shepherd.
To make things harder you might have dozens of A. Shepherds to check, or ‘Cromwell’ may be an old family name, conflating generations and creating confusion.
This is why we have Wikipedia disambiguation pages. Often, a name is simply not enough to find the right person, so you add some context like dates, nationality, or what they’re known for.
However, there are ways to build up the certainty that this person is in fact that person. Like most collecting organisations that have records about people, Te Papa’s collection management system EMu assigns a unique identifier to each one. This is the number at the end of the URL when you view a person’s page on Collections Online.
When we want to associate a person with a collection object, we don’t just say that it was created by “A. Cromwell Shepherd”, we link the object record with the person record, using their internal record number (IRN).

We do the same thing with places, categories, species names, and more.
That’s helpful when you’re dealing with Te Papa material, but where things get interesting is that because we have a nice stable identifier for someone, we can match it up with another organisation’s identifier. When we both know that Rita Angus (Te Papa IRN 72) is the same person as Rita Angus (Auckland Art Gallery artist ID 461), a researcher can find more of Rita’s work with less fuss.
Admittedly, Rita’s not the trickiest person to track down, but the same principle applies for the 63,000 Party records in our system. By connecting our identifiers with external ids like Getty’s Union List of Artist Names or the researcher authority ORCID, all sorts of connections become simpler.
Networking connectivity with Wikidata
Wikidata is the data backbone for Wikipedia. If a Wikipedia article has a helpful infobox on the side, it’s probably being filled in by structured data from a Wikidata item. It means the same information can be used many different places, including on different language versions of Wikipedia.

Like us, Wikidata gives unique identifiers (QIDs, because they start with Q) to each item, making it another place we can link up to.
Wikidata items for people can store biographical information like their birth date and place, family relationships, and where they worked. They can also hold a huge range of identifiers, meaning if we point to an item, we’re also pointing to records for that person from around the world.

When we find a match in Wikidata, we store that QID in EMu so it can be used later, including on the person’s Collections Online page.
But to get back to disambiguation, it’s not always obvious who’s who – the A Shepherds and similar. If we want to link to Wikidata and the wider world, we should be pretty sure we’ve got the right person.
Connecting with confidence
Mix’n’Match is a tool by Magnus Manske that crowdsources disambiguation with a combination of automatic matching and human review, and it’s helped us make thousands of new matches already.
We use it by loading a spreadsheet with our person data (just the information that’s already public on Collections Online, by the way). First, the system checks for any matches it can make by itself – for example, if our data already has a Wikidata QID that’s a definite yes, and a matching name and birth year might be a maybe.
Then anyone who’s interested can dig into the possible matches and unmatched records, looking at the available clues to decide if a match can be made. If a person isn’t in Wikidata, it’s easy to create a new item for them using the uploaded data.

We now have over 9,000 people matched to Wikidata items, helped along by a range of enthusiastic volunteers. We’ll be refreshing the Mix n Match dataset regularly to include new people and extra information.
Python code for combining EMu and API data and outputting it for Mix n Match
See how to use Mix’n’Match – Youtube
Roundtripping
Mix’n’Match adds our identifiers to Wikidata, but we also want Wikidata’s QIDs in EMu. This is known as roundtripping and it makes connections more robust and maintainable.
The Biodiversity Heritage Library has a great post about roundtripping and how they’re using it.
To make this happen, Collections Data Manager Gareth Watkins created a monthly report that checks our data in EMu against Wikidata, and tells us if anything needs updating in either location.
This video has more about our data and how the report works, so give it a watch:
Perl code
Perl code for EMu roundtripping report.
What comes next
With Wikidata QIDs in hand, we can start doing more with the data they connect us to. We’re considering whether we might import details that we’re missing from our records, like full names, dates, places, or even images.
This will require some thought about how we decide which sources are reliable, and whether we store data like this in EMu or as part of Collections Online.
We’re also looking at technical improvements that make all this linking easier for us and anyone who wants to link to us. This might include revisions to EMu’s Parties module and adding resolvable URIs to our API, so watch out for those.
If you’re interested in how we handle identifiers, use Mix’n’Match, or the report we run to roundtrip, get in touch at digitaloutreach@tepapa.govt.nz.







