Unlocking collections: Casting new light with AI

The possibilities of using Generative AI in a positive way to enhance collection accessibility and findability has sparked a pilot project by Curator History Katie Cooper and Collections Data Manager Gareth Watkins. The first of a series of four blogs, Katie and Gareth take you on their AI journey.

We’re blogging our Generative AI journey as it happens. We don’t quite know where we’ll land, but our plan is to blog as we build on our experiences. So, we’ll follow-up this blog with one looking at some of the Gen AI content and how we analysed it. We’ll then do some writing about rights and attribution, and finally look at the impact in terms of collection access, cost and the environment.

The overall aim of the pilot project is to test the usefulness and accuracy of Gen AI when creating descriptions of collection items that have been photographed but do not yet have an online text description.

Text descriptions are important for several reasons. Firstly, text is the main way people search on both the web and on Collections Online. If a collection item isn’t well described through its title, description and tags, it becomes very hard to discover. Secondly, while a collection item may have an image showing what it looks like, without an accompanying text description it can potentially limit access for people with a vision impairment.

As Gen AI is a field that is developing very fast, we decided to propose a project to our internal AI Guidance Group that didn’t require a lot of upfront cost and development. We proposed to use an already trained AI model on the OpenAI platform. We also wanted to be able to experiment without having to ultimately commit to publishing the descriptions.

Choosing the Collection

After careful consideration, Katie proposed the Silversmith collection found in the General History collection. In 1997, Te Papa purchased approximately 1,000 nineteenth-and twentieth-century silversmiths and jewellers’ tools. The collection was started by jeweller Norris Blaxall in the early 1920s, and his son continued to assemble and document the collection during research into the history of the silversmith and jewellery trades.

Four tools used for silversmithing laid out in a line on a white background.
Silversmith tools left to right: Reamer (GH006547), punch (GH006535), punch (GH006536), burnisher (GH006544), Purchased 1997 with New Zealand Lottery Grants Board funds. Te Papa

The collection provides a wonderful insight into the process of making and decorating silverware and jewellery, and many of the tools in the collection can be associated with specific makers. For example, the collection includes dies, punches, burnishers, stakes, vices, and reamers used by Frank Grady, a Wellington jeweller whose fine silverware is also represented in Te Papa’s collection.

Two silver knife rests that are the initials N and Z interchangeably and the rests are made of pounamu or New Zealand greenstone
Knife rests, sterling silver and pounamu, Frank Grady, Wellington, Purchased 2017. Te Papa GH025106

Currently there are around 900 tools that have been photographed but do not have a published description.

As the collection comprises mass-produced work objects, rather than images of specific people or personal effects, we thought it was a great starting point for our journey into creating Gen AI content. We may then be able to apply what we’ve learnt to more personalised collection items in the future, for example describing artworks or people.

Additionally, the relatively small size of the collection, allows us to test and assess the environmental, financial and rights implications of using Gen AI content in this way.

Influencing the Gen AI response

Firstly, we needed to understand what were the things that influenced the quality of the Gen AI response. These included:

  • Selecting the right AI model.
  • Deciding on whether to have the AI model use high or low image interpretation.
  • Supplying contextual information about the collection item to increase the AI model’s understanding of what it was looking at.
  • Supplying more than one image of the collection item for the AI model to assess.
  • Creating a successful System and User prompt.

We are currently experimenting with OpenAI’s GPT-4o mini model – a small, low cost, more environmentally friendly model which has a lower energy consumption.

One of the things that we are still working through is whether to get the AI model to use high or low image interpretation. Using a high setting potentially provides a more detailed description, but at a higher cost. And is a high level of image detail required given the nature of this particular collection?

We found that the more contextual information we supplied the AI model the better the result. Knowing this creates a real incentive for people working with our collections, to add as much detail as possible into EMu, our collection management system. Title, dimensions, materials, maker, inscriptions and production place all help the AI model understand and craft a response.

But probably the biggest eureka moment for us so far, was when Katie suggested we send not just one image per collection item to the AI model but send two images so that the model can understand the objects’ three-dimensional structure. Providing only a top-down view sometimes led the model to generate factually incorrect information, such as “hallucinating” non-existent handles on objects.

Stateless vs. Learning Based Model

While we are creating numerous descriptions, each of these is generated in a stateless interaction with the AI model. This means that we are not carrying over context or memory from one object to another.

For each description, we send a request to the AI model with information and images about that specific collection item. The AI model returns a response and then we repeat the process without preserving any memory of previous interactions.

An alternative approach would be to train an inhouse AI model on our collections data so that it learns from each interaction. This could be useful if you were trying to summarise a body of work, for example drawing linkages between all the artworks by a particular artist.

Prompts

Finally, we experimented with creating both a System prompt and User prompts.

The System prompt is setup once and defines the role of the AI model. Our current System prompt is:

Your role is to describe in detail the physical object in the supplied images to a person with a visual impairment, ensuring the description is close to 300 words. Where possible, two images of the same object will be provided by the user. Your response should combine details from all images. Do not reference the images separately because it is the same object. Do not use any external knowledge beyond the supplied images and user prompt. Ignore the background surrounding the object. Do not extrapolate dimensions, weight, or describe parts of the object that aren’t visible. Do not extrapolate on the object’s significance or usage. Spell in New Zealand English and do not repeat the image title.

Then the User prompt changes for each collection item. The prompt includes two images of the collection item and information from our database. For example:

A split image with the left side showing a golden coloured badge with an insignia pressed into it, and the right side showing the back view of the medal which is orange and has the words New Zealand Badges made from original dies Limited Edition Artillery Circa 1901
Badge (front and back), Cimino Jewellers; manufacturer(s); about 1901; New Zealand. Te Papa (GH005881/63)

The object in the image is part of a silversmith’s tool set. It has been given the title: Badge. It was made by Cimino Jewellers (manufacturer) in New Zealand, circa 1901. It is made of paper and metal. It is identified as badges and commemoratives. The object pictured contains a verified inscription, which should always be incorporated into your response: “Royal NZ Artillery”

The Result

And what about the generated Gen AI descriptions, how did they turn out?

Well, that will be for a future blog, as we’re still working through the copyright and licensing implications of publishing Gen AI content. But to quote from a colleague who has read through some of the draft summaries, “Wow Katie and Gareth – this is fascinating!”

Leave a Reply

Your email address will not be published. Required fields are marked *