Improving Access to Archives: Navigating the RES Index
By Elliot Smith
The focus of the Research & Education Space (RES) is to improve access to public archives for use in education. RES does this by indexing the data in those archives, then exposing that index as a single, aggregated point of entry to that data.
Rather than providing end user applications, RES instead supplies a service for developers, which enables them to include archive materials in their own Powered by RES applications. Most of these applications are intended for teachers and learners; typically, those applications help users find media resources stored in public archives. Some example use cases are:
- A geography lecturer searches for a video of glacial movement.
- A secondary school pupil needs an image of tadpoles to include in a presentation.
- A physics teacher looks for a web page which explains the different states of matter.
Here are some examples of what’s already in the RES index:
- BBC Images (c. 50,000 photos) - example media
- BBC Remarc (videos, photos) - example media
- BBC Teach (videos) - example media
- BBC Shakespeare Archive (videos, photos) - example media
Note that the media isn’t stored in the index; only the metadata about where it is, what it’s about, who contributed to it, when it was created etc. Also note that, at the moment, most of the data in the index comes from the BBC; but we’re working with external partners to add their data, too.
The Acropolis API is the gateway to finding data in the RES index. There is a general technical document which explains how to use this API in detail. This rest of this article builds on that document, giving examples of the kind of data returned and how to interpret it.
Using the Acropolis API
The Acropolis API is developer-focused and not intended for end users. However, it does have a simple interface you can experiment with through a web browser. As an example of how to use and make sense of the API without having to write any code, we’ll walk through one of the use cases above: finding an image of tadpoles for a presentation. The steps below take you through the process.
1. Search for a keyword
Go to http://acropolis.org.uk/?q=tadpole in your browser. Note the q=tadpole
part of the URL. This asks Acropolis to search its index for the term tadpole and return matching resources. Note that the term “resource” here has a formal definition from RDF, the data format used by Acropolis. By this definition, a resource can represent anything, real or imagined: a person, a place, an event, a novel, a play, a web page, a video, an image, an idea etc.
The results are information resources which contain your keyword(s), displayed as a list of titles and descriptions (if available):
Each link points to a web page describing a proxy resource (see below).
2. Fetch the proxies
When a search is run against Acropolis, the result contains links to proxy resources (proxies). Proxies are created by Acropolis and are effectively aggregates of resources from different archives. For example, if both BBC Images and BBC Teach have data about the concept tadpole, Acropolis combines that data into a single proxy which stands for both tadpole in BBC Images and tadpole in BBC Teach. Media from both of those archives are then associated with the proxy, as you can see in the real Acropolis proxy for tadpole.
A client (whether person or program) using the index can follow links from tadpole proxies to data and media about the concept tadpole in all of the archives indexed by Acropolis. This simplifies the task of finding media across multiple archives: a client doesn’t have to individually search each archive, and can instead just search Acropolis.
For more details about how Acropolis generates proxy resources, see this blog post which explains aggregation.
In the case of our example search, clicking the first link takes us to the tadpole proxy page:
Notice that one section of the page is expanded. This section of the page contains data about the resource which exactly matches the URL in the browser address bar (http://acropolis.org.uk/4c56607b4e1b4fe8ac29a1ad1b245246#id); i.e. the proxy representing the concept of tadpole, mostly derived from data extracted from DBPediaLite.
The two columns inside the expandable section display statements about the proxy. For example:
dct:description "larva of amphibians" [en]
rdfs:seeAlso /15f8690cf16148f88db57bd79fe00595#id
You can read these statements as <proxy> has a <left-hand column value> property with a value <right-hand column value>
, e.g.
the tadpole proxy has a dct:description property with a value '"larva of amphibians" [en]'
The property dct:description
refers to the term description from the dct vocabulary; clicking on dct:description will take you to the web page which describes that vocabulary (DCMI Metadata Terms). So what the statement really says is:
the tadpole proxy has a description property (as defined in the DCMI Metadata Terms vocabulary) with a value '"larva of amphibians" [en]'
The reason for specifying the vocabulary for each property is to reduce ambiguity. Because the DCMI Metadata Terms vocabulary formally defines the meaning of description
, a machine reading dct:description
can differentiate this from other meanings of description
in other vocabularies.
Given a set of result proxies, a client can use their properties to find media players and assets depicting them, as described in the next section.
3. Find the media
Finding the media linked to a resource requires some understanding of how Acropolis organises its index. Acropolis uses two main properties to associate resources with media:
- mrss:player
This links a resource to a player: an HTML page (typically) which embeds some media, and which may provide controls for playing it and/or (optionally) some metadata about it (e.g. caption, copyright declaration, links to alternative sizes or formats). An iPlayer page is an example of a player, as are a YouTube video page and a Flickr image page.
The player for a piece of media may prompt for authentication, to prevent access by unauthorised users. (Note that Acropolis doesn’t require any authentication and relies on the server hosting the player to do this.)
- mrss:content
This links a resource to a digital asset (e.g. an audio, image or video file). The asset can be directly embedded into an application through its URI. For example, this resource has an
mrss:content
statement in the index:
mrss:content http://remarc-assets.pilots.bbcconnectedstudio.co.uk/images/1980_TV&Radio_CrackerjackA.jpg
The URI is a direct link to a JPEG image asset which can be embedded directly in an application. As with players, Acropolis doesn’t enforce any access control; but a client can expect that an asset linked to using
mrss:content
has no access controls and can be freely used (with the caveat that its copyright must be respected).
For the purposes of finding media, a client can look for mrss:player
and mrss:content
statements attached to any proxies it has retrieved. Unfortunately, the tadpole proxy we have at the moment has no mrss:player
or mrss:content
statements. However, in such cases, we can fetch more proxies which are related to the resources we already have, in the hope that they will have mrss:player
or mrss:content
properties.
The most relevant are likely to be those associated with the proxy through rdfs:seeAlso
statements: these resources have some kind of relationship with the proxy, such as representing the same entity, having the proxy as a subject, or being about the proxy.
For example, the first rdfs:seeAlso
link for the tadpole proxy goes to this resource…
…which is a photograph of tadpoles.
The photo relates to our original tadpole proxy via rdfs:seeAlso
. This is because the photo is about a Tadpole, as defined by DBPedia resource…
dct:subject http://dbpedia.org/resource/Tadpole
…and our original tadpole proxy is the same as (owl:sameAs
) the Tadpole, as defined by DBPedia resource…
http://acropolis.org.uk/4c56607b4e1b4fe8ac29a1ad1b245246#id owl:sameAs http://dbpedia.org/resource/Tadpole
In diagrammatic form, the relationships look like this:
As you can see, there is an mrss:player
statement on the photo proxy:
mrss:player http://bbcimages.acropolis.org.uk/14605105/player
This URI is for a player page which displays an image of tadpoles (you may need to accept a licence agreement before you can see it).
And, finally, we’ve achieved our goal of finding an image of some tadpoles.
Conclusions
The above only covers a small part of the capabilities of the Acropolis API, but should provide some clues to finding useful media before you write any code. The same steps can be used programmatically to find media, the main difference being that your code would fetch and parse RDF rather than HTML. But that’s a topic for another article.
Some developers have already built applications which use data from Acropolis, using techniques similar to these. You can find out more about them on the RES website.
If you need any assistance getting started with the Acropolis API, please contact the RES team.