OAI-PMH (Open Archive Initiative - Protocol Metadata Harvesting) is an international agreement to request metadata from a repository.
What is OAI-PMH?
In brief, OAI-PMH provides a set of services that enables exposure and harvesting of repository metadata. The protocol is comprised of six verbs that specify the service being invoked, they are:
- Identify - used to retrieve information about the repository.
- ListIdentifiers - used to retrieve record headers from the repository.
- ListRecords - used to harvest full records from the repository.
- ListSets - used to retrieve the set structure of the repository.
- ListMetadataFormats - lists available metadata formats that the repository can disseminate.
- GetRecord - used to retrieve an individual record from the repository.
Selective harvesting can be performed by the use of accompanying parameters. Available parameters are:
- identifier - specifies a specific record identifier.
- metadataPrefix - specifies the metadata format that the records will be returned in.
- set - specifies the set that returned records must belong to.
- from - specifies that records returned must have been created/update/deleted on or after this date.
- until - specifies that records returned must have been created/update/deleted on or before this date.
- resumptionToken - a token previously provided by the server to resume a request where it last left off.
The verbs and parameters can be combined to issue requests to the service.
|Comprehensive tool for validating and exploring OAI-PMH repositories.
The oai identifier is the unique identification for a metadata record, not the learning material. Repositories offering metadata will have to ensure an unambiguous reference (“unique identifier”) to metadata associated with educational content. Based on this unambiguous reference, the metadata collecting application can determine whether the data supplied from the repositories offering metadata concerns new learning resources or updates to learning resources. The following agreement contains the official specification:
The resumptionToken is used to retrieve the next page of a given selection. The initial selection is determined by the first request; the set, metadataPrefix, from and until arguments. The resumptionToken must therefore contain this information in addition to the page information. If there is no next page, the resumptionToken should not occur to prevent an infinite loop in the harvesting process.
For example, here we list one request and the associated resumptionToken (with a page size of 100):
<oai:resumptionToken cursor="0" completeListSize="605">100-1389654000-testset-lom</oai:resumptionToken>
The individual values are separated by a separator ('-'), and the from argument is first converted into a UNIX timestamp.
Here we list the call that uses this resumptionToken. In the call itself, set and metadataPrefix are no longer included because they are contained in resumptionToken:
<oai:resumptionToken cursor="100" completeListSize="605">200-1389654000-testset-lom</oai:resumptionToken>
Within OAI-PMH it is mandatory to track when the item was last modified (created, updated or deleted). A repository that offers metadata must therefore record the moment of change. The harvester application can be limited (with the from argument) to fetching only those items that have been modified since the last time the harvest was done.
Keeping metadata within repositories up to date is very important for successful use of the repository. OAI-PMH offers the offering repository the ability to indicate that an item has been deleted by adding the status="deleted" attribute to the record header. This offers the metadata collecting application the opportunity to clean up its own repository during harvesting.
This means that all identifiers that have ever been offered must remain retrievable forever. In other words, this also applies to records that have been permanently deleted in the offering repository.
<request verb="ListRecords" metadataPrefix="oldarXiv">http://an.oa.org/OAI-script</request>
<!-- no metadata or about of the record required -->