Child pages
  • DARE use of OAI-PMH

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Section
Column
width50%

Table of contents

Table of Contents
Column
width50%
Wiki Markup


h1. Document information

| *Title:* DARE use of OAI-PMH \\
*Subject:* Guidelines for the use of OAI-PMH within the DARE programme \\
*Moderator:* Feijen, Martin; Vanderfeesten, Maurice \\
*Version:* 1.2 \\
*Date published:*  2007-03-22 \\
*Excerpt*:{excerpt} Write an excerpt here {excerpt}\\
\\
(Optional information) \\
*Type:*  Internal report \\
*Format:* Text/richtext \\
*Identifier:* SURF OZ \\
*Language:* Eng \\
*Rights:*  Copyright Stichting SURF. The text of  this document may be used freely,  without permission of Stichting SURF. \\
*Tags:* {page-info:labels}\\ |

Document History

Date

Version

Owner

Changelog

Image Modified PDF

2007-03-22

1.2

 

Small changes;
URL of the dare_didl schema changed to the KNAW
Set naming is not manditory, but preferred.
Resumption-token life span is set from at least 5 to at least 24 hours
Service window example has been removed, HTTP Error 503 will suffice to indicate a repository is down. This nice to have feature can be worked-out in the future.
In the namespace section a line has to be changed.

Download

2006-08-22

1.1

 

Additional agreements (to metadataprefix naming, datestamp format, set naming, deleted records, resumption token life span, harvest batch size, service window properties, adminEmail for error logging feedback, Prefix & namespace declaration, XML validation, Communication for Repository modification)

 

July 2006

1.0

 

First internal version presented to project managers

 

Abstract

The abstract describes what the application profile is about. It should contain a problem definition, the standards described by the application profile and the goal of the application profile.

...


It is important to make a distinction between Item and Record. The protocol text states:
"...An item is conceptually a container that stores or dynamically generates metadata about a single resource in multiple formats, each of which can be harvested as records via the OAI-PMH [...]A record is metadata expressed in a single format. A record is returned in an XML-encoded byte stream in response to an OAI-PMH request for metadata from an item...[bold added by MF]
Within DARE, the XML-encoded stream is constructed according to the XML-Container specifications. These specifications are given below.
The unique identifier identifies an item within a repository. Do not confuse this identifier with the element dc:identifier in Dublin Core. The OAI identifier has a different function: it is used to extract metadata, whereas the DC identifier is used to extract the resource.
Schematically:







Additional agreements and recommendations

These additional agreements are based on the Open Archives Initiative Protocol for Metadata Harvesting - Version 2.0 found at http://www.openarchives.org/OAI/openarchivesprotocol.html.
These guidelines will provide additional agreements on the implementation of OAI-PHM for a smooth operation between repository and harvester in the DARE network
Metadataprefix naming, datestamp format, set naming, deleted records, resumption token life span, harvest batch size, service window properties, adminEmail for error logging feedback, Prefix & namespace declaration, XML validation, Communication for Repository modification.

MetadataPrefix naming

Look at: http://www.openarchives.org/OAI/openarchivesprotocol.html#MetadataNamespaces
OAI-PMH supports the dissemination of records in multiple metadata formats from a repository. The ListMetadataFormats request returns the list of all metadata formats.
metadataPrefix arguments are used in ListRecords, ListIdentifiers, and GetRecord requests to retrieve records, or the headers of records that include metadata in the format specified by the metadataPrefix.
For purposes of interoperability, repositories must disseminate Dublin Core, without any qualification. Therefore, the protocol reserves the metadataPrefix 'oai_dc', and the URL of a metadata schema for unqualified Dublin Core, which is http://www.openarchives.org/OAI/2.0/oai_dc.xsd. The corresponding XML namespace URI is http://www.openarchives.org/OAI/2.0/oai_dc/.

dare_didl

The DARE community supports the implementation of the metadataPrefix 'oai_dc' and the the metadataPrefix 'dare_didl'. The schema for the XMLcontainer dare_didl is located at http://www.repository.knaw.nl/web/dare_didl.xsd.
The corresponding XML namespace for dare_didl namespace URI currently is http://www.repository.knaw.nl/web/dare_didl.
Every DARE repository must support this 'dare_didl' metadata schema.
The specification of the dare_didl XMLcontainer can be found in the document XMLcontainer1.1.1.pdf. at the location: http://www.darenet.nl/upload.view/DARE_DIDL-XMLcontainter-Specification-v1.1.1.pdf

...

Look at: http://www.openarchives.org/OAI/openarchivesprotocol.html#Set
The OAI-PMH document says the folowing:
Repositories may organize items into sets. Set organization may be flat, i.e. a simple list, or hierarchical.
The DARE agreement is that DARE repositories support at least two type of sets. The 'dare' set and the 'keur' set. Both sets are flat and do not have any hierarchical structure.
The table below shows the highly preferred setName and setSpec that can be used for either set.

 

setName

setSpec*

The DARE set

DAREset

dare

The Keur set

Keurset

keur

*A harvester only uses the setSpec request to perform selective harvesting. The letters must be in smallcaps.

Set Content

The specific content of the setSpec is determined at the local repository.
A DARE repository using these kind of sets must conform to the following rules when inserting a record into one/both of these sets. DARE uses this guideline to globally define the content of both sets:

  • The DAREset contains records that must contain an object that is open accessable for a normal internet user. (What kind of objects/records is left to the local repository.)
  • The Keurset contains records, which is a collection of all the published work of the institutions leading scientists. These records do not have to contain any object. When it contains an object it does not have to be open access, but it is highly recommended.

Set Location

The DAREset and the Keurset can each be located at a different location/baseURL.

...

Look at: http://www.openarchives.org/OAI/openarchivesprotocol.html#DeletedRecords
If a record is no longer available then it is said to be deleted. Repositories must declare one of three levels of support for deleted records in the deletedRecord element of the Identify response:

  • no - the repository does not maintain information about deletions. A repository that indicates this level of support must not reveal a deleted status in any response.
  • persistent - the repository maintains information about deletions with no time limit. A repository that indicates this level of support must persistently keep track of the full history of deletions and consistently reveal the status of a deleted record over time.
  • transient - the repository does not guarantee that a list of deletions is maintained persistently or consistently. A repository that indicates this level of support may reveal a deleted status for records.


The DARE agreement requests the DARE repositories to use the option 'transient'. Also 'persistent' can be used. This option makes the harvester do an easier job to detect deleted records.
Goals: {yet to be filled in}
Use of transient: {yet to be filled in} {verval tijd, in what time does a repository not have to tell a harvester a record is deleted.}

Anchor
_Toc146962625
_Toc146962625
Resumption token life span

...

The batch size is the number of records a repository delivers to the harvester for one resumption token.
The agreement is that DARE repositories must set the batch size between 100 and 200 records.
Using this batch size for all DARE repositories will make the harvester operate at optimal performance.

Service window properties

Not a recommendation, but a nice to have feature is a Service window. A service window indicates when a repository is down, or schedulled to go down for maintenance in the Identify request. This will prevent the harvester to report unnessesary errors.
Also the service window provides information to the harvesters when it is the most appropriate time to harvest.
Currently the standard mandatory message to tell the harvesters the repository is down is with the http 503 error code. (Harvesters should nicely deal with this message)
In the future:
Information about the service window can be located in the <about> element in the Identify request. An appropriate format for the service window has not yet been developed. The DARE community is striving for to use International standards. At this time the most appropriate standard will be Calendar formats like vCal or iCal.

AdminEmail for error logging feedback

...

For the <OAI-PMH> schema use:

Code Block
xml
xml

<OAI-PMH
            xmlns="http://www.openarchives.org/OAI/2.0/"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
            http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"
>

For the <didl:DIDL> schema use:

Code Block
xml
xml

<didl:DIDL
diext:DIDcreated="YYYY-MM-DDThh:mm:ssZ"
            xmlns:didl="urn:mpeg:mpeg21:2002:02-DIDL-NS"
            xmlns:dcterms="http://purl.org/dc/terms/"
            xmlns:diext="http://library.lanl.gov/2004-04/STB-RL/DIEXT"
            xmlns:dii="urn:mpeg:mpeg21:2002:01-DII-NS"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="
                        urn:mpeg:mpeg21:2002:02-DIDL-NS
                        http://purl.lanl.gov/STB-RL/schemas/2004-08/DIDL.xsd
                        urn:mpeg:mpeg21:2002:01-DII-NS
                        http://purl.lanl.gov/STB-RL/schemas/2003-09/DII.xsd
                        http://library.lanl.gov/2004-04/STB-RL/DIEXT
                        http://purl.lanl.gov/STB-RL/schemas/2004-04/DIEXT.xsd"
>

For the <oai_dc:dc> schema use:

Code Block
xml
xml

<oai_dc:dc
            xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xsi:schemaLocation="
                        http://www.openarchives.org/OAI/2.0/oai_dc/
                        http://www.openarchives.org/OAI/2.0/oai_dc.xsd
                        http://purl.org/dc/elements/1.1/
                        http://dublincore.org/schemas/xmls/simpledc20021212.xsd"
>

For the <dare_qdc:qdc> schema use:

Code Block
xml
xml

<dare_qdc:qdc
            xmlns:dare_qdc="http://dare.nl/dare_qdc:/2.0"
            xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="
                        http://www.openarchives.org/OAI/2.0/oai_dc/
                        http://www.openarchives.org/OAI/2.0/oai_dc.xsd
                        http://purl.org/dc/elements/1.1/
                        http://dublincore.org/schemas/xmls/simpledc20021212.xsd"
>

For the <qdc:dc> schema use (not tested):

Code Block
xml
xml

<container_qdc:qualifieddc
            xmlns:container_qdc="urn:dc:qdc:container"
            xmlns:qdc="http://purl.org/dc/elements/1.1/"
            xmlns:dterms="http://purl.org/dc/terms/"
            xmlns:dcmitype="http://purl.org/dc/dcmitype/"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="
                       urn:dc:qdc:container
http://dublincore.org/schemas/xmls/qdc/2003/04/02/qualifieddc.xsd

                       http://purl.org/dc/elements/1.1/
http://dublincore.org/schemas/xmls/qdc/2003/04/02/dc.xsd

                       http://purl.org/dc/terms/
http://dublincore.org/schemas/xmls/qdc/2003/04/02/dcterms.xsd

                       http://purl.org/dc/dcmitype/
http://dublincore.org/schemas/xmls/qdc/2003/04/02/dcmitype.xsd"
>

...

When a DARE repository modifies either baseURL, setSpec, metadataPrefix or metadata schema's, that influences the DARE content cycle then: the concerning repository administrator must report this to the DARE community and the DARE harvester administrator in particular.

Anchor
_Toc146962632
_Toc146962632
HTTP request format

...