Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

Section
Column
width50%

Table of contents

Table of Contents
h1.

Document

information

| *

Column
width50%
Wiki Markup

Title:

*

DARE

use

of

OAI-PMH

\\ *


Subject:

*

Guidelines

for

the

use

of

OAI-PMH

within

the

DARE

programme

\\ *


Moderator:

*

Feijen,

Martin;

Vanderfeesten,

Maurice

\\ *


Version:

*

1.2

\\ *


Date

published:

* 

  2007-03-22

\\ *


Excerpt

*

:

{

Excerpt

}

Write

an

excerpt

here {excerpt}\\ \\

here



(Optional

information)

\\ *


Type:

* 

  Internal

report

\\ *


Format:

*

Text/richtext

\\ *


Identifier:

*

SURF

OZ

\\ *


Language:

* Eng \\ *Rights:*  Copyright Stichting SURF. The text of this document may be used freely, without permission of Stichting SURF. \\ *Tags:* {page-info:labels}Macro om labels te geven niet mogelijk in huidige installatie, wordt nog geupdate |

Eng
Rights:  Copyright Stichting SURF. The text of this document may be used freely, without permission of Stichting SURF.
Tags:

Document History

Date

Version

Owner

Changelog

PDF

2007-03-22

1.2

 

Small changes;
URL of the dare_didl schema changed to the KNAW
Set naming is not manditory, but preferred.
Resumption-token life span is set from at least 5 to at least 24 hours
Service window example has been removed, HTTP Error 503 will suffice to indicate a repository is down. This nice to have feature can be worked-out in the future.
In the namespace section a line has to be changed.

Download

PDF

2006-08-22

1.1

 

Additional agreements (to metadataprefix naming, datestamp format, set naming, deleted records, resumption token life span, harvest batch size, service window properties, adminEmail for error logging feedback, Prefix & namespace declaration, XML validation, Communication for Repository modification)

 

July 2006

1.0

 

First internal version presented to project managers

 

Abstract

The abstract describes what the application profile is about. It should contain a problem definition, the standards described by the application profile and the goal of the application profile.

...

Definitions and concepts: item, record and unique identifier


Wiki Markup\\ It is important to make a distinction between Item and Record. The protocol text states:
"...An item is conceptually a container that stores or dynamically generates metadata about a single resource in *multiple* formats, each of which can be harvested as [<span style="color: #0000ff"><span style="text-decoration: underline; ">records</span></span>|Record] via the as records via the OAI-PMH \ [...\]A record is metadata expressed in a *single* format. A record is returned in an XML-encoded byte stream in response to an OAI-PMH request for metadata from an item...\[bold added by MF\] \\
Within DARE, the XML-encoded stream is constructed according to the XML-Container specifications. These specifications are given below. \\
The unique identifier identifies an item within a repository. Do not confuse this identifier with the element dc:identifier in Dublin Core. The OAI identifier has a different function: it is used to extract metadata, whereas the DC identifier is used to extract the resource. Schematically: \\ \\ !aoi-pmh-1.JPG!\\ \\ !worddav7194c559f0334a1c27aa6a15f284ac4f.png|height=323,width=500!\\ \\ \\ \\ !aoi-pmh-2.PNG!\\ \\
Schematically:

Image Added

Image Added



Image Added

Additional agreements and recommendations

These additional agreements are based on the Open Archives Initiative Protocol for Metadata Harvesting - Version 2.0 found at http://www.openarchives.org/OAI/openarchivesprotocol.html.
These guidelines will provide additional agreements on the implementation of OAI-PHM for a smooth operation between repository and harvester in the DARE network
Metadataprefix naming, datestamp format, set naming, deleted records, resumption token life span, harvest batch size, service window properties, adminEmail for error logging feedback, Prefix & namespace declaration, XML validation, Communication for Repository modification.

MetadataPrefix naming

Look at: http://www.openarchives.org/OAI/openarchivesprotocol.html#MetadataNamespaces
OAI-PMH supports the dissemination of records in multiple metadata formats from a repository. The ListMetadataFormats request returns the list of all metadata formats.
metadataPrefix arguments are used in ListRecords, ListIdentifiers, and GetRecord requests to retrieve records, or the headers of records that include metadata in the format specified by the metadataPrefix.
For purposes of interoperability, repositories must disseminate Dublin Core, without any qualification. Therefore, the protocol reserves the metadataPrefix 'oai_dc', and the URL of a metadata schema for unqualified Dublin Core, which is http://www.openarchives.org/OAI/2.0/oai_dc.xsd. The corresponding XML namespace URI is http://www.openarchives.org/OAI/2.0/oai_dc/.

dare_didl

The DARE community supports the implementation of the metadataPrefix 'oai_dc' and the the metadataPrefix 'dare_didl'. The schema for the XMLcontainer dare_didl is located at http://www.repository.knaw.nl/web/dare_didl.xsd.
The corresponding XML namespace for dare_didl namespace URI currently is http://www.repository.knaw.nl/web/dare_didl.
Every DARE repository must support this 'dare_didl' metadata schema.
The specification of the dare_didl XMLcontainer can be found in the document XMLcontainer1.1.1.pdf. at the location: http://www.darenet.nl/upload.view/DARE_DIDL-XMLcontainter-Specification-v1.1.1.pdf

...

Look at: http://www.openarchives.org/OAI/openarchivesprotocol.html#Set
The OAI-PMH document says the folowing:
Repositories may organize items into sets. Set organization may be flat, i.e. a simple list, or hierarchical.
The DARE agreement is that DARE repositories support at least two type of sets. The 'dare' set and the 'keur' set. Both sets are flat and do not have any hierarchical structure.
The table below shows the highly preferred setName and setSpec that can be used for either set.

 

setName

setSpec*

The DARE set

DAREset

dare

The Keur set

Keurset

keur

*A harvester only uses the setSpec request to perform selective harvesting. The letters must be in smallcaps.

Set Content

The specific content of the setSpec is determined at the local repository.
A DARE repository using these kind of sets must conform to the following rules when inserting a record into one/both of these sets. DARE uses this guideline to globally define the content of both sets:

  • The DAREset contains records that must contain an object that is open accessable for a normal internet user. (What kind of objects/records is left to the local repository.)
  • The Keurset contains records, which is a collection of all the published work of the institutions leading scientists. These records do not have to contain any object. When it contains an object it does not have to be open access, but it is highly recommended.

Set Location

The DAREset and the Keurset can each be located at a different location/baseURL.

...

Look at: http://www.openarchives.org/OAI/openarchivesprotocol.html#DeletedRecords
If a record is no longer available then it is said to be deleted. Repositories must declare one of three levels of support for deleted records in the deletedRecord element of the Identify response:

  • no - the repository does not maintain information about deletions. A repository that indicates this level of support must not reveal a deleted status in any response.
  • persistent - the repository maintains information about deletions with no time limit. A repository that indicates this level of support must persistently keep track of the full history of deletions and consistently reveal the status of a deleted record over time.
  • transient - the repository does not guarantee that a list of deletions is maintained persistently or consistently. A repository that indicates this level of support may reveal a deleted status for records.


The DARE agreement requests the DARE repositories to use the option 'transient'. Also 'persistent' can be used. This option makes the harvester do an easier job to detect deleted records.
Goals: {yet to be filled in}
Use of transient: {yet to be filled in} {verval tijd, in what time does a repository not have to tell a harvester a record is deleted.}

Anchor
_Toc146962625
_Toc146962625
Resumption token life span

...

The batch size is the number of records a repository delivers to the harvester for one resumption token.
The agreement is that DARE repositories must set the batch size between 100 and 200 records.
Using this batch size for all DARE repositories will make the harvester operate at optimal performance.

Service window properties

Not a recommendation, but a nice to have feature is a Service window. A service window indicates when a repository is down, or schedulled to go down for maintenance in the Identify request. This will prevent the harvester to report unnessesary errors.
Also the service window provides information to the harvesters when it is the most appropriate time to harvest.
Currently the standard mandatory message to tell the harvesters the repository is down is with the http 503 error code. (Harvesters should nicely deal with this message)
In the future:
Information about the service window can be located in the <about> element in the Identify request. An appropriate format for the service window has not yet been developed. The DARE community is striving for to use International standards. At this time the most appropriate standard will be Calendar formats like vCal or iCal.

AdminEmail for error logging feedback

...

Code Block
xml
xml
 <OAI-PMH
	xmlns="http://www.openarchives.org/OAI/2.0/"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="
http://www.openarchives.org/OAI/2.0/
  http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"
>
  <...>
  <metadata>
    <didl:DIDL
      xmlns:didl="urn:mpeg:mpeg21:2002:02-DIDL-NS"
	xmlns:dii="urn:mpeg:mpeg21:2002:01-DII-NS"
	xmlns:dcterms="http://purl.org/dc/terms/"
	xsi:schemaLocation="
urn:mpeg:mpeg21:2002:02-DIDL-NS http://standards.iso.org/.../didl.xsd
urn:mpeg:mpeg21:2002:01-DII-NS  http://standards.iso.org/.../dii.xsd"
    >
      <...>
    </didl:DIDL>
  </metadata>
  </...>
<OAI-PMH>

...

\*According to the proclamation in the same document (\[\*[http://www.w3.org/TR/REC-xml-names/#ns-using\*|http://www.w3.org/TR/REC-xml-names/#ns-using*]\|http://www.w3.org/TR/REC-xml-names/#ns-using\]), the DARE agreement will be that it is also possible to declare prefixes and namespaces in the ancestors of the document.\*

"

Anchor
nsc-NSDeclared
nsc-NSDeclared
The namespace prefix, unless it is xml or xmlns, MUST have been declared in a namespace declaration attribute in either the start-tag of the element where the prefix is used or in an ancestor element (i.e. an element in whose content the prefixed markup occurs)."


Example of the optional , but not recommended uses of prefixes and namespaces.

...

For the <OAI-PMH> schema use:

Code Block
xml
xml
 
<OAI-PMH
            xmlns="http://www.openarchives.org/OAI/2.0/"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
            http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"
>

For the <didl:DIDL> schema use:

Code Block
xml
xml
 
<didl:DIDL
diext:DIDcreated="YYYY-MM-DDThh:mm:ssZ"
            xmlns:didl="urn:mpeg:mpeg21:2002:02-DIDL-NS"
            xmlns:dcterms="http://purl.org/dc/terms/"
            xmlns:diext="http://library.lanl.gov/2004-04/STB-RL/DIEXT"
            xmlns:dii="urn:mpeg:mpeg21:2002:01-DII-NS"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="
                        urn:mpeg:mpeg21:2002:02-DIDL-NS
                        http://purl.lanl.gov/STB-RL/schemas/2004-08/DIDL.xsd
                        urn:mpeg:mpeg21:2002:01-DII-NS
                        http://purl.lanl.gov/STB-RL/schemas/2003-09/DII.xsd
                        http://library.lanl.gov/2004-04/STB-RL/DIEXT
                        http://purl.lanl.gov/STB-RL/schemas/2004-04/DIEXT.xsd"
>

For the <oai_dc:dc> schema use:

Code Block
xml
xml
 
<oai_dc:dc
            xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xsi:schemaLocation="
                        http://www.openarchives.org/OAI/2.0/oai_dc/
                        http://www.openarchives.org/OAI/2.0/oai_dc.xsd
                        http://purl.org/dc/elements/1.1/
                        http://dublincore.org/schemas/xmls/simpledc20021212.xsd"
>

For the <dare_qdc:qdc> schema use:

Code Block
xml
xml
 
<dare_qdc:qdc
            xmlns:dare_qdc="http://dare.nl/dare_qdc:/2.0"
            xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="
                        http://www.openarchives.org/OAI/2.0/oai_dc/
                        http://www.openarchives.org/OAI/2.0/oai_dc.xsd
                        http://purl.org/dc/elements/1.1/
                        http://dublincore.org/schemas/xmls/simpledc20021212.xsd"
>

For the <qdc:dc> schema use (not tested):

Code Block
xml
xml
 
<container_qdc:qualifieddc
            xmlns:container_qdc="urn:dc:qdc:container"
            xmlns:qdc="http://purl.org/dc/elements/1.1/"
            xmlns:dterms="http://purl.org/dc/terms/"
            xmlns:dcmitype="http://purl.org/dc/dcmitype/"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="
                       urn:dc:qdc:container
http://dublincore.org/schemas/xmls/qdc/2003/04/02/qualifieddc.xsd

                       http://purl.org/dc/elements/1.1/
http://dublincore.org/schemas/xmls/qdc/2003/04/02/dc.xsd

                       http://purl.org/dc/terms/
http://dublincore.org/schemas/xmls/qdc/2003/04/02/dcterms.xsd

                       http://purl.org/dc/dcmitype/
http://dublincore.org/schemas/xmls/qdc/2003/04/02/dcmitype.xsd"
>

...

When a DARE repository modifies either baseURL, setSpec, metadataPrefix or metadata schema's, that influences the DARE content cycle then: the concerning repository administrator must report this to the DARE community and the DARE harvester administrator in particular.

Anchor
_Toc146962632
_Toc146962632
HTTP request format

...