...
Section | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Document History
Date | Version | Owner | Changelog |
---|
2007-03-22 | 1.2 |
| Small changes; | |
2006-08-22 | 1.1 |
| Additional agreements (to metadataprefix naming, datestamp format, set naming, deleted records, resumption token life span, harvest batch size, service window properties, adminEmail for error logging feedback, Prefix & namespace declaration, XML validation, Communication for Repository modification) |
|
July 2006 | 1.0 |
| First internal version presented to project managers |
|
Abstract
The abstract describes what the application profile is about. It should contain a problem definition, the standards described by the application profile and the goal of the application profile.
...
It is important to make a distinction between Item and Record. The protocol text states:
"...An item is conceptually a container that stores or dynamically generates metadata about a single resource in multiple formats, each of which can be harvested as records via the OAI-PMH [...]A record is metadata expressed in a single format. A record is returned in an XML-encoded byte stream in response to an OAI-PMH request for metadata from an item...[bold added by MF]
Within DARE, the XML-encoded stream is constructed according to the XML-Container specifications. These specifications are given below.
The unique identifier identifies an item within a repository. Do not confuse this identifier with the element dc:identifier in Dublin Core. The OAI identifier has a different function: it is used to extract metadata, whereas the DC identifier is used to extract the resource.
Schematically:
Additional agreements and recommendations
These additional agreements are based on the Open Archives Initiative Protocol for Metadata Harvesting - Version 2.0 found at http://www.openarchives.org/OAI/openarchivesprotocol.html.
These guidelines will provide additional agreements on the implementation of OAI-PHM for a smooth operation between repository and harvester in the DARE network
Metadataprefix naming, datestamp format, set naming, deleted records, resumption token life span, harvest batch size, service window properties, adminEmail for error logging feedback, Prefix & namespace declaration, XML validation, Communication for Repository modification.
MetadataPrefix naming
Look at: http://www.openarchives.org/OAI/openarchivesprotocol.html#MetadataNamespaces
OAI-PMH supports the dissemination of records in multiple metadata formats from a repository. The ListMetadataFormats request returns the list of all metadata formats.
metadataPrefix arguments are used in ListRecords, ListIdentifiers, and GetRecord requests to retrieve records, or the headers of records that include metadata in the format specified by the metadataPrefix.
For purposes of interoperability, repositories must disseminate Dublin Core, without any qualification. Therefore, the protocol reserves the metadataPrefix 'oai_dc', and the URL of a metadata schema for unqualified Dublin Core, which is http://www.openarchives.org/OAI/2.0/oai_dc.xsd. The corresponding XML namespace URI is http://www.openarchives.org/OAI/2.0/oai_dc/.
dare_didl
The DARE community supports the implementation of the metadataPrefix 'oai_dc' and the the metadataPrefix 'dare_didl'. The schema for the XMLcontainer dare_didl is located at http://www.repository.knaw.nl/web/dare_didl.xsd.
The corresponding XML namespace for dare_didl namespace URI currently is http://www.repository.knaw.nl/web/dare_didl.
Every DARE repository must support this 'dare_didl' metadata schema.
The specification of the dare_didl XMLcontainer can be found in the document XMLcontainer1.1.1.pdf. at the location: http://www.darenet.nl/upload.view/DARE_DIDL-XMLcontainter-Specification-v1.1.1.pdf
...
Look at: http://www.openarchives.org/OAI/openarchivesprotocol.html#Set
The OAI-PMH document says the folowing:
Repositories may organize items into sets. Set organization may be flat, i.e. a simple list, or hierarchical.
The DARE agreement is that DARE repositories support at least two type of sets. The 'dare' set and the 'keur' set. Both sets are flat and do not have any hierarchical structure.
The table below shows the highly preferred setName and setSpec that can be used for either set.
| setName | setSpec* |
The DARE set | DAREset | dare |
The Keur set | Keurset | keur |
*A harvester only uses the setSpec request to perform selective harvesting. The letters must be in smallcaps.
Set Content
The specific content of the setSpec is determined at the local repository.
A DARE repository using these kind of sets must conform to the following rules when inserting a record into one/both of these sets. DARE uses this guideline to globally define the content of both sets:
- The DAREset contains records that must contain an object that is open accessable for a normal internet user. (What kind of objects/records is left to the local repository.)
- The Keurset contains records, which is a collection of all the published work of the institutions leading scientists. These records do not have to contain any object. When it contains an object it does not have to be open access, but it is highly recommended.
Set Location
The DAREset and the Keurset can each be located at a different location/baseURL.
...
Look at: http://www.openarchives.org/OAI/openarchivesprotocol.html#DeletedRecords
If a record is no longer available then it is said to be deleted. Repositories must declare one of three levels of support for deleted records in the deletedRecord element of the Identify response:
- no - the repository does not maintain information about deletions. A repository that indicates this level of support must not reveal a deleted status in any response.
- persistent - the repository maintains information about deletions with no time limit. A repository that indicates this level of support must persistently keep track of the full history of deletions and consistently reveal the status of a deleted record over time.
- transient - the repository does not guarantee that a list of deletions is maintained persistently or consistently. A repository that indicates this level of support may reveal a deleted status for records.
The DARE agreement requests the DARE repositories to use the option 'transient'. Also 'persistent' can be used. This option makes the harvester do an easier job to detect deleted records.
Goals: {yet to be filled in}
Use of transient: {yet to be filled in} {verval tijd, in what time does a repository not have to tell a harvester a record is deleted.}
Anchor | ||||
---|---|---|---|---|
|
...
The batch size is the number of records a repository delivers to the harvester for one resumption token.
The agreement is that DARE repositories must set the batch size between 100 and 200 records.
Using this batch size for all DARE repositories will make the harvester operate at optimal performance.
Service window properties
Not a recommendation, but a nice to have feature is a Service window. A service window indicates when a repository is down, or schedulled to go down for maintenance in the Identify request. This will prevent the harvester to report unnessesary errors.
Also the service window provides information to the harvesters when it is the most appropriate time to harvest.
Currently the standard mandatory message to tell the harvesters the repository is down is with the http 503 error code. (Harvesters should nicely deal with this message)
In the future:
Information about the service window can be located in the <about> element in the Identify request. An appropriate format for the service window has not yet been developed. The DARE community is striving for to use International standards. At this time the most appropriate standard will be Calendar formats like vCal or iCal.
AdminEmail for error logging feedback
...
For the <OAI-PMH> schema use:
Code Block | ||||
---|---|---|---|---|
| ||||
<OAI-PMH
xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"
>
|
For the <didl:DIDL> schema use:
Code Block | ||||
---|---|---|---|---|
| ||||
<didl:DIDL
diext:DIDcreated="YYYY-MM-DDThh:mm:ssZ"
xmlns:didl="urn:mpeg:mpeg21:2002:02-DIDL-NS"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:diext="http://library.lanl.gov/2004-04/STB-RL/DIEXT"
xmlns:dii="urn:mpeg:mpeg21:2002:01-DII-NS"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
urn:mpeg:mpeg21:2002:02-DIDL-NS
http://purl.lanl.gov/STB-RL/schemas/2004-08/DIDL.xsd
urn:mpeg:mpeg21:2002:01-DII-NS
http://purl.lanl.gov/STB-RL/schemas/2003-09/DII.xsd
http://library.lanl.gov/2004-04/STB-RL/DIEXT
http://purl.lanl.gov/STB-RL/schemas/2004-04/DIEXT.xsd"
>
|
For the <oai_dc:dc> schema use:
Code Block | ||||
---|---|---|---|---|
| ||||
<oai_dc:dc
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xsi:schemaLocation="
http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd
http://purl.org/dc/elements/1.1/
http://dublincore.org/schemas/xmls/simpledc20021212.xsd"
>
|
For the <dare_qdc:qdc> schema use:
Code Block | ||||
---|---|---|---|---|
| ||||
<dare_qdc:qdc
xmlns:dare_qdc="http://dare.nl/dare_qdc:/2.0"
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd
http://purl.org/dc/elements/1.1/
http://dublincore.org/schemas/xmls/simpledc20021212.xsd"
>
|
For the <qdc:dc> schema use (not tested):
Code Block | ||||
---|---|---|---|---|
| ||||
<container_qdc:qualifieddc
xmlns:container_qdc="urn:dc:qdc:container"
xmlns:qdc="http://purl.org/dc/elements/1.1/"
xmlns:dterms="http://purl.org/dc/terms/"
xmlns:dcmitype="http://purl.org/dc/dcmitype/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
urn:dc:qdc:container
http://dublincore.org/schemas/xmls/qdc/2003/04/02/qualifieddc.xsd
http://purl.org/dc/elements/1.1/
http://dublincore.org/schemas/xmls/qdc/2003/04/02/dc.xsd
http://purl.org/dc/terms/
http://dublincore.org/schemas/xmls/qdc/2003/04/02/dcterms.xsd
http://purl.org/dc/dcmitype/
http://dublincore.org/schemas/xmls/qdc/2003/04/02/dcmitype.xsd"
>
|
...
When a DARE repository modifies either baseURL, setSpec, metadataPrefix or metadata schema's, that influences the DARE content cycle then: the concerning repository administrator must report this to the DARE community and the DARE harvester administrator in particular.
Anchor | ||||
---|---|---|---|---|
|
...