Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This requirement may be lifted in the context of usage data since currently there is no direct use for this format itself. Nevertheless it is strongly recommended to implement it anyway to comply with the requirements for a standards compatible OAI-PMH interface. It is advisable to offer the data at least as a rudimentary DC data set (identifier and description) which should describe the data offered and linked to by a certain identifier (see above regarding the identifier discussion).Example Warning: the XML excerpts given in these guidelines as illustrations do not necessarily contain all details regarding XML namespaces and XML schema. Nevertheless this omitted information is to be included in actual implementations and must not be considered optional.

Code Block
xml
xml
linenumbertrue
titleOAI-PMH record
collapsetrue

<?xml version="1.0" encoding="UTF-8"?>
<record>
    <header> ... (compare notes about the record header)</header>
    <metadata>
        <dc xmlns="http://www.openarchives.org/OAI/2.0/oai_dc/"
            xmlns:dc="http://purl.org/dc/elements/1.1/">
            <identifier>ID2</identifier>
            <description> Usage Event Data for Server ... from ... until ... </description>
        </dc>
    </

...

metadata>
</record>

...


<code}
\\

Also, the choice of identifiers imposes problems: According to the OAI-PMH specification, the identifier within the DC metadata set must link to the described document. When understood as being metadata, the data contained in one <contextobject> or in a <context-objects> aggregation is best described as being metadata of the usage events in a given time frame. Those usage events, however, regularly do not have their own identifiers yet. So in order to comply with DC requirements, too, identifiers have to be generated for those usage events as well (ID2 in the excerpt above). However, by now there seems to be no immediate use case for such identifiers. Therefore, in the context of these guidelines, offering DC metadata is not required.

...


* _Usage of Sets_ see OAI-PMH, 2.7.2 OAI-PMH optionally allows for structuring the offered data in "sets" to support selective harvesting of the data. Currently, this possibility is not further specified in these guidelines. Future refinements may use this feature, e.

...

&nbsp;g. for selecting usage data for certain services. Provenance information is already included in the Context Objects.

...


* _Datestamps, Granularity_ see OAI-PMH, 2.7.1 (also compare the notes about datestamps in the OAI-PMH record header versus datestamps within the Context Objects)The OAI-PMH specification allows for either exact-to-the-second or exact-to-the-day granularity for record header datestamps. The data providers may chose one of these possibilities. The service provider will most certainly rely on overlapping harvesting, i.

...

&nbsp;e. the most recent datestamp of the harvested data is used as the "from" parameter for the next OAI-PMH query. Thus, the data provider will provide some records that have been harvested before. Duplicate records are matched by their identifiers (those in the OAI-PMH record header) and are silently tossed if their datestamp is not renewed (see notes below on deletion tracking).It is strongly recommended to implement exact-to-the-second datestamps to keep redundancy of the transferred data as low as possible.

...


* _Deletion tracking_ OAI-PMH, 2.5.1 The OAI-PMH provides functionalities for the tracking of deletion of records. Compared to the classic use case of OAI-PMH (metadata of documents) the use case presented here falls into a category of data which is not subject to long-term storage. Thus, the tracking of deletion events does not seem critical since the data tracking deletions would summarize to a significant amount of data.However, the service provider will accept information about deleted records and will eventually delete the referenced information in its own data store. This way it is possible for data providers to do corrections (e.

...

&nbsp;g. in case of technical problems) on wrongly issued data.It is important to note that old data which rotates out of the data offered by the data provider due to its age will not to be marked as deleted for storage reasons. This kind of data is still valid usage data, but not visible anymore.The information about whether a data provider uses deletion tracking has to be provided in the response to the "identify" OAI-PMH query within the <deletedRecords> field. Currently, the only options are "transient" (when a data provider applies or reserves the possibility for marking deleted records) or "no".The possible cases are:

...


** Incorrect data which has already been offered by the data provider shall be corrected. There are two possibilities:

...


*** Re-issuing of a corrected set of data carrying the same identifier in the OAI-PMH record header as the set of data to be corrected, with an updated OAI-PMH record header datestamp.

...


*** When the correction is a full deletion of the incorrect issued data, the OAI-PMH record has to be re-issued without a Context Object payload, with specified "<deleted>" flag and updated datestamp in the OAI-PMH record header.

...


** Records that fall out of the time frame for which the data provider offers data: These records are silently neglected, i.

...

&nbsp;e. not offered via the OAI-PMH interface anymore, without using the deletion tracking features of OAI-PMH.

...


* _Metadata formats_ OAI-PMH, 3.4All data providers have to provide support for <context-object> documents or <context-objects> aggregations, respectively.This choice also has to be announced in the response to the "listMetadataFormats" query OAI-PMH, 4.4 by the data provider. While a specific "metadataPrefix" is not required, the information about "metadataNamespace" and "schema" is fixed

...

 for implementations:
{code:xml|collapse=true|linenumber=true|title=metadataPrefix}
<metadataFormat> <metadataPrefix>ctxo</metadataPrefix> <schema>[http://www.openurl.info/registry/docs/xsd/info:ofi/

...

]  fmt:xml:xsd:ctx</schema><metadataNamespace>info:ofi/fmt:xml:xsd:ctx</metadataNamespace></metadataFormat>
Info

Using OAI-PMH, the mandatory MetadataPrefix for UpenURL Context Objects will be: "ctxo"

  • Inclusion of Context Objects in OAI-PMH recordsCorresponding to the definition of XML encoded Context Objects as data format of the data exchanged via the OAI-PMH, the embedding is to be done conforming to the OAI-PMH:
Code Block
xml
xml
linenumbertrue
titlerecord 1
collapsetrue

<record>
    <header>
        <identifier>urn:uuid:fd23522e-c447-4801-9be4-c93c60a2d550

...

 </identifier>
        <datestamp>2009-06-02T14:10:02Z</datestamp>
    </header>
    <metadata>
        <context-objects xmlns="info:ofi/fmt:xml:xsd:ctx">
            <context-object> ...

...

 </context-object>
            <context-object> ... </context-object>
        </context-objects>
    </

...

metadata>
</record>

In the aforementioned example, the OAI-PMH record is identified by a UUID (in form of a URI). see RFC 4122When offering single <context-object> documents rather than an aggregation using <context-objects> containers like above, a conformal OAI-PMH record may look like the following:

Code Block
xml
xml
linenumbertrue
titlerecord 2
collapsetrue

<record>
    <header>
        <identifier>urn:uuid:fd23522e-c447-4801-9be4-c93c60a2d550

...

 </identifier>
        <datestamp>2009-06-02T14:10:02Z</datestamp>
    </header>
    <metadata>
        <context-object xmlns="info:ofi/fmt:xml:xsd:ctx"

...

 datestamp="2009-06-01T19:20:57Z"> ...
        </context-object>
    </

...

metadata>
</record>

5.2. SUSHI

OAI-PMH is a relatively light-weight protocol which does not allow for a bidirectional traffic. If a more reliable error-handling is required, the Standardised Usage Statistics Harvesting Initiative (SUSHI) must be used. SUSHI http://www.niso.org/schemas/sushi/ was developed by NISO (National Information Standards Organization) in cooperation with COUNTER. This document assumes that the communication between the aggregator and the usage data provider takes place as is explained in figure 1.

Figure 1.
The interaction commences when the log aggregator sends a request for a report about the daily usage of a certain repository. Two parameters must be sent as part of this request: (1) the date of the report and (2) the file name of the most recent robot filter. The filename that is mentioned in this request will be compared to the local filename. Four possible responses can be returned by the repository.

...