Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note

Usage events are good to use as a control mechanism for the aggregation database to check if duplicate events are harvested. The provenance information of the repository can be found in the <resolver> element, see section 4.1.6. - Peter

Note

Some internet fora argue that the same MD5 hash can be produced by the same strings, thus not globally unique, and thus cannot be used as a primary key in a global system. SHA-1 is a better method for encrypting the content (better protection against IP address hacking??), some prefer GUID's/UUID's for better global uniqueness. Others say that MD5 collisions occur very rarely. http://stackoverflow.com/questions/221165/pros-and-cons-of-using-md5-hash-of-uri-as-the-primary-key-in-a-databaseImage Removed

Occurences of child elements in <context-object>

...

The data exchange between a data provider and a service provider should be a log aggregator may be based on the widely established OAI Protocol for Metadata Harvesting (OAI-PMH)http://www.openarchives. org/ OAI/openarchivesprotocol.html, referred to as OAI-PMH .
OAI-PMH was originally designed for the exchange of document metadata. Thus, this standard is mainly adapted in a specific way of handling a certain kind of metadata, as usage data does not meet the general requirements of typical formats used.
In principle, the protocol specifies a data synchronization synchronisation mechanism which supports a reliable implementation of one-way data synchronizationsynchronisation. This functionality also fits well for the purpose of usage data transfer.

The document-centric approach of OAI-PMH results in the following central problems when applied to usage data:

Requirement for metadata record identifiers (see OAI-PMH, 2.

...

4 )

Data providers must issue identifiers for data records to formally comply with OAI-PMH. These identifiers must be valid URIs. These identifiers are not used by the log aggregator.

Datestamps for records (see OAI-PMH, 2.7.1, also see below regarding OAI

...

datestamps)

OAI-PMH requires datestamps for all records of provided data. This information has to be kept separately from the datestamp of the usage event itself:

  • Datestamp within the usage data contained within the metadata part of the OAI record, i. e. within the Context Object's data: This is the time at which the actual usage event took place. Also see notes in the example data set given later on
  • Datestamp within the OAI-PMH record header: This is the time the Context Object or the Context Objects container has been stored in the database which feeds the OAI-PMH interface.

Mandated metadata in Dublin Core (DC) format

This requirement may be lifted in the context of usage data since currently there is no direct use for this format itself. Nevertheless it is strongly recommended to implement it anyway to comply with the requirements for a standards compatible OAI-PMH interface. It is advisable to offer the data at least as a rudimentary DC data set (identifier and description) which should describe the data offered and linked to by a certain identifier (see above regarding the identifier discussion).Example Warning: the XML excerpts given

...

in these guidelines as illustrations do not necessarily contain all details regarding XML namespaces and XML schema. Nevertheless this omitted information is to be included in actual implementations and must not be considered optional

...

.

<record> <header> ... (compare notes about the record header)</header> <metadata> <dc xmlns=http://www.openarchives.org/OAI/2.0/oai_dc/ xmlns:dc="http://purl.org/dc/elements/1.1/"> <identifier>ID2</identifier> <description> Usage Event Data for Server ... from ... until ... </description> </dc> </metadata></record>
Also, the choice of identifiers imposes problems: According to the OAI-PMH specification, the identifier within the DC metadata set must link to the described document. When understood as being metadata, the data contained in one <contextobject> or in a <context-objects> aggregation is best described as being metadata of the usage events in a given time frame. Those usage events, however, regularly do not have their own identifiers yet. So in order to comply with DC requirements, too, identifiers have to be generated for those usage events as well (ID2 in the excerpt above). However, by now there seems to be no immediate use case for such identifiers. Therefore, in the context of these guidelines, offering DC metadata is not required.

...