Child pages
  • KE Usage Statistics Guidelines

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

OAI-PMH repositories must be able to provide records with metadata expressed in Dublin Core. As a minimum, a rudimentary DC data set (identifier and description) should be provided which should describe the data offered and linked to by a certain identifier (see above regarding the identifier discussion). For creating a DC data set, follow the DRIVER guidelines. Example Warning: the XML excerpts given in these guidelines as illustrations do not necessarily contain all details regarding XML namespaces and XML schema. Nevertheless this omitted information is to be included in actual implementations and must not be considered optional.


 

Code Block
xml
xml
linenumbertrue
titleOAI-PMH listRecords metadataPrefix=oai_dc
collapsetruexml
<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH>
...
<record>
    <header> ... (compare notes about the record header)</header>
    <metadata>
        <dc xmlns="http://www.openarchives.org/OAI/2.0/oai_dc/"
            xmlns:dc="http://purl.org/dc/elements/1.1/">
            <identifier>ID2</identifier>
            <description> Usage Event Data for Server ... from ... until ... </description>
        </dc>
    </metadata>
</record>
...
</OAI-PMH>

...

All data providers have to provide support for <context-objects> aggregations. While a specific "metadataPrefix" is not required, the information about "metadataNamespace" and "schema" is fixed for implementations:

Code Block
xml
xml
linenumbertrue
titleOAI-PMH listMetadataFormats
collapsetruexml
<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH>
...
<metadataFormat>
  <metadataPrefix>ctxo</metadataPrefix>
  <schema>http://www.openurl.info/registry/docs/xsd/info:ofi/fmt:xml:xsd:ctx</schema>
  <metadataNamespace>info:ofi/fmt:xml:xsd:ctx</metadataNamespace>
</metadataFormat>
...
</OAI-PMH>

...

Corresponding to the definition of XML encoded Context Objects as data format of the data exchanged via the OAI-PMH, the embedding is to be done conforming to the OAI-PMH:

Code Block
xml
xml
linenumbertrue
titlemethod 1 : all Context Objects in one OAI-PMH record : OAI-PMH listRecords metadataPrefix=ctxo
collapsetruexml
<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH>
...
<record>
    <header>
        <identifier>urn:uuid:e5d037a0-633c-11df-a08a-0800200c9a66</identifier>
        <datestamp>2009-06-02T14:10:02Z</datestamp>
    </header>
    <metadata>
        <context-objects xmlns="info:ofi/fmt:xml:xsd:ctx">
            <context-object datestamp="2009-06-01T19:20:57Z">
              ...
            </context-object>
            <context-object datestamp="2009-06-01T19:21:07Z">
              ...
            </context-object>
        </context-objects>
    </metadata>
</record>
...
</OAI-PMH>

...

This request will active a special tool that can inspect the server logging and that can return the requested data. These data are transferred as OpenURL Context Object log entries, as part of a SUSHI response.
The reponse must repeat all the information from the request, and provide the requested report as XML payload
The usage data are subsequently stored in a central database. External parties can obtain information about the contents of this central database through specially developed web services. The log harvester must ultimately expose these data in the form of COUNTER-compliant reports.
Listing 1 is an example of a SUSHI request, sent from the log aggregator to a repository.

Code Block
xml
xml
titleListing 2
linenumberstrue
collapsetruexml
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://schemas.xmlsoap.org/soap/envelope/ http://schemas.xmlsoap.org/soap/envelope/">
 <soap:Body>
  <ReportRequest
   xmlns:ctr="http://www.niso.org/schemas/sushi/counter"
   xsi:schemaLocation="http://www.niso.org/schemas/sushi/counter
   http://www.niso.org/schemas/sushi/counter_sushi3_0.xsd"
   xmlns="http://www.niso.org/schemas/sushi"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
   <Requestor>
    <ID>www.logaggregator.nl</ID>
    <Name>Log Aggregator</Name>
    <Email>logaggregator@surf.nl</Email>
   </Requestor>
   <CustomerReference>
    <ID>www.leiden.edu</ID>
    <Name>Leiden University</Name>
   </CustomerReference>
   <ReportDefinition Release="urn:robots-v1.xml" Name="Daily Report v1">
    <Filters>
     <UsageDateRange>
      <Begin>2009-12-21</Begin>
      <End>2009-12-22</End>
     </UsageDateRange>
    </Filters>
   </ReportDefinition>
  </ReportRequest>
 </soap:Body>
</soap:Envelope>

...

Note that the intent of the SUSHI request above is to see all the usage events that have occurred on 21 December 2009. The SUSHI schema was originally developed for the exhchange of COUNTER-compliant reports. In the documentation of the SUSHI XML schema, it is explained that COUNTER usage is only reported at the month level. In SURE, only daily reports can be provided. Therefore, it will be assumed that the implied time on the date that is mentioned is 0:00. The request in the example that is given thus involves all the usage events that have occurred in between 2009-12-21T00:00:00 and 2002-12-22T00:00:00.
As explained previously, the repository can respond in four different ways. If the parameters of the request are valid, and if the requested report is available, the OpenURL ContextObjects will be sent immediately. The Open URL Context Objects will be wrapped into element <Report>, as can be seen in listing 2.

Code Block
xml
xml
linenumbertrue
titleListing 3
collapsetruexml
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
				xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
				xsi:schemaLocation="http://schemas.xmlsoap.org/soap/envelope/ http://schemas.xmlsoap.org/soap/envelope/">
	<soap:Body>
		<ReportResponse xmlns:ctr="http://www.niso.org/schemas/sushi/counter"
						xsi:schemaLocation="http://www.niso.org/schemas/sushi/counter http://www.niso.org/schemas/sushi/counter_sushi3_0.xsd"
						xmlns="http://www.niso.org/schemas/sushi"
						xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
			<Requestor>
				<ID>www.logaggregator.nl</ID>
				<Name>Log Aggregator</Name>
				<Email>logaggregator@surf.nl</Email>
			</Requestor>
			<CustomerReference>
				<ID>www.leiden.edu</ID>
				<Name>Leiden University</Name>
			</CustomerReference>
			<ReportDefinition Release="urn:DRv1" Name="Daily Report v1">
				<Filters>
					<UsageDateRange>
						<Begin>2009-12-22</Begin>
						<End>2009-12-23</End>
					</UsageDateRange>
				</Filters>
			</ReportDefinition>
			<Exception>
				<Number>1</Number>
				<Message>The range of dates that was provided is not valid. Only daily reports are
				available.</Message>
			</Exception>
		</ReportResponse>
	</soap:Body>
</soap:Envelope>

...

If the begin date and the end date in the request of the log aggregator form a period that exceeds one day, an error message must be sent. In the SUSHI schema, such messages may be sent in an <Exception> element. Three types of errors can be distinguished. Each error type is given its own number. An human-readable error message is provided under <Message>.


 

Code Block
xml
xml
titleListing 4
linenumberstrue
collapsetruexml
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
				xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
				xsi:schemaLocation="http://schemas.xmlsoap.org/soap/envelope/ http://schemas.xmlsoap.org/soap/envelope/">
	<soap:Body>
		<ReportResponse xmlns:ctr="http://www.niso.org/schemas/sushi/counter"
						xsi:schemaLocation="http://www.niso.org/schemas/sushi/counter http://www.niso.org/schemas/sushi/counter_sushi3_0.xsd"
						xmlns="http://www.niso.org/schemas/sushi"
						xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
			<Requestor>
				<ID>www.logaggregator.nl</ID>
				<Name>Log Aggregator</Name>
				<Email>logaggregator@surf.nl</Email>
			</Requestor>
			<CustomerReference>
				<ID>www.leiden.edu</ID>
				<Name>Leiden University</Name>
			</CustomerReference>
			<ReportDefinition Release="urn:DRv1" Name="Daily Report v1">
				<Filters>
					<UsageDateRange>
						<Begin>2009-12-22</Begin>
						<End>2009-12-23</End>
					</UsageDateRange>
				</Filters>
			</ReportDefinition>
			<Exception>
				<Number>1</Number>
				<Message>The range of dates that was provided is not valid. Only daily reports are
				available.</Message>
			</Exception>
		</ReportResponse>
	</soap:Body>
</soap:Envelope>

...

A second type of error may be caused by the fact that the file that is mentioned in the request can not be accessed. In this situation, the response will look as follows:

Code Block
xml
xml
titleListing 5
linenumberstrue
collapsetruexml
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
				xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
				xsi:schemaLocation="http://schemas.xmlsoap.org/soap/envelope/ http://schemas.xmlsoap.org/soap/envelope/">
	<soap:Body>
		<ReportResponse xmlns:ctr="http://www.niso.org/schemas/sushi/counter"
						xsi:schemaLocation="http://www.niso.org/schemas/sushi/counter http://www.niso.org/schemas/sushi/counter_sushi3_0.xsd"
						xmlns="http://www.niso.org/schemas/sushi"
						xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
			<Requestor>
				<ID>www.logaggregator.nl</ID>
				<Name>Log Aggregator</Name>
				<Email>logaggregator@surf.nl</Email>
			</Requestor>
			<CustomerReference>
				<ID>www.leiden.edu</ID>
				<Name>Leiden University</Name>
			</CustomerReference>
			<ReportDefinition Release="urn:DRv1" Name="Daily Report v1">
				<Filters>
					<UsageDateRange>
						<Begin>2009-12-22</Begin>
						<End>2009-12-23</End>
					</UsageDateRange>
				</Filters>
			</ReportDefinition>
			<Exception>
				<Number>2</Number>
				<Message>The file describing the internet robots is not accessible.</Message>
			</Exception>
		</ReportResponse>
	</soap:Body>
</soap:Envelope>

...

When the repository is in the course of producing the requested report, a response will be sent that is very similar to listing 5. The estimated time of completion will be provided in the <Data> element. According to the documentation of the SUSHI XML schema, this element may be used for any other optional data.


 

Code Block
xml
xml
titleListing 6
linenumberstrue
collapsetruexml
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
				xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
				xsi:schemaLocation="http://schemas.xmlsoap.org/soap/envelope/ http://schemas.xmlsoap.org/soap/envelope/">
	<soap:Body>
		<ReportResponse xmlns:ctr="http://www.niso.org/schemas/sushi/counter"
						xsi:schemaLocation="http://www.niso.org/schemas/sushi/counter http://www.niso.org/schemas/sushi/counter_sushi3_0.xsd"
						xmlns="http://www.niso.org/schemas/sushi"
						xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
			<Requestor>
				<ID>www.logaggregator.nl</ID>
				<Name>Log Aggregator</Name>
				<Email>logaggregator@surf.nl</Email>
			</Requestor>
			<CustomerReference>
				<ID>www.leiden.edu</ID>
				<Name>Leiden University</Name>
			</CustomerReference>
			<ReportDefinition Release="urn:DRv1" Name="Daily Report v1">
				<Filters>
					<UsageDateRange>
						<Begin>2009-12-22</Begin>
						<End>2009-12-23</End>
					</UsageDateRange>
				</Filters>
			</ReportDefinition>
			<Exception>
				<Number>3</Number>
				<Message>The report is not yet available. The estimated time of completion is
				provided under "Data".</Message>
				<Data>2010-01-08T12:13:00+01:00</Data>
			</Exception>
		</ReportResponse>
	</soap:Body>
</soap:Envelope>

...

Info

In these guidelines the IP addresses are pseudonymized using a Salted MD5 hash encryption.

Appendices

Code Block
xml
xml
titleAppendix A: Sample OpenURL Context Object File
linenumberstrue
collapsetruexml
<?xml version="1.0" encoding="UTF-8"?>
<context-objects xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:dcterms="http://dublincore.org/documents/2008/01/14/dcmi-terms/"
        xmlns:sv="info:ofi/fmt:xml:xsd:sch_svc"
        xsi:schemaLocation="info:ofi/fmt:xml:xsd:ctx http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:ctx"
        xmlns="info:ofi/fmt:xml:xsd:ctx">
    <context-object timestamp="2009-07-29T08:15:46+01:00" identifier="b06c0444f37249a0a8f748d3b823ef2a">

        <referent>
            <identifier>https://openaccess.leidenuniv.nl/bitstream/1887/12100/1/Thesis.pdf</identifier>
            <identifier>http://hdl.handle.net/1887/12100</identifier>
        </referent>

        <referring-entity>
            <identifier>http://www.google.nl/search?hl=nl&amp;q=beleidsregels+artikel+4%3A84&amp;meta="</identifier>
            <identifier>info:sid/google</identifier>
        </referring-entity>

        <requester>
            <identifier> data:,b505e629c508bdcfbf2a774df596123dd001cee172dae5519660b6014056f53a</identifier>

            <metadata-by-val>
                <format>http://dini.de/namespace/oas-requesterinfo</format>
                <metadata>
                    <requesterinfo xmlns="http://dini.de/namespace/oas-requesterinfo">
                        <hashed-ip>data:,b505e629c508bdcfbf2a774df596123dd001cee172dae5519660b6014056f53a</hashed-ip>
                        <hashed-c>data:,d001cee172dae5519660b6014056f5346d05e629c508bdcfbf2a774df596123d</hashed-c>
                        <hostname>uni-saarland.de</hostname>
                        <classification>institutional</classification>
                        <hashed-session>660b14056f5346d0</hashed-session>
                        <user-agent>mozilla/5.0 (windows; u; windows nt 5.1; de; rv:1.8.1.1) gecko/20061204</user-agent>
                    </requesterinfo>
                </metadata>
            </metadata-by-val>
        </requester>

        <service-type>
            <metadata-by-val>
                <format>http://dublincore.org/documents/2008/01/14/dcmi-terms/</format>
                <metadata>
                    <dcterms:format>info:eu-repo/semantics/objectFile</dcterms:format>
                </metadata>
            </metadata-by-val>
        </service-type>

        <resolver>
            <identifier>http://www.worldcat.org/libraries/53238</identifier>
        </resolver>

        <referrer>
            <identifier>info:sid/dlib.org:dlib</identifier>
        </referrer>

    </context-object>
</context-objects>
Code Block
xml
xml
titleAppendix B: Schema for Robot filter List
linenumberstrue
collapsetruexml
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">

	<xs:element name="exclusions">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="sources"/>
				<xs:element ref="robot-list"/>
			</xs:sequence>
			<xs:attributeGroup ref="attlist.exclusions"/>
		</xs:complexType>
	</xs:element>

	<xs:attributeGroup name="attlist.exclusions">
		<xs:attribute name="version" type="xs:string"/>
		<xs:attribute name="datestamp" type="xs:date"/>
	</xs:attributeGroup>

	<xs:element name="sources">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="source" minOccurs="0" maxOccurs="unbounded"/>
			</xs:sequence>
		</xs:complexType>
	</xs:element>

	<xs:element name="source">
		<xs:complexType>
			<xs:simpleContent>
				<xs:extension base="xs:string">
					<xs:attribute name="id" type="xs:ID" use="required"/>
					<xs:attribute name="name" type="xs:string"/>
					<xs:attribute name="version" type="xs:string"/>
					<xs:attribute name="datestamp" type="xs:date"/>
				</xs:extension>
			</xs:simpleContent>
		</xs:complexType>
	</xs:element>

	<xs:element name="sourceRef">
		<xs:complexType>
			<xs:simpleContent>
				<xs:extension base="xs:string">
					<xs:attribute name="id" type="xs:IDREF" use="required"/>
				</xs:extension>
			</xs:simpleContent>
		</xs:complexType>
	</xs:element>

	<xs:element name="robot-list">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="useragent" minOccurs="0" maxOccurs="unbounded"/>
			</xs:sequence>
		</xs:complexType>
	</xs:element>

	<xs:element name="useragent">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="regEx"/>
				<xs:element ref="sourceRef" minOccurs="0" maxOccurs="unbounded"/>
			</xs:sequence>
		</xs:complexType>
	</xs:element>

	<xs:element name="regEx" type="xs:string"/>

</xs:schema>


Code Block
xml
xml
titleAppendix C: Sample Robot filter list
linenumberstrue
collapsetruexml
<?xml version="1.0" encoding="UTF-8"?>

<exclusions xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
			xsi:noNamespaceSchemaLocation="robotlist.xsd"
			version="1.0"
			datestamp="2010-04-10">

	<sources>
		<source id="l1" name="COUNTER" version="R3" datestamp="2010-04-01">COUNTER list of internet robotos</source>
		<source id="l2" name="PLOS">PLOS list of internet robotos</source>
	</sources>

	<robot-list>
		<useragent>
			<regEx>[^a]fish</regEx>
			<sourceRef id="l2"/>
		</useragent>
		<useragent>
			<regEx>[+:,\.\;\/-]bot</regEx>
			<sourceRef id="l2"/>
		</useragent>
		<useragent>
			<regEx>acme\.spider</regEx>
			<sourceRef id="l2"/>
		</useragent>
		<useragent>
			<regEx>Brutus\/AET</regEx>
			<sourceRef id="l1"/>
			<sourceRef id="l2"/>
		</useragent>
		<useragent>
			<regEx>Code\sSample\sWeb\sClient</regEx>
			<sourceRef id="l1"/>
		</useragent>

	</robot-list>

</exclusions>

...