Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


21. Aspects to be recorded


A usage event takes place when a user downloads a document which is managed in an institutional repository, or when a user view the metadata that is associated with this document, see also figure 1.

...

Data element

Description

IP-address of requestor

Providing the full IP-address is not permitted by international copyright laws. For this reason, the IP-address needs to be encrypted.

C-class Subnet

When the IP-address is encrypted, this will have the disadvantage that information on the geographic location, for instance, can no longer be derived. For this reason, the Class subnet, being the first three most significant bits from the IP-address must also be provided.

Geographic location

The country from which the request originated is also provided explicitly.

Persistent identifier of requested document

See also section 4.

URL of document

See also section 4.

Date and time of the request

 

Request Type

It must be clear if a document was downloaded or if its metadata was viewed.

Host name

The institution that is responsible for the repository in which the requested document is stored.

Usage event ID

Unique number for a specific usage event.

Referrer URL

The URL which was received from the referring entity, if it was used. This URL often contains the search terms that were typed in by the user.

Referrer name

A classification of the referrer, based on a short list of search engines.



32. Source of information


All Dutch repositories make use of Apache server software for the maintenance of their repository websites. Each usage event that takes place generates an entry in the Apache logging files. These logging files will be used in the SURF SURE project as the primary source of information for usage statistics.

...

From figure 2, it can be seen that the aspects which were mentioned at the end of section 2 can normally be derived relatively easy from the log file.


43. Formatting guidelines


Different institutions may also have configured the logging facilities of their servers in different ways. Because of this, there may occasionally be variations in way in which, for instance, the time and the date are formatted. To avoid problems arising from variations in formatting, this section provides guidelines on the format in which the mandatory and optional data element need to be provided. A general principle, however, is that information should be passed along to the central database as 'pure' as possible, so that analysis can take place centrally and consistently.

...

Geographic location

Description

The country from which the request was

Usage

Mandatory

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="a25f81c8ec5776bb-4786ee7f-46d54a49-95ffa25e-c9d2391162f35d7cc32e6818"><ac:plain-text-body><![CDATA[

Format

A two-letter code in lower case, following the ISO 3166-1-alpha-2 code.[[2]

http://www.surffoundation.nl/#_ftn2]

]]></ac:plain-text-body></ac:structured-macro>

Example

ne

...

Usage Event ID

Description

Unique identification of the usage event. This identification will be generated, and it can not be derived from the Apache log file.

Usage

Mandatory

Format

The identifier will be formed by combining the item, the date and a three-letter code for the institution. Next, this identifier will be encrypted using MD5, so that the identifier becomes a 32-digit number (hexadecimal notation).

Example

b06c0444f37249a0a8f748d3b823ef2a


54. Normalisation-       

  • The SURE Statistics project will attempt to restrict its focus to requests which have consciously been initiated by human users. For this reason, automated visits by internet robots must be filtered from the data as much as possible. The Log Aggregator must maintain a file which list the names of internet robots that individual repositories must use during the filtering of their results. The name of this file must indicate its version. The name of the first file that will be published will be robots-v1.xml. Repositories can use the version indication in the filename to check if they are working with the most recent list of internet robots.

...

  • If a single user clicks repeatedly on the same document within the same 24 hours, this should be counted as a single request.

...

  • One single publication may be split into a set different files. The impact of such variations in the organisation of complex objects must be nullified. The consultation of a part should count towards the statistic of the whole. It should make no difference if a publication consists of one pdf-files or of multiple pdf-files.


6. Data format

Wiki Markup
In compliance with the JISC Usage Statistics Review, individual usage events need to be serialized in XML using the syntax that is specified in the OpenURL Context Objects schema.\[[3]\|http://www.surffoundation.nl/#_ftn3\] This section will describe a recommended practice for the use of this schema.

...

  • Under <referent>, the two identifiers for the requested document must both be given in separate <identifier> elements.
  • Element <referring-entitity> contains information on the referrer. The URL that was received from the referrer and the classification of the search engine, if it was used, must both be given in an <identifier> element.
  • The <requester>, the agent who has requested the <refererent> must be identified by providing the C-class Subnet, and the encrypted IP-address must both be given in separate <identifier>s. In addition, the name of the country where the request was initiated must be provided. The <metadata-by-val> element must be used for this purpose. The country must be given in <dcterms:spatial>. The dcterms namespace must be declared in the <format> element as well.
  • The DC metadata term "type" is used to clarify whether the usage event involved a download of a object file or a metadata view.
  • Wiki Markup
    <span style="color: black">Finally, an &lt;identifier&gt; for the institution that provided access to the downloaded document must be given within &lt;resolver&gt;.</span>\\
    \----\[[1]\|http://www.surffoundation.nl/#_ftnref1\] [http://www.jisc.ac.uk/whatwedo/programmes/digitalrepositories2007/usagestatisticsreview.aspx]
    \[[2]\|http://www.surffoundation.nl/#_ftnref2\] [http://www.iso.org/iso/english_country_names_and_code_elements]
    \[[3]\|http://www.surffoundation.nl/#_ftnref3\] The XML Schema for XML Context Objects can be accessed at [http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:ctx|http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:ctx]