Child pages
  • Search Retrieval via URL for HBO Kennisbank
Skip to end of metadata
Go to start of metadata

This application profile has a draft status. Be sure to check back regularly. Please direct any comments to the author of this page.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Introduction

Search/Retrieval via URL (SRU) is an search protocol standard maintained at the Library of Congress. SRU supports multiple transport mechanisms, amongst others by means of SOAP (SRU via HTTP SOAP) which in versions prior to SRU version 1.2 was known as its twin-protocol Search/Retrieve Web Service (SRW).
SRU has its origins in the Z39.50 protocol and implements Contextual Query Language (CQL, previously known as Common Query Language), which is based on Z39.50 semantics, as the query language.

This document describes the implementation of SRU by the HBO Kennisbank as of June 28 2011.

HBO Kennisbank

The HBO Kennisbank can be divided into roughly three distinct compontents:

  • The portal (http://www.hbo-kennisbank.nl), providing the user interface for end users;
  • The backend, containing the metadata store and index functionality exposed by the webservices (e.g. SRU and RSS) used by the portal and third parties;
  • The harvester, responsible for harvesting metadata via OAI-PMH from the underlying repositories and feeding the harvested metadata to the store in the backend.

The SRU interface is located at http://hbo-kennisbank.nl:8000/hbokennisbank/sru. Resolving this URL will trigger a default explain operation with version 1.1 and return

<?xml version="1.0" encoding="UTF-8"?>
<srw:explainResponse xmlns:srw="http://www.loc.gov/zing/srw/" xmlns:zr="http://explain.z3950.org/dtd/2.0/"> 
  <srw:version>1.1</srw:version> 
    <srw:record> 
      <srw:recordPacking>xml</srw:recordPacking> 
      <srw:recordSchema>http://explain.z3950.org/dtd/2.0/</srw:recordSchema> 
      <srw:recordData> 
        <zr:explain> 
          <zr:serverInfo protocol="SRU" version="1.1"> 
            <host>hbo-kennisbank.nl</host> 
            <port>8000</port> 
            <database>hbokennisbank/sru</database> 
            </zr:serverInfo> 
          <zr:databaseInfo> 
          <title lang="en" primary="true">SRU Database</title> 
          <description lang="en" primary="true">Meresco SRU</description> 
        </zr:databaseInfo> 
      </zr:explain> 
    </srw:recordData> 
  </srw:record> 
</srw:explainResponse>

If you can't get a valid response from the SRU interface while resolving the URL above (i.e. you're getting a 404 or other error), check that your local or network's firewall policy allows traffic to the TCP port 8000.


The HBO Kennisbank implements a combination of DIDL/MODS as the primary metadata standard as described by the HBO Bibliographic Metadata application profile. This metadata is harvested from the institutional repositories using OAI-PMH.

While the DIDL/MODS tandem provides us with a uniform metadata format for both theses and publications, the underlying legacy repository landscape still has widely implemented Simple Dublin Core. Moreover, in recent developments in Enhanced Publications, RDF has been recognised as the futureproof interoperable datamodel for knowledge management.

Following this trend, the HBO Kennisbank translates records from the DIDL/MODS and legacy DC XML implementations into an RDF/XML based superset.

SRU Mechanics

The SRU standard recognises three main operations:

In this application profile, we will only describe the Search/Retrieve operation. The HBO Kennisbank does not support the Scan method at this time.

The protocol also specifies three means of transport:

The HBO Kennisbank has implemented support for the latter two, however it is RECOMMENDED as a good practice not to overload HTTP semantics by using POST as a method to retrieve data, but to use the GET method for idempotent queries. For more information and exceptions see RFC 2616 section 9.1.1 and Possible reasons to use "POST" for idempotent queries.

Search/Retrieve Operation

The Search/Retrieve operation request is made by specifying the searchRetrieve value in the operation request parameter. The use of the version request parameter is REQUIRED, as is the use of the query request parameter. Currently, the version of SRU supported by the HBO Kennisbank is 1.1.


A minimal Search/Retrieve request looks like

operation=searchRetrieve&version=1.1&query=foo

This will generate a response that looks similar to

<?xml version="1.0" encoding="UTF-8"?>
<srw:searchRetrieveResponse xmlns:srw="http://www.loc.gov/zing/srw/" xmlns:diag="http://www.loc.gov/zing/srw/diagnostic/" xmlns:xcql="http://www.loc.gov/zing/cql/xcql/" xmlns:dc="http://purl.org/dc/elements/1.1/"> 
  <srw:version>1.1</srw:version>
  <srw:numberOfRecords>26</srw:numberOfRecords>
  <srw:records>
    <srw:record>
      <srw:recordSchema>rdf</srw:recordSchema>
      <srw:recordPacking>xml</srw:recordPacking>
      <srw:recordData>
        (..)
        <!--some field containing'foo'-->
        (..)
      </srw:recordData>
    </srw:record>
    <srw:record>
      <srw:recordSchema>rdf</srw:recordSchema>
      <srw:recordPacking>xml</srw:recordPacking>
      <srw:recordData>
        (..)
        <!--some field containing'foo'-->
        (..)
      </srw:recordData>
    </srw:record>
    (..)
  </srw:records>
  <srw:nextRecordPosition>11</srw:nextRecordPosition>
  <srw:echoedSearchRetrieveRequest>
    <srw:version>1.1</srw:version>
    <srw:query>foo</srw:query>
    <srw:startRecord>1</srw:startRecord>
    <srw:maximumRecords>10</srw:maximumRecords>
    <srw:recordPacking>xml</srw:recordPacking>
    <srw:recordSchema>rdf</srw:recordSchema>
  </srw:echoedSearchRetrieveRequest>
</srw:searchRetrieveResponse>

Querying

Defining queries using the SRU protocol is done in CQL. The SRU interface of the HBO Kennisbank supports queries up to CQL conformance level 1.
A CQL expression consists of one or more search clauses. A search clause MUST specify at least a term (e.g. foo). A clause MAY also specify the index to which the clause is making the query (i.e. fieldname e.g. dc.title) and a relation operator (i.e. = for case-insensitive equality or exact for case-sensitive equality).

The syntax of a clause is

index relation term

or as an implementation on the index of the DC title field

dc.title = foo

The above will return records with foo, Foo, FOO etc. in the DC title field.

Relations and Tokenisation

The exact relation SHOULD only be used against untokenised indices as using it against tokenised fields most likely won't yield the results you opted for: when we have a record containing <foo>Bar</foo> and foo is a tokenised index, the query foo exact Bar will not return this record in the results as the tokenised value of foo for this record is bar (note the lower-case 'b'), thus the capitalisation of the term does not match the indexed value for the case-sensitive exact relation. With the exception of drilldown, only the case-insensitive = relation SHOULD be used in queries.

For a list of the available indices, check the #Indices section.


Multiple search clauses MAY be linked by Boolean operators. Supported operators are AND, OR and NOT. The operators are evaluated left-to-right. Parentheses MAY be used to group Boolean operators and override the left-to-right evaluation.

dc.title = foo AND dc.creator = bar OR dc.creator = baz)
dc.title = foo AND (dc.creator = bar OR dc.creator = baz)

Sorting

Queryresults MAY be sorted using the sortKeys query parameter. The syntax of the sort request parameter is

sortKeys=key,schema,ascending

The key parameter is REQUIRED and the schema and ascending parameters are OPTIONAL.
In the HBO Kennisbank implementation, the schema parameter is not used and MUST be left empty. The ascending parameter is a boolean value with true being the default value if not provided. A descending sort on dc.date may be performed using the following expression:

sortKeys=sort.dc.date,,0

Sorts may also be performed based on multiple keys. Multiple keys are seperated by whitespace:

sortKeys=sort.rating.averageScore sort.dc.date

or reversing the sort with

sortKey=sort.rating.averageScore,,0 sort.dc.date,,0

Drilldown

The HBO Kennisbank provides a so-called drilldown interface which can be used to create ‘search facets’ or filters as shown on the left while searching at the HBO Kennisbank. These facets provide a count of unique values for a given field, e.g. the number of publications per author in the current queryresult.

A drilldown query MAY be performed by specifying a valid (i.e. untokenised) field in the x-term-drilldown request parameter. Additional fields MAY be specified separated by comma.

x-term-drilldown=dc.creator,dc.publisher

The example above provides a count of the first 10 unique authors and publishers in alphabetical order in the extraReponseData element.

(..)
<srw:extraResponseData>
  <dd:drilldown xmlns:dd="http://meresco.org/namespace/drilldown" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://meresco.org/namespace/drilldown http://meresco.org/files/xsd/drilldown-20070730.xsd">
    <dd:term-drilldown>
      <dd:navigator name="dc.creator">
        <dd:item count="2">Barnes, O<dd:item>
        <dd:item count="1">Johnson, E<dd:item>
        <dd:item count="1">Saunders, A<dd:item>
        (..)
      </dd:navigator>
      <dd:navigator name="dc.publisher">
        <dd:item count="1">Avans<dd:item>
        <dd:item count="4">Fontys<dd:item>
        <dd:item count="2">Hogeschool van Amsterdam<dd:item>
        (..)
      </dd:navigator>
    </dd:term-drilldown>
  </dd:drilldown>
</srw:extraResponseData>
(..)

This number MAY be specified by a colon followed by a positive integer directly after the index key. The example below increases the number of unique values for dc.creator returned to 20.

x-term-drilldown=dc.creator:20,dc.publisher

When 'zooming in' on a value found in the drilldown, the current query MUST be appended with the exact term from the drilldown results against the untokenised index of the specified field. Say that for the example above the CQL expression evaluated in the query looked like

dc.title = foo

and we're zooming in on the two publications from O. Barnes using the value from the dc.creator drilldown facet, then the new expression will look like

dc.title = foo AND drilldown.dc.creator exact "Barnes, O"

Note that the drilldown. prefix is used when referencing an untokenised drilldown index as part of the CQL expression in the query request parameter, as opposed to when referencing it in the x-term-drilldown request parameter when it is omitted.

Request Parameters

Parameter

Use

Value

operation

used, REQUIRED

string, searchRetrieve or explain

version

used, REQUIRED

string, 1.1

query

used, REQUIRED

string, valid CQL expression

startRecord

used, OPTIONAL

positive integer, default is 1

maximumRecords

used, OPTIONAL

positive integer, default is 10

recordPackaging

used, OPTIONAL

string, xml or string, default is xml

recordSchema

used, OPTIONAL

string, pointer to known recordschema, default is rdf

recordXPath

not used

resultSetTTL

not used

sortKeys

used, OPTIONAL

string, pointer to one or more untokenised fields, comma and whitespace separated

stylesheet

not used

extraRequestData

used, see drilldown and extra record data

x-recordSchema

used, OPTIONAL

string, pointer to known recordschema

x-term-drilldown

used, OPTIONAL

string, indexname:maximumvalues, comma separated e.g. dc.title:20,dc.publisher

Recordschemas

Schema

Description

rdf

'Lingua franca' schema containing both DIDL/MODS and Dublin Core.

meta

Metadata from the harvesting process added by the harvester (e.g. OAI-PMH baseURL, metadataPrefix, set, repositoryId etc.

dc

The schema for records using Dublin Core.

didl

The DIDL schema for records using DIDL/MODS.

mods

The MODS schema for records using DIDL/MODS.

social

Ratings and comments. A diagnostics error is provided for records for which no social metadata is available.

Indices

Unless explicitly stated otherwise, all indices contain tokenised values.

RDF

The indices based on RDF provide a superset of all records using the DIDL/MODS or DC format, thus using the RDF based indices SHOULD be sufficient in almost all scenario's and the DIDL/MODS or DC-based indices SHOULD NOT be used directly.

RDF.Description.accessRights
RDF.Description.classification
RDF.Description.classification.unparsable
RDF.Description.contributor.Person.affiliation
RDF.Description.contributor.Person.name
RDF.Description.contributor.Person.roleTerm
RDF.Description.creator.Person.affiliation
RDF.Description.creator.Person.affiliation.disallowed
RDF.Description.creator.Person.identifier
RDF.Description.creator.Person.name
RDF.Description.creator.Person.roleTerm
RDF.Description.date
RDF.Description.department
RDF.Description.description
RDF.Description.hostTitle
RDF.Description.language
RDF.Description.lectorate
RDF.Description.metadataNamespace
RDF.Description.place
RDF.Description.publisher
RDF.Description.publisher.Organization.identifier
RDF.Description.resourceLocation
RDF.Description.setting
RDF.Description.subTitle
RDF.Description.subject
RDF.Description.title
RDF.Description.type

The following fields are mapped for reasons of backwards compatibility and SHOULD NOT be used in new implementations.

RDF.Description.creator.Person.name -> dc.creator
RDF.Description.subject -> dc.subject
RDF.Description.type -> dc.type
RDF.Description.publisher -> dc.publisher

Sortkeys

sort.rating.averageScore # average rating
sort.dc.date             # datestamp from metadata
sort.comment.datestamp   # datestamp from last comment
sort.header.datestamp    # datestamp from OAI-PMH header.

Drilldown

The Drilldown indeces are untokenised and queries SHOULD be made using the exact operator in order to yield the expected result.

While pointing to the fields above in the x-term-drilldown request parameter, the drilldown. prefix MUST be ommitted, i.e. drilldown.dc.creator MUST be referenced in the request as x-term-drilldown=dc.creator.

drilldown.RDF.Description.accessRights
drilldown.RDF.Description.classification
drilldown.RDF.Description.classification.unparsable
drilldown.RDF.Description.creator.Person.affiliation
drilldown.RDF.Description.creator.Person.affiliation.disallowed
drilldown.RDF.Description.creator.Person.roleTerm
drilldown.RDF.Description.department
drilldown.RDF.Description.lectorate
drilldown.dc.creator
drilldown.dc.publisher
drilldown.dc.type
drilldown.extra.hasInstitute
drilldown.extra.hasLectorate
drilldown.meta.repository.repositoryGroupId
drilldown.normalized.date.year
drilldown.normalized.date.year.unparsable
drilldown.normalized.language
drilldown.normalized.language.unparsable
drilldown.normalized.subject
drilldown.sort.dc.date.unparsable

See also

SRU Version 1.2 Specifications
Meresco Core Public Interfaces
HBO Bibliographic Metadata

  • No labels