This application profile has a draft status. Be sure to check back regularly. Please direct any comments to the author of this page.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Introduction
Search/Retrieval via URL (SRU) is an search protocol standard maintained at the Library of Congress. SRU supports multiple transport mechanisms, amongst others by means of SOAP (SRU via HTTP SOAP) which in versions prior to SRU version 1.2 was known as its twin-protocol Search/Retrieve Web Service (SRW).
SRU has its origins in the Z39.50 protocol and implements Contextual Query Language (CQL, previously known as Common Query Language), which is based on Z39.50 semantics, as the query language.
This document describes the implementation of SRU by the HBO Kennisbank as of June 28 2011.
HBO Kennisbank
The HBO Kennisbank can be divided into roughly three distinct compontents:
- The portal (http://www.hbo-kennisbank.nl), providing the user interface for end users;
- The backend, containing the metadata store and index functionality exposed by the webservices (e.g. SRU and RSS) used by the portal and third parties;
- The harvester, responsible for harvesting metadata via OAI-PMH from the underlying repositories and feeding the harvested metadata to the store in the backend.
The SRU interface is located at http://hbo-kennisbank.nl:8000/hbokennisbank/sru. Resolving this URL will trigger a default explain
operation with version 1.1
and return
<?xml version="1.0" encoding="UTF-8"?> <srw:explainResponse xmlns:srw="http://www.loc.gov/zing/srw/" xmlns:zr="http://explain.z3950.org/dtd/2.0/"> <srw:version>1.1</srw:version> <srw:record> <srw:recordPacking>xml</srw:recordPacking> <srw:recordSchema>http://explain.z3950.org/dtd/2.0/</srw:recordSchema> <srw:recordData> <zr:explain> <zr:serverInfo protocol="SRU" version="1.1"> <host>hbo-kennisbank.nl</host> <port>8000</port> <database>hbokennisbank/sru</database> </zr:serverInfo> <zr:databaseInfo> <title lang="en" primary="true">SRU Database</title> <description lang="en" primary="true">Meresco SRU</description> </zr:databaseInfo> </zr:explain> </srw:recordData> </srw:record> </srw:explainResponse>
If you can't get a valid response from the SRU interface while resolving the URL above (i.e. you're getting a 404 or other error), check that your local or network's firewall policy allows traffic to the TCP port 8000.
The HBO Kennisbank implements a combination of DIDL/MODS as the primary metadata standard as described by the HBO Bibliographic Metadata application profile. This metadata is harvested from the institutional repositories using OAI-PMH.
While the DIDL/MODS tandem provides us with a uniform metadata format for both theses and publications, the underlying legacy repository landscape still has widely implemented Simple Dublin Core. Moreover, in recent developments in Enhanced Publications, RDF has been recognised as the futureproof interoperable datamodel for knowledge management.
Following this trend, the HBO Kennisbank translates records from the DIDL/MODS and legacy DC XML implementations into an RDF/XML based superset.
SRU Mechanics
The SRU standard recognises three main operations:
- Search/Retrieve;
- Explain; and
- Scan.
In this application profile, we will only describe the Search/Retrieve operation. The HBO Kennisbank does not support the Scan method at this time.
The protocol also specifies three means of transport:
- the aforementioned SRU via HTTP SOAP;
- HTTP GET; and
- HTTP POST.
The HBO Kennisbank has implemented support for the latter two, however it is RECOMMENDED as a good practice not to overload HTTP semantics by using POST as a method to retrieve data, but to use the GET method for idempotent queries. For more information and exceptions see RFC 2616 section 9.1.1 and Possible reasons to use "POST" for idempotent queries.
Search/Retrieve Operation
The Search/Retrieve operation request is made by specifying the searchRetrieve
value in the operation
request parameter. The use of the version
request parameter is REQUIRED, as is the use of the query
request parameter. Currently, the version of SRU supported by the HBO Kennisbank is 1.1
.
A minimal Search/Retrieve request looks like
operation=searchRetrieve&version=1.1&query=foo
This will generate a response that looks similar to
<?xml version="1.0" encoding="UTF-8"?> <srw:searchRetrieveResponse xmlns:srw="http://www.loc.gov/zing/srw/" xmlns:diag="http://www.loc.gov/zing/srw/diagnostic/" xmlns:xcql="http://www.loc.gov/zing/cql/xcql/" xmlns:dc="http://purl.org/dc/elements/1.1/"> <srw:version>1.1</srw:version> <srw:numberOfRecords>26</srw:numberOfRecords> <srw:records> <srw:record> <srw:recordSchema>rdf</srw:recordSchema> <srw:recordPacking>xml</srw:recordPacking> <srw:recordData> (..) <!--some field containing'foo'--> (..) </srw:recordData> </srw:record> <srw:record> <srw:recordSchema>rdf</srw:recordSchema> <srw:recordPacking>xml</srw:recordPacking> <srw:recordData> (..) <!--some field containing'foo'--> (..) </srw:recordData> </srw:record> (..) </srw:records> <srw:nextRecordPosition>11</srw:nextRecordPosition> <srw:echoedSearchRetrieveRequest> <srw:version>1.1</srw:version> <srw:query>foo</srw:query> <srw:startRecord>1</srw:startRecord> <srw:maximumRecords>10</srw:maximumRecords> <srw:recordPacking>xml</srw:recordPacking> <srw:recordSchema>rdf</srw:recordSchema> </srw:echoedSearchRetrieveRequest> </srw:searchRetrieveResponse>
Querying
Defining queries using the SRU protocol is done in CQL. The SRU interface of the HBO Kennisbank supports queries up to CQL conformance level 1.
A CQL expression consists of one or more search clauses. A search clause MUST specify at least a term
(e.g. foo
). A clause MAY also specify the index
to which the clause is making the query (i.e. fieldname e.g. dc.title
) and a relation
operator (i.e. =
for case-insensitive equality or exact
for case-sensitive equality).
The syntax of a clause is
index relation term
or as an implementation on the index of the DC title field
dc.title = foo
The above will return records with foo
, Foo
, FOO
etc. in the DC title field.
Relations and Tokenisation
The exact
relation SHOULD only be used against untokenised indices as using it against tokenised fields most likely won't yield the results you opted for: when we have a record containing <foo>Bar</foo>
and foo
is a tokenised index, the query foo exact Bar
will not return this record in the results as the tokenised value of foo
for this record is bar
(note the lower-case 'b'), thus the capitalisation of the term does not match the indexed value for the case-sensitive exact
relation. With the exception of drilldown, only the case-insensitive =
relation SHOULD be used in queries.
For a list of the available indices, check the #Indices section.
Multiple search clauses MAY be linked by Boolean operators. Supported operators are AND
, OR
and NOT
. The operators are evaluated left-to-right. Parentheses MAY be used to group Boolean operators and override the left-to-right evaluation.
dc.title = foo AND dc.creator = bar OR dc.creator = baz) dc.title = foo AND (dc.creator = bar OR dc.creator = baz)
Sorting
Queryresults MAY be sorted using the sortKeys
query parameter. The syntax of the sort request parameter is
sortKeys=key,schema,ascending
The key
parameter is REQUIRED and the schema
and ascending
parameters are OPTIONAL.
In the HBO Kennisbank implementation, the schema
parameter is not used and MUST be left empty. The ascending
parameter is a boolean value with true
being the default value if not provided. A descending sort on dc.date
may be performed using the following expression:
sortKeys=sort.dc.date,,0
Sorts may also be performed based on multiple keys. Multiple keys are seperated by whitespace:
sortKeys=sort.rating.averageScore sort.dc.date
or reversing the sort with
sortKey=sort.rating.averageScore,,0 sort.dc.date,,0
Drilldown
The HBO Kennisbank provides a so-called drilldown interface which can be used to create ‘search facets’ or filters as shown on the left while searching at the HBO Kennisbank. These facets provide a count of unique values for a given field, e.g. the number of publications per author in the current queryresult.
A drilldown query MAY be performed by specifying a valid (i.e. untokenised) field in the x-term-drilldown
request parameter. Additional fields MAY be specified separated by comma.
x-term-drilldown=dc.creator,dc.publisher
The example above provides a count of the first 10 unique authors and publishers in alphabetical order in the extraReponseData
element.
(..) <srw:extraResponseData> <dd:drilldown xmlns:dd="http://meresco.org/namespace/drilldown" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://meresco.org/namespace/drilldown http://meresco.org/files/xsd/drilldown-20070730.xsd"> <dd:term-drilldown> <dd:navigator name="dc.creator"> <dd:item count="2">Barnes, O<dd:item> <dd:item count="1">Johnson, E<dd:item> <dd:item count="1">Saunders, A<dd:item> (..) </dd:navigator> <dd:navigator name="dc.publisher"> <dd:item count="1">Avans<dd:item> <dd:item count="4">Fontys<dd:item> <dd:item count="2">Hogeschool van Amsterdam<dd:item> (..) </dd:navigator> </dd:term-drilldown> </dd:drilldown> </srw:extraResponseData> (..)
This number MAY be specified by a colon followed by a positive integer directly after the index key. The example below increases the number of unique values for dc.creator
returned to 20
.
x-term-drilldown=dc.creator:20,dc.publisher
When 'zooming in' on a value found in the drilldown, the current query MUST be appended with the exact
term
from the drilldown results against the untokenised index
of the specified field. Say that for the example above the CQL expression evaluated in the query looked like
dc.title = foo
and we're zooming in on the two publications from O. Barnes using the value from the dc.creator
drilldown facet, then the new expression will look like
dc.title = foo AND drilldown.dc.creator exact "Barnes, O"
Note that the drilldown.
prefix is used when referencing an untokenised drilldown index as part of the CQL expression in the query
request parameter, as opposed to when referencing it in the x-term-drilldown
request parameter when it is omitted.
Request Parameters
Parameter |
Use |
Value |
---|---|---|
|
used, REQUIRED |
string, |
|
used, REQUIRED |
string, |
|
used, REQUIRED |
string, valid CQL expression |
|
used, OPTIONAL |
positive integer, default is |
|
used, OPTIONAL |
positive integer, default is |
|
used, OPTIONAL |
string, |
|
used, OPTIONAL |
string, pointer to known recordschema, default is |
|
not used |
|
|
not used |
|
|
used, OPTIONAL |
string, pointer to one or more untokenised fields, comma and whitespace separated |
|
not used |
|
|
used, see drilldown and extra record data |
|
|
used, OPTIONAL |
string, pointer to known recordschema |
|
used, OPTIONAL |
string, |
Recordschemas
Schema |
Description |
---|---|
|
'Lingua franca' schema containing both DIDL/MODS and Dublin Core. |
|
Metadata from the harvesting process added by the harvester (e.g. OAI-PMH baseURL, metadataPrefix, set, repositoryId etc. |
|
The schema for records using Dublin Core. |
|
The DIDL schema for records using DIDL/MODS. |
|
The MODS schema for records using DIDL/MODS. |
|
Ratings and comments. A diagnostics error is provided for records for which no social metadata is available. |
Indices
Unless explicitly stated otherwise, all indices contain tokenised values.
RDF
The indices based on RDF provide a superset of all records using the DIDL/MODS or DC format, thus using the RDF based indices SHOULD be sufficient in almost all scenario's and the DIDL/MODS or DC-based indices SHOULD NOT be used directly.
RDF.Description.accessRights RDF.Description.classification RDF.Description.classification.unparsable RDF.Description.contributor.Person.affiliation RDF.Description.contributor.Person.name RDF.Description.contributor.Person.roleTerm RDF.Description.creator.Person.affiliation RDF.Description.creator.Person.affiliation.disallowed RDF.Description.creator.Person.identifier RDF.Description.creator.Person.name RDF.Description.creator.Person.roleTerm RDF.Description.date RDF.Description.department RDF.Description.description RDF.Description.hostTitle RDF.Description.language RDF.Description.lectorate RDF.Description.metadataNamespace RDF.Description.place RDF.Description.publisher RDF.Description.publisher.Organization.identifier RDF.Description.resourceLocation RDF.Description.setting RDF.Description.subTitle RDF.Description.subject RDF.Description.title RDF.Description.type
The following fields are mapped for reasons of backwards compatibility and SHOULD NOT be used in new implementations.
RDF.Description.creator.Person.name -> dc.creator RDF.Description.subject -> dc.subject RDF.Description.type -> dc.type RDF.Description.publisher -> dc.publisher
Sortkeys
sort.rating.averageScore # average rating sort.dc.date # datestamp from metadata sort.comment.datestamp # datestamp from last comment sort.header.datestamp # datestamp from OAI-PMH header.
Drilldown
The Drilldown indeces are untokenised and queries SHOULD be made using the exact
operator in order to yield the expected result.
While pointing to the fields above in the x-term-drilldown
request parameter, the drilldown.
prefix MUST be ommitted, i.e. drilldown.dc.creator
MUST be referenced in the request as x-term-drilldown=dc.creator
.
drilldown.RDF.Description.accessRights drilldown.RDF.Description.classification drilldown.RDF.Description.classification.unparsable drilldown.RDF.Description.creator.Person.affiliation drilldown.RDF.Description.creator.Person.affiliation.disallowed drilldown.RDF.Description.creator.Person.roleTerm drilldown.RDF.Description.department drilldown.RDF.Description.lectorate drilldown.dc.creator drilldown.dc.publisher drilldown.dc.type drilldown.extra.hasInstitute drilldown.extra.hasLectorate drilldown.meta.repository.repositoryGroupId drilldown.normalized.date.year drilldown.normalized.date.year.unparsable drilldown.normalized.language drilldown.normalized.language.unparsable drilldown.normalized.subject drilldown.sort.dc.date.unparsable
See also
SRU Version 1.2 Specifications
Meresco Core Public Interfaces
HBO Bibliographic Metadata