Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Section
Column
width50%

Table of contents

Table of Contents
Column
width50%
Wiki Markup


h1. Document information

| *Title:* The SWORD Protocol \\
*Subject:* \\
*Moderator:* \\
*Version:* \\
*Date published:* \\
*Excerpt*:{excerpt} Write an excerpt here {excerpt}\\
\\
(Optional information) \\
*Type:* \\
*Format:* \\
*Identifier:* \\
*Language:* \\
*Rights:* \\
*Tags:* {page-info:labels}\\ |

Document History

Date

Version

Owner

Changelog

Image Modified PDF

 

 

 

 

 

Abstract

The abstract describes what the application profile is about. It should contain a problem definition, the standards described by the application profile and the goal of the application profile.

...

The SWORD AtomPub Profile is an application profile of the Atom Publishing Protocol (APP) (RFC 5023) Internet Engineering Task Force 2007, The Atom Publication Protocol, RFC 5023, Internet Engineering Task Force, http://tools.ietf.org/html/rfc5023http://tools.ietf.org/html/rfc5023 that contains specific features that allows for an application-level deposit of material into repositories.
The APP is based on the HTTP transfer of Atom-formatted representations. It is easy to think of APP as a way of publishing just Atom Syndication Format feeds. While it is true that APP provides the means to publish Atom Syndication Format Entries to collections (such as blogs), it also provides a mechanism for the publishing of binary formatted data called Media Resources in APP context (Internet Engineering Task Force 2007). While in the blog scenario this mechanism may be used to add attachments to a blog post i.e. images, audio, video, documents), SWORD exploits this for the publishing (or deposit) of material into repositories, usually in some form of content packaging in which data and descriptive metadata are being held together in one container (see Figure 13).

Image Modified

Figure 13: Content Package or Container


An example of an implementation of such a container would be a ZIP-file containing a full-text manuscript in the PDF/A-1 format and descriptive metadata in the TEI-XML format.
The container is being submitted by a client to a SWORD interface service (server) as a bit stream using a HTTP POST request consisting of a header containing information about authorisation and the bit stream (type and format of the container) in order for the server to be able to interpret the bit stream properly, and a body part containing the bit stream itself (see Figure 14). Upon reception, the server sends a HTTP response back to the client - again consisting of a header and a body part - with the header containing a HTTP status code indicating a success or failure of the attempted deposit according to regular HTTP semantics, and a response document containing additional APP/SWORD specific information about the deposit being made.

Image Modified

Image Modified

Figure 14: HTTP request and response structure in the SWORD context

 

 

1.2 Use of SWORD in PEER

In the PEER workflow there are two scenarios of deposits into the PEER repositories specified: deposit made by PEER and deposit made by authors (see Figure 15)

Publicly Available PEER Repositories







Eligible Journals / Articles
PublishersPEER DepotAuthorsSelect100 % Metadata50 % ManuscriptsPublishers Transfer50 % ManuscriptsPublishers DepositPublishers InformTransferDepositUGOEKTUMPGINRIAULDUNIBITransferLong-Term Preservation: LTP Depot(e-Depot, KB)






Image Modified

Figure 15: PEER Workflow

Figure 16: Deposit situation


This results in an n:n-relation between repositories and deposit sources either the PEER Depot or third party services operated by an author (see Figure 16). To prevent multiple tailored solutions and implementations it is important to define a standard process for the deposit of material into repositories.
The processes may be categorised into two types of mechanisms: push and pull. An example of the pull mechanism is the KB's mechanism of the eDepot harvesting repositories through OAI-PMH and pulling content using a webclient (see Figure 17) which downloads the objects specified in the location entries in the metadata.

Image Modified

Figure 17: OAI-PMH data harvest


An example of the push mechanism is the SWORD deposit mechanism where the data is being pushed by an agent (i.e. a webservice or desktop application representing a user) to the SWORD interface of a repository which then accepts or rejects the deposit (see Figure 18).

Image Modified

Figure 18: SWORD data deposit


Finally, a third, hybrid mechanism can be created by setting up an FTP server to which deposits can be uploaded (pushed) by an agent. A repository may then pull the FTP content which is then being pulled into the repository (see Figure 19).

Image Modified

Figure 19: SWORD versus FTP


A disadvantage of this mechanism is that this only provides direct feedback to the agent about status of the upload, not of the status of the actual deposit into the repository. This may lead to the situation when an agent successfully uploads data to the FTP server, but the data is being rejected by the repository afterwards because it does not adhere to rules the repository enforces on its contents without the agent being informed about this rejection - something that is not the case when using SWORD.
Figure 20 provides a schematic overview of the use of SWORD in the PEER deposit scenario. Here a publisher transfers manuscripts and metadata into the PEER Depot where the manuscripts and metadata are being converted and crosswalked to the formats specified for the PEER deposit process. The converted and crosswalked manuscripts and metadata are then being packaged into a container and sent to the SWORD interface service of a repository where the contents are being unpacked from the container. Upon reception these MAY be converted and crosswalked into an internal storage format before they are being archived into the repository.

Image Modified

Figure 20: SWORD use in PEER for PEER Depot

 

2 Use of SWORD features

2.1 About this section

...

The PEER Submission Information Package (SIP) MAY be expressed using (a combination of) different formats (i.e. XML containers or RFC 1951 compliant ZIP archives) and/or serialised using different structural models (i.e. DIDL, METS, ORE, TEI, NLM, MODS, DC). The mappings between the SIP, its components and the formats and structures will be defined and expressed using specialised application profiles developed in the PEER context.

Image Modified

Figure 21: Submission Information Package structure


The SWORD profile offers the possibility to enumerate multiple packaging formats in the Service Document and supply a Quality Value attribute indicating a preference and level of support for a designated package format.

...

The following paragraph is considered informative, but is included for clarity in the use of the SWORD profile outside the PEER project.
The PEER workflow offers two ways a manuscript can be deposited into one of the publicly available PEER repositories: either by publisher deposit (through the PEER Depot) or by author deposit (where the publisher informs the author who deposits his/her article(s) in the actual publicly available repository PEER D2.1 Draft report on logfile harvesting systems and manuscript deposit procedures for publishers and repository managers, p.8, http://www.peerproject.eu/reports/.).
For the author deposit, the author MAY make the deposit by proxy through a web service (i.e. by filling in a form to provide the metadata and upload a file containing the full-text material) after which the web service is making the actual deposit. The web service MAY not be used for the PEER project exclusively in which case the web service MAY use its own credentials to authenticate at the server (at the repository side).
Figure 22 depicts an example of the use of this mechanism in the PEER context. Note that the greyed out parts of the figure are considered outside the scope of the PEER project.

Image Modified

Figure 22: PEER deposit workflow


It is recognized that the repository MAY want to keep track of data that is being deposited within the PEER context by creating a single user account to the PEER Depot. This then covers the publisher deposit workflow, but does not provide for a solution for the case of author deposit through another web service which MAY use different credentials.
A possible solution MAY be the use of mediated deposit where a client authenticates using its assigned credentials on behalf of another known user (e.g. a web service authenticates using its own credentials and makes the deposit on behalf of the PEER user which is used by the PEER Depot).
This method MAY also be used to authenticate on behalf of other users (i.e. authors, librarians, data stewards, research assistants, etc.) that already have a valid user account at the repository.
The use of mediated deposit is considered OPTIONAL and is currently not implemented in the application of the SWORD profile within the PEER project.

...

In order to future proof agreements and guidelines for a technical model, it is important to detach the technical implementation from the abstract object and information model. Furthermore it is important to keep this abstract model aligned with other developments in the area the model will be used in. For PEER, there are two of such developments:
OAIS Reference Model for its use by the KB
DRIVER object model for Enhanced Publications for its use in DRIVER context
In PEER, manuscripts and metadata will be transferred between authors, publishers, the PEER Depot, Open Access repositories and an LTP repository exploited by the KB.
This results in a PEER object consisting of a manuscript object which is being described by one or more metadata objects (see Figure 23).
The D2.1 report provides an exhaustive metadata field set to be used in the PEER project (see Table 5, below).

Image Modified

Figure 23: PEER Object model ERD

Field Name

Semantics

Syntax

Title

Article Title

 

Creator

Corresponding author's name

Last name, first name

AuthorEmail

Corresponding author's e-mail address

 

Description

Abstract

 

Date

Date of publication

ISO 8601:2004 ; yyyy-MM-dd

Identifier

DOI of published article

 

Coverage

Geographic location of the contributing Author

ISO 3166-1-A2

Journal

Journal title

 

Affiliation

multi-tier organisation list

Country, Organization, Laboratory

ISSN

 

 

Volume

 

 

Issue

 

 

Page

 

 

Type

Semantic type of the publication

info:eu-repo/semantics/article
info:eu-repo/semantics/acceptedVersion defaults to article.

Subject

Subject headings; Scientific classification (defaults to what is provided in the general STM Journal table See Appendix A.)

 

Language

Language of the publication

ISO 639-3 (defaults to 'eng')

Embargo

Embargo Period (defaults to what is provided in the general STM Journal table5)

 

Anchor
_Toc243129865
_Toc243129865
Table 5: PEER information modelFor deposit the PEER object will be packaged into a container. The OAIS reference model specifies the Submission Information Package (SIP) as a specialised Information Package (IP) - which is used by the KB in the eDepot - for submission purposes (see Figure 24).

Image Modified

Figure 24: OAIS Information Package ERD

Image Modified

Figure 25: OAIS Content Information Object ERD


An IP consists of content information that is being described by Package Description Information (PDI).
The content information object is defined as a data object (i.e. PDF file) interpreted using representation information (i.e. mime-type, encoding version, etc.) (see Figure 25). Note the structure information being a part of the representation information.
The PDI (see Figure 26) contains Reference Information (i.e. bibliographic descriptions and persistent identifiers), Provenance Information (i.e. information about the conversion process), Context Information (i.e. reference to the research project a publication is based on) and Fixity Information (i.e. a checksum).

Image Modified

Figure 26: OAIS Package Description Information ERD


The PEER information model can be mapped onto the OAIS Reference Model as depicted in Figure 27. Here the structure object containing the structural information is being added to the PEER object model.

Image Modified

Figure 27: OAIS Reference Model-PEER Information Mapping


Figure 28 depicts a technical mapping of the PEER object model. The structure is expressed using the ORE abstract data model which is serialised as an Atom feed. This Atom feed is to be contained in an XML file. The metadata is serialised in TEI, again contained in an XML file. The manuscript is encoding in PDF/A, contained in a PDF-file. The XML file containing the ORE Atom feed, the XML file containing the TEI document and the PDF file containing the manuscript are then being packaged in a compliant ZIP-file. Upon deposit using SWORD, the ZIP-file is being placed into the body of the HTTP POST request. The Header contains an MD5 checksum and MAY contain authorisation information (see Figure 29).
The HTTP POST request is then being sent to the SWORD Interface Service as described in paragraph 1.2.

Image Modified

Figure 28: Technical Mapping of the PEER model

Image Modified

Figure 29: HTTP Mapping of the Technical Model

 

5 Implications on Repository level

...

...
SWORD APP Profile v1.3 states:
"The Location element of the HTTP header response MUST contain the URI of the Media Link Entry, as defined in ATOMPUB. The Media Link Entry URI MUST dereference, and MUST contain an atom:content element with a src attribute containing a _URI."_Example:
HTTP/1.1 201 Created
Date: Mon, 18 August 2008 14:27:11 GMT
Content­Length: nnn
Content­Type: application/atom+xml; charset="utf­8"
Location: *http://www.myrepository.org/geo/atom/my_deposit.atom*


<entry ...>
  <title>My Deposit</title>
  <id>info:something:1</id>
  <updated>2008­08­18T14:27:08Z</updated>
  <author><name>jbloggs</name></author>
  <summary type="text">A summary</summary>
  ...
  <content type="application/zip"     src="http://www.myrepository.ac.uk/geo/deposit1.zip"/>
  <sword:packaging>http://purl.org/net/sword­types/tei/peer</sword:packaging>
  <link rel="edit"
    href="http://www.myrepository.org/geo/atom/my_deposit.atom" />
</entry>
The Server MUST use the correct MIME-type reference in the type attribute of the atom:content element in the Media Link Entry.
The Media Link Entry MUST contain an atom:element with a src attribute containing the location of the fulltext. This is used in the feedback e-mail to the author.
The Media Link Entry MAY contain an atom:element with a src attribute containing the location of the original submission information package, the raw metadata, ORE-ReM, jump-off page, etc.