This is a REST-based service that will generally reside on a server along with some sort of digital repository. The function of the service is to allow authorized clients to create, retrieve, update, and possibly delete digital packages from the repository.
The format for creation, update, and retrieval is a Zip archive containing a standardized header file, a METS files conformant to ECHODEP Hub and Spoke profile, plus all of the files and directories that comprise the object in the repository's native import or export format.
In order to identify specific packages in a repository the service must map URIs into the local identifiers used by the repository. The format of URIs used by the service must follow this format:
BaseURL "/" EscapedLocalRepositoryIdentifier [ "?" query ]
The BaseURL conforms to normal syntax for HTTP or HTTPS URLs, namely: "http[s]:" "//" host [ ":" port ] [ abs_path ] The [ "?" query ] portion may occur after the EscapedLocalRepositoryIdentifier under special circumstances to be described below, but it is not considered part of the local identifier. The EscapedLocalRepositoryIdentifier must make up the final segment of the abs_path with any characters not otherwise allowed in a path segment escaped according to the rules in RFC 2396. When the EscapedLocalRepositoryIdentifier is unescaped it must be a local identifier for an object in the repository. In the case of a Create action, the local identifier can be the location in the repository into which the new record should be created. In terms of the Common Gateway Interface (CGI) the EscapedLocalRepositoryIdentifier can be considered the PATH_INFO variable. Following are some examples:
In general, the LRCRUD service will reside on the same host as the repository which it serves, but it could be on any accessible port, and the path component of the base URL is arbitrary.
For this service the database CRUD actions (Create/Retrieve/Update/Delete) have been mapped to corresponding HTTP methods:
| CRUD | HTTP | |
|---|---|---|
| Create | POST | The Create action uses the HTTP POST method with the local identifier part of the URI
representing the location in the repository in which to create the new record. For
example, with DSpace the local identifier would identify a specific collection in DSpace
in which to create the new record. The local identifier is optional and will depend on the
particular repository. It could be the handle of a collection as in DSpace or it could be
a hierarchical path to a location on disk to store the files. If it is omitted the
assumption is that the new package will be created in some default location dependent on
the repository. The URI of the newly created record is returned as part of the HTTP
response to the POST. For example, to create a new DSpace package in the DSpace collection
with an identifier of 2135.12346:
POST /dspace/lrcruds/2135.12346 HTTP/1.1
Date: Sun, 06 Nov 1994 08:49:37 GMT
From: thabing@uiuc.edu
User-Agent: ECHODEP_Hub_and_Spoke/1.0
Content-Length: 3495
Content-MD5: <Base64-encoded MD5 Hash>
Content-Type: application/x-www-form-urlencoded
Last-Modified: Tue, 01 Nov 1994 12:45:26 GMT
<X-WWW-FORM-URLENCODED OCTETS>
A successful response would look like the following:
201 Created
Date: Sun, 06 Nov 1994 08:51:09 GMT
Server: ECHODEP_LRCRUDS/1.0 DSpace/1.4
Allow: POST, HEAD
Location: http://some.plqace.edu/dspace/lrcruds/2135.89342
It is critical to note that the package itself is not uploaded as part of the POST
request, but that the POST request creates only a stub or placeholder record. However, if
there are additional parameters required by the repository to create the stub record those
parameters can be passed in the body of the request as URL encoded form values. These will
be dependent on the requirements of a specific repository. Since there will be a lag
between the creation of the stub record and the update of the actual package, and in some
cases this lag may be significant, the stub record should be clearly labeled as a
placeholder and not a 'real' record. If the repository has a means to suppress this record
it should do so until it is updated by the real record. The reason that the actual package is not uploaded as part of the POST is that the identifier assigned to the package by the repository needs to be embedded in the METS file which is part of the package. The typical sequence of operations to ingest a new package would be to use POST to create a new placeholder record and get the identifier for that record. That identifier is then used to update provenance and other metadata which is part of the package, and then the placeholder record is updated or overwritten with the actual package using the PUT action. |
| Update | PUT | The Update actions use the HTTP PUT method. If the identifier does not already exist in
the repository an HTTP 404 Not Found error is returned. If the identifier does exist, the
package is replaced or updated with the new package. For example, to create or update a DSpace package with an identifier of 2135.89342:
PUT /dspace/lrcruds/2135.89342 HTTP/1.1
Date: Sun, 06 Nov 1994 08:49:37 GMT
From: thabing@uiuc.edu
User-Agent: ECHODEP_Hub_and_Spoke/1.0
Content-Length: 3495
Content-MD5: <Base64-encoded MD5 Hash>
Content-Type: application/zip
Last-Modified: Tue, 01 Nov 1994 12:45:26 GMT
<ZIP FILE OCTETS>
A successful response would look like the following:
204 No Content
Date: Sun, 06 Nov 1994 08:51:02 GMT
Server: ECHODEP_LRCRUDS/1.0 DSpace/1.4
Allow: GET, PUT, DELETE, HEAD
The HTTP PUT method must be idempotent, meaning that the same request with the same data
must produce the same result no matter how many times it is performed. |
| Retrieve | GET | Retrieval of a record is being mapped to the HTTP GET method. In this case the exact
same URL as would be used to create or update the record is used, for example:
GET /dspace/lrcruds/2135.89342 HTTP/1.1
Date: Sun, 06 Nov 1994 08:49:37 GMT
From: thabing@uiuc.edu
User-Agent: ECHODEP_Hub_and_Spoke/1.0
A successful response would look like the following:
200 OK
Date: Sun, 06 Nov 1994 08:51:02 GMT
Server: ECHODEP_LRCRUDS/1.0 DSpace/1.4
Allow: GET, PUT, DELETE, HEAD
Content-Length: 3495
Content-MD5: <Base64-encoded MD5 Hash>
Content-Type: application/zip
Last-Modified: Tue, 01 Nov 1994 12:45:26 GMT
<ZIP FILE OCTETS>
|
| Delete | DELETE | Deletion of a record is being mapped to the HTTP DELETE method. In this case the exact
same URL as would be used to create or update the record is used, for example:
DELETE /dspace/lrcruds/2135.89342 HTTP/1.1
Date: Sun, 06 Nov 1994 08:49:37 GMT
From: thabing@uiuc.edu
User-Agent: ECHODEP_Hub_and_Spoke/1.0
A successful response would look like the following:
204 No Content
Date: Sun, 06 Nov 1994 08:51:02 GMT
Server: ECHODEP_LRCRUDS/1.0 DSpace/1.4
Allow: GET, PUT, DELETE, HEAD
|
| Other Administrative Functions | POST HEAD | In addition to Create, the POST method may be used for miscellaneous administrative
functions which are not CRUD actions. All POST requests must be encoded as
application/x-www-form-urlencoded, and the responses must be XML conforming to the schema
outlined in this document. Unlike the PUT, GET, and DELETE methods, POST is not required
to be idempotent. In fact the create stub record action will not be idempotent because it
will generate a new record and return a new identifier each time it is used. The HEAD method may be issued to retrieve meta-information about a resource. In general, a HEAD request will be treated the same as a GET request except that the entity body is not returned, only the HTTP header is returned. |
For information on programming the PUT and DELETE methods from various client and server platforms see http://www.intertwingly.net/wiki/pie/PutDeleteSupport.
The following sections will describe the various actions in more detail. In general, the rules defined in the HTTP specification must be followed for all HTTP methods and headers unless stated otherwise below.
The following HTTP header fields are common to all methods:
Date: Sun, 06 Nov 1994 08:49:37 GMT
Because of the possible large size of the files being transported, the chunked transport encoding must be supported for both the retrieve (GET) response and the update (PUT) request. The following headers pertain to entity body processing with both requests and responses:
Content-Length: 12345
Content-MD5: 1B2M2Y8AsgTpgAmY7PhCfg==
Content-Type: application/zipContent-Type: application/x-www-form-urlencoded
Transfer-Encoding: chunked
The following HTTP header fields are common to all request methods:
User-Agent: ECHODEP_Hub_and_Spoke/1.0
From: xyz123@uiuc.edu
Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
The following HTTP header fields are common to all response methods:
Server: ECHODEP_LRCRUDS/1.0 DSpace/1.4
Allow: GET, PUT, DELETE, HEAD
Allow: POST, HEAD
WWW-Authenticate: Basic realm="LRCRUDS"
The Create action is used to create a placeholder or stub package in the repository and return the local identifier of that package back to the requester.
The EscapedLocalRepositoryIdentifier portion of the URL must contain the local identifier for the location in which the stub package is to be created. If no EscapedLocalRepositoryIdentifier is given the service may create the stub package in a default location if there is one or return an error message. If an EscapedLocalRepositoryIdentifier is given the service must verify that it does identify a location where new packages may be created.
The entity body of the request may optionally contain X-WWW-FORM-URLENCODED values. These are dependent on the requirements of the serviced repository, but may include minimal metadata required to create the stub package, such as title or alternate identifiers. If there is an optional X-WWW-FORM-URLENCODED entity body, the request must include a Content-Length, Content-Type, and Content-MD5 headers which correspond to the entity body.
After successful execution of the Create request the service must respond with an HTTP 201 Created message. The response must also include a Location header which contains the absolute URI of the newly created package. This is the URI which can be used in later, Retrieve, Update, or Delete interactions with that package.
If the location identified by the EscapedLocalRepositoryIdentifier does not existing an HTTP 404 status must be returned. The reason phrase must be "Location not found".
If the EscapedLocalRepositoryIdentifier is required but not included with the request an HTTP 400 status must be returned with a reason phrase of "Location is required".
If the location identified by the EscapedLocalRepositoryIdentifier exists but does not represent a location to which new packages may be created then an HTTP 400 must be returned with a reason phrase "Packages my not be created in this location". For example, the identifier may represent a package instead of a location that contains packages.
If the Create request requires authorization, the server must respond with an HTTP 401 Unauthorized status.
If the service has retrieved data from the underlying repository but the data is corrupt or the service is unable to process the data for any reason, the service must respond with an HTTP 502 Bad Gateway response. If a more definitive status message is available, then the service should use it in place of the generic "Bad Gateway." This message will be dependent on the underlying repository.
If the service is not able to communicate with the underlying repository for any reason, it must respond with an HTTP 503 Service Unavailable response. If a more definitive status message is available, then the service should use it in place of the generic "Service Unavailable." This message will be dependent on the underlying repository.
If the service has established a connection to the underlying repository and is attempting to retrieve data, but the request has timed out, the service must respond with an HTTP 504 Gateway Timeout response. If a more definitive status message is available, then the service should use it in place of the generic "Service Unavailable." This message will be dependent on the underlying repository.
If there are problems processing the entity body, such as missing required parameters, an HTTP 400 status must be returned. These messages will be dependent on the requirements of the specific repository, but the reason phrase should contain enough information for an administrator to try and correct the problem before retrying the request.
If there is an optional entity body, the Content-Length header is required. If this is missing the service must return an HTTP 411 Length required status.
If there is an optional entity body, the Content-MD5 header is also required, and the LRCRUDS service must verify that the MD5 checksum contained in the header matches the checksum of the actual entity body. If it does not macth the service must respond with an HTTP 400 status with a reason phrase "MD5 checksum does not match".
The Update action replaces the package identified by EscapedLocalRepositoryIdentifier with the new package which must be contained in the entity body of the PUT request.
The EscapedLocalRepositoryIdentifier portion of the URL must contain the local identifier for the package which is being put into the repository. Generally this will be an identifier which was returned as part of a previous Create (POST) operation. For Update the EscapedLocalRepositoryIdentifier is always mandatory. An Update always assumes that the package or at least a placeholder for the package already exists in the repository, and the Update operation is replacing the old contents of that package withe the new contents.
The entity body of the request must contain a zip file which contains the files making up the package to be ingested. The Content-Type header value must be "application/zip".
After successful execution of the Update request the service must respond with an HTTP 204 No content message. This signifies that the request was successful, but no entity body is returned.
If the package identified by the EscapedLocalRepositoryIdentifier does not existing an HTTP 404 status must be returned. The reason phrase must be "Package not found".
If the Content-Type header of the request is not "application/zip", the service must respond with a HTTP 415 status. The reason phrase must be "application/zip is the only supported media type".
If the Create request requires authorization, the server must respond with an HTTP 401 Unauthorized status.
Our current implementation must respond with an HTTP 501 status if the request contains a Content-Range header. The reason phrase must be "Content-Range is not implemented". This might change in a future version of the protocol.
If the service has retrieved data from the underlying repository but the data is corrupt or the service is unable to process the data for any reason, the service must respond with an HTTP 502 Bad Gateway response. If a more definitive status message is available, then the service should use it in place of the generic "Bad Gateway." This message will be dependent on the underlying repository.
If the service is not able to communicate with the underlying repository for any reason, it must respond with an HTTP 503 Service Unavailable response. If a more definitive status message is available, then the service should use it in place of the generic "Service Unavailable." This message will be dependent on the underlying repository.
If the service has established a connection to the underlying repository and is attempting to retrieve data, but the request has timed out, the service must respond with an HTTP 504 Gateway Timeout response. If a more definitive status message is available, then the service should use it in place of the generic "Service Unavailable." This message will be dependent on the underlying repository.
If the client making the request to the LRCRUDS service does not complete the request in a timely fashion the service may respond with an HTTP 408 Request timeout status. This might happen if there is a long delay between the client sending the request headers and the entity body.
If there are problems processing the entity body, such as the zip file is corrupt or the zip file does not contain the files expected for ingest into the repository, an HTTP 400 status must be returned. These messages will be dependent on the requirements of the specific repository, but the reason phrase should contain enough information for an administrator to try and correct the problem before retrying the request.
The Content-Length header is not allowed if chunked transfer encoding is used . If chunked transfer encoding is not being used this header is requiored, and if this is missing the service must return an HTTP 411 Length required status.
The Content-MD5 header is also required, and the LRCRUDS service must verify that the MD5 checksum contained in the header matches the checksum of the actual entity body. If it does not match the service must respond with an HTTP 400 status with a reason phrase "MD5 checksum does not match".
The Retrieve action gets a package identified by the EscapedLocalRepositoryIdentifier from a repository and returns all the associated files as a zip archive.
The EscapedLocalRepositoryIdentifier portion of the URL must contain the local identifier for the package which is being retrieved. Generally this will be an identifier which was returned as part of a previous Create (POST) operation. For Retrieve the EscapedLocalRepositoryIdentifier is always mandatory.
After successful execution of the Retrieve request the service must respond with an HTTP 200 OK message. This signifies that the request was successful, and the entity body contains the requested package.
If the package identified by the EscapedLocalRepositoryIdentifier does not existing an HTTP 404 status must be returned. The reason phrase must be "Package not found".
If the Retrieve request requires authorization, the server must respond with an HTTP 401 Unauthorized status.
If the service has retrieved data from the underlying repository but the data is corrupt or the service is unable to process the data for any reason, the service must respond with an HTTP 502 Bad Gateway response. If a more definitive status message is available, then the service should use it in place of the generic "Bad Gateway." This message will be dependent on the underlying repository.
If the service is not able to communicate with the underlying repository for any reason, it must respond with an HTTP 503 Service Unavailable response. If a more definitive status message is available, then the service should use it in place of the generic "Service Unavailable." This message will be dependent on the underlying repository.
If the service has established a connection to the underlying repository and is attempting to retrieve data, but the request has timed out, the service must respond with an HTTP 504 Gateway Timeout response. If a more definitive status message is available, then the service should use it in place of the generic "Service Unavailable." This message will be dependent on the underlying repository.
The Content-Type header of the response must be "application/zip". Unless the Transfer-Encoding is chunked, the Content-Length header is required. The Content-MD5 header is also required. A Last-Modified header should be present with the date on which the package was last modified by the repository, if available.
<LRCRUDS date='' version=''>
<packageIdentifier>...</packageIdentifier>
<repositoryInformation>
<premis:agent>
...
</premis:agent>
</repositoryInformation>
<metsFilename>...</identifier>
</LRCRUDS>
TODO: Investigate using the Java JAR file manifest instead of creating our own formats. This would allow functionality such as digitally signing the file. See JAR File Specification for details.
http://wiki.dspace.org/LightweightNetworkInterface