Content-MD5 Checksums

Content-MD5 checksums provide an end-to-end message integrity check of the content (excluding metadata) as it is sent to and returned from Swarm. A proxy or client can check the Content-MD5 header to detect modifications to the entity body while in transit. A client can provide this header to indicate Swarm should compute and check it as it is storing or returning the object data.

See https://perifery.atlassian.net/wiki/spaces/public/pages/2443820996.

Client-Provided Content-MD5

During a POST or PUT, the client can provide the following Content-MD5 header as specified in section 14.15 of the HTTP/1.1 RFC:

Content-MD5 = "Content-MD5" ":" md5-digest

Where md5-digest is the base64 of the 128-bit MD5 digest (See RFC 1864 for more information).

The md5-digest is computed based on the content of the entity body, including any content coding applied, but not including any transfer-encoding applied to the message body.

  • If this header is present, Swarm computes an MD5 digest during data transfer and then compares the computed digest to the digest provided in the header.

  • When completed, the Content-MD5 data is stored with the object and returned with the GET or HEAD request.

  • If the hashes do not match, Swarm returns a 400 Bad Request error response, abandons the object, and closes the client connection.

Swarm-Provided Content-MD5

Another way to associate a Content-MD5 value with an object is to have Swarm compute the ContentMD5 for the body data of the request. Include the gencontentmd5 query argument in the request to perform this. Swarm returns the Content-MD5 as a header in the 201 Created response. Once computed, the Content-MD5 data is stored with the object and returned as a response header for any subsequent GET or HEAD requests. Note: the gencontentmd5 query argument replaces use of the "Expect: Content-MD5" request header, which is deprecated per RFC 2731. (v9.2)

Tip

The Swarm setting scsp.autoContentMD5Computation automates Content-MD5 hashing. The gencontentmd5 query argument or the deprecated Expect: Content-MD5 header on writes does not need to be included (although a separate Content-MD5 header may want to be supplied for content integrity checking). This setting is ignored wherever it is invalid, such as on a multipart initiate/complete or an EC APPEND. (v9.1)

Ranges - When including ?gencontentmd5 on a GET request with a Range header, any Content-MD5 header stored with the object is omitted in the response headers. Instead, a Content-MD5 of the selected range is returned as a trailing header to the GET request.

For details about Range headers, see section 14.35 (Range) in the HTTP/1.1 RFC.

Validation Failures

When SCSP reading operations request for a Content-MD5 hash validation and there is a hash mismatch, a storage node is removed of the Gateway's connection pool temporarily because of how Swarm reports a hash validation failure.

Storing Content-MD5 Headers

Content-MD5 headers are stored with the object metadata and returned on all subsequent GET or HEAD requests.

  • If a Content-MD5 header is included with a GET request, Swarm computes the hash as the bytes are read, regardless of whether the header was originally stored with the object

  • If the computed and provided hashes do not match, the connection is closed before the last bytes are transmitted, which is the standard way to indicate something went wrong with the transfer.

Content-MD5 and Replication

When providing the gencontentmd5 query argument in a request on a replicated object, the following applies:

  • On a write request (POST, PUT, COPY, or APPEND), the Content-MD5 is calculated, stored with the object, and returned as a response header for that write operation.

  • The Content-MD5 is always returned for any GET or HEAD request written with the gencontentmd5 query argument.

  • When including ?gencontentmd5 on a range read (a GET request with the Range header), Swarm suppresses any stored Content-MD5 from the response headers and instead return a Content-MD5 for the requested range as a trailing header.

Content-MD5 and Erasure-Coding

When providing the gencontentmd5 query argument in request on an erasure-coded object, the following applies:

  • The APPEND operation is no longer supported. If providing a gencontentmd5 query argument on an APPEND, it returns a 400 Bad Request error response.

  • The COPY operation is supported if providing a gencontentmd5 query argument on the existing object's write; otherwise the COPY operation fails.

  • For a range read (a GET request with the Range header), Swarm suppresses any stored Content-MD5 from the response headers and instead return a Content-MD5 for the requested range as a trailing header.

Example Download Verification

You can verify the integrity of a download from Swarm by checking the Content-MD5 published in an object’s metadata with the base64 encoded MD5 digest of the downloaded object. An example of how this is performed using the ‘openssl’ utility is outlined below:

$ curl -sI https://support.cloud.datacore.com/tools/swarm-support-tools.tgz HTTP/1.1 200 OK Date: Tue, 10 Jan 2023 19:12:40 GMT Gateway-Request-Id: A0A1788FF937057D Server: CAStor Cluster/15.0.1 Via: 1.1 support.cloud.datacore.com (Cloud Gateway SCSP/7.10.2) Gateway-Protocol: scsp CAStor-application: CaringoTechSupport Castor-System-CID: 664727e752ca7a48092c73699e909578 Castor-System-Cluster: gem.tx.caringo.com Castor-System-Created: Mon, 09 Jan 2023 18:25:17 GMT Castor-System-Name: swarm-support-tools.tgz Castor-System-Version: 1673288717.693 Content-Type: application/x-www-form-urlencoded Last-Modified: Mon, 09 Jan 2023 18:25:17 GMT X-Last-Modified-By-Meta: tools+swarm X-Owner-Meta: tools+swarm Manifest: ec ETag: "b5dea5b4048f21a0f99880873fa64865" Castor-System-Path: /support.cloud.datacore.com/tools/swarm-support-tools.tgz Castor-System-Domain: support.cloud.datacore.com Volume: 1dc47666d09cdb27bd59cbb731d046ca Content-MD5: EF8xHMmzt3xNjpksfRLo+A== Content-Length: 28398358 $ curl -O https://support.cloud.datacore.com/tools/swarm-support-tools.tgz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 27.0M 100 27.0M 0 0 2928k 0 0:00:09 0:00:09 --:--:-- 2826k $ cat swarm-support-tools.tgz | openssl dgst -md5 -binary | openssl enc -base64 EF8xHMmzt3xNjpksfRLo+A==

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.