Validating a Multipart Write

Validation with Composite-Content-MD5

Swarm enables transfer validation on multipart requests by way of a composite MD5 computed from the Content-MD5 hashes from all parts. The request header that enables multipart validation is Composite-Content-MD5. This header provides an end‑to‑end integrity check of the content (excluding metadata) of a multipart write request at the time of completion. The value can be used to validate the object contents if all parts are stored with a Content-MD5.

  • Storage - The value of the composite MD5 is persisted and indexed in a header called Castor-System-CompositeMD5 in the metadata section of the completed object's manifest. It is not preserved across a PUT or APPEND, but it is automatically persisted across a COPY so MD5 need not be recalculated on very large files, which is inefficient. 

  • Behavior - On a complete request that includes the Composite-Content-MD5 header, Swarm computes the value for the overall request from the MD5 of the concatenated Content-MD5 values stored with each part (in order). 

    • If a Composite-Content-MD5 header is sent in the request, Swarm must calculate and compare it with the stored value. The complete request can succeed only if the composite value Swarm calculates matches the value provided on the header. 

    • On the completion response, the Castor-System-CompositeMD5 header is provided as a trailing header if the Content-MD5 is available on all parts, regardless of whether Composite-Content-MD5 is provided on the request.

    • For newly completed multipart writes, the Composite-Content-MD5 header is also indexed in Elasticsearch, so it appears in listings:

      curl 'https://www.example.com/mybucket?format=json\ &fields=name,tmborn,etag,content-md5,Castor-System-CompositeMD5' [{ "last_modified": "2017-04-08T20:37:02.868400Z", "castor_system_compositemd5": "306cca04302861ed2620a328f286346f-5", "hash": "ae478cc4c3eb28b432825074673aeda9", "name": "samples/5G" }]

       (v9.2)

  • Usage - Pass in a composite MD5 made by taking the md5 of the concatenation of the binary md5 of the parts, in order, with no gaps, providing it as the value of the 'Composite-Content-MD5' header. This triggers Swarm to collect the Content-MD5s from each part and to assemble its comparison value.

  • Failure - If the calculated value does not match the supplied value, or if any part is missing a Content-MD5 header, the request fails with a 409 (Conflict). In this case, review the error message and correct the problem (such as parts missing the Content-MD5 header) before attempting the complete again.

Calculating a Composite-Content-MD5

These are different ways to represent an MD5 hash:

Base64

rbyRpD6YijtbdFuFKakLYQ==

Binary-to-text encoding

hexdigest

adbc91a43e988a3b5b745b8529a90b61 

HEX string representing the hash

base64.b64decode('rbyRpD6YijtbdFuFKakLYQ==').encode('hex')

For Composite-Content-MD5, you need to end up with the HEX digest of the MD5 hash of the concatenated binary MD5 hashes of all parts, in order. The composite value starts with the hex digest of that hashed concatenation of hashes, followed by a hyphen and the number of parts:

{ hash of concatenated part MD5 hashes, in order }-{ number of parts }

{ hash of part1hash & part2hash & … & partnhash }-{ n }

754e6c52092a9c1134d7f047d61db168-3

A multipart object with three parts:

  • Part 1 Content-MD5 = rbyRpD6YijtbdFuFKakLYQ==

  • Part 2 Content-MD5 = 9lzbDNFcX99eTYqZB4QKjg==

  • Part 3 Content-MD5 = 2qHK6cuQufMzJAs6IxTmKQ==

The composite hash is this:

  • Composite-Content-MD5 = 754e6c52092a9c1134d7f047d61db168-3

Calculating it involves this type of process:

partBinaryMD5-1 = base64.b64decode( contentMD5HeaderValue1 ) partBinaryMD5-2 = base64.b64decode( contentMD5HeaderValue2 ) partBinaryMD5-3 = base64.b64decode( contentMD5HeaderValue3 ) Composite-Content-MD5 = hashlib.md5("".join([ partBinaryMD5-1, partBinaryMD5-2, partBinaryMD5-3 ])).hexdigest() + '-3'

Composite-Content-MD5 Example

  1. Given a file divided into three parts, get the hex md5 digest for each part using md5sum.

    • Part 1 hash: babfc3ceb8a4568587b7d31bfff36257

    • Part 2 hash: fae6c82883c12e289bc5f12f3ecf76ef2

    • Part 3 hash: 2afdd827a9e785029f9692e82ea07cca

  2. Concatenate the MD5s together into a new file.

    echo "babfc3ceb8a4568587b7d31bfff36257" >> md5sums.txt
  3. See the result by catting the file:

  4. Convert to binary and hash it to get the composite MD5 using xxd and md5sum.

  5. Append the part count (3) to get the final composite header value:
    12138b95c0af8f8e764f80d719cc7cbd-3

  6. Use the value to complete the multipart write:

Validation with Content-MD5

Use a GET request with the gencontentmd5 query argument and compare the result with the known value to validate an object created by a completed multipart write.

See https://perifery.atlassian.net/wiki/spaces/public/pages/2443822141. (v9.2)

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.