Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
minLevel1
maxLevel2
outlinefalse
typelist
printablefalse

In addition to updating object metadata directly (via COPY), append additional metadata to existing objects without altering the original. This provides a method to extend the metadata of immutable objects, including historical versions, because each object's create date, original metadata, and version sequence remain undisturbed. Annotations provide an additional method for finding and managing objects, such as storing S3 object-level ACLs for the Gateway to enforce.

Info

Important

Swarm cannot be downgraded to an earlier version once this feature is used.

Benefits - Keeping metadata annotation separate from the object itself provides several advantages:

  • Add helpful metadata without changing the object’s create date, original metadata, and version sequence.

  • Retrieve objects as originally written, so applications can distinguish between what was original and what was added later.

  • Annotate immutable objects.

  • Annotate historical versions of objects, independent of the current version of the object. This is keenly important when the metadata is derived from analysis performed on the data, which changes from version to version, or when capturing information about specific versions.

Note

  • This lightweight implementation of annotation does not rely on the annotator and the target object interacting, and the objects do not operate as a pair. For example, there is no single request that returns both objects' headers, and there is no method to merge and resolve conflicts between them.

  • It is recommended to use SCSP COPY operation to update the metadata on an object so the metadata is directly available in listings. This COPY operation is efficient even for large objects, assuming they are erasure-coded, because the manifest stream is copied. For S3, use the PUT with copy operation, specifying the same source and destination.

Annotation Cleanup

There are two key features of this annotation method: (1) validation that target objects exist before annotations are written, and (2) the Health Processor's automated tracking and cleanup of annotation objects after the target object is removed. A target object annotated may be removed from Swarm in one of several ways:

  • SCSP Delete

  • SCSP Write (invalidating the old version)

  • Lifepoint Delete

  • Recursive delete of a parent context (domain or bucket)

Note

The Health Processor it logs a “DECORATION DELETE” AUDIT-level message when purging an annotation during garbage collection. Annotation objects "decorate" a targeted content object.

Regardless of the type of Swarm object annotated (named, alias, immutable, historical version) and the protection type (replicated or erasure-coded), metadata annotation operate largely the same way:

  • Swarm deletes the orphaned annotation during garbage collection if an annotation is created and the target object is later deleted.

  • The target object is completely unaffected if an annotation is created and later deleted. 

    • For named objects only, Swarm replaces the annotation object with a delete marker.

  • Swarm deletes both recursively if deleting a domain or a bucket containing both the original object and the annotation.

  • Create separate annotations for any historical versions if updating a versioned object; Swarm deletes the orphaned annotation during garbage collection when deleting a version.

  • Two outcomes occur based on the position in the version chain if creating and later deleting an annotation on a versioned object: 

    • Historical versions: Swarm removes the annotation. 

    • Current versions: Swarm replaces the annotation object with a delete marker.

Creating Annotations

Metadata annotation makes use of a persisted header, Castor-System-Decorates, which is the ETag of the target object the annotation object is extending (decorating). This is an annotation object, subject to special Health Processor management, if this header is present. The header is valid for all Swarm object types (immutable, alias, and named), but not for context objects (domains and buckets). Both the annotator (decorator) and annotated target object may be versioned. 

Info

Tip

Although it is common to create annotations as metadata-only (Content-Length: 0) objects, annotations are complete objects. Data may be included as part of the annotation.

Create a new annotation object create an object pointing to the ETag of the target and includes the custom metadata to be added, such as GPS coordinates extracted from an existing, uploaded photo:

Extending Metadata with Post-Processed Data
Code Block
languagexml
Content-Length: 0
Castor-System-Decorates: 9282727ffcca3a09e0843281aafc13af
X-GPS-Meta-Longitude: 36; 16; 48.36000000000589
X-GPS-Meta-Latitude: 115; 10; 20.79299999981990

Note

The ETag has no quotes, even though returned ETags are quoted. The ETag must be of a content-containing object, not a delete marker.

Info

Alias Objects

The current ETag of an alias object, not the permanent UUID, must be used for creating annotations. Swarm returns a 400 - Bad Request error if using the UUID.

Searching for Annotations

In the annotation (decorator) object’s Elasticsearch record, the Castor-System-Decorates header value is indexed under the key decorates, and the Elasticsearch configuration templates include the decorates field. Most Swarm queries return this value, if present, as part of the results.

Query argument - Use a “decorates=<uuid>” query argument in Swarm listing queries to find annotation objects for a given ETag (or earlier query result “hash”).

See Listing Operations.

Sample Scenario for Annotations

Suppose a company needs to store surveillance videos as immutable objects (as protection from tampering) in the domain "swarm.example.com". To add a video, use the normal POST, adding the Content-Type of the video and custom metadata for the video's duration, camera location, and camera model:

Code Block
languagebash
curl -i --location-trusted -X POST --post301 \
--data-binary @20170311-972-9928817883.mp4 \
-H "Expect: 100-continue" \
-H "x-example-meta-Start-Time: 2017-03-11T12:00:01.678Z" \
-H "x-example-meta-End-Time: 2017-03-11T13:00:00.421Z" \
-H "x-example-meta-Building: Annex 2" \
-H "x-example-meta-Location: 972" \
-H "x-example-meta-CameraModel: SWDSK-850004A-US" \
-H "Content-Type: video/mp4" \
-H "Content-Disposition: inline" \
"http://swarm.example.com/"

HTTP/1.1 201 Created
Location: http://192.168.1.11:80/e970b3280d5501571c8c6fe9d6838557?domain=swarm.example.com
Location: http://192.168.1.12:80/e970b3280d5501571c8c6fe9d6838557?domain=swarm.example.com
Volume: b3381183a1cfc620d960db3eae1d086d
Volume: 604a44d1a351045553b5481391af0810
Manifest: ec
Content-UUID: e970b3280d5501571c8c6fe9d6838557
Last-Modified: Tue, 28 Mar 2017 19:19:48 GMT
Castor-System-Encoding: zfec 1.4(2, 1, 524288, 200000000)
Castor-System-Version: 1490728788.934
Etag: "681b2470307b9260fb83542903e51828"
Replica-Count: 2
Date: Tue, 28 Mar 2017 19:22:19 GMT
Server: CAStor Cluster/9.2.0
Content-Length: 46
Content-Type: text/html
Keep-Alive: timeout=14400
<html><body>New stream created</body></html>

To verify the video is successfully stored, use a HEAD command:

Code Block
languagebash
curl --head --location-trusted "http://swarm.example.com/e970b3280d5501571c8c6fe9d6838557"

HTTP/1.1 200 OK
Castor-System-CID: 7e7fd5d747d244726af93c726672408b
Castor-System-Cluster: swarm.example.com
Castor-System-Created: Tue, 28 Mar 2017 19:19:48 GMT
Content-Disposition: inline
Content-Type: video/mp4
Last-Modified: Tue, 28 Mar 2017 19:19:48 GMT
x-example-meta-Building: Annex 2
x-example-meta-CameraModel: SWDSK-850004A-US
x-example-meta-End-Time: 2017-03-11T13:00:00.421Z
x-example-meta-Location: 972
x-example-meta-Start-Time: 2017-03-11T12:00:01.678Z
Manifest: ec
Content-Length: 1500964975
Etag: "681b2470307b9260fb83542903e51828"
Castor-System-Domain: swarm.example.com
Volume: b3381183a1cfc620d960db3eae1d086d
Date: Tue, 28 Mar 2017 19:24:25 GMT
Server: CAStor Cluster/9.2.0
Keep-Alive: timeout=14400

The custom metadata is what makes it possible and practical to identify video of interest. Suppose an incident occurs in the Annex 2 building. Search for immutable video taken at Annex 2 during the time span to find surveillance video relevant to the investigation:

Code Block
languagebash
curl -i --location-trusted "http://swarm.example.com/?domain=swarm.example.com&format=json&fields=all\
&stype=immutable\
&content-type=video/mp4\
&x-example-meta-Building=Annex%202\
&x-example-meta-Start-Time:date=<2017-03-11T12:17:23Z\
&x-example-meta-End-Time:date=>2017-03-11T12:17:23Z"

HTTP/1.1 200 OK
Castor-System-Alias: 7e7fd5d747d244726af93c726672408b
Castor-System-CID: ffffffffffffffffffffffffffffffff
Castor-System-Cluster: swarm.example.com
Castor-System-Created: Tue, 28 Mar 2017 19:19:29 GMT
Castor-System-Name: swarm.example.com
Castor-System-Owner: @CAStor administrator
Castor-System-Version: 1490728769.536
X-Timestamp: Tue, 28 Mar 2017 19:19:29 GMT
Last-Modified: Tue, 28 Mar 2017 19:26:30 GMT
Transfer-Encoding: chunked
Content-Type: application/json; charset=utf-8
Castor-Object-Count: 1
Castor-System-Object-Count: 1
Date: Tue, 28 Mar 2017 19:26:30 GMT
Server: CAStor Cluster/9.2.0
Keep-Alive: timeout=14400
[{
   "contextid":"7e7fd5d747d244726af93c726672408b",
   "x_example_meta_start_time":"2017-03-11T12:00:01.678Z",
   "x_example_meta_end_time:date":"2017-03-11T13:00:00.421Z",
   "@timestamp":1490728939869,
   "domainid":"7e7fd5d747d244726af93c726672408b",
   "last_modified":"2017-03-28T19:19:48.932400Z",
   "bytes":1500964975,
   "hash":"681b2470307b9260fb83542903e51828",
   "x_example_meta_location:double":972.0,
   "content_disposition":"inline",
   "sizewithreps":2251447463,
   "content_type":"video/mp4",
   "timestamp":1490728939869,
   "x_example_meta_location:long":972,
   "x_example_meta_location:date":972000,
   "x_example_meta_end_time":"2017-03-11T13:00:00.421Z",
   "name":"e970b3280d5501571c8c6fe9d6838557",
   "castor_stream_type":"immutable",
   "x_example_meta_building":"Annex 2",
   "x_example_meta_location":"972",
   "x_example_meta_cameramodel":"SWDSK-850004A-US",
   "x_example_meta_start_time:date":"2017-03-11T12:00:01.678Z"
}] 

The search correctly finds a video of interest: e970b3280d5501571c8c6fe9d6838557

Adding Metadata Annotation

With the video stored securely, suppose the organization also needs to run an application to perform facial recognition on the video. An application generates data when it is run, including both information on the algorithm/settings and the detailed results. The original video object must remain read-only to serve as evidence, so the derived data and metadata must be stored with a method associating it with the original object without altering it.

The solution is to annotate the video with a decoration object (which can be named or unnamed) to associate the results with the original video.

Info

Important

The Castor-System-Decorates header always refers to the ETag of the original video, not the GUID; this is a precaution against the annotation becoming orphaned if the target object is mutable. 

Code Block
languagebash
curl -i -X POST --post301 --location-trusted -d @results \
-H "Castor-System-Decorates: 681b2470307b9260fb83542903e51828" \
-H "x-VideoAnalysis-meta-Algorithm: facial-recognition" \
-H "x-VideoAnalysis-meta-Version: 8.7" \
-H "Content-Type: application/vnd.analysis.facerec" \
"http://swarm.example.com"

HTTP/1.1 201 Created
Location: http://192.168.1.13:80/0cb2d9e90a3341b10bc9dba27f27259c?domain=swarm.example.com
Location: http://192.168.1.12:80/0cb2d9e90a3341b10bc9dba27f27259c?domain=swarm.example.com
Volume: 6bd38289c2a8fb314caf902d9811fb87
Volume: 604a44d1a351045553b5481391af0810
Manifest: ec
Content-UUID: 0cb2d9e90a3341b10bc9dba27f27259c
Last-Modified: Tue, 28 Mar 2017 20:26:08 GMT
Castor-System-Encoding: zfec 1.4(2, 1, 524288, 200000000)
Castor-System-Version: 1490732768.888
Etag: "867c10c9e6649313a3a5eed2cc76f307"
Replica-Count: 2
Date: Tue, 28 Mar 2017 20:26:12 GMT
Server: CAStor Cluster/9.2.0
Content-Length: 46
Content-Type: text/html
Keep-Alive: timeout=14400
<html><body>New stream created</body></html>
Info

Tip

Create annotations as part of the intake process that stores the original objects in Swarm.

To find any annotations producing facial recognition on the original object, search for objects that decorate the video and also qualify the search to look for facial recognition results:

Code Block
languagebash
curl -i --location-trusted "http://swarm.example.com/?domain=swarm.example.com&format=json&stype=immutable&fields=all\
&decorates=681b2470307b9260fb83542903e51828\
&x_videoanalysis_meta_algorithm=facial%20recognition"

HTTP/1.1 200 OK
Castor-System-Alias: 7e7fd5d747d244726af93c726672408b
Castor-System-CID: ffffffffffffffffffffffffffffffff
Castor-System-Cluster: swarm.example.com
Castor-System-Created: Tue, 28 Mar 2017 19:19:29 GMT
Castor-System-Name: swarm.example.com
Castor-System-Owner: @CAStor administrator
Castor-System-Version: 1490728769.536
X-Timestamp: Tue, 28 Mar 2017 19:19:29 GMT
Last-Modified: Tue, 28 Mar 2017 20:36:40 GMT
Transfer-Encoding: chunked
Content-Type: application/json; charset=utf-8
Castor-Object-Count: 1
Castor-System-Object-Count: 1
Date: Tue, 28 Mar 2017 20:36:40 GMT
Server: CAStor Cluster/9.2.0
Keep-Alive: timeout=14400
[{
   "sizewithreps":11684987,
   "contextid":"7e7fd5d747d244726af93c726672408b",
   "content_type":"application/vnd.analysis.facerec",
   "name":"0cb2d9e90a3341b10bc9dba27f27259c",
   "castor_stream_type":"immutable",
   "timestamp":1490732772033,
   "@timestamp":1490732772033,
   "domainid":"7e7fd5d747d244726af93c726672408b",
   "decorates":"681b2470307b9260fb83542903e51828",
   "x_videoanalysis_meta_algorithm":"facial-recognition",
   "x_videoanalysis_meta_version:long":9,
   "x_videoanalysis_meta_version:date":8700,
   "x_videoanalysis_meta_version":"8.7",
   "hash":"867c10c9e6649313a3a5eed2cc76f307",
   "last_modified":"2017-03-28T20:26:08.888400Z",
   "x_videoanalysis_meta_version:double":8.7,
   "bytes":7789991
}]

The search correctly finds an annotation: 0cb2d9e90a3341b10bc9dba2 

Note

Elasticsearch is a NoSQL (non-relational) database that does not support joins directly, so queries for a primary object and an annotation object cannot be combined.