Walkthrough: Ordering Sets of Filtered Objects

The following are details and guidance for a complex example, how to paginate (list ordered subsets of) the search results on objects matching specific metadata.

This walkthrough shows how and why to combine the use of three related https://perifery.atlassian.net/wiki/spaces/public/pages/2443821835size, marker, and sort.

How to Count Objects in a Bucket

This query returns an empty set (size=0), focus on the header output alone:

$ curl -si -u jdoe "https://jdoe.cloud.acme.com/public/ ?format=json&domain=jdoe.cloud.acme.com&size=0" Enter host password for user 'jdoe': HTTP/1.1 200 OK Date: Wed, 16 Dec 2020 15:55:42 GMT Gateway-Request-Id: 5BF093C3AECC45AD Server: CAStor Cluster/12.0.0 Via: 1.1 jdoe.cloud.acme.com (Cloud Gateway SCSP/7.1.0) Gateway-Protocol: scsp Allow-Encoding: *;q=0 Castor-System-Alias: ac611714399ae0e5f22a628d4e8c26f4 Castor-System-CID: 924273bee8a6e01865d7b2a315ea5ae3 Castor-System-Cluster: foo.tx.acme.com Castor-System-Created: Thu, 10 Sep 2015 19:45:24 GMT Castor-System-Name: public Castor-System-Version: 1441914324.106 X-Last-Modified-By-Meta: jdoe@ X-Owner-Meta: jdoe X-Timestamp: Thu, 10 Sep 2015 19:45:24 GMT X-timestamp: Wed, 16 Dec 2020 15:55:42 GMT Content-Type: application/json;charset=utf-8 Castor-Object-Count: 62 Castor-System-Object-Count: 62 Last-Modified: Wed, 16 Dec 2020 15:55:42 GMT Transfer-Encoding: chunked [ ]

Check the value for Castor-Object-Count to determine how many objects are associated with the search performed. The number of objects in the "public" bucket under domain "jdoe.cloud.acme.com" is 62 per above.

How to Count Filtered Objects

Drill down further and focus on items matching a metadata characteristic. Filter for a specific kind of content (application, audio, image, text, video) being stored in the object, which is recorded in the Content-Type metadata header. Note: filter objects by custom metadata as well.

This search filters for objects holding MP4 video content:

$ curl -si -u jdoe "https://jdoe.cloud.acme.com/public/ ?format=json&domain=jdoe.cloud.acme.com&size=0&content-type=video/mp4" Enter host password for user 'jdoe': HTTP/1.1 200 OK Date: Wed, 16 Dec 2020 17:10:58 GMT Gateway-Request-Id: C28EB97FE6EF3914 Server: CAStor Cluster/12.0.0 Via: 1.1 jdoe.cloud.acme.com (Cloud Gateway SCSP/7.1.0) Gateway-Protocol: scsp Allow-Encoding: *;q=0 Castor-System-Alias: ac611714399ae0e5f22a628d4e8c26f4 Castor-System-CID: 924273bee8a6e01865d7b2a315ea5ae3 Castor-System-Cluster: foo.tx.acme.com Castor-System-Created: Thu, 10 Sep 2015 19:45:24 GMT Castor-System-Name: public Castor-System-Version: 1441914324.106 X-Last-Modified-By-Meta: jdoe@ X-Owner-Meta: jdoe X-Timestamp: Thu, 10 Sep 2015 19:45:24 GMT X-timestamp: Wed, 16 Dec 2020 17:10:58 GMT Content-Type: application/json;charset=utf-8 Castor-Object-Count: 39 Castor-System-Object-Count: 39 Last-Modified: Wed, 16 Dec 2020 17:10:58 GMT Transfer-Encoding: chunked [ ]

Filtering the "public" bucket in domain "jdoe.cloud.acme.com" for MP4 content (content-type=video/mp4) produces a count of 39 videos (Castor-Object-Count: 39).

How to Limit (Page) the Results

Limit the size of the search results when a portion of the search results is needed or the entire set of objects is too large to be displayed in full. Combining three search query arguments provides the control needed:

  • size - Controls the size of the result set, unrelated to object size (content-length). Set it to 0 when the actual listing is not needed.

  • marker - Used with size to paginate large result sets. Use an empty key to begin a new search, then use the last sort key value of the results on the next request to continue pagination.

  • sort - Sorts the results on one or more fields, in the order listed. Sorting defaults to ascending, so add descending (:desc) as needed. Sorting is computationally intensive, so sort output when necessary.

$ curl -s -u jdoe "https://jdoe.cloud.acme.com/public/ ?format=json&domain=jdoe.cloud.acme.com&content-type=video/mp4&marker=&size=5&sort=etag:desc" Enter host password for user 'jdoe': [ { "last_modified": "2018-09-04T17:14:44.848000Z", "bytes": 261671693, "name": "recording-a.mp4", "hash": "ff3ea60737fe1aec9b4a506a23c29fe9", "written": "2018-09-04T17:14:44.848000Z", "accessed": "2018-09-04T17:14:44.848000Z", "content_type": "video/mp4" }, { "last_modified": "2017-07-31T15:37:45.580000Z", "bytes": 77337274, "name": "recording-b.mp4", "hash": "f2402263315cad55c0909f50f7154c13", "written": "2017-07-31T15:37:45.580000Z", "accessed": "2017-07-31T15:37:45.580000Z", "content_type": "video/mp4" }, { "last_modified": "2017-06-14T18:32:28.592000Z", "bytes": 24926795, "name": "recording-c.mp4", "hash": "ed35d20e43af0a5a1757f000905ff653", "written": "2017-06-14T18:32:28.592000Z", "accessed": "2017-06-14T18:32:28.592000Z", "content_type": "video/mp4" }, { "last_modified": "2019-07-19T15:50:53.444000Z", "bytes": 3810394, "name": "recording-d.mp4", "hash": "ec3c93febe2ff19e3c6a6561f8c25363", "written": "2019-07-19T15:50:53.444000Z", "accessed": "2019-07-19T15:50:53.444000Z", "content_type": "video/mp4" }, { "last_modified": "2018-06-29T19:02:45.724000Z", "bytes": 55816215, "name": "recording-e.mp4", "hash": "e7e4a3d4cd8ee0df2894520d0624ceca", "written": "2018-06-29T19:02:45.724000Z", "accessed": "2018-06-29T19:02:45.724000Z", "content_type": "video/mp4" } ]
  • Skip getting the return headers of the request: this is performed because total object account above for objects filtering for is already determined.

  • "marker=", set to empty, starts at the beginning of the result set.

  • "size=5" returns the first 5 of our filtered objects.

  • "sort=etag:desc" sorts the objects in descending order from the "hash" value (ETag) associated with the object which is covered in detail below.

How to Pull the Next Result Set

Select a marker for the subsequent query to get the next five results in the set. Subsequent requests can be selected (marked) by a characteristic (metadata field) returned for the last object in the set. There are many to choose from:

The best practice is to use the "hash" field:

Field

Downsides of use as a Marker

Field

Downsides of use as a Marker

name

Effort: Must URL-encode any special characters

Not guaranteed to be unique except inside a given bucket

last_modified

Not guaranteed to be unique

Can introduce gaps in paging the result sets

Changeable in real time, during the query run itself

hash

None

The "hash" is the object's ETag (entity tag), which is guaranteed to be unique across the entire cluster. It supports queries spanning multiple buckets and domains.

How to Use the Hash as Marker

It takes two steps to page through result sets using the hash value as the marker:

  1. Parse the hash value out of the output for the last object in the previous set.

  2. Set the marker argument to be the hash string.

The hash value listed for the last object in the result above is "e7e4a3d4cd8ee0df2894520d0624ceca", so start our next search for results after the object as follows:

This returns the next set of 5 objects in descending ETag value ordering (sort=etag:desc).

For the next set, parse out the hash for the last object listed (b6e556acd26d43f052490afd0fe42e4f) and continue until walking through all objects returned.

Important

The "sort" argument is computationally intensive. Watch the load on the Elasticsearch cluster to gauge the performance impact when running queries like this.

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.