Time of Last Access - atime

Swarm can capture and persist the time of last access ("atime") on objects and add it to the search feed. This allows search queries to list objects that may be candidates for deletion or tiering (moving to cheaper storage). Write an application using atime values to purge "cold" objects not read in the last three years. Swarm stores the atime as the Castor-System-Accessed header and indexes it as the accessed field in Elasticsearch, which is useful for bulk evaluations of content. 

Performance Impacts

Tracking atime does affect performance, so enable only if needed. Tracking access times can incur long-tail latencies on first reads, particularly when disk demands are heavy. For around 90% of objects read for the first time, the latency is negligible (<1 ms); when requests queue on specific volumes do the effects become noticeable. Subsequent reads within the window of the disk.atimeGranularity value have no performance impact.

Having high numbers of small object reads (such as thumbnail images) can cause memory indexes to run full.

Implementing atime

Because support for "atime" involves changes to the underlying Elasticsearch schema, existing feeds cannot be restart after a Swarm upgrade, as the written and accessed fields are not populated for some records have the incorrect type. 

Tip

The atime feature requires a rebuilding of the search index, so take the opportunity to migrate to Elasticsearch 6 (https://perifery.atlassian.net/wiki/spaces/public/pages/2443809821) with the same reindexing.

  1. For intensive READ access scenarios, provision additional memory to support the load on the in-memory index.

  2. Finish installing the storage cluster to Swarm 10, and install the latest versions of the Swarm metrics and search RPMs in the Elasticsearch cluster.

  3. Enable the cluster setting for the feature, which is disabled by default: disk.atimeEnabled = true

  4. Create a new search feed, which uses the new Elasticsearch schema that supports atime.

  5. Complete these steps to transition to the new search food if a previous feed exists:

    1. After the new feed completes processing, make the new feed the Primary.

    2. Pause the old feed.

    3. Delete the old feed and the old index data after verifying the new feed is working as expected.

Configuring atime

The public settings for "atime" are dynamic. These values can be updated on one node and Swarm updates all others, and the values persist across reboots. Following are all settings that control the gathering of atime information:

Settings for atime

Default

Type

Description

Settings for atime

Default

Type

Description

disk.atimeEnabled
SNMP: accessedTimeEnabled

False

bool

Whether to track the time of last access on GET requests, stored in the Castor-System-Accessed header and indexed as the search field 'accessed'. Increases the load proportionally to the load of GETs in the cluster.

disk.atimeGranularity
SNMP: accessedTimeGranularity

86400

int

In seconds; defaults to 1 day. The window of time during which atime is not updated. Multiple reads may have occurred within window of time.

Lowering the value affects GET performance. A 1-second granularity provides most accurate accessed time results, but results in a GET performance penalty due to increased disk access.

disk.atimeEnabledTime
SNMP: accessedTimeEnabledTime

0

float

Non-UI. Read-only. The Linux epoch timestamp recorded when disk.atimeEnabled was set to True.

This time is nulled out in SNMP, REST API, and phone home reports if the atime feature is later disabled.

Using atime with SCSP

Swarm keeps a record of the request time of each object's last write or read (successful GET request) when enabling atime tracking for the cluster, and it sends that time to Elasticsearch as the accessed date field, for use in search queries. HEAD operations do not change an object's atime. To access atime without Elasticsearch, check the SCSP headers Swarm adds to the objects.

With atime enabled, both SCSP HEAD and GET requests include a Castor-System-Accessed header on the response when the verbose query argument is used. The Castor-System-Accessed response header has either the value of Castor-System-Created (because the object has not been read since the feature was enabled or the object was written) or else the read atime in the same GMT-based time format as Castor-System-Created. The 1-day granularity (default) in updating atime means additional reads may have occurred within that window of time.

Exceptions - GET requests trigger atime updates, except for these situations:

  • Administrative and authorized admin requests

  • Swarm requests for replication and other internal GET requests, such as for domains, settings, or manifests

  • Any request with the special query argument to suppress recording atime: notaccessed

  • Any request performing an integrity check or other specialized operation

Tip

The atime information is most useful on a HEAD request since the atime is returned without changing it. Although atime is returned on a GET request, it is simultaneously updated by the operation.

To determine if an object has been read, HEAD the object using the verbose query argument.

The Castor-System-Access value matches the Castor-System-Created if a read atime has not occurred:

> curl -I http://192.168.1.12:80/5647f528ea85667a44dc754f975816c6?verbose HTTP/1.1 200 OK Castor-System-Alias: 5647f528ea85667a44dc754f975816c6 Castor-System-Cluster: Baker Castor-System-Created: Wed, 19 Jul 2017 17:42:48 GMT Castor-System-Accessed: Wed, 19 Jul 2017 17:42:48 GMT ...

The Castor-System-Access value is more recent than the Castor-System-Created if a read has occurred:

> curl -I http://192.168.1.12:80/5647f528ea85667a44dc754f975816c6?verbose HTTP/1.1 200 OK Castor-System-Alias: 5647f528ea85667a44dc754f975816c6 Castor-System-Cluster: Baker Castor-System-Created: Wed, 19 Jul 2017 17:42:48 GMT Castor-System-Accessed: Tue, 02 Oct 2018 23:03:56 GMT ...

Using atime with Elasticsearch

In Elasticsearch, the atime value is indexed as the accessed date field, which can be used in Swarm https://perifery.atlassian.net/wiki/spaces/public/pages/2443821817. Both the written and accessed fields are populated in the Elasticsearch record:

Metadata Field

Type

Description

Metadata Field

Type

Description

accessed

date (written and listed as ISO 8601)

The date of last access appears in listing results if requested. The value does not reflect lifepoint conversion or segment consolidation that may have occurred.

Matches the value for written until the first GET operation occurs, after which it updates for each qualified GET.

written

date (written and listed as ISO 8601)

Does not change for a particular object version (ETag).

Admin GET requests do not bump the atime value. Make SCSP GET requests with the notaccessed query argument, to suppress the atime update. This argument allows listing objects for management purposes without erroneously bumping the accessed date, as if an end-user or program had requested the object.

Argument

Value

Description

Argument

Value

Description

notaccessed

"yes"/"true"

Allows a GET request to complete without updating the accessed time on the object if the atime feature is enabled.

Using atime with Content UI

Show and filter on Last Accessed when adding columns and search criteria to the object filters, which is indexed in Elasticsearch as 'accessed'. Filtering on the time of last access includes standard and custom time spans from the present as well as Before and Since ranges. (v11.0)

Click the Search button to add filtering criteria once in a domain or bucket After + Add Search Criteria for Last Accessed, + Add Column Header for Last Accessed as well if the access date needs to appear in the results:

See https://perifery.atlassian.net/wiki/spaces/public/pages/2443820093.

Limitations and Troubleshooting

  • The reported value may be stale if the feature was enabled and disabled repeatedly. To determine staleness, compare any value against the time when the atime feature was enabled: atimeEnabledTime.

  • The reported atime, if present, is an atime defaulting to the granularity of a day; therefore, it is not necessarily the precise time (hours and minutes) of last access.

  • Replication feeds do not get re-processed for atime changes.

  • Swarm can lose a recent read atime for an object on that volume if a volume is lost.

  • Stale atimes remain in the Elasticsearch records if the atime feature is disabled after having been enabled. Applications using this field need to check the state of the feature (whether disk.atimeEnabled = True) and when the feature was last enabled (disk.atimeEnabledTime).

  • For monitoring, Swarm provides a per-node count of how many objects are assessed but whose atime is not propagated to Elasticsearch.

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.