Managing and Optimizing Feeds

Configuring Target Clusters

Uneven Filling: Use one of these configuration strategies if there are concerns about uneven filling of the target (DR) cluster of the replication feed:

  • Run DR clusters in full performance mode by disabling Power-Saving Mode: power.savingMode = false (SNMP: powerSavingMode)

  • Lower the setting that limits the difference in capacity between volumes, which defaults to 20%: bidding.idleCost = 20 (SNMP: biddingIdleCost)

    • For pure DR clusters with no other traffic, set it to 5% to compensate for sleep cycles or feed definitions that favor particular nodes/volumes: bidding.idleCost = 5

    • For mixed-purpose clusters where remote replication causes uneven filling in the cluster, set it to 10%: bidding.idleCost = 10

Optimizing Replication Rate

You may need to adjust the rate at which replication occurs for these situations:

Need

Cause/Concern

Action

Need

Cause/Concern

Action

Speed up replication

Large volume of very small objects

Contact DataCore Support for specific settings adjustments.

Slow down replication

Low bandwidth and full cluster may trigger denial of service

In the networking routers/switches, enable the native rate-limiting features.

How Swarm Parallelizes Replication

The replication feed feature in Swarm seeks a high degree of parallelism in replicating objects between your source and target clusters. For each replication feed on each node, the replication feed is creating a batch of items to replicate. The size of the batch may be as large as 200 items, or smaller if a batch cannot be filled in 30 seconds. Once the batch of items has been filled sufficiently, it is sent to a node in the target cluster using long-running GET request that waits for the target cluster to replicate the items in the batch. When the batch of work has completed, the source cluster node can fill another batch. This mechanism creates a constant load of replication work for the target cluster. These GET/retrieve requests are relatively small and do not, in themselves, use much bandwidth.

Each node in the target cluster may be accepting work from multiple source cluster nodes from any number of source clusters and any number of feeds. Additionally, the source cluster may have a greater number of nodes than the target cluster. To prevent target cluster nodes from being overloaded, each node in the target cluster does two things. First, it delegates replication work to other nodes in the target cluster as a method of load balancing. Second, each node limits the number of replications that can be done simultaneously, regardless of source. Swarm's defaults assume a moderate client load.

Precise means of throttling can be achieved using networking technologies, including QOS bandwidth limiting and the use of bandwidth-limiting forward proxies out of the target cluster or reverse proxies into the source cluster.

Deleting Search Data

The Elasticsearch index (database) of search data remains on disk after you delete the feed; if you no longer need it, you need to delete it manually.

To delete the search data, you need to reference the search index, which is the same as the name of your cluster: 

Delete search data
curl -X DELETE "http://{ip- elasticsearch}:9200/{cluster- name}"

Note

The Elasticsearch server manages additional indices related to Swarm cluster: Swarm Storage and Gateway store historical metric information in rolling indices. Deleting search data does not affect historical data.

Feeds with Versioning

Note: Swarm Versioning is supported for both feed types. Feeds process each object twice by default, on creation and deletion. Feeds push objects as frequently as needed to verify they stay current with versioning enabled. 

For an introduction, see https://perifery.atlassian.net/wiki/spaces/public/pages/2443812258 in Swarm Concepts.

For implementation, see https://perifery.atlassian.net/wiki/spaces/public/pages/2443812387 in the SCSP Reference.

Replication Feed

Because object versioning is based on domains and buckets, which are replicated between clusters, object versions are also replicated between clusters. The replication feed processes historical object versions as well as current object versions.

Required

Upgrade both the source and target clusters to the same version of Swarm before enabling versioning in both clusters to use versioning with replications feeds.

Replication feeds replicate all versions, current and historical, to the remote cluster and allows the remote cluster to decide whether to keep the versions and how to integrate them into its version chain linkages.

Troubleshooting

It is possible for the older version to not be replicated in the target cluster if an object is versioned before the bucket/domain in the target cluster is updated to enable versioning. use the SCSP SEND command to re-transmit older versions manually if this occurs.

Search Feed

The search indices represent the current version of every object in the cluster. When a formerly obsolete named or aliased object becomes the current version again, the version number is based on an update time, provided on the object's metadata. This guarantees when Swarm decides a replica is the new current version, that fact is eventually updated in search. Because different replicas of the same object version may transition to "current" at different times, it is possible different replicas update the Elasticsearch record for an alias or named object more than one time. The latest such update wins. Swarm's update collision mechanisms prevent duplicate ES updates and minimize redundant updates.

Effect on Search Indices

  • Two new indices represent all existing versions of aliases and named and the delete markers, with no information about which is the current version. 

  • When versioning is enabled or suspended, every new named or alias version creation results in a new versioned name or alias record. 

  • When specific versions are deleted, either by SCSP or by HP, the corresponding versioned record is removed. 

  • The primary key of these records is the version ID (Etag), which is unique to each version. 

  • Both alias and named object versions have a flag that indicates whether the version is a delete marker, and that information is captured in these indices.

  • The search index schema upgrade for versioning does not require reindexing of your data.

Effect on Searches

  • Your existing search queries do not need to change.

  • You can add the versions query argument to obtain all existing versions. 

  • Swarm returns versions in the order of the version chain, starting with the current version and ending with the original version. (When simultaneous updates occur, Swarm saves all updates but determines which position each occupies in the chain.)

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.