An Elasticsearch cluster supports Swarm searches. The Swarm feeds mechanism (see Understanding Feeds) populates the metadata search servers running the Elasticsearch (ES) software.

See Elasticsearch Implementation

Info

Elasticsearch was previously used to store historical metrics, but that has moved to Prometheus starting with Swarm 14. Gateway Content Metering stores csmeter indices in Elasticsearch, but that does not impact Elasticsearch hardware requirements as much as a Swarm Search Feed.

This software requires one or more servers running RHEL/CentOS 7 or 8 Linux. Although Elasticsearch runs on other Linux platforms, DataCore currently provides support and testing for these versions. The Elasticsearch version provided with the Swarm distribution is supported.

See the Elasticsearch project website for more about Elasticsearch.

Do Not Install on Management Node

Both the Content Gateway and the production Elasticsearch cluster need to be on separate machines from the management node (SCS). The management node installs with Service Proxy and a single-node ES, which are dedicated to the Swarm UI.

Hardware Best Practices

Following are overall best practices, with hardware recommendations from Elasticsearch:

RAM for Elasticsearch

RAM is key for Elasticsearch performance. Use these guidelines as a basis for capacity planning:

Optimal performance can be achieved by adding adequate RAM in the ES servers to store all database shards in memory. Take steps to disable or mitigate swapping. Memory page swapping on an ES server impacts Elasticsearch performance.

Important

Watch for sustained increases in page swapping and disk I/O when monitoring the ES servers. This may mean additional RAM is needed on an ES server or additional servers need to be deployed to offset the load.

Disk Usage for Search

The storage on the Elasticsearch servers is used to persist the shards of the Swarm Search. Follow these guidelines for capacity planning for the Swarm Search indices.

These are unique objects, not replicas; how many Swarm replicas a Swarm object has is irrelevant to the ES servers. There is only one metadata entry for the object, no matter how many replicas of an object exist in the cluster.

Tip

  • Do not confuse this with the RAM-based Overlay Index each storage node maintains, which depends on the total number of replicas in the cluster.

  • It is good to overprovision disk space to allow for a reindex, i.e. adding another search feed (Add Search Feed), to use the same elasticsearch cluster.

Optimizing Disk I/O for ES

Elasticsearch heavily utilizes disks, so higher throughput results in more stable nodes. Follow these Elasticsearch guidelines for optimizing disk I/O:

Optimizing Disaster Recovery for ES

Elasticsearch clustering is designed to mitigate the impact of hardware and power failures, so long delays from refreshing the search data are not experienced. Determining how to invest and optimize hardware depends on how important metadata search and querying are to the organization and how long these features can be offline while Elasticsearch rebuilds data.

These are the principles for making a configuration more disaster-proof: