Hardware Requirements for Elasticsearch

An Elasticsearch cluster supports Swarm searches. The Swarm feeds mechanism (see https://perifery.atlassian.net/wiki/spaces/public/pages/2443811146) populates the metadata search servers running the Elasticsearch (ES) software.

See https://perifery.atlassian.net/wiki/spaces/public/pages/2443809573

Info

Elasticsearch was previously used to store historical metrics, but that has moved to Prometheus starting with Swarm 14. Gateway https://perifery.atlassian.net/wiki/spaces/public/pages/2443817185 stores csmeter indices in Elasticsearch, but that does not impact Elasticsearch hardware requirements as much as a Swarm Search Feed.

This software requires one or more servers running RHEL/CentOS 7 or 8 Linux. Although Elasticsearch runs on other Linux platforms, DataCore currently provides support and testing for these versions. The Elasticsearch version provided with the Swarm distribution is supported.

See the Elasticsearch project website for more about Elasticsearch.

Do Not Install on Management Node

Both the Content Gateway and the production Elasticsearch cluster need to be on separate machines from the management node (SCS). The management node installs with Service Proxy and a single-node ES, which are dedicated to the Swarm UI.

Hardware Best Practices

Following are overall best practices, with hardware recommendations from Elasticsearch:

  • Provision the machines with CPUs with at least 4 cores and 64 GB of memory. Between faster processors or more cores, choose more cores.

  • Use solid-state drives (SSD), preferably local, not SAN, and never NFS. This is critical for S3, especially for rapid small object writes and for the listing of buckets with millions of objects.

  • Let Support know if you are using hard disks that do not handle concurrent I/O as well as SSDs:

    • Select high-performance server disks.

    • Use RAID 0 with a writeback cache.

    • Set index.merge.scheduler.max_thread_count to 1 to prevent too many merges from running at once.

      curl -X PUT <ES_SERVER>:9200/<SWARM_SEARCH_INDEX>/_settings \ -d '{ “index”: {“merge.scheduler.max_thread_count”: 1}}'
  • As with the storage cluster, choose similar, moderate configurations for balanced resource usage.

RAM for Elasticsearch

RAM is key for Elasticsearch performance. Use these guidelines as a basis for capacity planning:

  • 64 GB of RAM per machine is optimal (recommended by Elasticsearch).

  • Dedicate half of the total RAM to the Java Virtual Machine (JVM) running Elasticsearch, but do not exceed 31 GB for best performance. Heap assignment is automatic with 7.17.

  • Disable swapping of the Elasticsearch image.

Optimal performance can be achieved by adding adequate RAM in the ES servers to store all database shards in memory. Take steps to disable or mitigate swapping. Memory page swapping on an ES server impacts Elasticsearch performance.

Important

Watch for sustained increases in page swapping and disk I/O when monitoring the ES servers. This may mean additional RAM is needed on an ES server or additional servers need to be deployed to offset the load.

Disk Usage for Search

The storage on the Elasticsearch servers is used to persist the shards of the Swarm Search. Follow these guidelines for capacity planning for the Swarm Search indices.

  • Baseline metadata to support listing: 150 GB per 200 million objects

  • Full metadata to support ad-hoc searching: 300 GB per 200 million objects

  • Custom metadata to allocate additional storage in proportion to indexing a large amount of custom metadata

These are unique objects, not replicas; how many Swarm replicas a Swarm object has is irrelevant to the ES servers. There is only one metadata entry for the object, no matter how many replicas of an object exist in the cluster.

Optimizing Disk I/O for ES

Elasticsearch heavily utilizes disks, so higher throughput results in more stable nodes. Follow these Elasticsearch guidelines for optimizing disk I/O:

  • Use SSDs: SSDs are critical for performance. With SSDs, verify the OS I/O scheduler is configured correctly.

  • Use RAID 0: Striped RAID increases disk I/O, at the expense of potential failure if a disk dies. Do not use mirrored or parity RAIDS, because replicas provide this functionality.

  • Do not use remote-mounted storage: such as NFS or SMB/CIFS; the latency negatively impacts performance.

  • Avoid virtualized storage, such as a SAN or EBS (Amazon Elastic Block Store). Even when SSD-backed, it is often slower than local instance storage and it conflicts with the purpose of replicas and sharding.

Optimizing Disaster Recovery for ES

Elasticsearch clustering is designed to mitigate the impact of hardware and power failures, so long delays from refreshing the search data are not experienced. Determining how to invest and optimize hardware depends on how important metadata search and querying are to the organization and how long these features can be offline while Elasticsearch rebuilds data.

These are the principles for making a configuration more disaster-proof:

  • Do not use and rely on a single Elasticsearch server. This introduces vulnerabilities and potential disruption of search capabilities, and it risks too little capacity to support all search needs.

  • For power failure protection, deploy enough Elasticsearch servers to survive multiple server failures and distribute them across different power sources.

  • Set up Elasticsearch with multiple nodes spread equally among the subclusters if the cluster is divided into subclusters to match the power groups. This strategy improves the survivability of a power group failure.

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.