Capacity Planning

Following is a high-level view of factors to consider when researching what hardware capacity is needed for the Swarm implementation.

Storage Capacity Factors

Expected Object Count and Average Object Size

  • Object count and object size are the primary drivers for capacity planning

  • Object count drives storage cluster memory requirements: more objects require more memory for the cluster's overlay index

  • Average object size multiplied by object count provides the logical storage footprint (the amount of content uploaded to the cluster), but it does not account for the space taken by replicas/segments from the protection scheme

  • Average object size is the key factor (along with cluster size) for which protection scheme to use (replication vs. erasure coding)

See

Choice of Protection Scheme

  • Which protection scheme (chosen) drives the memory requirements for the storage cluster

  • Erasure coding (EC) requires more memory than Replication (which uses more space)

  • Erasure coding impacts CPU performance requirements (because of calculating parity for erasure coding)

  • Required volume footprint is derived from the combination of (object count) x (average object size) x (protection scheme overhead)

    • Replication example: (1 million objects) x (1 megabyte/object) x (2 replicas) = 2 TB

    • EC example: (1 million objects) x (1 megabyte/object) x (5:2 EC scheme or 7/5) = 1.4 TB

RAM per Node

16 GB

32 GB

64 GB

128 GB

RAM per Node

16 GB

32 GB

64 GB

128 GB

Storage Node RAM Index Slots

268M

536M

1073M

2146M

Immutable Objects

268M

536M

1073M

2146M

Mutable Objects

134M

268M

536M

1073M

5:2 Erasure Coded Objects

26M

53M

107M

214M

See Configuring Content Policies

Need for High Availability

Knowing what failure scenarios can and cannot be tolerated helps with design optimization:

  • A requirement for high availability (HA) drives extra capacity needed to cover more catastrophic disk and server failures

  • Designs typically account for either multiple volumes or multiple server failure scenarios

  • Availability requirements vary in complexity and feedback to protection scheme choice

Best Practice

Start expanding the cluster when the cluster capacity reaches 80%.

Memory for Overlay Index

  • A cluster may have other features enabled that require more resources to support them

    • Example: Overlay Index for large clusters (32+ nodes)

  • Always consider and account for the resource impact of a given feature/setting before enabling it in a cluster

Best Practice

Allow an additional 25% of cluster memory to support the Overlay Index.

Elasticsearch (Search and List)

  • Provides the ability to search for and list objects based on metadata

  • Always assume full index of object metadata (custom metadata)

  • Memory: 64 GB RAM per 1 billion distinct objects

  • Disk: 1.5 TB required for 1 billion distinct objects

  • Networking: 1 Gb Ethernet minimum

  • Minimum 3 to 4 server counts for redundancy and performance

  • Scale out as needed by adding more Elasticsearch servers

Gateway (Including S3)

  • Provides reverse proxy into storage with added protocol conversion support (S3), authentication, and authorization policy enforcement

  • Best treated with a “scale out” approach (think “web farm” behind a load balancer)

  • The underlying engine is Java (Jetty)

  • Tuned out of the box to account for large session counts based on field feedback

  • Memory/CPU/Disk requirements are light for a single Gateway server (4 GB RAM/multi-core x86-64/4 GB Disk)

  • Networking needs to align with the choice used for the Storage Cluster (use 10 Gb interfaces for the Gateway servers if the Storage Cluster is using 10 Gb interfaces)

SwarmFS

  • Provides a protocol gateway for NFS clients (NFS v4.1 to SCSP+)

  • The level of concurrent write requests drives the resource requirements

Best Practice

Split up the different NFS client workloads across multiple SwarmFS servers (“scale out”).

  • Memory/CPU/Disk requirements are higher than Gateway (recommended baseline of 16 GB RAM/multi-core x86-64/40 GB Disk)

  • As with Gateway, align networking choices with the Storage Cluster choice to guarantee throughput

FileFly

  • Provides a transparent tiering mechanism to move data from Windows or NetApp file servers into a Swarm storage cluster

  • Deployments can range from “single server” configurations to multi-server/high-availability architectures

  • Agent software has a small footprint (minimal servers require 4 GB RAM, x86-64 CPU, 2 GB Disk for logs, etc.)

  • Treat as a “scale out” solution to support multiple Windows/NetApp file servers (multiple migration agents, multiple fpolicy servers)

  • Verify the servers under FileFly source management are “close” to Swarm on the network (avoid routing)

  • Align network interface choice for FileFly components with those used in Storage Cluster for best throughput/latency characteristics

  • Capacity planning for the FileFly source servers becomes important when performing a large de-migration from Swarm

  • Verify this scenario is planned for when assigning storage shares from the source servers to clients

 

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.