Application and Configuration Guidance

Follow the below guidance when developing Swarm applications or configuring Swarm clusters:

Changing Drive Controllers

Administrators must not move Swarm storage drives between drive array controller types after the drive has been formatted by Swarm. Each controller reports available drive space to Swarm, matching the controller. Many controllers claim the last section of the drive, reducing the total available drive space. The new controller may claim additional drive space not reported to Swarm if switching drives with another controller. Swarm may attempt to write data to non-existing drive space, generating I/O errors.

Indexer Query Arguments

The indexer searching syntax allows for repeated constraints on a field name in the HTTP query string. Verify the HTTP client library is passing all instances of the repeated name and not consolidating the repeats into one name/value pair if having problems.

SNMP Behavior with snmpwalk and snmpgetnext

To be consistent with standard SNMP behavior, the following changes are made to Swarm's SNMP agent:

  • All scalar object IDs (OIDs) end with ‘.0’

  • All tables' OIDs ‘x' for row ‘r’ are returned as 'x.r’ from a snmpwalk or snmpgetnext. Custom applications using snmpwalk or snmpgetnext may need to be changed.

Known Issues with Windows 200x Server Time Synchronization

DataCore strongly recommends configuring a cluster to use Network Time Protocol (NTP) as documented about the timeSource. Windows 200x servers as the NTP time source cannot be used. Windows servers are not reliable enough to provide highly accurate time synchronization as discussed in Microsoft KB article 939322. Consider the following possibilities as an alternative for using time synchronization available in Windows servers:

  • Use NTP servers available on the internet, such as the servers discussed on the NTP Pool Project page. Open a port in the firewall to enable the cluster to use external NTP servers.

  • Use an open-source NTP package such as the Windows based NTP Time Server Monitor.

  • Use a commercial Windows NTP package or deploy a dedicated NTP hardware solution in the network.

Virtual Deployments

Administrators wanting to run Swarm in a virtual environment such as VMware must contact a Support representative for restrictions and guidelines prior to deploying Swarm in a VM.

Duplicate Domain and Bucket Creation in Mirrored Clusters

In a certain cluster configuration referred to as active-active, do not attempt to create the same domain or same bucket in the same domain in each cluster. Instead, create the domain or bucket on one cluster and wait for it to be replicated in the other cluster. Failure to perform this results in the domain or bucket with the latest creation date taking precedence and objects contained in the other domain or bucket being inaccessible.

Use Curl 7.20.1 or Later

Curl 7.20.1 or later must be used if using curl with Swarm, and using the authorization feature. There are known issues with earlier curl versions.

Consumer-Grade Drives

Some non-enterprise-class drives have lengthy error recovery logic. When an error occurs on these types of drives, it may take minutes for a read or write operation to complete. In these cases, the client can experience very long response times or may see a socket timeout if the delay is too long. Many enterprises or server-grade drives are designed to return errors within a limited period and allow recovery or rebuild operations to begin immediately and to eliminate the lengthy delays on I/O operations.

Info

Swarm does not support consumer-grade drives in high-demand environments.

Avoid Client Timeouts with Large Objects

Client operations with large objects (1 GB or greater) can take several seconds or more, depending on object size. Clients who support large object operations are recommended to set socket timeouts accordingly to avoid client timeout errors.

Time Clock Synchronization for Client Servers

When formatting storage policies in lifepoint headers, the local clock on the machine that is creating the lifepoint, must be accurate so the end dates of the lifepoints reflect the true UTC time. The Swarm cluster can synchronize itself to an accurate time source. If the client mistakenly specifies the wrong end date in an object's storage policy, perhaps because the local clock is set incorrectly, there can be unintended consequences, including premature deletion, when Swarm enforces the policy.

Replica Terminology

The term replica has special meaning in the context of Swarm. All instances of an object stored in a Swarm cluster are identical; there is no original. Therefore, saying there are two replicas of an object means there are exactly two identical instances (not an original plus two copies).

Not Found Errors in a Busy Cluster

A Read, Info, Update, or Delete request to a heavily loaded cluster may rarely result in a 404 Not Found response, even if the requested object is present in the cluster. Retry the request until it succeeds; if the client has prior knowledge, a certain object is stored. A single retry is usually required.

Network Interface Required

Every Swarm node requires a working network connection. If a network cable is unplugged or if the network is not operational at any time during or after startup, neither SNMP nor SCSP is available. The indication of this condition is in the attached console (if there is one), where errors such as "Network Unreachable" display. Once the connection is restored, Swarm recovers and continues running. Implement SNMP or ping monitoring to verify the network connectivity among Swarm nodes.

HTTP Client Library Limitations

Some HTTP client libraries, including Microsoft .NET HttpWebRequest and httplib in Python, do not handle the Expect: 100-continue header properly. A client is recommended to include this header when writing content greater than 64 KB and wait after sending the initial headers for a response before sending additional content to Swarm. Possible responses are a redirect (301 or 307), an error response, or the 100 Continue response; continue sending data now. Per the HTTP specification, it is not permissible to continue sending data before receiving a 100-continue response from the server when an Expect: 100-continue header has been included in the request. These issues are resolved in the Swarm Software Development Kit but integrators writing a non-SDK client need to consider these limitations.

Available Drive Space

The available drive space reported by the Swarm Management Console and SNMP is an accurate estimate of the amount of usable space available on a volume or node. The calculation of this value takes into account a number of internal considerations not immediately visible to an administrator. Swarm reserves space on a volume equal to two times the size of the largest object or EC segment stored on a given volume to allow for continuous defragmentation. The first object or EC segment stored on a volume appears to consume more space than expected. The UUID of the object or EC segment used to reserve defrag space is available in the CARINGO-CASTOR-MIB. 

Available Index Slots

The management console may slightly overestimate the number of available index slots in a node. For capacity planning purposes, use the estimates provided in the memory table.

Retire in Small Clusters

To retire a node or volume, there must be at least two suitable nodes in the cluster having storage space available. Volume-less nodes and nodes retired or retiring, do not count as suitable nodes. In a multi-server configuration, suitable nodes can include other nodes running in the same physical chassis as the retiring node or volume.

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.