Upgrading a Storage Cluster

This section details how to upgrade a Swarm license and cluster nodes for non-CSN clusters.

For release-specific guidance, see https://perifery.atlassian.net/wiki/spaces/public/pages/2443804878 and https://perifery.atlassian.net/wiki/spaces/KB/pages/2443812915.

Types of Upgrades

A single cluster can contain nodes running mixed versions during the upgrade process and no data conversion between versions is necessary unless noted for a release. 

Simple Upgrade

The simplest upgrade method is to reboot the entire cluster at once after the software on all USB flash drives or the centralized configuration location has been updated.

  • Shut down all nodes in the cluster.

  • Upgrade the software.

  • Reboot the nodes.

  • Verify all nodes are healthy.

Rolling Upgrade

Restart the nodes one at a time with the new version and allow the cluster to continue serving applications during the upgrade process to upgrade the cluster without scheduling an outage or bringing down the cluster. Objects continue to be fully accessible during the upgrade if stored with at least two replicas. Wait at least 10 seconds between each node reboot to verify each node can properly communicate the rebooting state to the rest of the cluster and verify other cluster nodes do not initiate recovery for the rebooting node if using the rolling upgrade approach.

Errors

  • Ongoing Processes: Errors similar to 'Castor-System-Cluster value must refer to a remote cluster on RETRIEVE request' may be logged if there are any disk recovery or retire processes ongoing in the cluster during a rolling upgrade. These errors are harmless and stop once all nodes are running the new version.

  • Blocked Feeds: Swarm 9 modifies the feed definition in the persisted Settings object when starting a rolling upgrade from Swarm 8. The change is not supported by Swarm 8, so those nodes get blocked feeds with a config error ("Plugin validation error: Unknown attribute indexAlias") and are unblocked when the last Swarm node has been upgraded. During the rolling upgrade, the feed is blocked on some nodes, which may not support indexing and querying. The feed blocks again in the same way if ever downgrading to Swarm 8: either delete the feed and redefine it, or contact DataCore Support for help updating the feed definition in the persisted Settings object.

Preparing for the Upgrade

To prepare for the upgrade:

  1. Download the upgraded Swarm software from the Downloads section on the DataCore Support Portal.

  2. Important: Review the https://perifery.atlassian.net/wiki/spaces/public/pages/2443804942 for upgrade instructions specific to the version downloaded.

  3. Important: Run the  and resolve any configuration issues with DataCore Support.

  4. Locate the node configuration data, backup configuration files, and license files.

  5. Prepare the node configuration data on new USB flash drives or on a centralized configuration server.

  6. Verify the health of all cluster nodes.

  7. Schedule an off-line window for the cluster down time.

Review the release notes included with the new boot devices prior to starting the cluster upgrade. The release notes contain information about feature changes, operational differences, and any issues affecting how a storage cluster processes and stores data.

Remove USB flash drives from the running nodes to view and back up the configuration and license files. USB flash drives or the configuration server can be updated using the instructions in the README.txt file found in the latest Swarm installation package. Validate the node or cluster configuration file to verify there are no deprecated parameters needing to be removed or renamed after performing the upgrade.

See Configuring the Nodes for how to set the parameters in the node or cluster configuration files.

Return each USB flash drive to the node from which it was removed after all upgrades and validations are completed. Match each USB device to the original node in the cluster and verify the vols parameter which defines the storage devices matches the correct node.

Important

Verify the cluster health by checking for critical error messages on the status page of each node or the SNMP CastorErrTable OID before performing any node upgrade. This process verifies no hardware problems exist that can interrupt the upgrade process. Any problems need to be corrected prior to upgrading the cluster.

When upgrading a single node in a cluster, include the clusterSettingsUUID parameter value in the node or cluster configuration file prior to rebooting the node so the settings file can be located after the nodes reboot.

Upgrading the Nodes

To upgrade the cluster nodes:

  1. Shut down all cluster nodes (or one at a time for a rolling upgrade).

  2. Install the updated USB flash drives on the PXE boot server.

  3. Reboot all nodes.

  4. Verify the cluster is operating normally. 

A simultaneous shutdown of all cluster nodes is the first step in the simple upgrade. The nodes can be rebooted one at a time in a rolling upgrade so the cluster remains online if the cluster cannot be shut down during normal business operations, .

The cluster detects the missing node and the remaining nodes attempt to recover the missing content via the failed volume recovery (FVR) and erasure coding recovery (ECR) processes if performing a rolling upgrade and a cluster node is off-line. The cluster detects the node and the recovery processes stop when the missing node is brought back online.

Tip

Prepare USB flash drives with the upgraded version of Swarm for each node before beginning the upgrade to minimize node downtime and prevent the remaining nodes from filling with recovered content.

Initiating the disk recovery process is not a concern if all nodes are shut down within several seconds of each other. Suspending volume recovery may also be chosen from the Setting window on the Swarm Admin Console to prevent recovery from kicking off while nodes reboot in to the new software version.

See the Suspend Setting.

Example Shutdown Script

This UNIX shell script demonstrates a method of issuing the shutdown command to all cluster nodes. In this example, all nodes in the cluster are defined in the NODES variable.

NODES="192.168.1.101 192.168.1.102 192.168.1.103"  for n in $NODES;  do snmpset -v 2c -c pwd -m +CARINGO-CASTOR-MIB  $n caringo.castor.CastorShutdownAction = "shutdown"  done

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.