Platform Administration

Platform's CLI (command-line interface) is installed by default on the Platform server and supports common administrative tasks.

Rebooting a Cluster

There are two CLI options for rebooting a cluster: full versus rolling.

  • Full reboot notifies every chassis in the cluster to reboot itself at the same time. The entire cluster is temporarily offline as the chassis reboot.

    Full reboot

    platform restart storagecluster --full
  • Rolling reboot is a long-running process that keeps the cluster operational by rebooting the cluster one chassis at a time, until the entire cluster is rebooted. A rolling reboot includes several options, such as to limit the reboot to one or more chassis:

    Rolling reboot

    platform restart storagecluster --rolling [--chassis <comma-separated system IDs>] [--skipConnectionTest] [--skipUptimeTest] [--continueWithOfflineChassis] [--stopOnNodeError]

Requirements

Before a rolling reboot can begin, these conditions must be met:

  1. All chassis targeted for rebooting must be running and reachable. If chassis are offline, set a flag to have them ignored:

    • To skip the connection check altogether, add the flag --skipConnectionTest

    • To have the reboot process ignore currently offline chassis, add the flag --continueWithOfflineChassis

  2. All chassis must have an uptime greater than 30 minutes. To skip this requirement, add the flag --skipUptimeTest

Managing Rolling Reboots

10 seconds are allotted to cancel a rolling reboot before it begins. Once a rolling reboot has started, it stops and reports an error the following occur:

  1. A chassis is offline when it is selected for reboot. To have the reboot process ignore currently offline chassis, add the flag --continueWithOfflineChassis.

  2. The reboot process continues if the volumes come up but a node goes into an error state. To have the reboot process stop, add the flag --stopOnNodeError.

  3. If the chassis boots with a number of volumes that does not match the number present before the chassis was rebooted. A volume is considered up if it has a state of: ok, retiring, retired, or unavailable

  4. The chassis does not come back online after 3 hours has passed.

If a rolling reboot has stopped due to an error, resume the reboot using the resume command below after the error is resolved .

Status check —  To retrieve the status of a rolling reboot task, use the following commands for reboots remaining and reboots completed:

Rolling reboots remaining
platform status rollingreboot
Rolling reboots completed

Global states — When viewing the status for a rolling reboot, a rolling reboot task can have the following global states:

  • in-progress: The rolling reboot is currently running.

  • paused: The rolling reboot is paused (using the pause command).

  • completed: The rolling reboot finished successfully.

  • cancelled: The rolling reboot was caused per a user request.

  • error: The reboot is stopped due to an error of some kind.

Chassis states — The status listing shows the status for each chassis processed by the rolling reboot task. Each chassis can have one of the following states:

  • pending: The rolling reboot task has not processed the chassis.

  • in-progress: The rolling reboot task is in the process of rebooting the chassis.

  • completed: The chassis was successfully rebooted.

  • removed: The chassis was removed from the list of chassis to process after the rolling reboot was started (using the delete rolling reboot command).

  • error: The chassis encountered an error of some kind.

  • abandoned: The chassis was currently being processed when a user cancelled the rolling reboot.

  • dropped: The rolling reboot was in the process of waiting for the chassis to reboot when a user request was made to move to the next chassis (using the --skip flag).

  • offline: The chassis was already offline when the reboot task attempted to reboot the chassis.

Cancel reboot — To cancel (not pause) an active rolling reboot, issue the delete command, which the reboot process at the earliest moment and thus cannot be restarted later.

Exclude from reboot — To exclude from a currently running rolling reboot one or more chassis not rebooted:

Pause reboot — To pause the current rolling reboot process so it can be restarted later:

Resume reboot — To resume a paused rolling reboot:

No-wait reboot —  Normally, the rolling reboot process waits up to 3 hours for a rebooted chassis to come back online before proceeds to the next. To force the process to stop waiting and move to the next chassis, use the --skip flag:

Adding a Chassis

Which version of Swarm a given node uses is set at the time of provisioning.

To add a single chassis as a new Swarm node, use the following process:

  1. Create a node.cfg file and add any node-specific Swarm settings to apply, or leave it blank to accept all current settings.

  2. Power on the chassis for the first time.

  3. Wait until the chassis enlists and powers off.

  4. Deploy the new server:

Use the following process to deploy an individual chassis by system ID:

  1. Create a node.cfg file and add any node-specific Swarm settings to apply or leave it blank to accept all current settings.

  2. Get a list of chassis that are available for deployment by using the following command:

  3. Choose a System ID to deploy a single chassis using a command like the following:

Service Proxy

Restart the service so it picks up the new chassis list if the Service Proxy is running on the Platform Server when adding or removing chassis:

Reconfiguring the Cluster

Modify the cluster-wide Swarm configuration at anytime using the CLI and a configuration file. The reconfiguration process is additive: all existing settings that are not referenced in the file are preserved. Platform overwrites or adds those two settings if two settings are defined.

  1. Create a supplemental .cfg file (such as changes.cfg) and specify any new or changed Swarm settings to apply.

  2. To upload the configuration changes, use the following CLI command:

The CLI parses the uploaded configuration file for changes to make to Platform.

Platform Server attempts to communicate the new configuration to Swarm if Swarm was running during the upload. Any settings that cannot be communicated to Swarm requires a reboot of the Swarm cluster to take effect. The CLI indicates if the setting was communicated to the Storage cluster and if a reboot is required for each setting contained in the file. The Swarm UI also indicates which settings require rebooting.

Example: Increase Swarm processes

Swarm 10

Swarm Storage 10 has a single-process architecture, so the configuration setting chassis.processes is no longer used and cannot be increased.

Option 1: Create a configuration file:

To set all chassis throughout the cluster to a higher number of processes, create a configuration file and upload it to Platform Server. 

  1. Create a text file, such as update.cfg, containing only the setting to be changed.

  2. To upload the configuration changes, use the following CLI command:

Option 2: Use the CLI directly:

  1. Add the configuration change directly:

Reconfiguring a Chassis

Modify the node-specific settings for a single chassis by the same process, but the MAC address of any valid NIC on that chassis needs to be specified.

  1. Create a .cfg file (such as changes.cfg) and specify any new or changed node-specific settings to apply.

  2. To upload the configuration changes, use the following CLI command:

The CLI parses the uploaded configuration file for changes to make to that chassis.

Releasing a Chassis

There may be times when a chassis needs to be released from the Swarm cluster, either for temporary maintenance or for permanent removal.

Important

To guarantee a clean shut down, power off the chassis through the UI or SNMP before running release commands.

Temporary release — Temporary release of a chassis assumes that the chassis is added back into the cluster at a later time. Releasing a chassis allows deallocating the cluster resources, such as IP Addresses, or wipe and reset the configuration.

Once the chassis is powered off, release the chassis from the Swarm cluster:

Temporary removal

Permanent removal — Permanent removal is for retiring a chassis altogether or changing the chassis' main identifying information, such as changing a NIC. Removing the chassis from management causes the chassis to start the provisioning life cycle as if it is a brand new chassis, if it is powered on again.

Remove the chassis from Platform Server management permanently once the chassis is powered off:

Permanent removal

Resetting to Defaults

Issue the following commands to clear out all existing setting customizations from a given chassis or the entire cluster.

Delete All Default Chassis Settings
Delete All Cluster Settings

Managing Subclusters

Assign chassis to subclusters after all chassis are deployed and are running.

Use the list command to see the current subcluster assignments:

List subclusters

To assign a chassis to a subcluster, use the assign command:

Add to subcluster

Use the unassign command to remove a chassis from a subcluster:

Remove from subcluster

Changing the Default Gateway

The Platform Server configures Swarm Storage to use the Platform Server as the default gateway by default.

Either add a "network.gateway" to the cluster configuration file or issue the following command to override this behavior:

Managing Administrators

With one exception, modifying the admin users for the Storage cluster requires the Storage cluster to be up and running before the operations can be done. The one exception to this is the "snmp" user which can have the password set while the cluster is down or before the cluster is booted for the first time.

Adding or Updating Users

Use the following CLI command to add a new admin user:

Add admin user

The --askpassword flag allows avoiding specifying a password using the command line by providing the password using stdin. When this flag is used, a prompt displays to enter a new/updated password for the user. The Linux pipe functionality can be used:

Use the following CLI command to delete an admin user from the cluster:

Delete admin user

Upgrading Swarm Storage

Use the CLI to upload the version and take steps to deploy it to running nodes to upgrade Swarm Storage in a live cluster, either by restarting the entire cluster or each chassis in turn.

  1. Upload the new version of the Swarm Storage software to Platform server, verifying the <version-name> matches the version of Swarm Storage being uploaded:


    Note: The zip file above is contained within the Swarm-{version}-{date}.zip file. Inside this zip, a folder called Storage contains a file called storage-{version}-x86_64.zip.

  2. Get a full listing of all nodes along with IPs, MAC addresses, and system IDs:

  3. Using the list of system IDs, deploy the upgrade on each of the nodes. Run that command as well if restarting the node immediately after upgrade:

  4. Restart the cluster now if each node is not restarted individually, either full or rolling:

Managing Service Proxy

Status — Use this command to check the status of the Service Proxy:

Upgrade — Use the CLI to upload the version and deploy it to upgrade the Service Proxy on the Platform server:

Configuring DNS

The Storage nodes may need to resolve names for outside resources, such as Elasticsearch or Syslog. Configure the DNS server on the Platform Server to communicate with outside domains to perform this. 

Option 1: Forwarding

A Slave/Backup DNS zone is a read-only copy of the DNS records; it receives updates from the Master zone of the DNS server.

Perform forwarding by having the domain managed by the Platform server forward all lookups to outside domains if no DNS master/slave relationships are configured:

  1. Edit /etc/bind/named.conf.options and add the following line after the "listen-on-v6" line

  2. Run the following command to restart bind9 on the Platform Server:

Option 2: Configuring a Slave DNS Zone

Have the Platform Server become a slave DNS of that zone if an external DNS Zone is configured; the reverse can be done to allow other systems to resolve names for servers managed by the Platform server.

This process assumes the external DNS server is configured to allow zone transfers to the Platform server. The DNS server on the Platform server is not configured to restrict zone transfers to other DNS slaves.

  1. Edit /etc/bind/named.conf.local and add the following line at this location:

  2. Create a new file called /etc/bind/named.conf.slaves and add the settings in this format:

  3. Run the following command to restart bind9 on the Platform Server:

Configuring Docker Bridge

Edit the file /etc/docker/daemon.json to configure or modify the network information that is used by the default Docker (docker0) bridge. Add networking properties as properties to the root JSON object in the file:

The bip property sets the IP address and subnet mask to use for the default docker0 bridge. See the Docker documentation for details on the different properties.

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.