Platform Overview

The Swarm Platform Server (Platform) unifies and centralizes the services and management information needed to install, upgrade, and monitor Swarm storage clusters. Installed on a dedicated node, Platform simplifies the network services and Swarm product installation process for Swarm administrators. 

The Platform infrastructure provides these essential features:

  • Required network services. Provides network services that are configured to support a Swarm cluster: DHCP, PXE/Network Boot, TFTP, Syslog, and NTP.

  • Service Proxy and Swarm UI. Provides a cluster management dashboard, dynamic configuration of Swarm settings, health report graphics and logs, and Swarm software version management.

  • Swarm configuration management. Provides the ability to configure and update the cluster and individual chassis as needed. 

  • Command-line interface. The CLI provides a direct interface to Platform management tasks.

Platform Concepts

Services Node for Swarm

The Platform server is an extensible services node for managing Swarm. On a single server, it configures the environment and coordinates the support services needed to deploy hardware into and support a Swarm cluster. It is the next generation of the original Caringo Services Node (CSN).

The Platform server builds the Swarm-centric framework based on Ubuntu MAAS (Metal As A Service), which allows physical servers to be quickly provisioned and collectively managed like virtual machine instances. MAAS manages the machines, and Juju manages the services that run on them. When you bring up a Platform server, it automatically installs and configures the network infrastructure required for the Swarm environment.

Swarm Orchestration

Platform orchestration spans initial Swarm setup and deployment as well as all ongoing maintenance.

  • Installation and configuration of Swarm Storage software

  • Network boot support

  • Automatic provisioning of network and node configs

  • Integrated Swarm UI & Service Proxy, for browser-based management

Swarm Environment and Networking

The following system services are set up and initialized by the initial configuration of the Platform server:

  1. DHCP server, to allocate Swarm node addresses

  2. DNS server, for the management subnet

  3. TFTP server, for Swarm network boot

  4. NTP time server, essential for Swarm clock synchronization (defaults to pool.ntp.org) 

  5. Syslog server, for centralized logging

    • Platform and Storage: /var/log/caringo

    • Service Proxy: /var/log/gateway

  6. Service Proxy, a lightweight Gateway instance to host the Swarm UI

  7. HTTP proxy, to allow for HTTP access outside the private network

  8. Configuration and License server, for Swarm node access

Network Architecture

Swarm architecture uses a private network that is dedicated to the Swarm Storage nodes and application servers. Access to this cluster is restricted to the Platform server and any applications that reside on private subnet (in the absence of existing routers). Platform server owns this private storage network, but it can support flexible network topologies.

  • Platform server addresses are IPs on a routable subnet.

  • Application servers can be dual-homed like the Platform server, for ease of access to cluster.

  • Swarm Storage nodes’ default gateway is the Platform server, which can be changed to support existing routers.

Swarm Interfaces

There are three interfaces to Platform functionality:

  • CLI (Command-Line Interface) — see Platform CLI Commands

    • The CLI is a native application installed on the Platform server. Versions for other platforms are available, so it can run anywhere.

    • Type: platform to get CLI help

    • All CLI commands start with “platform”:

      • To list all nodes:        platform list nodes

      • To deploy a node:   platform deploy storage

  • Swarm UI — see Platform UI

  • REST API, underlying both the CLI and the Swarm UI

    • API is discoverable through the root: http://<platform>:8095/platform

Chassis Lifecycle

Each chassis goes through the following lifecycle on the way to becoming a functioning Swarm Storage node. Once a chassis is running as a Storage node, it is always boot as a Storage node and has a statically assigned IP address that is consistent across reboots.

Note

The Platform Server uses IPMI for power control, so the IPMI port must be accessible from the Platform Server.

1.

Add Hardware

Enlist

Before new hardware can be managed by the Platform server, you need to power it on manually for the first and only time. This initial step enlists the machine with the Platform server, after which the hardware automatically powers itself off.  After enlistment, new hardware appears in the CLI with the state “New”, and it can remain in that state indefinitely until needed.

Note: Enlistment is non-destructive to existing data. If the machine was added by mistake to the subnet that Platform server manages and is enlisted, you can correct it. No installation happens until you use the CLI to deploy Swarm to the chassis.

2.

Deploy Hardware

Commission

Once new hardware is enlisted, Platform sees it as available for commissioning, which occurs when you run a CLI command that deploys the hardware and boots Swarm. The deployment command automatically pushes the new hardware through all required operations. After the deploy operation starts, you can use the “list nodes” CLI command to view the chassis moving through the different states.

Note: The first power on and commissioning steps are also non-destructive to any existing data on the disks. Any data on the disks persists until Swarm is booted on the second power on operation, after which the persistence of any existing data is determined by Swarm. 

Chassis Statuses

As a chassis moves through the phases of deployment, it may show the following statuses:

  • New — It has been powered on for the first time and has enlisted with the Platform Server. 

  • Commissioning — It is in the process of or recently finished the initial hardware interrogation stage on the way to deployment.

  • Ready — It has completed all initial stages and is ready to be deployed with software.

  • Deploying/Deployed — It is in the process of or finished with the deployment of the Storage software.

  • Failed <X> — It encountered an error in completing one of the stages.

Chassis Grouping

If you make no explicit subcluster assignments in the Swarm configuration and there are more than one storage process running on a chassis, then each chassis forms a de facto subcluster. In Swarm, the "node.subcluster" value is an free-form name assigned to one or more chassis.

The storage process looks at all names assigned to the different chassis and forms them into groups, which can then be used to determine how replicas are handled. You can group the chassis in any way you need to achieve your desired replica/fail-over paradigm.

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.