Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

Table of Contents
minLevel1
maxLevel2
outlinefalse
typelist
printablefalse

The Reports section of the Swarm UI includes valuable real-time and historical views into the health and activity of a swarm cluster.

...

...

Health Report

The Health Report provides both summary and detailed information at the level of cluster, subcluster, chassis, and drive.

...

The status of each component is represented by the color of its wedge in the sunburst. Statuses include the following:

Component

status

Status

Description

OK

The chassis or drive is working and there are no errors.

Alert
Warning

The chassis or drive has experienced one or more errors. 

Cluster-level alerts often relate to space thresholds or network issues (unable to reach NTP/Gateway/ES/Metrics servers or other nodes).

Initializing

The short state after a chassis boots when it is reading cluster persisted settings and is not quite ready to accept requests.

Maintenance

The chassis has been shut down or rebooted by an administrator from either SNMP or the UI and should not be considered missing for recovery purposes. By default a chassis can be in a Maintenance state for 3 hours before it transitions to Offline and the cluster starts recovery of its content. Maintenance mode is not initialized when the power is manually cycled on the chassis outside of Swarm (either physically on the hardware or via a remote shutdown mechanism in an out-of-band management platform such as IPMI, Dell iDRAC, or HP iLO) or if there is a drive error; in both these instances recovery processes start for the chassis/drive unless recovery is suspended.

Mounting

The chassis is mounting one or more drives, including formatting the drive if it is new and reading all objects on the volume into the RAM index for faster access.

Offline

The chassis or drive was previously but is no longer present in the cluster.

Retiring

The chassis or drive is in the process of retiring, verifying all objects are fully protected elsewhere in the cluster and then removing them locally.

Retired

The chassis or drive has completed the retiring process and may be removed from the cluster.

Idle

The chassis or drive is in power-saving mode due during a period of configurable inactivity. (See Configuring Power Management.)

Subcluster and Cluster status are inherited from the chassis or drives contained within.

The data table below the sunburst displays more detailed information about the cluster, including the amount of used and free capacity and how many streams reside on the chassis/drive. Clicking on a subcluster row loads the Subcluster page. Clicking on a chassis row takes loads to the Chassis Details page unless the chassis status is Maintenance or Offline.

...

Storage Contents

The Storage Contents chart displays the total amount of used capacity in the cluster over time as well as the total stream count (including replicas and erasure coding segments).

...

Usage

...

Note

Historical usage charts may show artificial bumps in usage when adding or removing a large percentage of drives within a single day.

The Usage charts display percentages of Disk space and Stream index (memory):

  • Disk space — Space - The amount of free, trapped, and used drive space as a percent of the total available over time.

  • Stream index — Index - The amount of free, overlay, and used RAM index space as a percent of the total available over time.

...

Network Traffic

The Network Traffic graphs display Requests, Responses, and Internal Requests (inter-cluster activity).

Requests — : The count of each SCSP method type in incoming client requests to the cluster over time. SCSP Method types are: SCSP INFO, SCSP READ, Writes writes (sum of SCSP WRITE, SCSP UPDATE, and SCSP APPEND), SCSP DELETE, and Other (sum of SCSP COPY and Search queriesOperations). This information is useful in understanding both when and how a cluster is being used by client applications.

Responses—  The : The count of HTTP response codesResponse Codes returned to clients by the storage cluster over time. This data is helpful in identifying problems in the cluster or client applications, including if there are particular times during which error responses occur.

Internal Requests— The : The count of various internal, cluster-initiated activities between nodes in the cluster over time. This information is helpful in understanding how much data movement is happening in the cluster as hardware is added, removed, retired, etc. Spikes in activity within the cluster not correlating with client activity are often associated with either a failed drive recovery or an admin-requested retire.

...

Research an ES cluster status on the Elasticsearch Reports page if the Elasticsearch panel on the Dashboard shows a problem. These reports generate on demand and allow drilling into details spanning the ES nodes, thread pools, indices, and shards. (v2.0) 

...

Info

Important

Opening the Elasticsearch Reports page requires generation of a lot of status data; allow time for the page to display.

For details on the columns that are reported, see the relevant Elasticsearch Reference: version 2.3 or version 5.6.

Section

Setting

Notes

RESOURCES
Node details

  • name

  • ip

  • uptime

  • master

  • cpu

  • disk avail

  • memory size

  • tripped breaker

  • file desc current

  • heap max

  • heap percent

  • ram percent

  • indexing delete total

  • indexing index total

  • search query total

Shows the ES cluster topology. 

For seeing where a node resides and to check performance stats, focus on these columns:

  • ip

  • cpu

  • tripped breaker

Info

Important

The tripped breaker field signals trouble. Contact DataCore Support if the status is red.

  • heap percent

  • ram percent

Other columns are more helpful when looking at larger clusters, such as determining how many master-eligible nodes are available:

  • master

  • name

RESOURCES
Thread pool details

  • name

  • ip

  • bulk rejected

  • flush rejected

  • force_merge rejected

  • generic rejected

  • get rejected

  • index rejected

  • refresh rejected

  • search rejected

  • warmer rejected

Shows ES cluster-wide thread pool statistics per node. The rejected statistics are returned for all thread pools.

INDICES

  • index

  • health

  • status

  • docs count

  • docs deleted

  • pri

  • pri store size

  • rep

  • store size

Provides low-level information about the segments in the shards of an index.

info

Note

/wiki/spaces/DOCS/pages/2443811275

Swarm Metrics generates large numbers of indices.

docs.count

 — 

 - The number of non-deleted documents stored in this segment. These are Lucene documents, so the count includes hidden documents (such as from nested types).

docs.deleted

 — 

 - The number of deleted documents stored in this segment. The space for these documents is reclaimed when this segment is merged

SHARDS

  • index

  • node

  • ip

  • docs

  • prirep

  • shard

  • state

  • store

The detailed view of what nodes contain which shards. It tells if it is a primary or replica, the number of docs, the bytes consumed on disk, and the node where it is located.

prirep —  

prirep -Whether this segment belongs to a primary or replica shard.

Feeds Reports

The Data Feeds Reports show the number of processed events for each configured search or replication feed over time, providing insight into how busy each feed is. Status markers alert to problems with the feed.

Tipinfo

Tip

For quick access to the configuration details for a feed, click the gear icon in the top right of the chart.

...

Child pages (Children Display)