Changes for Storage 9.1

1 New Features
2 Upgrade Impacts
3 Additional Changes

New Features

Automatic EC segment optimization: Because design constraints may require applications to write EC objects with excessive segmentation, the health processor can be configured to automatically consolidate these segments into an optimal number in the background.

The new feature is enabled by setting the ec.segmentConsolidationFrequency parameter. See Settings Reference.
Automatic consolidation is turned off by default and can be enabled through SNMP or the storage interface.

Enforce explicit domain/bucket creation: Applications have incorrectly created context objects and used them as if they are normal objects (images, videos, documents, etc.). A new configuration setting is available to safeguard against the inadvertent creation of domains and buckets by requiring applications explicitly acknowledge they are creating a context instead of a normal object. This is a requirement when using the Content Gateway, and this change allows non-Gateway sites to use this safeguard.

To require explicit context creation, the new setting scsp.requireExplicitContextCreate is set to True. See Settings Reference.
Although the setting to enforce explicit create is disabled by default, all sites are recommended to enable this feature to catch application development mistakes before they go in to production. Sites where production applications are already incorrectly using contexts as normal objects can leave the default setting value to maintain backward compatibility.
When enforcement is enabled, applications include the "Content-type: application/castorcontext" header when creating domain or bucket objects. See SCSP Headers.
Since the Content Gateway requires this application behavior, applications running at sites that use the Gateway already conform to this explicit context creation requirement.

Proactive metrics pausing: Swarm proactively pauses the writing of operational metrics data when space is too low in the Elasticsearch cluster to avoid running out of storage space in Elasticsearch. Writing is automatically resumed as soon as more space becomes available.

The Elasticsearch critical space threshold and monitoring frequency are controlled with the settings: metrics.diskUtilizationThreshold, metrics.diskUtilizationCheckInterval. See Settings Reference.
For more information on the configuration of Swarm metrics, see Installing Swarm Metrics.

Automatic computation of Content-MD5 on all requests: A new Swarm setting is available to automatically add the Content-MD5 header on all writes to improve compatibility between SCSP and S3 applications. SCSP applications are required to request this behavior on a per-transaction basis, and it can cause problems when sharing content with some S3 applications if not computed.

The automatic computation is enabled by setting scsp.autoContentMD5Computation to True. See Settings Reference.
To maximize cross-protocol application compatibility, most sites are recommended to enable the feature.
When enabled, SCSP clients are not required to use the "Expect: Content-MD5" request header on writes to have the Content-MD5 header computed. See Content-MD5 Checksums.

Composite Content-MD5 for multipart writes: Swarm computes and saves a composite MD5 header with an object to improve S3 multipart upload compatibility and to provide SCSP clients with end-to-end transfer validation with parallel writes. This complements the use of the Content-MD5 request header on the pieces of a multipart object so transmission integrity is validated by both parties during all phases of a multipart (parallel) write.

For S3 application clients, no changes are necessary. The Content-MD5 returned with the S3 Complete Multipart Upload response is calculated in the same manner as with AWS S3.
For SCSP application clients, a new Composite-Content-MD5 header is returned when completing a multipart write. See Validating a Multipart Write.

Synchronous object replication across clusters: There are deployment scenarios where an application assures critical content is replicated across independent clusters prior to completing a transaction. A new SCSP method allows an application to request the immediate remote replication of an object and to receive positive confirmation the object exists in the remote cluster.

The new SCSP SEND method is used to request a target cluster immediately replicate an object from a source cluster. See SCSP SEND - legacy.
This request is complementary with replication Feeds and can be used together. While a replication feed is asynchronous, applies to a group of objects, and continues to retry until objects are replicated successfully, the SEND method is synchronous, applies to one object, and has one opportunity to succeed or fail.
The SEND method is an administrative request and is not intended for use by end-users through the Gateway.

Upgrade Impacts

These items are changes to the product function that may require operational or development changes for integrated applications.

Impacts for 9.1

Disk UUIDs required for ESXi VMs - When using ESXi virtual machines as Swarm storage nodes, enable disk UUIDs (set disk.EnableUUID=TRUE for each VM).
Legacy authentication disabled by default - If the legacy Swarm Admin Console is used to create or list the storage domains or use Swarm's native auth/auth for applications, add the setting security.noauth = False to continue. The native Swarm auth/auth feature is deprecated and is removed after June 2017. (SWAR-7399, SWAR-7115)
Content Gateway Upgrade Required - For deployments that use the Content Gateway, upgrade to at least 5.1.3 due to the enhancements to multipart COPY and the keep-alive handling. Older versions of the Content Gateway are unable to perform S3 PUT Object - Copy operations until are upgraded.
Headers Changes - Swarm replaces the existing error message header with a set of detailed ones. See SCSP Headers.
- New: castor-system-error-token, castor-system-error-text, castor-system-error-code
- Replaced: x-castor-meta-error-message
- New: composite-content-MD5
- Custom: Swarm has more restrictive naming requirements for custom metadata headers, which helps prevent future problems. For all types of custom metadata headers, valid characters are limited to letters, numbers, underscore, and dash (hyphen). See Custom Metadata Headers. (SWAR-6513)
Dots in Metadata Field Names - When reindexing objects for a feed to Elasticsearch 2.x, any dots (periods) in custom metadata field names are converted to underscores. "x_foo_meta_2016.12" becomes "x_foo_meta_2016_12". This addresses the new Elasticsearch 2.x restriction that normal field names may not contain dots. (SWAR-7292)
Elasticsearch field name change - The "_timestamp" field name for custom metadata entries is changed to "timestamp". Any SCSP queries that refer to the "_timestamp" field uses the new name. (SWAR-7348)

Additional Changes

These items are other changes and improvements including those that come from testing and user feedback.

OSS Updates
- The irqbalance daemon is updated to 1.1.0. (SWAR-7159)
- Intel Network Driver updates include igb 5.3.5.4, ixgbe 4.4.0-k, ixgbevf 3.2.2, bnx2x 1.712.30-0, hpsa 3.4.16-0, and mpt2sas/mpt3sas 13.100.00.00. (SWAR-7160)
- The Linux kernel is updated to 4.8.11. (SWAR-7126)
Improvements
- The User-Agent header on requests that result in 400, 500, or 501 response codes is logged. This helps administrators identify the client application generating the errant requests. (SWAR-6828)
- Swarm allows an unlimited number of parts in a multipart upload. This was previously restricted to a maximum of 10,000. (SWAR-6056)
- Statistics on various search query types and features are collected. These statistics are published via SNMP, REST, and the health report. (SWAR-6731)
- Chunked transfer-encoded writes are optimized so they are less likely to generate excessive trapped space requiring defragmentation. (SWAR-7239)
- Swarm supports the definition of multiple Elasticsearch host targets for Historical Metrics in the metrics.target setting. Additionally, a new alias, metrics.targets, exists for this setting. (SWAR-7085)
- The Metrics Curator supports definition of multiple Elasticsearch host targets for Historical Metrics. In the metrics.cfg file, the "host" parameter accepts a comma-delimited set of Elasticsearch server addresses. The "hosts" parameter is a synonym for the original "host" parameter. (SWAR-7063)
- Per-object feed errors no longer logged at the critical severity and so do not change the node state to 'error' if they occur. Continue to monitor the Dashboard (or legacy console Feeds page) for blocked feeds. (SWAR-7282)
- Rename performance of erasure-coded objects within buckets is significantly improved. (SWAR-7271)
- Statistics on the number of read or write requests closed prematurely by clients are tracked. This can help administrators identify misbehaving client applications. (SWAR-6783)
- Swarm prevents cluster booting if the SNMP security administrator (read/write user) is not set properly in the configuration file. (SWAR-7137)
- Disk light identification is improved and automatically chooses the best identification method based on the hardware. The included methods are: sg_ses, MegaCli64, sas2ircu, sas3ircu, and activity-based. End user modification of the plugin is discouraged unless a controller/driver not covered by these tools is used. (SWAR-7280)
- How long uncleared error messages continue to appear in the Swarm Admin Console can be controlled. The new setting, console.messageExpirationSeconds, defaults to two weeks. (SWAR-6202)
- A multipart COPY by part operates like a multipart write complete, which sends back an HTTP 202 response code and keep-alive characters to prevent client timeouts. (9.1.2: SWAR-7309)
Resolved Issues
- When using a replication feed that restricts certain storage domains from replication, the feed can incorrectly cause all context objects (domains, buckets), but not the objects contained within them, to be replicated to the remote cluster. Contact DataCore Support for assistance before deleting them if these empty contexts exist in a cluster. (SWAR-7100)
- A Swarm node reports the "ok" state before the volumes are mounted. (SWAR-7323)
- During creation of a Search Feed index in 9.0.1, a race condition caused the error "unsupported operand type(s) for *: 'NoneType' and 'int'". Swarm automatically retries affected objects, so the error can be ignored. (SWAR-7226)
- During rolling upgrade and rolling reboot involving 9.0.x releases, critical log messages reported attempts to contact rebooting nodes. (SWAR-7317)
- Dynamic changes to the SNMP password prevented subsequent successful reboot. (SWAR-7136)
- ElasticSearchMetricsPusher unexpected exception errors occur during the upgrade process when metrics data is being preserved. (SWAR-7294)
- Emergency defragmentation is performed on disks without sufficient trapped space to warrant the effort. (SWAR-7079)
- Failures in attempts to look up host IP addresses at boot time slowed the overall boot process. (SWAR-7269)
- The Linux kernel mpt2sas driver's prot_mask option is set to 1. This overrides the driver's default Protection Information (T10-PI) setting because it can cause problems in disk arrays containing non-PI-enabled disks. (SWAR-7372)
- An issue with the Metrics Curator is fixed whereby an incorrect configuration can lead to an invalid Elasticsearch mapping during initial setup. If a persistent "Data unavailable" message is observed on the interface and HTTP POST "400" errors in a browser's JavaScript console, this situation exists. Contact DataCore Support for the corrective actions. (SWAR-7332)
- Completed FVR/ECR recoveries are not remembered for a retired or failed volume left in the cluster after a reboot. (SWAR-7326)
- Storage rebalancing can stop before reaching an optimal balance in some clusters after the addition of new capacity. (SWAR-7246)
- Interfaces reported some nodes as offline rather than in maintenance mode during a reboot of a multi-node chassis. (SWAR-7318)
- Swarm may not recognize a blank disk if several are hot-plugged in quick succession. (SWAR-7225)
- Corrected an issue in disk encryption and hot-plug incorrectly mapping storage volumes in a chassis containing more than 10 disks. (SWAR-7381)
- Disk serial numbers and firmware information is not displayed for encrypted volumes. (SWAR-7223)
- Swarm Metrics did not publish SCSP 404 count statistics. (SWAR-7350)
- When an EC object is renamed and another object is present in the former name, the renamed object may have segments reclaimed by the health processor, resulting in data loss. (9.1.2: SWAR-7445)
- The newname query argument did not properly handle non-ASCII characters, resulting in a successful rename to the incorrect name. (9.1.2: SWAR-7408)