Configuring Swarm for Gateway

This section provides information specific to running Swarm Storage with Gateway. Install and configure Swarm, the storage cluster (storage nodes running on dedicated hardware) before proceeding.

Network Placement

When deployed with Gateway, the storage nodes should be placed on a network subnet not directly accessible to client applications. All user communications with the storage cluster must go through the Gateway.

Caution

If users are allowed to communicate directly with the storage cluster nodes, they may bypass access security, the business rules for content metadata, and audit logging performed by the Gateway and may render content in the cluster unusable to the Gateway.

Only allow direct access to the storage cluster nodes under highly controlled circumstances, such as administrator-only operations or trusted applications.

Domain Management

The Swarm cluster provides for logical separation of content among multiple tenants through the use of storage domain names. Gateway has the following requirements beyond those for a baseline storage deployment and client usage.

  • An administrative domain must be created in the storage cluster.

  • Storage domains must adhere to IANA naming standards (valid DNS names).

  • Client applications should specify a storage domain in every request (if not, the request goes to the default domain, with enforceTenancy=True).

The storage domain name for an operation is specified by the client application according to the following precedence from highest to lowest:

  • SCSP  domain=X query argument

  • HTTP  X-Forwarded-Host header

  • HTTP  Host header

Storage domains in Swarm must resolve to least one IP address ("A" record) for client applications to make use of the Host header to identify the storage domain with most HTTP/1.1 libraries. The resolved IP address should be for a Gateway or some other front-end network appliance such as a load balancer if applicable. Using a DNS round-robin with IP addresses is a valid configuration to use if there are multiple Gateway servers.

This is an example of a BIND 9 zone file implementing a wildcard of all storage domains within the cloud.example.com parent DNS domain and points them to the IP address 10.100.100.100.

$TTL 600 @ IN SOA cloud.example.com. dnsadmin.example.com. ( 2016070201 ; Serial number 4H ; Refresh every 4 hours 1H ; Retry every hour 2W ; Expire after 2 weeks 300 ) ; nxdomain negative cache time of 5 minutes IN NS ns1.example.com. * IN A 10.100.100.100

In the example zone file, 10.100.100.100 is the IP address used by client applications to communicate with the Gateway or a front-end load balancer. The names hydrogen2.cloud.example.com and oxygen.cloud.example.com both resolve to the same IP address.

Elasticsearch Servers

When using the S3 storage protocol, the metadata search service must be accessible to the Gateway servers.

When deployed with Gateway, like the storage nodes, the typical placement is on a network subnet not directly accessible to the client applications. There are no end-user supported API calls directly to the metadata search service.

Listing Consistency

Search feeds show eventual consistency as content changes, but enabling the https://perifery.atlassian.net/wiki/spaces/public/pages/2443810201 [s3] option enhancedListingConsistency improves the search-after-create response to the client applications using the Gateway.

Configuration Requirements

Use these Swarm configuration settings and adhere to the following operational changes when using Swarm Storage with Gateway. These configuration changes refer to the configuration file(s) for Swarm.

  • CSN: This is the cluster-wide file: /var/opt/caringo/netboot/content/cluster.cfg

  • Platform Server: This is the cluster-wide file used to deploy, which is located by default here: /etc/caringo/cluster.cfg

  • No Platform Server: This is the node-specific configuration file: node.cfg

Caution

Failure to use these settings and operational changes can prevent Gateway from working properly with the storage cluster.

Requirement

Description

Requirement

Description

Optimize GETs 

With Swarm 12.0 and higher, a setting can be added to improve performance through Gateway. Enable scsp.enableVolumeRedirects to permit Gateway to redirect GET requests to volume processes. These redirects increase efficiency, especially with reading small objects.

scsp.enableVolumeRedirects = True

Enable an EC encoding

S3 multipart (large file) writes fail if erasure coding is not configured; define an ecEncoding if using S3.

policy.ecEncoding = {k:p}

See https://perifery.atlassian.net/wiki/spaces/public/pages/2443812150

Clear legacy settings

Unless needed for backwards compatibility (because untenanted objects are used in the cluster and do not need S3), enable tenancy for unnamed objects, which verifies every object is written to a domain (see https://perifery.atlassian.net/wiki/spaces/public/pages/2443821769):

Set it to True and reboot the cluster if this was set to False.

Storage Domain Management

Only create and manage storage domains through the Content UI or programmatically through the Gateway's management API.

The cluster configuration contains security.noauth=False if storage domain management displays in the legacy Admin Console (port 90), which is not supported by Content Gateway. Set it to True and reboot the cluster.

Troubleshooting: If the Content UI reports "Page Not Found: The original bucket to which this collection refers cannot be found or has been replaced", it is likely the domain was created by the legacy Admin Console (port 90) and contains the legacy Castor-Authorization header. DataCore Support for help correcting the domain.



© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.