SCSP Proxy 8.2

SCSP Proxy Essentials

The Swarm SCSP Proxy accepts HTTP requests from a host network, forwards them to a Swarm cluster, handles redirects transparently, and then supplies the response back to the requestor. In many deployments, a Swarm cluster may be isolated on an internal network, protecting it from undesired interaction with the host network, and also protecting the host network from services like PXE boot and multicast traffic that can interfere with other network resources.

This reverse proxy can serve several other purposes as well, both in production deployments and in test environments. The Swarm SCSP Proxy supports the following types of interactions:

  • Basic SCSP Proxy for a local cluster

  • Remote cluster coordination and communication

  • Validation of incoming client requests for proper syntax and formatting

In a local cluster deployment, the SCSP Proxy handles all local SCSP traffic and manages the associated communication with the local Swarm cluster. The SCSP Proxy listens for any inbound SCSP communications on the configured port and determines which Swarm node to send the initial request to.

For optimal performance, the SCSP Proxy caches open connections to Swarm for reuse.

Getting a List of Swarm Cluster IPs

The SCSP Proxy intercepts GET / and HEAD / requests and responds with information it stores internally about itself and about the cluster for which it serves as a forward proxy. Only GET / and HEAD / requests result in this type of special handling by the SCSP Proxy.

Query arguments are not processed, even when the resource is empty. A request with a query argument is forwarded as-is to the cluster and receives no special processing by the SCSP Proxy.

The SCSP Proxy responds to a GET / request from a client running in a private network with the SCSP Proxy's list of Swarm IP addresses and data and metadata describing the SCSP Proxy's Swarm cluster. You can prevent the SCSP Proxy from returning cluster IP addresses using the configuration parameter reportHosts.

Scsp-Proxy-Cluster Request Header

The optional Scsp-Proxy-Cluster: cluster-name request header, if included with a request, causes the SCSP Proxy to first compare the case-sensitive cluster-name against the SCSP Proxy's own configured cluster name. If the names do not match, no node IP addresses are returned in the response.

If Scsp-Proxy-Cluster is not present, the request uses the configured cluster name and the list of node IP addresses reflects the currently known node IPs for that cluster.
In either case, the response metadata includes a response header with the same name containing the cluster name used by the SCSP Proxy.

Response Headers

These are the response headers and meaning:

Response

Description

Response

Description

Scsp-Proxy-Cluster

Cluster name. It is the value of scsp.clusterName in /etc/caringo/scspproxy/scspproxy.cfg if the cluster is configured on a CSN.

Scsp-Proxy-Nodes

ASCII string count of the number of node IP addresses returned in the body of the response. If this count is zero, an additional reason is supplied to explain why there are no nodes returned. More information about the zero-node response is discussed following the table.

Scsp-Proxy-Agent

SCSP Proxy and its software version.

Obtained by polling any available node on the local cluster. These headers are provided to maintain consistency with current SCSP Proxy and Swarm responses.

Note: These headers are returned if a cluster can be contacted.

The following response indicates there are 10 nodes in the cluster:

If the response indicates zero nodes in the cluster, it is accompanied by a reason, as follows:

These are the reason codes you can receive for why the node list is empty:

Reason code

Meaning

Reason code

Meaning

no-nodes

The SCSP Proxy reports no nodes when the cluster is off-line or is being rebooted.

bad-subnet

The request was made from a subnet from which the node IP addresses are not routable.
The SCSP Proxy determines whether or not a request originated from the same private subnet as its cluster and responds only if the request did originate in that subnet. The SCSP Proxy does this to help prevent malicious discovery about the cluster and because private IP addresses are not routable anyway.

bad-cluster

The cluster name supplied in the Scsp-Proxy-Cluster header did not match the currently configured cluster.

disabled

Indicates the SCSP Proxy's reportHosts configuration parameter is set to False.

Response Body

The body of the response to a GET / is Content-Type text/plain list of IP addresses of Swarm nodes local to that SCSP Proxy, one address per CRLF-terminated line.

The Content-Length header calculation is based on the number of bytes in this list of CRLF terminated IPs. If the list is empty, the Scsp-Proxy-Nodes header indicates count=0 and the Content-Length header is zero.

The body of the response to a HEAD / is always empty, although the Content-Length header indicates the total number of response bytes returned if the HEAD had been a GET. Additionally, the Scsp-Proxy-Nodes header indicates the number of nodes returned and, if this count is zero, provides the reason for the empty list exactly as it does for GET /.

About Expect: 100-continue Behavior

The SCSP Proxy waits to read the input stream and does not rewind, seek, or reset on a WRITE, APPEND, or UPDATE for small streams until 100-continue is received if the initial request includes an Expect: 100-continue header. The SCSP Proxy buffers input from the client but attempts to stall the input data object from the client if the buffer reaches a limit if the initial request does not include an Expect: 100-continue header. It adds Expect: 100-continue to the request it sends to the local cluster and handles Swarm's 100 response if the request content length is greater than 128K.

About Location Headers

To properly route subsequent requests, the SCSP Proxy returns one Location response header with its own external IP address and discards any other Location headers. For Replicate on Write requests, all Location response headers are rewritten and only the one generated Location header is returned.

Remote Cluster Communication and Coordination

Developers who need to interact with multiple Swarm clusters, either independently or in multi-cluster requests, can use the SCSP Proxy for remote cluster communication and can even coordinate requests between more than one cluster at a time.

To use the following syntax, the SCSP Proxy must be configured with a list of all known remote clusters. The following syntax is valid only with the SCSP Proxy; sending requests formatted as follows directly to a Swarm cluster results in a 404 (Not Found) error because Swarm attempts to resolve it as a path to a named object.

Syntax

Description

Syntax

Description

Sends a request for an object, referenced by UUID or by name, to a specific cluster-name.

Queries all configured clusters (remote and local) for a particular object, determines the current version, and returns that object to the client.

any is valid for remote INFO requests and results in an error if used with any other method.
any causes a request to be sent for an object, referenced by UUID or by name, to any available cluster (local or remote).
The information is returned if the object exists in the local cluster. The request is sent to each remote cluster in random order if this condition is not met. The error response from the local server is returned if no cluster is able to locate the data.

remote is valid only for remote INFO requests and results in an error if used with any other method.
remote causes a request for an object to be sent, referenced by UUID or by name, to a remote cluster.
The information is returned from the first cluster that has the object. The error response received from the first remote cluster attempted is returned if the object is not found in the remote clusters.

Deprecated

Support for remote SCSP Proxy requests without the /_proxy prefix is deprecated and is removed altogether in a future release.

Examples of using /_proxy Syntax

As shown in the examples below, you must separate the URI from the domain specification using the forward slash character /. In all cases, except where noted, the domain name must be passed as the Host in the request.

AggregateInfo of domain on either a local or remote cluster

Local cluster:

Remote cluster:



HEAD of a domain on either a remote or any cluster

Remote cluster:

Any cluster:

Note
The domain= query argument is required when executing a method (in this case, HEAD) on a domain

POST of an unnamed object to the remote default cluster domain

Note
Because you are working with the remote default cluster domain, the domain name does not need to be sent as the Host in the request. For unnamed objects, POST authentication is supported only in the default cluster domain.

Remote synchronous write POST of an unnamed object to the remote default cluster domain, which has POST authentication enabled

For more information about remote synchronous write, see the next section.

Remote Synchronous Writes and Updates

Remote Synchronous Write enables you to write or update a copy of the same stream both locally and remotely as part of the same request.

A Remote Synchronous Write first writes two copies of the object to the local cluster.

  • Local success — If the local write succeeds, the SCSP Proxy writes the updated object to the specified remote cluster. This request is authenticated using the Swarm administrator credentials specified in the configuration of the remote cluster.

  • Local failure — If the local write fails for any reason, the error response is returned to the requestor and the operation is abandoned.

All query arguments except alias=yes as well as the Expect: Content-MD5 header are stripped from the remote write to simplify the response coordination between the two clusters.

  • Remote success — If the remote write is also successful, a 201 (Created) response is returned to the requestor.

  • Remote failure — If the remote write fails for any reason, a 202 (Accept) response is returned, indicating that only the local write was successful and that remote replication can occur at a later time using Content Router (if enabled).

Remote Aggregate Info

The SCSP Proxy has an AggregateInfo method to validate a set of content exists in a cluster. Aggregate Info can be issued against a local cluster but it is usually used to validate remote replication.

You determine the desired data set by first creating a "consistency checkpoint" using the following format, terminated with CRLF:

For unnamed anchor objects, you must use the mutable parameter. (The default, with no parameter specified, is immutable.) All named objects are assumed to be mutable so no argument is required.

You must supply a list of either URL-encoded names or UUIDs. (Use percent encoding for object names, if needed.) The name, UUID, or the consistency checkpoint, is stored as a Swarm object.

Info requests in the AggregateInfo method are issued for each name or UUID in the consistency checkpoint and either object metadata or an error response is returned in the concatenated response body. Similar to the individual Info method, the response for a successful AggregateInfo method execution is a 200 code.

The following is a sample checkpoint manifest stream. All object names must be URL- encoded so you must use percent encoding to escape special characters in named objects, including space. All line terminators must be CRLF:

The checkpoint manifest should be stored in the local cluster and its name or UUID used to issue the AggregateInfo request. The AggregateInfo method supports named and unnamed objects for both the manifests and streams stored in the manifest. It supports authentication for the manifest stream itself, and checkpoint streams in the manifests are protected for HEAD are returned as 401s in the AggregateInfo response body.

Any additional query arguments and headers included with the AggregateInfo method apply to the GET request issued for the checkpoint manifest only and are not included in the individual Info requests for each name or UUID in the manifest. If the checkpoint
manifest is stored in an unnamed anchor stream, the AggregateInfo method must be called with alias=yes in its queryArg dictionary.

AggregateInfo uses Aggregate-Stream-Count as its trailer header.

AggregateInfo does not support Expect: Content-MD5 and Range headers, or any integrity seal query arguments.

Aggregate Info Response

For each line of the checkpoint manifest, the SCSP Proxy uses chunked encoding to send in the AggregateInfo response body either a parse error for the line or the line's name or UUID followed by the Swarm's verbatim response for the Info query for it. A CRLF line- end sequence separates the uuid line from the Info response. The following representative response body is returned for a manifest including two UUIDs, both of which the SDK was able to Info successfully:

If the first line does not have a valid UUID, the SCSP Proxy responds with a single error message in the response body stating that the checkpoint stream is incorrectly formatted and stops processing.

Installing the SCSP Proxy

The Swarm SCSP Proxy installs as an RPM and is managed like other RPM packages. The installation and base configuration are performed automatically as part of CSN setup if the Swarm SCSP Proxy is being used on a CSN.

Linux Requirements

SCSP Proxy has been developed and tested with the English versions of 64-bit Red Hat Enterprise Linux (RHEL) or CentOS 6.x. Other versions or Linux distributions, including languages other than English, are not currently supported. The following section assumes a RHEL Linux environment.

Red Hat does not support in-place upgrades between major versions, so to use RHEL 6.x for an existing installation running RHEL 5.x, see the Red Hat documentation.

Important

Before starting the SCSP Proxy installation, verify the RHEL installation is running the correct version.

About monit

The SCSP Proxy installation script installs the watchdog process monit, which performs a variety of functions, including starting and restarting SCSP Proxy services if they fail unexpectedly. If monit is not already installed on your system, SCSP Proxy installs and configures it automatically.

If monit is already installed on your system, your configuration files are backed up before being modified. If necessary, monit is upgraded to the version provided with SCSP Proxy.

When monit is installed or upgraded, the following is added to your existing configuration:

  • Changes /etc/monit.conf by adding the following:

  • Saves your original /etc/monit.conf as /etc/monit.conf.orig

  • Installs a product-specific configuration file under /etc/monit.d. The filename ends with .monitrc

  • With RHEL 6: Adds /etc/init/monit.conf to enable the upstart service to monitor monit so it runs when the machine restarts.

Extracting the SCSP Proxy Package

Extract the .zip file to a directory on the SCSP Proxy's local file system before the SCSP Proxy is installed. In the remainder of this section, this directory is referred to as the SCSP Proxy-unzip-dir.

After you extract the .zip file, there is one subdirectory named caringo-scspproxy-version-brand that has several subdirectories.

Installing or Upgrading SCSP Proxy Software

To install or upgrade the SCSP Proxy, log in as a user with root privileges on the CSN, if you use one, or else the RHEL server, if you have no CSN. Run the following commands in the order shown:

After completing an upgrade, you must restart the Proxy.

Warning

Do not upgrade the SCSP Proxy using standard Red Hat packaging tools like yum: upgrading with these tools can result in losing configuration data.

Configuring the SCSP Proxy

You can configure the SCSP Proxy using either the Cluster Services Node (CSN) console or by editing the scspproxy.cfg and hosts.cfg files. This section details the second method.

After the SCSP Proxy is installed, you must copy the sample files and then modify them using a text editor. Run the following commands as a user with root privileges to create and edit the files in the default location:

You must configure the interface and the port that the SCSP Proxy listens on, logging, connection pool settings, the port and host list as well as the name for the local cluster, and any remote clusters.

Configuring scspproxy.cfg

To configure the SCSP Proxy when you are not using CSN, edit /etc/caringo/scspproxy/scspproxy.cfg.

The configuration file is divided into 5 sections: [proxy], [log], [connectionpool], [scsp], and [remote]. The following table discusses all configuration parameters:

Option Name

Default

Description

Option Name

Default

Description

[proxy] interface

none

Required. The IP address of the external interface of the SCSP Proxy machine (e.g. 192.168.0.1).

[proxy] port

80

Required. The SCSP Proxy listen port. The port must be on the SCSP Proxy machine's external interface.

[proxy] reportHosts

True

Enables or disables the ability to detect the addresses of cluster nodes. Valid case- sensitive values are True and False.

[proxy] validationMode

False

Whether or not the SCSP Proxy should be running in validation mode or sending received requests to the Swarm cluster.

 

False

Determines whether or not the SCSP Proxy returns the IP addresses of Swarm nodes located in a different subnet.

  • False (default) means SCSP Proxy returns IP address of Swarm nodes only if the nodes are in the same subnet as the SCSP Proxy.

  • True means all Swarm IP addresses are returned, even if those nodes are in a subnet different from the SCSP Proxy.

 

localhost

Syslog server's fully qualified host name or IP address. Comment out this parameter if you are using file-based logging instead of syslog (e.g. 192.168.0.2).

 

514

The port number of the syslog server to send log messages to

 

stdout

Local log file name if you are not using syslog. To use file-based logging, the value of host must be null or the parameter must be absent.

 

0

The number of bytes allowed for all file-based log files. The default value, 0, means the log file size is unlimited. Comment out this parameter if you use remote logging.

 

40

One of the following log levels, listed from most to least verbose:

  • 10 = Debug (Request data, locator, and connection pool)

  • 20 = Info (Request, and response tracing at a header level)

  • 30 = Warn (Run-time warnings)

  • 40 = Error (System-level errors (failed to get IP address, port, host name, and so on for an existing connection), protocol errors (missing Host header, invalid body data received), configuration errors (bad remote host definition, invalid SCSP Proxy configuration), dependency failures (such as Avahi), caught exceptions, connection pool errors, internal state watching errors)

Note: More verbose log levels include all information in all levels below it. Debug includes all information in Info, Warn, and Error.

 

0

The log facility (that is, channel) to log to when using a syslog. Valid values are 0-6.

 

localhost

The IP address to use in log message headers.

 

200

The number of connections the SCSP Proxy saves for reuse. DataCore recommends you set this parameter to five times the number of Swarm cluster nodes. Valid values are 200 (minimum) to 4000 (maximum).

 

60

The length of time, in seconds, to wait for a connection to be made or maintained. Lack of response for longer than the specified timeout result in the connection being closed by Proxy. This timeout may need to be increased for unusually long operations, like a COPY operation on a large object.

 

300

The length of time, in seconds, an unused connection remains in the connection pool

 

80

The local Swarm cluster listen port. The value of the cluster's scspport configuration parameter in the node or cluster configuration file.

Note: This value must match the other SCSP Proxy's [proxy] port configuration parameter if this SCSP Proxy is configured to communicate with another SCSP Proxy.

 

none

Used by the StaticLocator (starting the SCSP Proxy with – staticlocator). If this is how you use the SCSP Proxy, set the value of this parameter to a space-separated list of IP addresses of hosts in the local Swarm cluster: “192.168.0.11 192.168.0.12”
Comment this parameter out otherwise.

 

none

The name of the local Swarm cluster for use in discovering node IP addresses. This must match the configured cluster parameter in the cluster's node or cluster configuration file.
If the SCSP Proxy is installed on a Cluster Services Node (CSN), the cluster parameter is set during the initial network bootstrap process:
cluster.example.com

 

/etc/caringo/scspproxy/ hosts.cfg

The path to the location of the hosts.cfg file on the local server.

The following is an example of the scspproxy.cfg file:

Configuring hosts.cfg

The file hosts.cfg controls how the SCSP Proxy communicates with remote proxies or clusters. It consists of a list of all known remote clusters, each configured with the following values on a single line per cluster:

Option Name

Description

Option Name

Description

ClusterName

A string that identifies the remote cluster or SCSP Proxy. You must use this name in code that sends SCSP commands to the remote cluster. You can set ClusterName to either the cluster name or DNS name for the remote cluster but that is not required. (The remote cluster's name is specified by the value of the cluster.name configuration parameter in the node or cluster configuration file.)

ClusterName cannot contain whitespaces and cannot be a 32-character hexadecimal string (to avoid confusion with a UUID).

RemoteAddress

The IP address for any node in the remote cluster. Cannot contain whitespaces.

Note: If this SCSP Proxy is being configured to communicate with another SCSP Proxy, this value must match the value of the other SCSP Proxy's [proxy] interface parameter in its scspproxy.cfg configuration file.

Port

The port on which the remote SCSP Proxy listens for incoming requests. This is usually port 80. Cannot contain preceding or trailing whitespaces. The value of RemotePort must
match the value of the scspport configuration parameter of the remote cluster.

Note: If this SCSP Proxy is being configured to communicate with another SCSP Proxy, this value must match the value of the other SCSP Proxy's [proxy] port parameter in its scspproxy.cfg configuration file.

RemoteAdminName

The name of an administrator that belongs to the CAStor administrators group for the remote Swarm cluster. This value cannot contain whitespace.

RemoteAdminPassword

The administrator's password. This option can contain whitespace as long as it is not leading or trailing.

The following is an example of the hosts.cfg file:

Running the SCSP Proxy

After the configuration file has been updated, start and stop the SCSP Proxy either way discussed below. You must run these commands as a user with root privileges.

  • If the SCSP Proxy is installed on a CSN, or if it is installed on RHEL and you do not want to use optional startup options, use the start/stop scripts:

  • Start the SCSP Proxy using proxyservice.py to use startup options (see table). Run the command from the directory where the install file was unzipped (or from any location the configured Python points to):

proxyservice.py startup option

Meaning

proxyservice.py startup option

Meaning

Start the SCSP Proxy and use the static locator.

Access the SCSP Proxy configuration file from the specified location.

Specify the location of the SCSP Proxy service Process Identifier (PID) file.
The default location is /var/run/ scspproxy.pid.

Tip — To validate if the SCSP Proxy is running in front of a Swarm cluster, point your browser to the SCSP Proxy's IP address and port. If you see a Swarm status page, the SCSP Proxy is working correctly.

Validation Mode

In addition to the basic validation included in the Swarm SDK, the SCSP Proxy includes a validation mode that provides validation for incoming SCSP requests. This validation checks the syntax for common operations like adding query arguments and creating lifepoint headers. The SCSP Proxy does not discern between clients created using the Swarm SDK and those created without it.

Validation mode and execution mode are mutually exclusive. Requests are not sent to Swarm while in validation mode but are instead analyzed by the SCSP Proxy for a response. If a query fails validation, it returns an error describing the reason for the failed validation.

These are the response codes in validation mode:

  • 200 for GET, HEAD, and DELETE

  • 201 for POST, PUT, COPY, and APPEND Errors are returned as one of the following codes:

  • 400 for any validation error for a known method

  • 50x is returned by the SDK for general request errors such as bad host name, unknown method, HTTP syntax errors, and so on.

A client may return other response codes for these types of errors.

Outside of validation mode, invalidheaders or query arguments are either silently ignored or trigger an error at the Swarm server layer and return as SCSP errors to the client.

To enable validate mode in the SCSP Proxy, change the value of the validationMode parameter in scspproxy.cfg file to True.

The following items are currently checked by the SCSP Proxy while in validation mode. Validation mode supports validation of lifepoint time inputs in RFC 1123 format.

Validation

GET

HEAD

POST

DELETE

PUT

COPY

APPEND

Validation

GET

HEAD

POST

DELETE

PUT

COPY

APPEND

Header: Lifepoint (date)





V



V

V



Header: Lifepoint (deletion)





V



V

V



Header: replica count





V



V

V



Header: Host

V

V

V

V

V

V

V

Header: Content- Length





V



V



V

Header: Allow





V



V

V



Header: Content- Type





V



V

V



Header: Content- Disposition





V



V

V



Query Argument: replicate





V



V

V

V

Query Argument: admin

V

V

V

V

V

V

V

Query Argument: alias

V

V

V

V

V

V

V

Query Argument: domain

V

V

V

V

V

V

V

Query Argument: validate

V













Query Argument: hashtype

V



V



V

V

V

Query Argument: hash

V













Query Argument: newhashtype

V













Query Argument: countreps



V











Query Argument: indirect

V













Uninstalling the SCSP Proxy

Enter the following command as a user with root privileges to uninstall the SCSP Proxy:

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.