Rate limiting of S3 listings

There is a Content Gateway patch implementing experimental rate limiting of S3 listing requests to prevent overloading of the elasticsearch cluster.

Installation

The latest rpm will be provided in a support ticket, named like below. Just install it on your current Content Gateway 7.10.7 server. There is no official release of Gateway 7.10.8 but this patch uses that version to ease yum install and downgrade. There will likely be updates to this patch and note that all configuration, apis and behavior will change before it is available in an official release in an 8.0-based branch.

yum install caringo-gateway-7.10.8-0.CLOUD.3331.ratelimitlistings.7108.0003.noarch.rpm

Verify with yum list installed | grep caringo and curl http://GATEWAY/_admin/manage/version that version is now installed and running.

Rollback can be done with yum downgrade caringo-gateway-7.10.7-1.noarch.rpm but that should not be necessary as there is no rate limiting enabled by default.

Configuration

The maximum number of concurrent delimiter listings and non-delimiter listings can be configured separately, since delimiter listings are usually the problem queries for elasticsearch. Add this to your /etc/caringo/cloudgateway/gateway.cfg and systemctl restart cloudgateway.

[debug] auditLogVersion = 4 # Requires a special pre-release rpm from DataCore Support: # caringo-gateway-7.10.7-0.CLOUD.3331.ratelimitlistings.0005.noarch.rpm # This Gateway will allow a maximum of 50 delimiter listings. Additional requests will # wait 5 seconds before returning an S3 503 SlowDown response to the client. # Up to 100 non-delimiter listing requests are allowed with additional requests waiting # the default 10 seconds before returning a 503 SlowDown response to the client. rateLimitListings = delimiter:50,5 nondelimiter:100

Dynamic configuration

The rate limiting configuration can also be changed dynamically using the below PUT apis, without requiring a Gateway restart.

Since all Gateways are independent these requests must be issued to each Gateway server. The Gateway reverts to the rateLimitListings configuration in gateway.cfg on restart. Consider having your load balancer direct all listing requests (a GET with query args like delimiter, prefix, max-keys, marker, or continuation-token) to one or two Gateways, for better control.

For example if elasticsearch cluster is having severe problems you can prevent any further delimiter listings, with no wait time, and allow only 10 concurrent non-delimiter listings with:

curl -u dcadmin -X PUT http://GATEWAY/_admin/manage/_ratelimit/listings/delimiter/0,0 curl -u dcadmin -X PUT http://GATEWAY/_admin/manage/_ratelimit/listings/nondelimiter/10

This can be done on a running Gateway without affecting any other requests like object GET and PUT. Keep in mind backup software and tools like rclone copy will likely stop if unable to list. Consider have your load balancer send requests in a critical domain to a separate gateway configured with a higher limit.

You can DELETE any rate limiting configuration dynamically, to no longer rate limit listings, with:

curl -u dcadmin -X DELETE http://GATEWAY/_admin/manage/_ratelimit/listings/all

Monitoring

It’s good to tail cloudgateway_audit.log on the individual Gateways or on the centralized SCS/CSN syslog server to monitor the listing times and responses:

You can see the current rate limit configurations and see a list of active listing requests with:

The formatting of the list of listings needs to be improved.

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.