Using the s3cmd command-line tool with Content Gateway S3

The s3cmd command-line utility is a popular open-source tool. http://s3tools.org/s3cmd

It has two main uses with Content Gateway:

  • Easy command-line syncing of files to and from a Swarm domain

  • Help with diagnosing and verifying a Content Gateway environment

Installing and Configuring s3cmd

The .s3cfg file configures the s3cmd utility so that it can access your Caringo Content Gateway domain. In this example, the domain you've created and want to access is mydomain.example.com and the Gateway S3 endpoint is running at 192.168.99.100:80.

Important: Your machine must be able to resolve the domain name as the Content Gateway S3 gateway IP address. In a production environment, this would involve DNS configuration of wildcard domains, but you can simply edit your hosts file when using s3cmd locally. 

  1. Using OS X brew or python pip install s3cmd Windows: Install python 2.7 and pip. For more info see README

    sudo pip install s3cmd
  2. Verify that that s3cmd is version 1.5.2 or later:

    s3cmd --version
  3. Edit your /etc/hosts (or c:\Windows\System32\etc\hosts) file and add a mapping for your domain to your Content Gateway IP address.

          192.168.99.100 mydomain.example.com

  4. Edit your ~/.s3cfg file and paste into it all of these settings. Note: if you don't increase part size here, use command-line argument --multipart-chunk-size-mb=100 on s3cmd put/sync:

    # This should be your ~/.s3cfg file. It configures the s3cmd utility
    # to access your Swarm Content Gateway domain. 
    [default]
    access_key = {access-key-for-token}
    secret_key = {secret-key-for-token}
    # Must use default port 80 to avoid "S3 error: 403 (SignatureDoesNotMatch)".
    # Or you can use a custom S3 port if you configure V2 signatures below.
    host_base = mydomain.example.com:80
    host_bucket = mydomain.example.com:80
    # Below format might be needed under older s3cmd versions, but requires wildcard dns.
    #host_bucket = %(bucket)s.mydomain.example.com:80
    signature_v2 = True
    check_ssl_certificate = False
    use_https = False
    # Important for improving Swarm performance and reducing storage overhead!
    multipart_chunk_size_mb = 100

  5. Remember to replace "mydomain.example.com:80" in all places with your actual Content Gateway domain and S3 port!

  6. Generate a new access key (token) via the Content Portal or a command-line curl, e.g.:

    # Create an S3 token that expires in 90 days, assumes gateway's scsp port is 8081
    $ curl -v -u "caringoadmin" -X POST --data-binary "" -H "X-User-Secret-Key-Meta: secret" -H "X-User-Token-Expires-Meta: +90" "http://mydomain.example.com:8081/.TOKEN/"

  7. Set access_key to the 32-character token uuid and set secret_key to the secret string that was used.

  8.  You're now ready to use s3cmd to list and create buckets, and copy files in or out.

    # List all your buckets in the domain
    $ s3cmd ls ...

    # Problems connecting, signature mismatch? Show debug 
    # output to see exactly what's sent and returned.
    $ s3cmd ls -d 

    # Download all the files from your "images" bucket
    $ mkdir headshots && s3cmd get -r s3://images headshots

    # Generate a signed url that expires in an hour
    $ s3cmd signurl s3://mybucket/file.html +3600
    http://mbyucket.mydomain.example.com:80/file.html?AWSAccessKeyId=0e71169c9ab10b293bda2b454bf20c35&Expires=1447998649&Signature=KKwTgl0x%2Fk96jaPzp60LQ97ozO0%3D
    The bucket can be moved from the hostname into the path. It always outputs "http", but you can use "https" -- make sure your front-end proxy routes requests with the "AWSAccessKeyId" query arg to the Content Gateway S3 port.

    # List S3 multipart uploads in progress that were begun in 2015 and delete them, including parts:
    $ s3cmd multipart s3://inbox | grep '^2015-' | sed 's/ /%20/g' | awk -F$'\t' '{print $2, $3}' | xargs -p -r -t -n 2 s3cmd abortmp


The following S3 multipart / SCSP parallel write requests rely on internal implementation details that will change and are intended for diagnostic use only.

CloudScaler 4.x (S3 Multipart)

# SCSP: list S3 multipart uploads in progress
$ curl -u "${myusername}" 'http://mydomain.example.com:8081/?content-type=application/caringo-multipart-id&fields=x-multipart-id,x-multipart-part-meta,X-Multipart-Content-Bucket-Meta,X-Multipart-Object-Meta,name,tmborn,etag,content-md5,content-type,X-Multipart-Content-type-Meta&stype=unnamed&format=json&sort=x-multipart-id-meta,x_multipart_part_meta'
...

{"content_type":"application/caringo-multipart-id", "name":"4bbc3b023f5d8e38d8da5064a9168d5d", "x_multipart_object_meta":"3076_20151017201832_mwi_9_3.iso", "hash":"4a66ed2e13c8a2b5e5165a288d8d02b2", "last_modified":"2015-11-17T18:18:33.898100Z", "x_multipart_content_type_meta":"application/octet-stream"},
...

# SCSP: And you can list the uploaded parts for a specific "upload id":
$ curl -u "${myusername}" 'http://mydomain.example.com:8081/?x-multipart-id-meta=4bbc3b023f5d8e38d8da5064a9168d5d&fields=x-multipart-id-meta,x-multipart-part-meta,X-Multipart-Bucket-Meta,X-Multipart-Object-Meta,name,tmborn,etag,content-md5,content-type,X-Multipart-Content-type-Meta&stype=unnamed&format=json&sort=x-multipart-id-meta,x_multipart_part_meta&size=10000'
...

{"content_type":"application/caringo-multipart-part", "name":"97d528ebcb0545248ed57980f562a062", 
"x_multipart_id_meta":"4bbc3b023f5d8e38d8da5064a9168d5d", "x_multipart_part_meta":"02479", "x_multipart_bucket_meta":"inbox", "x_multipart_object_meta":"biglogs.tgz", "hash":"97d528ebcb0545248ed57980f562a062", "content_md5":"fD8MJjqMOwoUBNuSYz586A==", "last_modified":"2016-01-05T08:16:23.042100Z"},
... 

Swarm 9 (SCSP parallel write) / Gateway 5.x (SCSP parallel write and S3 Multipart)

# SCSP: list multipart uploads in progress (POST-initiated or PUT-initiated)
$ curl -i --location-trusted 'Host:mydomain.example.com' 'http://${SWARM_ENDPOINT}/?stype=all&castor_system_partnumber=0&fields=context,name,tmborn,content-length,castor_system_uploadid,castor_system_partnumber&format=json&sort=tmborn:ASC'

# Direct to elasticsearch query to list the uploadIds of all uploads in progress, 
# even if initiated ("part 0") stream is missing
$ curl -i -XPOST "http://ELASTICSEARCH:9200/CARINGO-CLUSTER-NAME/IMMUTABLE/_search?pretty" -d '{ "size" : 0, "aggregations" : { "castor_system_uploadid" : { "terms" : { "field" : "castor_system_uploadid" } } } }'

...

{
"took" : 3,
"timed_out" : false,
...
"hits" : {
"total" : 5645,
...
},

"aggregations" : {
"castor_system_uploadid" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "f8e96f441d2e32b57a8f3a3af84dc00ad7f9644799b818a158ad60b25abe3ac6d7f9644799b818a158ad60b25abe3ac60U",
"doc_count" : 2048
}, {
"key" : "93e9937cf0b1e1e282b17d9b3c2fae301fe01052b949300bde8d1ed34c69507f1fe01052b949300bde8d1ed34c69507f0U",
"doc_count" : 1863
}, {

"key" : "f207289dae46079bd182a9c3a41bb8993f10b199ea37143f3dcb1fa062a40d083f10b199ea37143f3dcb1fa062a40d081P",
"doc_count" : 965
}, {
"key" : "f207289dae46079bd182a9c3a41bb899e7be48f920601b8c3f1a4f4ece5e7a3be7be48f920601b8c3f1a4f4ece5e7a3b1P",
"doc_count" : 449
}, {
"key" : "0fb87a6d6c64af9db6e315ba76980da236afdcbe99ccff8f310637bede00b77c36afdcbe99ccff8f310637bede00b77c0U",
"doc_count" : 289
}, {
"key" : "5539b3f8ad46a76b5f54a892c02e41032284fe283d4a8724597c58b1a34287de2284fe283d4a8724597c58b1a34287de1P",
"doc_count" : 10
...

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.