SwarmFS Troubleshooting

Required

To use ganesha_mgr for these troubleshooting steps, first install the RPM package nfs-ganesha-utils.

General Troubleshooting

Start with this to begin: Can a client inside the ACL mount successfully? 

mount server:/export/nfs /export/nfs

Check these things if not, or if receiving a permission denied error:

  1. Is iptables is allowing access through the firewall, if any?

  2. Is SELinux is blocking access? (see next section)

    /usr/sbin/setroubleshootd grep httpd /var/log/messages
  3. Are the portmap and nfs services running?

  4. Can NFS statistics be viewed through nfsstat?

  5. Can exported file systems be viewed through exportfs?

SELinux Status

By default, SELinux does not allow any access to remote content. Run this status command to verify SELinux is disabled:

sestatus

One of these SELinux booleans needs to be enabled if SwarmFS with SELinux enabled is desired to run :

  • nfs_export_all_ro - allows file systems to be exported read-only

  • nfs_export_all_rw  - allows file systems to be exported read-write

  • use_nfs_home_dirs - allows home directories to be exported over NFS

Set this with the setsebool utility:

Persistent .nfsXXXX Files

Per POSIX standards, Ganesha does not physically delete files that are open at the time they are unlinked. It hides them by a mechanism known as "silly rename": the unlinked files are kept in the same directory but renamed to the form .nfsXXXX (with XXXX being a random number). These files are cleaned up after the last application using them closes the file handles. These files may linger indefinitely if for some reason this does not occur.

Add a cron job that periodically looks for and deletes such files to verify no "silly" files persist and consume storage space.

Changing Logging Levels

SwarmFS logs to /var/log/ganesha.log by default. The logging level for SwarmFS defaults to NIV_EVENT to optimize read performance.

Find Level

Run the appropriate command to determine the current log level for the SwarmFS plugin or all Ganesha components:

Change Level

Edit the /etc/sysconfig/ganesha file to change the logging level permanently. These are supported levels:

  • NIV_EVENT - SwarmFS default, for best performance.

  • NIV_INFO - Prints all logs below the level, such as NIV_FATAL, NIV_MAJ, NIV_CRIT, NIV_WARN, and NIV_EVENT.

  • FULL_DEBUG - Enable for troubleshooting.
    Best Practice: Enable debug temporarily without restarting Ganesha using these commands:

  • Start Debug: Run the appropriate command to enable debug logging for the SwarmFS plugin or all Ganesha components:

Note

COMPONENT_ALL is the default for components with no individual log level set.

  • Stop Debug: Run the appropriate command to turn off debug logging for the SwarmFS plugin or all Ganesha components:Start Debug: Run the appropriate command to enable debug logging for the SwarmFS plugin or all Ganesha components:

Failure to Load Export Configuration

SwarmFS may not be loading the configuration if, after starting Ganesha, client root export mounts [mount {server}:/ {/mntpoint}] list /bkt, .

  1. Start Ganesha manually in the foreground.     

  2. Wait 20 seconds. Expect output similar to the following if all is working:

  3. Look for one set of Remove Export with id 1, Remove Export with x, and Add Export for each of the configured exports. Proceed if these complete sets do not display.

  4. Verify SwarmFS can retrieve the central configuration:

  5. Navigate to Settings > NFS and locate the Configuration URL in the Swarm UI:  

  6. Use cURL to verify the configuration file can be manually retrieved:

  7. Resolve the issue and then restart Ganesha manually in the foreground to verify the configuration file cannot be manually retrieved using cURL.

Client Mounts with Empty Listings

Follow these steps if client mounts show empty listings (and matching content exists):

  1. Navigate to Settings > NFS and verify both the export details and the authentication in the Swarm UI.

  2. Verify the bucket can be accessed using cURL from the SwarmFS server using the configured export details:

  3. Verify Elasticsearch (as defined in the Search host(s) export field) can be accessed from the SwarmFS server if the bucket can be accessed:

Listing Exports and Clients

Exports

To list active exports from the SwarmFS server, run the following command:

Clients

To list active clients from the SwarmFS server, run the following command:

Matching Requests between SwarmFS and Storage Logs

An implementation can have large numbers of unrelated parallel NFS requests. Enable verbose (DEBUG) logging and make use of these labels logs can be traced through if, for troubleshooting, storage requests need to be traced back to individual SwarmFS files being read and written:

  • request-type prefix

  • fileid

  • download/upload id

  • part number

Caution

Do not enable DEBUG logging any longer than necessary if exports are mounted directly on the SwarmFS server.

Missing Custom Header

It may be due to having an invalid name if an expected custom header is missing from an object. SwarmFS skips malformed custom headers silently.

See https://perifery.atlassian.net/wiki/spaces/public/pages/2443822018 for the rules of custom header naming in Swarm Storage.

Users Lost Permissions

If after a few hours a user becomes unable to read or write files, despite having permissions, session authorization may need to be enabled in the SwarmFS exports.

To have normal reads, writes, and attribute updates go through session authorization, superuser access needs to be set up, which is necessary for numerous operations:

  • Directory management (create, delete, rename)

  • File renaming

  • Certain :metadata writes

This is how to enable session-specific authorization for 2.1 and higher:

  1. First, to create session authorization, configure token admin credentials in NFS (user + pass, or token).

  2. Next, verify one of the following:

    • Specify a user with full access granted by the applicable policy in the User Credentials of the NFS export configuration.

    • Verify the token admin as full access granted by the applicable policy.

Performance Issues

See also Optimizing Performance in https://perifery.atlassian.net/wiki/spaces/public/pages/2443810509.

Symptom

Actions

Symptom

Actions

Gateway is overloaded and experiencing
timeouts from excessive SwarmFS requests

Reconfigure the client (such as Samba) to use larger blocksizes (buffers) to transfer data, such as 1 MB or higher. (NFS-785)

Performance for larger files is lagging

Increase the storage setting ec.SegmentConsolidationFrequency to 100. (NFS-786)

Check whether the storage cluster is nearly full, and add capacity; increasing this setting generates additional trapped space.

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.