Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
minLevel1
maxLevel2
outlinefalse
typelist
printablefalse

...

  • k:p: Defines the encoding, where 

    • k (data segments) drives the footprint: An EC object's data footprint in the cluster approximates this value: size * (k+p)/k

    • p (parity segments) is protection: Choose the protection level needed, two or higher; p=2 and p=3 are most common.

    • k+p (total segments) is the count of segments: The original object can be reconstructed if any p segments are lost.

  • Manifests: Segments are tracked in a manifest, which is itself protected with p+1 replicas, distributed across the cluster.

  • Sets of Sets: Very large EC objects (or incrementally written objects) are broken up into multiple EC sets because any segment that's over the size limit triggers another level of EC. Each set has its own k:p encoding, and the overall request combines them all in sequence.

See Elastic Content Protection with Erasure Coding

How Many Nodes are Needed?

...

EC Profile

Formula

Example: 5:2 

Notes

Manifest minimum

p+1

2 + 1 = 3

Basic requirement for storing manifests.

Segment minimum

ceil((k+p)/p)

ceil((5 + 2) / 2) = 4

Objects can be read (but not written) if one node is lost or offline.

Per 5:2, four nodes allow 2+2+2+1 segment distribution because Swarm allows two segments per node.

Info

Info

The ceiling (ceil) means the integer that is greater than or equal to the result.

Recommended protection

ceil((k+p)/p +p)

ceil((5 + 2) / 2 + 2) = 6

Objects can be read and written if one node is lost or offline.

Info

Info

The ceiling (ceil) means the integer that is greater than or equal to the result.

High protection

k+p

5 + 2 = 7

Objects can be read and written even if two entire nodes are lost or offline.

High performance

(k+p)*2

(5 + 2) × 2 = 14

Recommended for best performance and load distribution (load-balancing becomes easier as clusters expand).

...

  1. Good-Enough Encoding: Do not over-protect. The more nodes are involved, the more constraints on EC write to succeed and the more overhead is created.

    • Keeping k+p small reduces the overhead of EC writes.

    • Keeping k small reduces the overhead of EC reads.

  2. Consistent Scaling: The rule of thumb is to scale erasure coding and add one additional node for each ceil((k+p)/p)+1 node.

  3. Faster Nodes: As a rule, an EC read/write is limited by the slowest node, and there is a significant constant expense to set up connections.

  4. More Nodes: Having more nodes in the cluster than needed for an encoding allows the cluster to better load-balance.

...