Flow Distribution Across ECMP Paths
ECMP is crucial for scaling and performance in modern data centers and wide-area networks, which rely on hash-based path selection. It leverages path diversity and keeps a flow’s packets on the same path, preventing reordering with useful properties like stateless operation and no reordering.
While simple and widely used, ECMP has some limitations. For example, it does not always distribute traffic evenly across all available paths. However, due to its ease of hardware implementation, ECMP remains the predominant approach. The core enabler for ECMP is hashing, which allows packet-by-packet path selection in a distributed manner across switches. ECMP limitations have also started getting more attention with the surge in building GPU clusters but fabrics suffer from Poor hashing due to a lack of flow entropy.
In this post, we’ll dive into ECMP and use statistical analysis to better understand the limitations.
Introduction
Here is a simplified explanation of how the lookup process functions. We aim to perform a prefix lookup that directs us to a specific ECMP Group listed in the ECMP group table. Each of these ECMP groups contains ECMP member counts for the ECMP group. A hash function takes Packet fields i.e. our typical five tuple (Source Continue reading