We are excited to share that Meta has deployed the Arista 7700R4 Distributed Etherlink Switch (DES) for its latest Ethernet-based AI cluster. It's useful to reflect on how we arrived at this point and the strength of the partnership with Meta.
As I think about the evolution of the CloudVisionⓇ platform over the last 10 years, and our latest announcement today, I’m reminded of three principles that have guided us along our journey: full network data without compromise, platform over point product, and a modern operating model. While the product and our plans have evolved over the years, each of these principles feels incredibly relevant to the problems facing enterprises today.
In 1984, Sun was famous for declaring, “The Network is the Computer.” Forty years later we are seeing this cycle come true again with the advent of AI. The collective nature of AI training models relies on a lossless, highly-available network to seamlessly connect every GPU in the cluster to one another and enable peak performance. Networks also connect trained AI models to end users and other systems in the data center such as storage, allowing the system to become more than the sum of its parts. As a result, data centers are evolving into new AI Centers where the networks become the epicenter of AI management.
Enterprises are under pressure to meet and exceed the challenges of rapidly increasing bandwidth requirements, including AR/VR (augmented reality/virtual reality) applications, streaming multimedia, IoT proliferation, video applications and high density deployments.
Back in the early 2000s, store and forward networking was used by both market data providers, exchanges and customers executing electronic trading applications where the lowest latency execution can make the difference in a strategy from a profit to a loss. Moving closer to the exchange to reduce link latency, eliminating any unnecessary network hops, placing all feed handler and trading execution servers on the same switch to minimize transit time, and leveraging high-performance 10Gb NICs with embedded FPGAs all contributed to the ongoing effort to squeeze out every last microsecond to execute trades and gain a performance edge.
The perimeter of networks is changing and collapsing. In a zero trust network, no one and no thing is trusted from inside or outside of the enterprise network without verification or network access control (NAC). However, for years, organizations have been saddled with bolt-on NAC technologies that deliver cost complexity while failing to be effective. Instead, security-conscious organizations are shifting to a “microperimeter” enterprise that embeds security into the network infrastructure as the proactive way to defend today’s wider attack surface.
The evolution of WAN architectures has historically paralleled that of application architectures. When we primarily connected terminals to mainframes, the WAN architecture was largely point-to-point links connecting back to data center facilities. As traffic converged to remove OpEx-intensive parallel network structures, the WAN evolved to architectures that enabled site-to-site connectivity in a full mesh or configurable mesh and then enabled multi-tenancy for carrier cost optimization.
Whether it is something as simple as what kind of coffee to order for your commute to the office, which route to take to avoid traffic, or in my case, whether to support the USA or England in the 2022 world cup group stage game, we all make a myriad of choices every day.
The rapid arrival of real-time gaming, virtual reality and metaverse applications is changing the way network, compute memory and interconnect I/O interact for the next decade. As the future of metaverse applications evolve, the network needs to adapt for 10 times the growth in traffic connecting 100s of processors with trillions of transactions and gigabits of throughput. AI is becoming more meaningful as distributed applications push the envelope of predictable scale and performance of the network. A common characteristic of these AI workloads is that they are both data and compute-intensive. A typical AI workload involves a large sparse matrix computation, distributed across 10s or 100s of processors (CPU, GPU, TPU, etc.) with intense computations for a period of time. Once the data from all peers is received, it can be reduced or merged with the local data and then another cycle of processing begins.
As an industry leader in data-driven networking, Arista’s introduction of 400G platforms in 2019 intersected the emerging needs of hyper-scale cloud and HPC customers to dramatically increase bandwidth for specific ultra-high performance applications.
Arista’s EOS (Extensible Operating System) has been nurtured over the past decade, taking the best principles of extensible, open and scalable networks. While SDN evangelists insisted that the right way to build networks started with the decoupling of hardware and software in the network, manipulated by a centralized, shared controller, many companies failed to provide the core customer requisite in a clean software architecture and implementation coupled with key technical differentiation. This has been the essence of Arista EOS.
Arista has a long history of joint development with hyper-scale cloud providers delivering innovative solutions for a broad range of customers. Our integration with Google Cloud and Network Connectivity Center is a testament to that ongoing innovation and abstracting complex networking challenges making them simple and agile for IT clients worldwide.
Over a decade ago, we entered the high speed switching market with our low latency switches. Our fastest switch then, the 7124, could forward L2/L3 traffic in 500ns, a big improvement over store and forward switches that had 10x higher latency. Combined with Arista EOS®, our products were well received by financial trading and HPC customers.
I don’t know about you but I am eagerly looking forward to the new year erasing all the negativity and losses that 2020 brought to our broader lives, health and the global economy. Today I digress to make some predictions on the post-pandemic era that are likely to change the way we live, learn, work and play, blending the lines between those distinct functions we had once partitioned.
Today we are introducing the Arista 750 Series Modular Campus switch, a next generation modular platform based on merchant silicon that delivers more performance, more security, more visibility and more power capabilities than any other product in its class.
The Networking industry is undergoing a metamorphosis. Modern networking operations teams are challenged to cope with multiple operational models. As attackers become better and better at breaching our defenses, security analysts are increasingly at the heart of a security organization. The operators are responsible for detecting, investigating and remediating potential breaches before they progress into brand, customer, financial and IP damage. This confluence of DevOps, NetOps, SecOps, and CloudOps demands persistent operations control. How do you cope with decades of security, threat and cyber detection done in reactive silos? What happens as more workloads move to the cloud? At Arista, we value our ecosystem of security partners and networking must adapt to the new complex threats.
Traditional networking has been transformed by cloud-networking principles. These principles drive an open, software-first approach to efficient automation, granular telemetry, and proactive analytics that have simplified traditional network operations. At Arista, we align our product strategy to these cloud networking principles and build our products based on modern software approaches. One such approach is the network-wide state and inference-driven architecture to manage networks with CloudVision. Arista’s strategic approach to automation, analytics, and change control has made CloudVision one of the favorite choices in the menu for our enterprise customers.
Arista has a decade long history of collaboration in open networking. We have pushed the envelope, co-developed open platforms and deployed them to build the world’s largest cloud -scale networks.
Just a decade ago, public cloud titans Amazon Web Services and Microsoft Azure Cloud, became synonymous with elastic scaling, and software provisioning through APIs. This was a phenomenon that didn’t exist within closed legacy systems.
Private clouds, by contrast, saw the relevance of enterprise customers recreating an infrastructure based on public cloud principles operating at a smaller scale. In an ideal world, both clouds would allow application developers to create and choose where to deploy applications without trade-offs. Arista pioneered technology development in this cloud networking category and today with Covid-19 restrictions driving millions of users to work-from-home, there are tremendous pressures on network access and bandwidth.
CIOs today mandate a ‘Cloud First’ or a ‘Cloud Only’ model for new IT investments with three different cloud models.