TL;DR:
- Cisco is championing Ethernet as the foundation of AI networks.
- They lead the Ultra Ethernet Consortium to advance Ethernet for AI.
- Nexus 9000 switches form the core of Cisco’s AI infrastructure strategy.
- Key technologies like ROCEv2 and ECN enhance Nexus AI networking.
- Automation scripts simplify network configuration.
- Silicon One processors offer a unified solution for routing and switching.
- Cisco addresses 400G integration, power consumption, and sustainability.
- Nexus Dashboard provides real-time insights for data center operations.
Main AI News:
In the fast-paced realm of technological innovation, where data is the lifeblood of progress, Cisco is charting a visionary course towards a future where Ethernet reigns supreme as the bedrock of artificial intelligence (AI) networks. With a rich history of active contributions to Ethernet development within the IEEE and other industry forums, Cisco has now assumed a leading role as a core contributor to the Ultra Ethernet Consortium (UEC). This consortium is tirelessly focused on enhancing the physical, link, transport, and software layers of Ethernet to empower it with the robust capabilities necessary to support the ever-evolving landscape of AI infrastructures.
Thomas Scheibe, Vice President of Product Management at Cisco’s cloud networking division, Nexus & ACI product line, underlines the pressing need for organizations to leverage AI technology to unlock the value hidden within their colossal reservoirs of data. “Customers seek guidance on optimizing their networking infrastructure to accommodate the colossal clusters of GPUs and the surging data volumes they generate,” Scheibe affirms. In response, Cisco has meticulously devised a blueprint that illuminates how existing data center Ethernet networks can seamlessly accommodate AI workloads today.
Central to Cisco’s AI blueprint is the Nexus 9000 data center switches. These formidable switches boast support for up to 25.6 terabits per second of bandwidth per ASIC, delivering the requisite low latency, congestion management mechanisms, and telemetry capabilities crucial for AI and machine learning applications. Supported by tools like Cisco Nexus Dashboard Insights and Nexus Dashboard Fabric Controller, the Nexus 9000 switches emerge as ideal platforms for constructing high-performance AI/ML network fabrics.
Pioneering technologies facilitating Nexus AI-based networking encompass the switch’s NX-OS operating system support for Remote Direct Memory Access Over Converged Ethernet, version 2 (ROCEv2), and Explicit Congestion Notification (ECN). ROCEv2 revolutionizes network computing, enabling direct data transfer between device memories without involving a server CPU, thus reducing latency and enhancing throughput. ECN, on the other hand, fosters a lossless Ethernet network by actively monitoring and managing network congestion, ensuring uninterrupted data flow.
Further enhancing congestion management, Priority Flow Control plays a pivotal role in Layer 3-based networks, particularly in prioritizing mission-critical AI workloads. These technologies collectively empower Ethernet networks to prioritize vital workloads, safeguarding AI tasks from packet loss even amidst network congestion.
Cisco has gone the extra mile by furnishing customers with automation scripts, streamlining network configuration and fabric setup. Additionally, Nexus 9000 switches offer built-in telemetry capabilities for issue correlation and optimization of RoCEv2 transport, thus ensuring peak network performance.
Beyond the realm of Nexus 9000, Cisco unveils its high-end, programmable Silicon One processors tailored for large-scale AI/ML infrastructures in enterprises and hyperscale environments. These processors, such as the 5nm 51.2Tbps Silicon One G200 and 25.6Tbps G202, provide a single-chipset solution for routing and switching, eliminating the need for diverse silicon architectures.
At the heart of Silicon One is its robust support for advanced Ethernet features, including enhanced flow control, congestion management, and avoidance. Moreover, its load-balancing capabilities and “packet-spraying” mechanisms ensure even traffic distribution, minimizing congestion and latency. Hardware-based link-failure recovery guarantees uninterrupted network operation, ensuring efficiency at its zenith.
Combining these enhanced Ethernet technologies results in the creation of a “Scheduled Fabric,” where all physical components collaborate seamlessly, offering optimal scheduling behavior and vastly improved bandwidth throughput, particularly for AI/ML workloads.
While AI dominates the conversation, data center network operators face a trio of formidable challenges: the efficient integration of 400G networks, reducing power consumption, and bolstering sustainability practices. To address these concerns, Cisco offers solutions such as the Network Energy Utilization service within Nexus Cloud, providing insights into environmental impact. Furthermore, the Nexus Dashboard delivers real-time and historical power consumption data, empowering data centers to make informed, eco-conscious decisions.
In summation, Cisco’s unwavering commitment to evolving Ethernet into the linchpin of AI networks reflects a strategic vision that not only meets today’s demands but also anticipates the future needs of organizations striving for excellence in the realm of artificial intelligence.
Conclusion:
Cisco’s strategic focus on enhancing Ethernet for AI networks signifies a pivotal shift in the market. By providing advanced technologies and comprehensive solutions, Cisco is poised to empower organizations to harness the full potential of AI while optimizing network performance and sustainability. This strategic vision positions Cisco as a leader in shaping the future of AI infrastructure.