Joerg Hiller
Apr 11, 2025 10:56
AMD unveils its Pensando AI NICs, promising scalable AI infrastructure with high performance and flexibility, meeting the demands of next-gen AI workloads.
In a significant move to bolster AI infrastructure, AMD has announced the release of its Pensando Pollara 400 AI NICs, designed to meet the growing demands of AI and machine learning workloads. The new AI network interface cards (NICs) promise to offer scalable solutions that cater to the performance needs of AI clusters while maintaining flexibility, according to AMD.
Addressing AI Infrastructure Challenges
As the demand for AI and large language models increases, there is a pressing need for parallel computing infrastructure that can effectively handle high-performance requirements. A major challenge has been the network bottleneck that hampers GPU utilization. AMD’s new AI NICs aim to overcome this by optimizing the intra-node GPU-GPU communication network in data centers, thus enhancing data transfer speeds and overall network efficiency.
Features of Pensando AI NICs
The Pensando Pollara 400 AI NICs are described as the industry’s first fully programmable AI NICs. They are built to align with emerging Ultra Ethernet Consortium (UEC) standards, providing customers with the ability to program the hardware pipeline using AMD’s P4 architecture. This allows for the addition of new capabilities and custom transport protocols, ensuring that AI workloads can be accelerated without waiting for new hardware generations.
Some key features include:
- Transport Protocol Options: Supports RoCEv2, UEC RDMA, or any Ethernet protocol.
- Intelligent Packet Spray: Enhances network bandwidth utilization with advanced packet management techniques.
- Out-of-Order Packet Handling: Reduces buffer time by managing out-of-order packet arrivals efficiently.
- Selective Retransmission: Improves network performance by resending only lost or corrupted packets.
- Path-Aware Congestion Control: Optimizes load balancing to maintain performance during congestion.
- Rapid Fault Detection: Minimizes GPU idle time with quick failover mechanisms.
Open Ecosystem and Scalability
AMD emphasizes the advantage of an open ecosystem, allowing organizations to build AI infrastructures that are easily scalable and programmable for future demands. This approach not only reduces capital expenditure but also avoids dependency on expensive switching fabrics, making it a cost-effective solution for cloud service providers and enterprises.
The Pensando Pollara 400 AI NIC has been validated in some of the largest scale-out data centers globally. Its programmability, high bandwidth, low latency, and extensive feature set have made it a preferred choice for cloud service providers looking to enhance their AI infrastructure capabilities.
Image source: Shutterstock