Intel Gaudi 2: Strong Performance Competition for Nvidia

Nvidia and Intel are fierce competitors in the AI accelerator market, particularly when it comes to training and inference. According to recent research conducted by Databricks, Intel Gaudi 2 technology has proven to be a powerful competitor to Nvidia’s industry-leading AI accelerators. The study reveals that Gaudi 2 matches the latency of Nvidia H100 systems on decoding for large language model (LLM) inference, while also outperforming the Nvidia A100. Gaudi 2 also achieves higher memory bandwidth utilization than H100 and A100. However, Nvidia still holds the advantage in terms of training performance on its top-end accelerators.

When it comes to training performance, Databricks’ researchers found that Gaudi 2 ranks second in single-node LLM performance after Nvidia H100, with over 260 TFLOPS/chip. In terms of public cloud pricing, Gaudi 2 also offers the best dollar-per-performance for both training and inference when compared to A100 and H100. These findings from Databricks provide third-party validation for Intel’s Gaudi technology and its performance capabilities. Abhinav Venigalla, lead NLP architect at Databricks, expressed his admiration for Gaudi 2’s performance, particularly its high utilization achieved for LLM inference. He also noted the potential for further performance gains with Gaudi 2’s FP8 support in their latest software release, although they were only able to examine performance using BF16 due to time constraints.

Intel’s Confidence in Gaudi’s Performance

The positive performance numbers reported by Databricks come as no surprise to Intel. Eitan Medina, COO at Habana Labs (an Intel company), emphasized that the findings align with Intel’s own measurements and feedback from customers. Medina further stated that such publication reviews are crucial for raising awareness among customers about Gaudi as a viable alternative to Nvidia’s offerings. Intel acquired AI chip startup Habana Labs, along with its Gaudi technology, in 2019 for $2 billion. Since then, Intel has been continually enhancing the technology to meet the demands of the AI market.

To establish performance credibility, vendors like Nvidia and Intel often rely on industry-standard benchmarks such as MLPerf. Both companies actively participate in the MLPerf benchmarks for training and inference, showcasing their abilities to set new speed records. However, Medina stressed that many customers conduct their own testing to ensure hardware and software compatibility for specific models and use cases. While MLPerf serves as a maturity filter for technology stacks, it is not the sole factor driving customers’ decision-making process. The maturity of the software stack plays a significant role, as customers are cautious about vendor optimization tailored to meet specific benchmarks.

Gaudi’s Future: Gaudi 3

Intel is preparing to launch Gaudi 3, the next generation of its AI accelerator technology, in 2024. Gaudi 2, developed using a 7-nanometer process, will be succeeded by Gaudi 3, based on a more advanced 5-nanometer process. Gaudi 3 promises four times the processing power and double the network bandwidth compared to its predecessor. Intel aims to establish dominance in terms of performance, performance per dollar, and performance per watt with Gaudi 3. Medina revealed that Gaudi 3 will be launched and enter mass production in 2024, creating massive advancements in AI acceleration.

Looking beyond Gaudi 3, Intel is focused on developing future generations that unify its high-performance computing (HPC) and AI accelerator technologies. The company recognizes the enduring value of its CPU technologies in AI inference workloads and recently announced its 5th Gen Xeon processors with AI acceleration. Medina highlighted the significant role of CPUs in inference, stating that even fine-tuning can yield advantages in terms of performance in CPUs.

Intel’s Gaudi 2 technology presents robust performance competition to Nvidia’s AI accelerators. Databricks’ research validates Gaudi 2’s capabilities in terms of inference and training performance, matching Nvidia’s latency while exhibiting higher memory bandwidth utilization. Intel’s commitment to continuous improvement is highlighted by its forthcoming launch of Gaudi 3, poised to deliver unparalleled performance, efficiency, and capabilities in the AI accelerator sector. As Intel merges its HPC and AI accelerator technologies, it remains dedicated to innovation and meeting the ever-increasing demands of the AI industry.

Intel’s Confidence in Gaudi’s Performance

Gaudi’s Future: Gaudi 3

Articles You May Like

Leave a Reply Cancel reply