Most tech enthusiasts and industry experts were eagerly anticipating Nvidia's latest unveiling, and the tech giant did not disappoint. Nvidia has officially revealed the Blackwell B200 GPU, touted as the most powerful chip for AI to date. The new GPU marks a significant leap forward in the world of artificial intelligence and computing power.
According to Nvidia, the Blackwell B200 GPU offers an impressive 20 petaflops of FP4 horsepower from its staggering 208 billion transistors. Additionally, the company introduced the GB200 "superchip," which combines two B200 GPUs with a Grace CPU to provide a remarkable 30 times the performance for LLM inference workloads while also being more energy-efficient. Nvidia claims that the GB200 can reduce cost and energy consumption by up to 25 times compared to its predecessor, the H100.
One of the standout features of the Blackwell B200 GPU is its capability to drastically improve the efficiency and speed of training large AI models. Nvidia CEO announced that training a 1.8 trillion parameter model that previously required 8,000 Hopper GPUs and 15 megawatts of power can now be accomplished with just 2,000 Blackwell GPUs consuming only four megawatts.
On a benchmark test with a GPT-3 LLM model containing 175 billion parameters, Nvidia claims that the GB200 offers seven times the performance of an H100 and four times the training speed. The key improvements in the new GPU include a second-gen transformer engine that doubles compute, bandwidth, and model size by using four bits for each neuron instead of eight.
Another noteworthy advancement is the next-gen NVLink switch that allows for 576 GPUs to communicate with each other, delivering 1.8 terabytes per second of bidirectional bandwidth. Nvidia had to develop an entirely new network switch chip with 50 billion transistors and 3.6 teraflops of FP8 onboard compute to support this capability.
The Blackwell B200 GPU is designed to be scalable for large-scale deployments, such as the GB200 NVL72 rack, which combines 36 CPUs and 72 GPUs for a total AI training performance of 720 petaflops or 1.4 exaflops of inference. Major cloud service providers like Amazon, Google, Microsoft, and Oracle are already planning to integrate the NVL72 racks into their offerings.
Overall, Nvidia's Blackwell B200 GPU represents a significant leap forward in AI computing power, offering unprecedented performance and efficiency for training large-scale models. With its advanced features and capabilities, Nvidia is poised to maintain its leadership in the AI chip market and drive innovation in artificial intelligence applications.