
With the launch of the HGX H200 GPU, NVIDIA has significantly improved its artificial intelligence computing platform. The platform, which is based on the NVIDIA Hopper architecture, contains the NVIDIA H200 Tensor Core GPU with enhanced memory for the purpose of managing enormous amounts of data for generative artificial intelligence and high speed computing tasks.
The NVIDIA H200 is the first graphics processing unit (GPU) to feature HBM3e, which is a faster and larger memory designed to drive the acceleration of generative artificial intelligence and massive language models, as well as advance scientific computing for HPC workloads. With HBM3e, the NVIDIA H200 is capable of delivering 141 gigabytes of memory at 4.8 terabytes per second. When compared to its predecessor, the NVIDIA A100, this woul be roughly double the capacity and 2.4 times higher bandwidth.
It is anticipated that the first shipments of H200-powered server systems will commence around the second quarter of 2024. These systems will come from the most prominent server manufacturers and cloud service providers in the globe.
According to Ian Buck, NVIDIA's Vice President of Hyperscale and HPC, "To create intelligence with generative AI and HPC applications, vast amounts of data must be efficiently processed at high speed using large, fast GPU memory. With the NVIDIA H200, the industry's leading end-to-end AI supercomputing platform has just gotten faster to solve some of the world's most important challenges."
Continuously Pioneering New Technologies
The NVIDIA Hopper architecture would achieve a speed boost over its predecessor that has never been seen before, and it continues to push the envelope with continual software advancements with H100, such as the recent release of powerful open-source libraries like NVIDIA TensorRT -LLM.
Inference speed on Llama 2, a 70 billion-parameter LLM, would nearly double thanks to the introduction of H200, which would lead to greater performance leaps compared to H100. It is anticipated that further software updates would bring about additional performance leads and improvements with H200.
Form Factors for the NVIDIA H200
NVIDIA HGX H200 server boards will be available with four- and eight-way configurations. These server boards will be compatible with the hardware and software of HGX H100 systems. NVIDIA H200 will be available in these server boards later this year. Additionally, it may be found in the NVIDIA GH200 Grace Hopper Superchip with HBM3e, which was released in August of this year.
Because of these features, the H200 may be installed in any data center, including those that are on-premises, in the cloud, or in hybrid cloud environments. Server manufacturers that are a part of NVIDIA's global network of partner companies, such as ASRock Rack, ASUS, Dell Technologies, Eviden, GIGABYTE, Hewlett Packard Enterprise, Ingrasys, Lenovo, QCT, Supermicro, Wistron, and Wiwynn, are able to include an H200 into their already-existing server solutions and make them more up-to-date.
In addition to CoreWeave, Lambda, and Vultr, Amazon Web Services, Google Cloud, and Microsoft Azure will be among the first cloud service providers to deploy H200-based instances beginning in 2019. Oracle Cloud Infrastructure will also be among the first cloud service providers to do so.
The HGX H200, which is powered by NVIDIA's NVLink and NVSwitch high-speed interconnects, would offer the greatest performance possible across a wide range of application workloads. This includes LLM training and inference for the largest models, which can have more than 175 billion parameters.
For the maximum performance possible in generative artificial intelligence and high-performance computing applications, an eight-way HGX H200 provides over 32 petaflops of FP8 deep learning compute and 1.1 terabytes of aggregate high-bandwidth memory.
When combined with NVIDIA Grace central processing units (CPUs) and an extremely fast NVLink-C2C connector, the H200 can produce the GH200 Grace Hopper Superchip with HBM3e. This is an integrated module that is intended to serve large-scale HPC and artificial intelligence applications.
NVIDIA Full-Stack Software
NVIDIA's accelerated computing platform is backed by powerful software tools that would make it possible for software developers and businesses to design production-ready applications ranging from AI to HPC and then accelerate those applications. This contains the NVIDIA AI Enterprise suite of tools, which can handle workloads such as hyperscale inference, recommender systems, and voice.
Availability
The NVIDIA H200 will be available from global server system manufacturers and cloud service providers (CSPs) starting in the second quarter of 2024.