The NVIDIA H100 is a high-end GPU designed specifically for AI, deep learning and HPC applications. The H100 GPU is based on innovative Hopper architecture and uses powerful fourth-generation Tensor Cores to deliver exceptional performance. Thanks to its enormous computing capacity, NVIDIA’s H100 is ideal for training complex neural networks, data-intensive cloud workloads and intricate HPC simulations.

What are the features of the NVIDIA H100?

The NVIDIA H100 offers an exceptional level of performance based on the novel Hopper architecture. This combines Tensor Core technology with a transformer engine to provide more computing power and significantly accelerate the training of AI models. NVIDIA offers the H100 GPU in two variants, which are H100 SXM and H100 NVL.

The two versions differ in their form factor, as well as in their performance, memory bandwidth and connectivity. The H100 SXM is primarily designed for use in high-density servers and hyperscale environments. The H100 NVL, on the other hand, was designed for PCIe slots, making it easier to integrate the GPU into existing server structures. The following table provides a detailed overview of the performance features of the two NVIDIA H100 variants:

Performance feature NVIDIA H100 SXM NVIDIA H100 NVL
FP64 34 TFLOPS 30 TFLOPS
FP64 Tensor Core 67 TFLOPS 60 TFLOPS
FP32 67 TFLOPS 60 TFLOPS
TF32 Tensor Core 989 TFLOPS 835 TFLOPS
BFLOAT16 Tensor Core 1,979 TFLOPS 1,671 TFLOPS
FP16 Tensor Core 1,979 TFLOPS 1,671 TFLOPS
FP8 Tensor Core 3,958 TFLOPS 3,341 TFLOPS
INT8 Tensor Core 3,958 TOPS 3,341 TOPS
GPU memory 80 GB 94 GB
GPU memory bandwidth 3.35 TB/s 3.9 TB/s
Decoder 7 NVDEC, 7 JPEG 7 NVDEC, 7 JPEG
Maximal thermal design power (TDP) 700 W (configurable) 350-400 W (configurable)
Multi-instance GPU (MIG) Up to 7 MIGs with 10 GB each Up to 7 MIGs with 12 GB each
Form factor SXM PCIe with two slots and air cooling
Interface NVIDIA NVLink 900 GB/s, PCIe Gen5: 120 GB/s NVIDIA NVLink: 600 GB/s, PCIe Gen5 128 GB/s
Server options NVIDIA HGX H100 partners and NVIDIA-certified systems with 4 or 8 GPUs, NVIDIA DGX H100 with 8 GPUs Partners and NVIDIA-certified systems with up to 8 GPUs
NVIDIA AI enterprise Add-on Inclusive
Note

TFLOPS (Tera Floating Point Operations Per Second) is a unit to describe the processing speed of computers (floating point). One TFLOPS corresponds to one trillion calculations per second. The same applies to the unit TOPS (Tera Operations Per Second) - with the difference being that integer operations are represented here.

What are the advantages and disadvantages of the NVIDIA H100?

The NVIDIA H100 is one of the most powerful GPUs on the market and has been equipped with numerous advanced technologies and functions. The most important advantages of the H100 GPU are:

  • Very high computing power: The H100 offers tremendous FP8 and FP16 Tensor Core performance, making it ideal for complex, data-intensive workloads such as large language models (LLMs). The combination of fourth-generation Tensor Cores and transformer engine can significantly increase the efficiency of AI operations.
  • NVLink and NVSwitch: The NVIDIA H100 supports fourth-generation NVLink, which allows multiple server GPUs to be connected to each other with a bidirectional bandwidth of 900 GB/s. Thanks to NVSwitch, it’s also possible to flexibly scale corresponding clusters.
  • Multi-instance GPU (MIG): The GPU can be partitioned into up to seven independent GPU instances, enabling the simultaneous execution of multiple workloads with dedicated resources. This improves flexibility and efficiency in shared computing environments.
  • Confidential computing: Thanks to the integrated security function, the confidentiality and integrity of data is protected along the entire workload.
  • HBM3 memory and PCIe Gen5 support: With up to 94 GB of HBM3 memory and a bandwidth of up to 3.9 TB/s, the NVIDIA H100 offers one of the most powerful memory solutions for data-intensive workloads. In combination with PCIe Gen5, it enables very fast data transfer.

However, this proves to be a disadvantage, as the high performance of the NVIDIA H100 is also reflected in the price. Depending on the version, the GPUs cost between 35,000 and 45,000 dollars. H100 instances are therefore also comparatively expensive in cloud environments. Another disadvantage is the limited availability. Due to high demand, there are always supply bottlenecks and long waiting times.

Which applications is NVIDIA’s H100 GPU best suited to?

The NVIDIA GPU H100 was specially developed for compute-intensive workloads and is particularly suitable for demanding AI and HPC applications. The following overview shows the key areas of application for the H100 GPU:

  • Training of large AI models: Thanks to its high computing power, the GPU significantly accelerates the model training of complex neural networks and large language models such as GPT or LLaMA.
  • Real-time AI inference: The H100 can run pre-trained AI models at top speeds, which is an advantage in areas such as speech processing and image recognition.
  • Cloud and data centers: GPUs form the basis of many GPU servers by providing the computing power required for complex workloads.
  • High-performance computing (HPC): Scientific calculations and simulations benefit from the high FP64 performance of the H100 graphics processors.
  • Generative AI: NVIDIA’s H100 is ideal for text, image and video generation with AI models. The GPU enables fast and efficient processing of large data sets required for generative AI.
  • Data analysis: The Hopper GPUs support companies in various industries — such as logistics and finance — in deriving precise forecasts and predictions from large volumes of data.

What are the possible alternatives to H100 GPU?

Although the NVIDIA H100 is one of the most powerful GPUs for AI and HPC, alternative solutions may be available depending on the use case and budget. For example, due to higher cost efficiency. Possible alternatives include, among others:

  • NVIDIA A100: The predecessor model also offers solid performance for AI training, inference and HPC, but it’s less expensive.
  • NVIDIA A30: The A30 combines high performance with an affordable price.
  • NVIDIA H200: The H200 is a slightly improved version of the NVIDIA H100, which has an even higher memory bandwidth.
  • Intel Gaudi 3: The AI accelerator delivers high performance for AI inference.
Note

We present the current most frequently used graphics processors in more detail in our article comparing server GPUs.

Was this article helpful?
Go to Main Menu