The NVIDIA H100 is a high-end GPU designed specif­i­cal­ly for AI, deep learning and HPC ap­pli­ca­tions. The H100 GPU is based on in­no­v­a­tive Hopper ar­chi­tec­ture and uses powerful fourth-gen­er­a­tion Tensor Cores to deliver ex­cep­tion­al per­for­mance. Thanks to its enormous computing capacity, NVIDIA’s H100 is ideal for training complex neural networks, data-intensive cloud workloads and intricate HPC sim­u­la­tions.

What are the features of the NVIDIA H100?

The NVIDIA H100 offers an ex­cep­tion­al level of per­for­mance based on the novel Hopper ar­chi­tec­ture. This combines Tensor Core tech­nol­o­gy with a trans­former engine to provide more computing power and sig­nif­i­cant­ly ac­cel­er­ate the training of AI models. NVIDIA offers the H100 GPU in two variants, which are H100 SXM and H100 NVL.

The two versions differ in their form factor, as well as in their per­for­mance, memory bandwidth and con­nec­tiv­i­ty. The H100 SXM is primarily designed for use in high-density servers and hy­per­scale en­vi­ron­ments. The H100 NVL, on the other hand, was designed for PCIe slots, making it easier to integrate the GPU into existing server struc­tures. The following table provides a detailed overview of the per­for­mance features of the two NVIDIA H100 variants:

Per­for­mance feature NVIDIA H100 SXM NVIDIA H100 NVL
FP64 34 TFLOPS 30 TFLOPS
FP64 Tensor Core 67 TFLOPS 60 TFLOPS
FP32 67 TFLOPS 60 TFLOPS
TF32 Tensor Core 989 TFLOPS 835 TFLOPS
BFLOAT16 Tensor Core 1,979 TFLOPS 1,671 TFLOPS
FP16 Tensor Core 1,979 TFLOPS 1,671 TFLOPS
FP8 Tensor Core 3,958 TFLOPS 3,341 TFLOPS
INT8 Tensor Core 3,958 TOPS 3,341 TOPS
GPU memory 80 GB 94 GB
GPU memory bandwidth 3.35 TB/s 3.9 TB/s
Decoder 7 NVDEC, 7 JPEG 7 NVDEC, 7 JPEG
Maximal thermal design power (TDP) 700 W (con­fig­urable) 350-400 W (con­fig­urable)
Multi-instance GPU (MIG) Up to 7 MIGs with 10 GB each Up to 7 MIGs with 12 GB each
Form factor SXM PCIe with two slots and air cooling
Interface NVIDIA NVLink 900 GB/s, PCIe Gen5: 120 GB/s NVIDIA NVLink: 600 GB/s, PCIe Gen5 128 GB/s
Server options NVIDIA HGX H100 partners and NVIDIA-certified systems with 4 or 8 GPUs, NVIDIA DGX H100 with 8 GPUs Partners and NVIDIA-certified systems with up to 8 GPUs
NVIDIA AI en­ter­prise Add-on Inclusive
Note

TFLOPS (Tera Floating Point Operations Per Second) is a unit to describe the pro­cess­ing speed of computers (floating point). One TFLOPS cor­re­sponds to one trillion cal­cu­la­tions per second. The same applies to the unit TOPS (Tera Operations Per Second) - with the dif­fer­ence being that integer op­er­a­tions are rep­re­sent­ed here.

What are the ad­van­tages and dis­ad­van­tages of the NVIDIA H100?

The NVIDIA H100 is one of the most powerful GPUs on the market and has been equipped with numerous advanced tech­nolo­gies and functions. The most important ad­van­tages of the H100 GPU are:

  • Very high computing power: The H100 offers tremen­dous FP8 and FP16 Tensor Core per­for­mance, making it ideal for complex, data-intensive workloads such as large language models (LLMs). The com­bi­na­tion of fourth-gen­er­a­tion Tensor Cores and trans­former engine can sig­nif­i­cant­ly increase the ef­fi­cien­cy of AI op­er­a­tions.
  • NVLink and NVSwitch: The NVIDIA H100 supports fourth-gen­er­a­tion NVLink, which allows multiple server GPUs to be connected to each other with a bidi­rec­tion­al bandwidth of 900 GB/s. Thanks to NVSwitch, it’s also possible to flexibly scale cor­re­spond­ing clusters.
  • Multi-instance GPU (MIG): The GPU can be par­ti­tioned into up to seven in­de­pen­dent GPU instances, enabling the si­mul­ta­ne­ous execution of multiple workloads with dedicated resources. This improves flex­i­bil­i­ty and ef­fi­cien­cy in shared computing en­vi­ron­ments.
  • Con­fi­den­tial computing: Thanks to the in­te­grat­ed security function, the con­fi­den­tial­i­ty and integrity of data is protected along the entire workload.
  • HBM3 memory and PCIe Gen5 support: With up to 94 GB of HBM3 memory and a bandwidth of up to 3.9 TB/s, the NVIDIA H100 offers one of the most powerful memory solutions for data-intensive workloads. In com­bi­na­tion with PCIe Gen5, it enables very fast data transfer.

However, this proves to be a dis­ad­van­tage, as the high per­for­mance of the NVIDIA H100 is also reflected in the price. Depending on the version, the GPUs cost between 35,000 and 45,000 dollars. H100 instances are therefore also com­par­a­tive­ly expensive in cloud en­vi­ron­ments. Another dis­ad­van­tage is the limited avail­abil­i­ty. Due to high demand, there are always supply bot­tle­necks and long waiting times.

Which ap­pli­ca­tions is NVIDIA’s H100 GPU best suited to?

The NVIDIA GPU H100 was specially developed for compute-intensive workloads and is par­tic­u­lar­ly suitable for demanding AI and HPC ap­pli­ca­tions. The following overview shows the key areas of ap­pli­ca­tion for the H100 GPU:

  • Training of large AI models: Thanks to its high computing power, the GPU sig­nif­i­cant­ly ac­cel­er­ates the model training of complex neural networks and large language models such as GPT or LLaMA.
  • Real-time AI inference: The H100 can run pre-trained AI models at top speeds, which is an advantage in areas such as speech pro­cess­ing and image recog­ni­tion.
  • Cloud and data centers: GPUs form the basis of many GPU servers by providing the computing power required for complex workloads.
  • High-per­for­mance computing (HPC): Sci­en­tif­ic cal­cu­la­tions and sim­u­la­tions benefit from the high FP64 per­for­mance of the H100 graphics proces­sors.
  • Gen­er­a­tive AI: NVIDIA’s H100 is ideal for text, image and video gen­er­a­tion with AI models. The GPU enables fast and efficient pro­cess­ing of large data sets required for gen­er­a­tive AI.
  • Data analysis: The Hopper GPUs support companies in various in­dus­tries — such as logistics and finance — in deriving precise forecasts and pre­dic­tions from large volumes of data.

What are the possible al­ter­na­tives to H100 GPU?

Although the NVIDIA H100 is one of the most powerful GPUs for AI and HPC, al­ter­na­tive solutions may be available depending on the use case and budget. For example, due to higher cost ef­fi­cien­cy. Possible al­ter­na­tives include, among others:

  • NVIDIA A100: The pre­de­ces­sor model also offers solid per­for­mance for AI training, inference and HPC, but it’s less expensive.
  • NVIDIA A30: The A30 combines high per­for­mance with an af­ford­able price.
  • NVIDIA H200: The H200 is a slightly improved version of the NVIDIA H100, which has an even higher memory bandwidth.
  • Intel Gaudi 3: The AI ac­cel­er­a­tor delivers high per­for­mance for AI inference.
Note

We present the current most fre­quent­ly used graphics proces­sors in more detail in our article comparing server GPUs.

Go to Main Menu