Nvidia continues to gain ground in high performance and quantum computing
/Though Nvidia was built on the promise of the GPU for gaming and advanced rendering, its rise to a $1 trillion valuation was on the back of high performance compute and AI. What began as a small project called “general purpose GPU” (GPGPU) that looked at in-game physics and video transcoding applications transformed the company into the titan of the silicon space, displacing Intel as the clear thought-leader for the future of computing.
What has made Nvidia successful in this transition from being only a gaming company to one of the premier compute and AI leaders is its ability to not just build silicon, but to build an entire platform and ecosystem. It calls this the “Nvidia Scientific Computing Platform” and it’s a combination of hardware like its Hopper GPU and Grace CPU with system software like CUDA and PhysX that aim to simplify the programming models for developers. Add on to that end-to-end platforms like Nvidia Omniverse and Nvidia AI and you get applications that can spur scientific development and AI advancement, all of course optimized for Nvidia silicon.
But like all titans, they need to stay one step ahead of the curve and the struggle is to maintain the momentum that got them there. SC23 is the leading conference for high performance computing and is currently underway in Denver. It combines product announcements from hardware and software companies with keynotes from astrophysicists and academic panels. Nvidia used this event to make a couple of interesting product announcements and give some insight on how it might be thinking about the next big thing in high performance computing.
A new GPU to maintain leadership
The primary announcement from Nvidia today was the H200, a mid-generation update to the H100 Hopper GPU that makes significant advancements in the memory design. By moving from HBM3 (high bandwidth memory) to HBM3e, the new H200 can improve memory bandwidth and max memory capacity, and as a result offer significant performance improvements over the current generation H100 product.
Memory technology is often just as important to HPC systems as the GPU or CPU itself. As new AI models and high performance compute workloads like climate simulation utilize larger datasets as the input, the limit on per-GPU memory capacity can be a problem. In fact, this is one of Intel’s major talking points on its CPU-based HPC advantages where a modern Xeon processor can support up to 10x the amount of system memory.
The new Nvidia H200 increases memory capacity by 76% and memory bandwidth by 43%. Nvidia showed benchmarks that translate that change into nearly a 2x performance improvement in training a Llama2 AI large language model. Nvidia was already the market leader in this space so the improvements might seem superficial. But as Intel continues to push its GPU and Gaudi AI accelerator strategy, and AMD executes the roll out of its MI300 AI chip, its critical for Nvidia to keep driving an aggressive roadmap.
Nvidia also talked about a new set of benchmark results based on the industry standard MLPerf testing suite built by an independent machine learning and AI engineering consortium called MLCommons. In these tests, the Nvidia H200 system set six new records including one for Stable Diffusion text-to-image creation.
For investors interested in what Nvidia has planned next, the company teased performance of its upcoming Blackwell architecture and the B100 GPU slated for 2024 as offering somewhere north of 2x the performance of the H200 in a GPT-3 AI inference benchmark.
Nvidia also walked out numerous partners and customers to back up both the product claims and long-term commitments. The EuroHPC Joint Undertaking showed the plan for a supercomputer named ‘Jupiter’ with almost 24,000 GH200 nodes, the combination of Arm-based Grace CPUs and Hopper GPUs. Another was the Texas Advanced Computing Center system called ‘Vista’ that will use both GH200 Grace Hopper and Grace CPU superchips. TACC sees this move to Arm+Nvidia as a bridge to their next generation, 10x larger super computer called ‘Horizon.’
The next computing frontier – Quantum Computing
Perhaps the most substantial change to computing in our lifetime will be the move to quantum systems. Quantum computing differs from classical computing in that it depends on quantum mechanics, specifically the ideas of quantum superposition and entanglement, rather than simple electrical impulses used in classical compute.
For now, Nvidia does not have a QPU (quantum processing unit) in its stable of technology but that isn’t stopping the company from being involved in this developing technological space. Whether or not Nvidia chooses to build or perhaps buy a quantum processor at some point in the future, it is creating the supporting hardware and software ecosystem for quantum compute to make sure that it isn’t left out.
From a hardware perspective, nearly all quantum computing systems today have associated classical computing systems that are used to simulate or control the quantum systems. This can be for error calculation, systems control, or simply for secondary processing where quantum devices do not excel. Nvidia calls these “hybrid quantum systems” and it’s a way for them to ensure that quantum engineers and system researchers are working with Nvidia throughout the process.
Maybe even more ingenious is the creation of ‘CUDA Quantum’ which is the quantum equivalent to CUDA, the programming and software model that enabled Nvidia to dominate the AI and HPC spaces for the last 15 years. CUDA Quantum includes a high-level programming language that allows quantum system designers and application developers to write code that can run on both classical systems and quantum ones, diverting work between them or simply simulating the quantum portion.
During its SC23 address, Nvidia’s VP of Hyperscale and HPC Ian Buck showcased its deep partnership with the growing quantum compute ecosystem including hardware builders, research centers, and more. With a claimed 92% of the top 50 quantum startups currently utilizing Nvidia GPUs and software and 78% of companies building and deploying quantum processors using CUDA Quantum as their programming model.
To me this represents a move that no other technology company today can make; enabling the high performance compute industry that is already tailoring their software for Nvidia GPUs and ensuring that the next generation of quantum applications are created as “Nvidia-first” even before the company has a clear stake in the quantum computing hardware landscape.