Summary of "NVIDIA SC25 Fireside Chat"

Summary of NVIDIA SC25 Fireside Chat

Overview

Ian Buck, NVIDIA’s Vice President and General Manager of Hyperscale and HPC, delivered a keynote reflecting on NVIDIA’s supercomputing journey since 2006. He highlighted the evolution and future directions of accelerated computing in simulation, AI, and quantum computing. The session included a panel discussion with leaders from major supercomputing centers and labs, focusing on technological advances, ecosystem development, and future challenges.

Key Technological Concepts and Product Features

NVIDIA’s Supercomputing Evolution
- CUDA introduced in 2006 at SC (Supercomputing Conference).
- Innovations such as immersion cooling (2010), MVLink interconnects (2016), and the GraceHopper CPU-GPU system (2022).
- Transition from PCIe GPU cards to side-mounted GPUs for better IO and multi-GPU connectivity.
- Current dominance: 88 of the top 100 supercomputers accelerated with NVIDIA GPUs.
Three Pillars of Modern Supercomputing
- Simulation: Traditional scientific simulations of natural phenomena at scales/timeframes impossible for physical experiments.
- AI: Explosive growth since 2016, applying AI to scientific discovery (e.g., climate modeling, CFD, seismology, drug discovery).
- Quantum Computing: Early stage, with GPUs helping simulate quantum algorithms (e.g., Shor’s algorithm at 50 qubits) and quantum processor design.
CUDA-X Ecosystem
- Over 6,000 CUDA-accelerated applications across disciplines.
- 600 million downloads, 7 million developers, 1,000+ SDKs and AI models.
- Specialized libraries for different domains: KU Litho (circuit modeling), KUP (path planning), Physics Nemo (AI physics), Hollow Scan (radio astronomy), Alchemy (material science).
Notable Application Highlights
- Max Planck Institute’s Earth climate simulation at 1 km resolution simulating 146 days per day on 20,000+ Grace Hopper chips.
- Luminary Cloud’s AI physics model for spacecraft nozzle design reducing CFD runtime from hours to seconds.
- NASCAR CFD simulations accelerated from 11 hours to 48 minutes using NVIDIA GPUs.
- Real-time interactive digital twin of fusion reactor (General Atomics) built in NVIDIA Omniverse, integrating sensor data, physics, engineering, and AI.
- Real-time tsunami forecasting with AI-powered digital twins using seismic data.
NVIDIA Warp
- A Python package for physics simulations.
- Supports automatic differentiation enabling integration with AI frameworks (PyTorch, JAX).
- Allows physics simulations that can be optimized and solved interactively.
Apollo AI Models
- A new family of pre-trained AI models focused on physics-based simulations (CFD, manufacturing, structural analysis).
- Demonstrated with aircraft wing design reducing simulation time from 8 hours to 5 seconds on RTX GPUs.
- Open-source with workflows for fine-tuning and inference.
Quantum Computing Integration
- GPU supercomputers simulate quantum algorithms at unprecedented scale.
- NVIDIA Q-Link: a reference architecture to process quantum processor signals.
- Collaboration with 23+ supercomputing centers worldwide on hybrid quantum-GPU-CPU systems.
- Full-chip simulation of quantum processors using thousands of GPUs.
MVLink and Interconnect Innovations
- MVLink: a high-throughput, low-latency shared memory fabric connecting GPUs, CPUs, and accelerators.
- Enables 10x performance improvement in AI inference throughput compared to previous generations.
- Upcoming systems (Reuben Ultra) will scale to 576 chips interconnected via MVLink.
- Introduction of co-packaged optics to reduce power consumption and improve reliability in data center interconnects.
- MVLink Fusion IP enables third-party CPUs/accelerators (e.g., ARM) to join the MVLink ecosystem.
NVIDIA Vera CPU
- Next-gen NVIDIA-designed ARM-based CPU core (Olympus).
- Twice the performance of Grace CPU.
- Supports 1.2–2 TB/s LPDDR5X memory bandwidth.
- Integrated chip-to-chip MVLink interconnect at 1.5 TB/s.
- Energy-efficient and optimized for HPC workloads like CFD and weather modeling.
- Central to NSF Horizon supercomputer.
New and Upcoming Supercomputers - Horizon (NSF’s largest academic supercomputer in the US) combining Grace Blackwell GPUs and Vera CPUs. - Solstice and Equinox at Argonne National Labs with 100,000 and 10,000 GPUs respectively. - Fugaku Next in Japan using NVIDIA GPUs and Fujitsu CPUs. - European AI factories and gigafactories with 100,000+ GPUs as part of public-private partnerships. - 80+ new supercomputers announced at SC25 with over 300,000 GPUs and multiple zettaflops of compute.

Analysis and Insights from Panel Discussion

AI and HPC Integration
- Transition from batch job HPC to interactive, hybrid AI-HPC workflows.
- Shift from compute-centric to data-centric HPC ecosystems.
- AI as an integral tool across all scientific disciplines, including those not traditionally using HPC.
- Importance of coupling AI surrogates with HPC simulations for efficiency and new discovery modes.
Software and Libraries
- Growing complexity of scientific codes necessitates reliance on well-maintained, optimized libraries.
- AI is transforming software development workflows (e.g., AI-assisted testing).
- Need for community effort to curate and update numerical libraries for evolving hardware.
- Open source and collaborative ecosystem critical for progress.
Supercomputer Design and Operation Challenges
- Power and cooling remain major constraints for scaling supercomputers.
- Networking and interconnect design are key to achieving high performance at scale.
- Software stability and maturity are as important as hardware innovation.
- Public-private partnerships help leverage cloud and commercial expertise for scientific supercomputing.
Quantum Computing Outlook
- Quantum computers will be integrated as accelerators within HPC systems.
- Near-term advances expected in quantum annealers and simulators.
- Logical qubit scale-up anticipated in 4–6 years to solve classically intractable problems.
- Hybrid quantum-classical programming models and hardware co-design are critical research areas.
Future Directions and Needs
- Focus on maintaining and evolving core numerical and AI libraries.
- Data curation and management to handle the scientific data deluge.
- Developing hybrid AI-HPC workflows and interactive supercomputing environments.
- Efficient utilization of hardware through mixed precision and emulation techniques.
- Extending AI models to handle noisy, high-dimensional scientific data governed by physical laws.
- Continued investment in software ecosystems and open frameworks to lower barriers for scientists.

Tutorials, Guides, and Resources Highlighted

NVIDIA Warp: Python physics simulation package with auto-differentiation, downloadable and demonstrated at the show.
Apollo Models: Pre-trained AI physics models with open workflows for fine-tuning and inference, including real-time CFD simulation for aircraft wing design.
Hollow Scan: Open-source software for real-time AI filtering of radio astronomy data.
NVIDIA Alchemy: AI models for chemistry and materials science, enabling exploration of vast chemical spaces.
CUDA-X Libraries: Extensive ecosystem of CUDA-accelerated libraries and SDKs supporting diverse scientific applications.

Main Speakers / Sources

Ian Buck: Vice President and General Manager, NVIDIA Hyperscale and HPC – keynote speaker and moderator.
Thomas Lippert: Director, Jülich Supercomputing Centre (ULIC).
Jenny Rumpesh: Argonne National Laboratory, involved with Solstice supercomputer.
Dan Siozi: Executive Director, Texas Advanced Computing Center (TACC), hosting Horizon supercomputer.
Muhammad Wahib: Team Principal, RIKEN Center for Computational Science, involved with Fugaku Next.

Conclusion

The NVIDIA SC25 Fireside Chat showcased the rapid evolution of supercomputing driven by NVIDIA’s innovations in GPU acceleration, AI integration, and emerging quantum computing. The discussion emphasized the importance of hybrid AI-HPC workflows, advanced interconnects like MVLink, and the growing software ecosystem that supports scientific discovery across simulation, AI, and quantum domains.

Panelists highlighted ongoing challenges in power, data management, and software maturity, while expressing optimism about the future of supercomputing as a critical national and global infrastructure for science and industry.