NVIDIA announced several edge computing partnerships and products on Nov. 11 ahead of The International Conference for High Performance Computing, Networking, Storage and Analysis (aka SC22) on Nov. 13-18.
The High Performance Computing at the Edge Solution Stack includes the MetroX-3 Infiniband extender; scalable, high-performance data streaming; and the BlueField-3 data processing unit for data migration acceleration and offload. In addition, the Holoscan SDK has been optimized for scientific edge instruments with developer access through standard C++ and Python APIs, including for non-image data.
SEE: iCloud vs. OneDrive: Which is best for Mac, iPad and iPhone users? (free PDF) (TechRepublic)
All of these are designed to address the edge needs of high-fidelity research and implementation. High performance computing at the edge addresses two major challenges, said Dion Harris, NVIDIA’s lead product manager of accelerated computing, in the pre-show virtual briefing.
First, high-fidelity scientific instruments process a large amount of data at the edge, which needs to be used both at the edge and in the data center more efficiently. Secondly, delivery data migration challenges crop up when producing, analyzing and processing mass amounts of high-fidelity data. Researchers need to be able to automate data migration and decisions regarding how much data to move to the core and how much to analyze at the edge, all of it in real time. AI comes in handy here as well.
“Edge data collection instruments are turning into real-time interactive research accelerators,” said Harris.
“Near-real-time data transport is becoming desirable,” said Zettar CEO Chin Fang in a press release. “A DPU with built-in data movement abilities brings much simplicity and efficiency into the workflow.”
NVIDIA’s product announcements
Each of the new products announced addresses this from a different direction. The MetroX-3 Long Haul extends NVIDIA’s Infiniband connectivity platform to 25 miles or 40 kilometers, allowing separate campuses and data centers to function as one unit. It’s applicable to a variety of data migration use cases and leverages NVIDIA’s native remote direct memory access capabilities as well as Infiniband’s other in-network computing capabilities.
The BlueField-3 accelerator is designed to improve offload efficiency and security in data migration streams. Zettar demonstrated its use of the NVIDIA BlueField DPU for data migration at the conference, showing a reduction in the company’s overall footprint from 13U to 4U. Specifically, Zettar’s project uses a Dell PowerEdge R720 with the BlueField-2 DPU, plus a Colfax CX2265i server.
Zettar points out two trends in IT today that make accelerated data migration useful: edge-to-core/cloud paradigms and a composable and disaggregated infrastructure. More efficient data migration between physically disparate infrastructure can also be a step toward overall energy and space reduction, and reduces the need for forklift upgrades in data centers.
“Almost all verticals are facing a data tsunami these days,” said Fang. “… Now it’s even more urgent to move data from the edge, where the instruments are located, to the core and/or cloud to be further analyzed, in the often AI-powered pipeline.”
More supercomputing at the edge
Among other NVIDIA edge partnerships announced at SC22 was the liquid immersion-cooled version of the OSS Rigel Edge Supercomputer within TMGcore’s EdgeBox 4.5 from One Stop Systems and TMGcore.
“Rigel, along with the NVIDIA HGX A100 4GPU solution, represents a leap forward in advancing design, power and cooling of supercomputers for rugged edge environments,” said Paresh Kharya, senior director of product management for accelerated computing at NVIDIA.
Use cases for rugged, liquid-cooled supercomputers for edge environments include autonomous vehicles, helicopters, mobile command centers and aircraft or drone equipment bays, said One Stop Systems. The liquid inside this particular setup is a non-corrosive mix “similar to water” that removes the heat from electronics based on its boiling point properties, removing the need for large heat sinks. While this reduces the box’s size, power consumption and noise, the liquid also serves to dampen shock and vibration. The overall goal is to bring transportable data center-class computing levels to the edge.
Energy efficiency in supercomputing
NVIDIA also addressed plans to improve energy efficiency, with its H100 GPU boasting nearly two times the energy efficiency versus the A100. The H100 Tensor Core GPU based on the NVIDIA Hopper GPU architecture is the successor to the A100. Second-generation multi-instance GPU technology means the number of GPU clients available to data center users dramatically increases.
In addition, the company noted that its technologies power 23 of the top 30 systems on the Green500 list of more efficient supercomputers. Number one on the list, the Flatiron Institute’s supercomputer in New Jersey, is built by Lenovo. It includes the ThinkSystem SR670 V2 server from Lenovo and NVIDIA H100 Tensor Core GPUs connected to the NVIDIA Quantum 200Gb/s InfiniBand network. Tiny transistors, just 5 nanometers wide, help reduce size and power draw.
“This computer will allow us to do more science with smarter technology that uses less electricity and contributes to a more sustainable future,” said Ian Fisk, co-director of the Flatiron Institute’s Scientific Computing Core.
NVIDIA also talked up its Grace CPU and Grace Hopper Superchips, which look ahead to a future in which accelerated computing drives more research like that done at the Flatiron Institute. Grace and Grace Hopper-powered data centers can get 1.8 times more work done for the same power budget, NVIDIA said. That’s compared to a similarly partitioned x86-based 1-megawatt HPC data center with 20% of the power allocated for CPU partition and 80% toward the accelerated portion with the new CPU and chips.
For more, see NVIDIA’s recent AI announcements, Omniverse Cloud offerings for the metaverse and its controversial open source kernel driver.