Overcoming the Biggest Risk to AI/HPC That No One Is Talking About: A Don’t-Miss Session at 7×24 Exchange Fall

The biggest risk to the data center industry’s major investments in AI/HPC is something that no one is talking about. It’s flying under the radar, but we need to talk about it if we want to prevent it from escalating into a crisis for the industry.

The biggest risk is not access to power and land for facilities. And it’s not access to capital or supply chain bottlenecks. Those are all significant challenges, but many of the smartest people in the room have been working on those since demand for high-density computing began accelerating nearly four years ago. The biggest risk no one is talking about is all about people.

The operational environment for high-density GPUs and servers is radically different from traditional enterprise/cloud servers. Direct-to-Chip liquid cooling is required to achieve adequate heat rejection, but introducing liquid to any data center brings tremendous risk. Even brief interruptions to liquid cooling can lead to temperature spikes that cause extremely expensive damage to AI/HPC equipment. Leaks in the liquid cooling systems can rapidly cause catastrophic equipment failures and operator safety. And any incident of downtime comes at a high price in terms of lost revenue, SLA penalties and reputational damage. Those risks are real, and the price tag for those incidents is sky-high.

Current operational methodologies and employee skills simply aren’t a match for AI/HPC. To prevent those expensive incidents above, organizations need a fundamentally new operational model, and they need to have operational teams with entirely new skillsets. Without those key ingredients, AI/HPC investments and momentum are going destined to struggle.

That is why my colleague John Shultz’ presentation at 7×24 Exchange Fall this week is one of the most important sessions on the conference’s program. On October 20th, John’s session, “AI-Driven Direct-to-Chip Liquid Cooling: Operational Excellence, Best Practices, and World-Class Operator Training” will provide attendees with a blueprint for establishing an operational model for AI/HPC that mitigates the significant risks I discussed above. His session will also map out the critical skillsets that operations teams need to truly be ready for de-risking Direct-to-Chip liquid cooling and AI/HPC operations.

John is the perfect person to be leading the charge on this issue. He has spent the last decade working closely with GPU manufacturers like NVIDIA, leading financial investors in AI/HPC, data center operators, liquid cooling companies and others to become a preeminent expert in this issue. If you’re at 7×24 Exchange Fall, be sure to attend his session.

Or if you’re not attending the conference, please schedule a time to talk with John after the event to tap into his expertise in AI/HPC operational best practices.

Please also download his new white paper about AI/HPC Direct-to-Chip liquid operations, which has invaluable information for mitigating risks for your AI/HPC investments. You can download the white paper here: https://salute.com/ai-hub/.

By Bob Proko Vice President of AI / HPC Data Center Strategy & Development.

Salute on LinkedIn

Follow for news and insights

You might find these articles interesting