Cloud

Nvidia GTC: Nvidia pivots from training to inference powerhouse

By Mitch Wagner Mar 17, 2026 2:20pm

Nvidia is pivoting to inference, where analyst Jack Gold estimates 80-85% of AI workloads will land
Vera Rubin's Groq 3 LPU integration delivers 35x inference throughput gains
Nvidia's GTC keynote focused more on full-stack AI systems and software than chips, signaling a fundamental shift

NVIDIA GTC, SAN JOSE, CALIF. — Nvidia wants to own the next phase of AI. After dominating AI training, it now wants to own inference. But the economics of that next phase are fundamentally different, analyst Jack Gold said.

"Nvidia is trying hard to reposition itself as the inference AI company, after it spent so much time being the premier training company over the past few years — and especially now that there are so many newcomers pushing into inference," said Gold, founder and principal analyst at J.Gold Associates, in an email bulletin. "We estimate 80%-85% of AI workloads will be inference in the next one to two years, so Nvidia must be a major player there."

The economics of inference are much different than training. Organizations deploying inference at scale won't commit capital at the same magnitude as those building frontier models from scratch. Nvidia's answer is a message about its Vera Rubin platform: even as system costs rise, the architecture drives down the cost per token — the fundamental unit of AI output.

Inference is cost-sensitive

Inference is a cost-sensitive compute structure, much like cloud hosting, Gold said. Convincing customers that expensive Nvidia systems can generate far more tokens, and therefore more revenue, is now a core strategic imperative — particularly as hyperscalers including Google, Amazon Web Services and Microsoft Azure develop their own custom silicon, and chip rivals such as Cerebras make inroads in the inference market.

Vera Rubin is the centerpiece of Nvidia's emerging AI strategy. Vera Rubin combines the new Vera CPU, Rubin GPU and the Groq 3 LPU — the first chip from Nvidia's $20 billion Groq acquisition — into a rack-scale design Nvidia claims delivers up to 35 times higher inference throughput per megawatt compared to GPU-only configurations.

Embracing OpenClaw

Nvidia's embrace this week of OpenClaw, the open-source AI agent framework, is also strategically significant. Alongside endorsing the platform, Nvidia announced NemoClaw, an enterprise-hardened deployment path designed to give IT and compliance teams a sanctioned option and head off unsanctioned "shadow AI" adoption.

Nvidia also announced an AI-RAN partnership with T-Mobile, reframing cell towers as distributed inference platforms. That got less keynote time than it deserved, given its implications for carrier revenue and the future of physical AI over 5G and eventually 6G, Gold said.

In another move to capture a share of that telco opportunity, Nvidia and Cisco unveiled an AI Grid reference architecture built on Nvidia RTX PRO Blackwell GPUs that pushes inference capacity deeper into carrier networks and closer to end users. Comcast is already trialing three use cases on the platform.

Overall, Nvidia's chip story is receding in favor of a systems story, Gold said. "It's interesting that Nvidia spent far more time on the overall AI factory and systems message than its chips," he said. "It shows it's now a full-stack AI systems company with hardware and software assets."

Nvidia Nvidia GTC AI Conference artificial intelligence (AI) inference T-Mobile Jack Gold Jensen Huang Groq data center Dell Cloud

Nvidia GTC: Nvidia pivots from training to inference powerhouse

Inference is cost-sensitive

Related

Related

Embracing OpenClaw

Related