- Microsoft debuted its Maia 200 AI accelerator chip and system
- The chip has a beefy amount of memory and an Ethernet-based Interconnect system
- It could help telcos offer differentiated AI services and lower costs — if Microsoft decides to make the chip available to partners
Cloud giant Microsoft just came out with Maia 200, the second generation of its custom AI accelerator. But it’s not just another chip. Analysts said Maia could potentially provide a way for telcos to escape the dumb pipe trap and boost enterprise AI performance without increasing costs.
The new chip is the second from Microsoft, following its Maia 100 chip that was introduced in 2023. But there’s a notable difference: Maia 200 is Microsoft’s first “silicon and system platform optimized specifically for AI inference,” Microsoft’s Saurabh Dighe wrote. That means it was designed for efficiency, both in terms of its ability to deliver tokens per dollar and performance per watt of power used.
In concrete terms, Maia 200 can deliver “30% better performance per dollar than the latest generation hardware in our fleet today,” Microsoft EVP for Cloud and AI Scott Guthrie wrote in a blog.
Other headline features include a beefy memory subsystem, a network-on-chip for communication across clusters and memory subsystems, and an Ethernet scale-up Interconnect based on an optimized AI Transport Layer (ATL) that supports standard Ethernet.
Let’s break that down.
Gartner VP Analyst Chirag Dekate said the amount of memory in Maia is greater chips offered by Microsoft’s peers. All that memory means Microsoft can “enable thinking and reason workloads that [today] are inherently memory bandwidth and memory capacity constrained,” Dekate said. If you’ve been following our coverage of the memory wall issue, you know this is a big deal.
“Overall, this chip is designed for extreme scale inference performance,” he said
Telco potential
Maia 200’s Ethernet Interconnect could be good news for telcos looking to offer value added services and escape the dumb pipe data mover trap, Dekate said.
“There is an opportunity to engage in strategic dialogue, strategic conversations with Microsoft” to potentially host a Maia instance or Maia-powered data centers in their mix, he said. Doing so would enable them to deliver new, higher-performing agentic AI capabilities (think sovereign AI) with better energy and cost efficiency.
It could take a while for any potential partnerships to bear fruit though.
Microsoft has already deployed Maia 200 in its U.S. Central Azure data center region and plans to light it up in its U.S. West-3 region next. But it appears to be initially using the chip for internal use cases, like making Copilot work better.
Microsoft has made the Maia SDK available, meaning enterprises will eventually be able to run their workloads on the chip, but that capability will probably be rolled out over time. Maia will probably initially be made available to a select set of enterprise customers before eventually becoming generally available.
The cost question
Dekate said he expects first order impacts of Maia 200’s rollout to come in the form of better results for enterprises and telcos using Microsoft Copilot. But as Maia 200 eventually becomes available for enterprise workloads, customers will be able to run more complex workloads using data they have stored in Microsoft Azure.
“They’re going to be able to use and run more complex agentic workflows, more reliable agentic workflows” across a range a models at a better cost, Dekate said.
And that cost bit is important. J. Gold Associates Founder Jack Gold told Fierce that as inference workloads ramp, the cost of operations will become an increasingly important factor for companies looking to run AI. That’s especially so for Azure users.
“For users of Azure, cost is an issue, especially heavy users,” Gold told Fierce. “Microsoft has created a chip it sees as the most capable for giving it a cost structure for AI inference workloads that is lower than the use of general purpose GPUs.” Essentially, he said, it has cut out the middleman by having the chip made for them directly rather than buying chips from Nvidia or AMD.
But it remains to be seen what Microsoft will do with those savings: whether they will lower costs for customers or parlay the savings into higher margins for Azure instances, Gold said.
Dekate agreed.
“Not only is it efficient, but cost profiles are likely going to be very complementary,” he said. “Now, it’ll be up to Microsoft to decide and define their pricing models…but theoretically, the capability is there.”