Q&A: Why AI speed - not scale - will define next global digital economy

AI’s first act was all about scale. Bigger models, more parameters, more GPUs, more data centers. For the past several years, the industry has been locked in an arms race defined by size — who could train the largest models, raise the most capital and pour the most concrete. But as AI moves out of the lab and into the real economy, a different constraint is asserting itself: speed.

Inference latency, not training scale, is becoming the limiting factor for AI’s next phase. When AI shifts from being an occasional novelty to a system that humans and machines rely on continuously — inside software development, enterprise operations, industrial control systems and autonomous workflows — seconds matter. Then milliseconds matter. Eventually, delays become existential.

That shift is why Cerebras has emerged as one of the most closely watched companies in AI infrastructure. Rather than scaling out clusters of GPUs, Cerebras took a radically different approach: collapsing an entire data-centre-class architecture onto a single wafer-scale chip -- albeit a really big chip -- eliminating switches, cabling, and much of the latency that defines conventional AI systems. Investors have taken notice. In early 2026, Cerebras closed a $1 billion Series H round, bringing total capital raised to nearly $3 billion and valuing the company at roughly $23 billion.

In this wide-ranging conversation, Cerebras CEO Andrew Feldman explains why AI speed — especially at inference — will define the next global digital economy; why agentic AI turns latency into a hard economic boundary; and why many of the industry’s most comfortable assumptions, including long-standing software moats, are beginning to erode. Along the way, we veer into Bell Labs, quantum-computing hype, space-based data centers, China’s infrastructure strategy, and the uncomfortable reality that AI is becoming industrial infrastructure — not a consumer toy.

What emerges is a picture of an AI future shaped less by scale for its own sake, and more by realism, discipline and unforgiving physics.

Interview: Andrew Feldman, CEO, Cerebras

Steve Saunders: Hey Andrew, how are you?

Andrew Feldman: I’m doing well, Steve. Good to see you.

Saunders: It’s been a few years since we last chatted, and you’re sort of putting the lie to that old saying that there are no second careers in American business. You’ve always been successful, but you’re having this staggering success with Cerebras right now. For anyone who’s been living at the bottom of a well, give us the view from the satellite. What is Cerebras, and why is it so important to the communications and AI industry?

Feldman: We’re an AI computer company. We design processors and build computer systems that are specialized in making AI fast.

We chose a very unusual path to do this. We built what is essentially the largest chip in history — a chip the size of a dinner plate. Normally, size is the enemy in chip design. You don’t want big chips in phones or laptops. But AI has very special characteristics, and we realized we could accelerate it for less power and less money using a very large chip.

By doing that, we eliminate switches, a ton of cabling, and a lot of other gunk that adds latency, cost, and power draw. The result is the fastest training and the fastest inference in the world, bar none.

Saunders: AI feels like it’s moved very quickly from novelty to something much more serious. What changed?

Feldman: AI went from being something you’d say, “That’s cool, do it once,” to “I use this every day, 25 times a day.” When that happens, you can’t wait 30 seconds for a response.

If you’re engaging with AI constantly — say in a coding IDE [Integrated Development Environment] — you can’t wait 20 seconds, 10 seconds, even five seconds. You want real-time engagement. As the world really began using AI, the performance advantage of our platform became obvious. People feel it. They see it.

That’s a big reason we’ve done pretty well over the last several months.

A data center on a chip

Saunders: You’re basically doing Fantastic Voyage (1966 film) — shrinking a data center down and putting it on silicon.

Feldman: That’s actually not a bad analogy.

If you think about what a data center is, it’s lots of small compute tiles — CPUs or GPUs — tied together with cables into clusters. If instead you could put a million tiles on a single piece of silicon, you could avoid all those cables and switches. You’d have a data center on a chip.

That was the core idea. For 75 years in the computer industry, that was a holy grail. Nobody had ever successfully yielded a chip this big.

It wasn’t easy. It took a long time. It took some gut-wrenching mistakes. It took a lot of capital. But the result is extraordinary.

Saunders: Whose decision was this? When did someone first write this on a whiteboard and say, “Let’s try this”?

Feldman: Around April of 2016. My co-founders and I came to the conclusion that we could do it and that it would provide an enormous advantage.

At that point, we weren’t talking about large language models. The market was convolutional neural networks. The transformer architecture didn’t exist yet. But what we knew was that AI had characteristics very different from most compute problems — and that GPUs would struggle as the market unfolded.

Saunders: You stuck your finger in the air ten years ago, looked way out into the future, and got it right. That must be one of the best feelings in business — after all the criticism.

Feldman: It is an amazing feeling. What’s the German word — schadenfreude or something like that?

Saunders: Freud’s sister, right? Charlene Freud? (laughs)

Feldman: (laughs) Exactly.

Inference becomes the battleground

Saunders: Someone said to me recently, “Nvidia rules training — Cerebras rules inference.” Is that a fair summary?

Feldman: I wouldn’t say we’re kings yet — maybe a rising prince. Nvidia is the 800-pound gorilla, and you have to respect that. They’re not going to roll over.

But right now, there are no benchmarks showing anything other than we’re the fastest at inference. That’s a nice place to be.

Inference is where AI actually becomes useful. CUDA [Compute Unified Device Architecture] doesn’t matter in inference. Nobody uses CUDA there. Many people don’t even use PyTorch.

When a technology becomes useful, it leaves the small expert community and gets used by everyone else who just wants answers. If you’re building a web app and want a chatbot, you don’t want to think about CUDA any more than you want to think about credit-card clearing when you use Stripe.

Saunders: If I look 10 years out, I don’t see a consumer AI economy. I see an industrial one. Latency starts to look like five nines.

Feldman: That’s exactly right. In an industrial AI economy, delays are death.

Agentic AI, infrastructure and what comes next

Saunders: I’m working on a report about networked agentic AI. Does that matter to you?

Feldman: Very much so.

Agentic AI is just AI that takes an action. That doesn’t stop when it provides you with a result. For a coding AI, it will compile the code and then run it. You can ask it to compare and do audit steps for you.

This is when AI moves from being useful to being a necessity — when it has some agency to it. That’s when labor productivity goes through the roof.

Saunders: When I travel, it feels like LLMs are a North American obsession. China feels much more focused on machine learning, robotics, and infrastructure. Are we overweighted here?

Feldman: There are dangers on multiple fronts.

China has invested enormously in robotics and power infrastructure. The U.S. grid is a disaster — 1950s technology, sometimes older. They see robotics as foundational to the next industrial revolution.

World models — systems that understand the three-dimensional world — are critical. We haven’t had our “GPT moment” there yet, but it’s coming.

Saunders: What about quantum computing? Is that your next act?

Feldman: Quantum is fascinating — but it’s not going to help us in AI.

It solves problems we’ve never had tools to attack before. It’s not going to make existing problems better. And there’s a lot of BS floating around.

Saunders: I made the mistake of asking researchers at Bell Labs if we shouldn’t put quantum computers in space because it’s really cold up there. Apparently, it’s not cold enough, and this was a stupid question. Let me tell you, the nerds were angry that day, my friend.

Feldman: Yes — if you want to really annoy them, bring up the space data centers too.

Saunders: Last question — what’s next?

Feldman: We’ve got a lot of work to do. AI is becoming infrastructure. And fast AI changes everything.

Stephen M. Saunders MBE is a communications analyst and USPTO-registered inventor examining how digital infrastructure — 5G, cloud, and AI —is reshaping industry, power and society, as well as underpinning the emerging, ubiquitous global digital economy. As anchor of FNTV and a longtime industry insider, he focuses less on growth narratives and more on execution, risk and how hyperscale technology is distorting markets, governance and society at scale.