The AI model market is expanding rapidly: the number of major AI models released annually has grown more than 10x since 2019. In the experience of Benny Yufei Chen, co-founder of Fireworks AI, chasing the latest models has become a costly distraction for organizations. As enterprise leaders work to operationalize AI, the bottleneck Chen sees is rarely in model intelligence, but in operational alignment.
After years of working on PyTorch infrastructure at Meta, Chen launched Fireworks AI to help companies run and optimize their large language models (LLMs) without sacrificing adaptability. Having watched hundreds of AI systems move from research into production, he's noticed a pattern: organizations tend to underestimate what it takes to operationalize AI, and often to their detriment.
"We're here to help businesses scale AI sustainably—so they don't scale into bankruptcy," Chen says.
In this episode of MindMakers, delight.ai CEO John Kim sits down with Chen to unpack what it really takes to scale AI in production, and the four requirements most organizations overlook.
Requirement 1: Inference efficiency
The economics of enterprise generative AI are becoming a serious concern. According to Andreessen Horowitz, model inference costs can account for up to 80–90% of the total cost of running large AI systems in production. As usage grows, these costs can outpace the value the system generates. Most organizations can build a good AI pilot, says Chen, but fewer ask whether they can operate it sustainably at scale.
Meanwhile, token consumption is rising fast. Currently. Fireworks AI processes trillions of tokens per day across its thousands of customers, and Chen believes demand could grow dramatically as companies move toward AI-first workflows. What appears like a simple interaction with an AI agent can involve dozens of internal reasoning steps and tens of thousands of tokens behind the scenes.
“Inference was always taking the lion’s share,” he says. As organizations move from experimentation to production, Chen insists they must consider computational efficiency.
Requirement 2: Model optionality
Chen always reminds organizations: today's best model will soon be surpassed. This creates a genuine strategic challenge for leaders: how to pick the right model for now while staying adaptable in a rapidly evolving market?
"It's very hard for buyers to pick which model to go for," Chen says. Committing too deeply to a single provider poses real risks: model switching can mean rewriting AI prompts, re-running evaluations, and validating that AI performance hasn't regressed since deployment. This is why Fireworks AI supports a broad portfolio of frontier, proprietary, and open-source AI models, allowing customers to experiment and adapt without unnecessary lock-in that they may regret.
Requirement 3: Disciplined AI evaluation
AI systems don’t just need to be scalable and flexible; they also need to be trustworthy. This requires rigorous AI evaluation, yet Chen finds that most organizations skip this step. "When talking with our customers, we find a large majority of them don't have evals," he says. Gartner predicted that over 60% of generative AI deployments would fail to meet expectations due to poor governance and inadequate evaluation, and this prediction is bearing out in 2026.
Evaluation frameworks, such as delight.ai’s readiness framework, establish the feedback loops teams need to test, iterate, and deploy AI safely. Without them, organizations can't answer basic questions: Is the new model actually better? Did a recent update introduce regressions? Is the system improving over time? As AI systems grow more complex, that lack of measurement becomes a significant blind spot, hiding serious risks and costs.
Requirement 4: Organizational alignment
Even the best-designed AI system will underperform without the right people and processes to support it. The biggest gains from AI come from redesigning existing workflows around AI automation, not “bolting it on”. However, Gartner research shows 60% of organizational change efforts fail to meet their goals, largely because teams underestimate the human and operational work involved.
Chen encourages organizations to let AI do its part. While AI can only automate what's been clearly defined by humans in quality documentation, he often reminds companies: "Don't try to put yourself on the hook for repetitive work. Make a bot be on the hook for it," AI lets teams accomplish more in less time, especially when encoding skills and designing AI agents. “It is now much easier to be a 10× engineer than before,” Chen says.
The companies that benefit most from AI don’t simply give employees new tools—they redesign work itself as they redesign their AI workflows. Doing so helps ensure operational and cultural alignment, plus ensure better AI performance over time as human teams grow their skills in parallel to these new AI systems.
Enterprise AI success begins with genuine readiness
Each of these overlooked requirements to operationalize AI—efficient inference, model flexibility, robust evaluation, and fully redesigned workflows—points to a larger theme: AI readiness. Readiness is a measure of an organization's preparedness to adopt AI safely and sustainably across its technology, operations, and culture.
According to Cisco’s 2025 AI Readiness Index, organizations that are highly AI-ready are four times more likely to move pilots into production and 50% more likely to see value from their investments. Readiness makes AI innovation a repeatable, cross-functional process, helping teams stay agile in a fast-moving market by sidestepping the pitfalls associated with failed AI projects. And ultimately, it's what builds trust.
"People watch you for your consistency," Chen says. "If you're not consistent, it is very hard to win trust." That consistency comes down to four things: infrastructure that doesn't fail, evaluation systems that catch problems early, workflows designed for repeatability, and teams that know how to supervise automated systems. When these elements are in place, supporting model intelligence, organizations gain the confidence to automate more complex workflows from a proven foundation of responsible AI that operates safely and reliably at scale.
To learn more about how Fireworks AI scales AI models with maximum efficiency and flexibility, listen to the full MindMakers podcast on Apple Podcasts or Spotify.
Wondering if your organization is truly AI-ready?
Take the AI readiness assessment for CX from delight.ai. Or contact sales to learn more.




