The seven hidden costs that decide whether your AI is worth running

A margin waterfall. Per-outcome revenue starts at 100 percent, then seven production-AI cost buckets each chip it down: build and integration, inference and hosting, orchestration, human-in-the-loop, failures and retries, compliance, and maintenance. What's left is roughly 15 percent margin, far below the roughly 85 percent margin a SaaS business used to keep. Proportions are illustrative. — The SaaS margin waterfall: seven production-AI cost buckets chip per-outcome revenue down from what software seats used to keep. Proportions are illustrative. Tap to open full size.

Read the latest tech commentary, and you might think we are witnessing the end of Software as a Service. But the obituaries for SaaS are missing the real plot twist.

We aren't witnessing the end of software subscriptions. We are witnessing the complete rewrite of the unit economics of end-user applications.

For the past decade, the SaaS model has been built on a beautiful, predictable foundation: build it once, sell software seats infinitely, and enjoy 80-90% gross margins. But today, the paradigm is shifting from selling software seats to selling labor outcomes.

When you make that shift, you absorb the variable compute costs of actually doing that work. You are no longer just providing the digital desk; you are providing the digital worker. Goodbye, 80-90% gross margins. Hello, variable cost nightmares.

Where the cost actually comes from

I regularly speak with engineering leaders and founders who are successfully deploying AI internally. They are seeing leverage, but at what cost?

When they scale these agentic workflows using frontier models, the costs skyrocket. The biggest challenge in enterprise AI today isn't proving its value; it's balancing the immense leverage AI provides against its compounding, variable costs. The leverage is real, but so is the bill. And the question is the same whether you're building AI in-house or buying it from a vendor: how do you know what this costs to run at scale?

If you want to survive the transition to agentic workflows and protect your margins, you have to stop looking at flat subscription fees and master the seven hidden cost buckets of production AI.

1 Build & Integration (Amortized)

Moving from a weekend prototype to a production-grade agent requires heavy upfront engineering. It is not just about API calls; it is about deep enterprise system integrations, building custom infrastructure so business teams can securely build their own agents, and implementing rigorous safety controls. This foundational work is a massive capital expenditure before a single user outcome is delivered.

2 Inference & Hosting

Raw API and GPU costs spiral quickly. Massive context windows and models that burn "hidden reasoning tokens" to think through complex problems will eat your margins alive. As organizations scale, the difference between running a specialized, internal model versus hitting a premium frontier model for every minor query becomes the difference between profitability and bankruptcy.

3 Orchestration Overhead

Multi-agent workflows act as a direct cost multiplier. Because Agent A's output becomes Agent B's input, your token consumption compounds with every back-and-forth interaction. It is all fun and games until two autonomous bots get stuck in a polite, infinite feedback loop and you wake up to a $47,000 API bill.

4 Human-in-the-Loop Labor

Autonomous AI is not 100% autonomous. In these early stages of enterprise AI adoption, you simply cannot leave agents to make final, high-stakes decisions alone. The cost of human experts reviewing outputs, handling escalations, and fixing edge-case errors adds expensive, manual labor right back into your margins.

5 Failures & Retries

There is a false economy in using "cheap" models. A cheaper, less capable model that hallucinates or fails, requiring three automated retries to get a task right, often ends up costing far more in total tokens and debugging time than an expensive model that nails the workflow on the first attempt.

6 Compliance & Security

For enterprise adoption, governance is everything. To allow internal teams to actually use these agents, you must build systems that comply with strict data privacy laws. The infrastructure required for data de-identification, perfect audit logs, and maintaining a secure environment often exceeds the actual inference costs of the models themselves.

7 Ongoing Maintenance

Production AI is never set-and-forget. It requires continuous MLOps, prompt tuning, and rigorous regression testing as underlying models update, drift, or get deprecated.

The bottom line

Whether you are building enterprise software or consumer apps, the fundamental math of software development has changed. If the price of your product is still based on a per-software-seat model, while your backend is operating as a variable-cost digital workforce, the math is eventually going to catch up with you — regardless of how many VC rounds you have raised.

The teams that come out ahead won't be the ones with the cleverest prompts. They'll be the ones who ran the real cost math before they committed, and built only what they genuinely needed to own, bought the rest, and walked away from the parts that were never going to pay off. That math is a toil that most teams excited about AI skip until the bill arrives.

Run the cost math before you commit

Knowing what an agentic system costs to run at scale, and which parts are worth owning versus buying, is the diligence I embed to do: the unit economics, the architecture underneath them, and the build/buy/walk-away call, made before the bill arrives.

Book a 30-minute qualifying call

Stay Updated

Subscribe for frameworks and engagement briefs on production AI, agents, and governed autonomy.