Why FinOps for AI Is No Longer Optional

AI workloads have shattered the logic that once governed cloud cost management. They don’t scale linearly, behave predictably, or expose their decision-making. From a FinOps perspective, the ground has shifted: the model is no longer a consumer of resources it has become a decision-maker, silently spending money on your behalf in a not so transparant way.

And the cloud vendors? They haven’t exactly brought clarity either. Their pricing is vague to say the least built around tokens, abstract usage tiers, and hidden GPU allocation. But to be fair, it’s likely just as unclear to them. How do you make AI usage truly transparent when even you, as the provider, don’t have a clean way to map compute cycles and model selection to real cost?

Wall Street wants profits on AI or the bubble bursts. Some of you know what happens when stock market bubbles burst. It is not the best news in the world ;-). In this new arms race for intelligence, the hyperscalers themselves are struggling to invest in massive infrastructure with sustainable margins. They need consumption to rise and visibility isn’t part of that equation (for now).

The Windowless Restaurant

Think of an AI service as a windowless all-you-can-eat restaurant. You have plenty of food on the buffet and you can eat whatever you want. The choice is enormous, so the perfect spot for an individual to decide what will give an satisfying answer to the hungry lion in their stomach. So the story continues… Hungry customer goes into the restaurant and out comes a satisfied one. Without windows you cannot spy on the customer and see what put the lion back to sleep.

Was it a grilled cheese or a kilogram of caviar? Same goes for AI questions, whatever the question you hit enter and it goes into a black box. The service decides how it will answer. Out of the black box comes an answer… What happend inside nobody really knows. Each call looks cheap fractions of a cent per thousand tokens but under the hood, the system autonomously chooses which model to use, how much data to load, and how much compute to consume.

From a FinOps standpoint, it’s economic madness: a variable cost per transaction, no predictable logic, and no guaranteed value. A Finance director once told me breadcrumbs is also bread. Which was her way to say small amounts add up to big piles of money in the end.

The Four Layers Where Costs Multiply

To understand why AI spending spirals, you need to know where the costs actually live. AI workloads have roughly four interdependent layers, each amplifying the other.

  1. The Infrastructure Layer: CPUs, GPUs, memory, and networking.

    The physical layer. In managed services you rarely see it, but it’s there GPU bursts, parallel inference, autoscaling clusters. Costs can double in minutes.

  2. The Data Layer: what’s being read and how wide the context is.

    Every request loads prompts, embeddings, and documents.

    A broad context might improve quality but also multiplies tokens. Unbounded retrieval often means scanning entire data lakes for a single answer.

  3. The Model Layer: the economic multiplier.

    Earlier generations were named Curie and Davinci; now we have GPT-4o, Claude 3, Gemini 1.5, and Llama 3 70B.

    Heavier models deliver nuance and reasoning at several times the price per token.

    Model right-sizing is the new compute right-sizing.

  4. The Inference Layer: how often and how long models are invoked.

    Chatbots that lose context re-ask the same question. Pipelines retry automatically. One small feedback loop, and your budget is gone.

Together, these layers create cost amplification: every attempt to optimize accuracy increases or decreases the computational and data load beneath it.

The Accountability Paradox

In traditional FinOps, you know who spun up a VM or attached a disk. With AI, that transparency collapses.

  • FinOps and finance teams have no insight into which models, parameters, or prompt strategies developers choose.

  • Data scientists and engineers optimize for accuracy, not cost.

  • Business owners celebrate user growth — unaware that every new query inflates spend exponentially.

The result is a governance black hole:

Those who create value don’t know the price. Those who pay the price don’t understand the value.”

I’ve seen it firsthand, still have some warscars reminding me about this. A client burned through an entire project budget in four days because nobody had a kill switch, alerting, or budget caps in place. The service just kept eating the buffet. In this case it was a learning algoritm that had much to learn over the weekend. The hunger of learning the full dataset burned all the money for the project. It puts your feed on the ground really fast after a recharging long weekend ;-).

Putting Windows in the Restaurant

FinOps for AI isn’t about slowing innovation, it’s about making consumption visible.

You need to design transparency into the system itself:

  • Define the menu: limit datasets and context windows so models can’t crawl entire archives for a single answer.

  • Portion control: set caps on token count, runtime, and request frequency.

  • Deterministic routing: specify exactly which models are allowed, and block auto-escalation to premium variants.

  • Event-based monitoring: real-time alerts on anomalies, not a monthly surprise in the invoice.

  • Traceability: every model call, dataset, and output must tie back to an owner and a business purpose.

  • Put a Human in the Loop: say hi to flesh and blood. Sometimes it is ok to route the answer to a human. Ok the cost might be higher. However a human might know this as the grey mass normally still absorbs things that are usefull to the answer.

Without a clear link between cost and value, you’re not doing FinOps — you’re just processing bills.

The Maturity Gap

Few organizations are ready for this. FinOps for AI demands alignment from the top down and the bottom up:

  • Executives who recognize AI cost as strategic risk.

  • Engineers who understand that efficiency has a financial dimension.

  • FinOps teams who adapt tooling to token-driven economics.

It’s not just about maturity it’s about shared accountability. AI’s speed is fine, as long as everyone understands what’s accelerating.

Before the Buffet Opens

Cloud vendors earn money per token, not per efficiency. Their incentive is volume. If you don’t set the menu, someone else will and they will profit from every extra bite.

FinOps for AI isn’t a brake on innovation. It’s the seatbelt. We all know that Volvo opensourced the seatbelt as they understood it would safe a lot of lifes! So think about that statement when looking at AI ;-).

FinOpf for AI is what keeps experimentation from turning into a fiscal black hole. It’s the discipline that turns AI from a cost center into a predictable value engine. Real value measurable in units or whatever metric you come up with to proof it is not money down the drain but drives new money into your organisation. Until we start putting windows in the restaurant, AI will remain dazzling on the outside and dangerously expensive on the inside.

If you want to learn more about this topic I can highly recommend the Finops.org website as this question is something that keeps the community busy in trying to find appropriate answers to this question. You can also invest some money in learning about the topic. For now, I must say the available modules lack some real depth into the topic but nevertheless good to learn on the topic as it will be super important in the future to understand this better in your day to day FinOps life! Off course, happy to think with you about the topic and share some thoughts about it… if you want a coffee chat happy to set it up.

Next
Next

Amsterdam - FinOps X Day