Back to all posts
Published on · by Renaud Deraison

Here is a fun game: you are a CTO and the AI bill just arrived

Uber burned its full-year 2026 AI coding budget by April. The CTO went back to the drawing board — not because the tools were bad, but because nobody could tie a single dollar of token spend to a single shipped change. The agents are fine. The visibility layer is the problem. Here is what that looks like, and what changes when every agent session is a structured record instead of a wall of scrollback.

Here is a fun game. You are a CTO. You have given your engineers a credit card with no statement, and now the credit card company would like a word. Uber's CTO put a number on the word last month: the full year's AI coding budget, gone by April. The interesting part wasn't the size of the bill. It was the second sentence — "I'm back to the drawing board because the budget I thought I would need is blown away already." Which, translated out of polite executive, is: we paid the invoice, we know exactly what it cost, we cannot tell you what it bought, and we would like to stop doing that.

Look, the modern way to buy software is to give engineers a thing that bills by the token, tell them to use it judiciously, and then wait. The Information reported last month that Uber's CTO, Praveen Neppalli Naga, did the waiting and discovered that his full-year 2026 AI coding budget was already gone in April, which, as schedules go, is ambitious. Claude Code usage at the company nearly doubled in a quarter; by March, 84% of Uber engineers were classified as agentic coding users, and good for them. Per-engineer spend ranged from roughly $500 to $2,000 a month, with the CTO himself burning $1,200 in a two-hour personal demo — which is roughly the hourly billing rate of a junior associate at a midtown law firm, who would at minimum produce a timesheet. Uber's overall R&D line was $3.4 billion in 2025, so it isn't that they didn't have the room. It's that the model went looking for more room than they had, found some, and billed for it.

The post you would expect from a security-and-isolation company at this point is "use a sandbox." This is not that post. We wrote that one already, because of course we did. This is the other half of the same problem, the part where the invoice arrives, you can read it, and you cannot answer the only question anyone wants to ask, which is: fine, but for what.

The invoice is fine. The invoice is the problem.

Per-seat pricing was very easy to think about, and you should appreciate the per-seat era now that it is gone. You bought 200 seats of GitHub Copilot at $19 each, you got a line item for $3,800 a month, and when someone asked "what did we get for that," the answer was a shrug shaped like the industry — "tab completions, probably" — and everyone went back to their meetings. The thing about a shrug is that it scales. A $3,800 shrug is fine. A $1.8 million shrug starts to attract attention from the part of the building that has a fiduciary duty.

Token-priced agentic tooling is a different shape, and you have to think about it with your face. Two engineers on the same team, working on the same kind of feature, can differ in spend by a factor of forty. (Forty.) One of them opens a fresh session, asks for one thing, gets the thing, closes the session — $50, done, goes home and makes pasta. The other lets a parallel pack of agents hammer on a refactor for six hours, retries when the build breaks, retries again because a different model thought the first retry was wrong, retries a third time because at this point it is personal. Both engineers shipped code. One shipped at $50 of model time. The other shipped at $2,000. The invoice does not say which is which. The IDE does not say which is which. Finance gets a single aggregate number, engineering gets a Slack thread saying "Claude was amazing today," and neither of those, with respect, is a measurement.

The thing that makes this hard isn't the price. It's that none of the spend is attached to anything you can audit a quarter later. And so a PR landed. The PR has a diff. The diff has commits. The commits have authors. The author is a human. The human used an agent at some point — maybe. Used which agent? Which session? Which prompt? For how long? Doing what? The trail goes cold somewhere around the terminal scrollback, which has been closed, because nobody scrolls back. There is a name for this in finance, and it is "shrinkage," though in software we have not yet bothered to name it.

TODAY — the invoice arrives, the trail goes coldCTO / FINANCEinvoice.pdf$1,847,302.18period: Aprilattribution: aggregateUNATTRIBUTED TOKEN SPENDdev-alice → 312 sessions / $284 (?)dev-bob → 41 sessions / $1,907 (?)dev-carol → 88 sessions / $612 (?)prompts: not retainedtool calls: scrollback onlyfiles written: unknownshell commands: unknownrepo touched: unknownmodel: known (per key)"Claude was amazing today" — slack, no further detailterminal closed at 6:14pm; transcript discardedfinance question: "what did this ship?" answer: ¯\_(ツ)_/¯SHIPPED THIS MONTHPR #4218 mergedPR #4231 mergedPR #4244 mergedPR #4257 mergedhuman author: yesAI involvement: ?session id: ?prompt id: ?trail ends here$ paid in full

The middle box is everything you bought. It is also the only thing you cannot see.

What May 2026 looks like in most engineering orgs: dollars flow out of the company in one big pipe, work flows out the other side in lots of little pipes, and the part in the middle — the part you actually paid for — is fog. The CFO can see the bill. The engineering manager can see the PRs. The thing connecting the two is a closed terminal window and a vibe.

Anyway, the shape of this problem is older than agentic coding, and if you have been in this industry long enough to be tired you will recognize it instantly. Cloud spend had the same arc. Five years of "the AWS bill is enormous and nobody knows why," then a generation of FinOps tooling started attaching dollars to teams and services and individual requests, and now the AWS bill is enormous and people know why, which, modest as that sounds, is the whole game. The fix was not to use AWS less. It was to make the spend legible.

Agentic coding is currently in pre-FinOps cloud. The spend is real, the tools are good, the productivity is genuinely there — none of that is in dispute. What's missing is the connective tissue between "this engineer, on this repo, in this session" and "these tokens, these commands, these files, this PR." Until that sentence exists, every dollar on the invoice is paid on faith, and the conversation with your CFO is one of those conversations.

What "visibility" has to mean, beyond the dashboard with the big number.

Everyone says "observability" and everyone nods, and almost nobody means the same thing by it. Coding agent vendors will happily show you a dashboard. The dashboard has token meters. It has adoption percentages. It has a slide-ready chart that goes up and to the right, which is the chart-shape of all software in 2026. None of this is an answer. "Your developers consumed 2.3 billion input tokens this week" is not an answer; it is a restatement of the bill in a larger font.

The thing a CTO actually needs, in order to defend a budget or raise one, comes in four parts.

Adoption you can show on a slide

How many of your engineers actually used a coding agent this week. For how long. On which projects, in which repos. Not seat licenses issued, not API key holders, not "people enrolled in the program" — sessions initiated, by a human, on a thing the org can name.

Every file the agent touched

Per session, per repo, per engineer — the exact set of files an AI agent created, modified, or deleted, with diffs, tied to the engineer who launched the session and the model they used. The unit of work is "a file mutation," not "a token." Tokens are how the vendor bills you. Files are what you have to ship.

Every command, every source

Every shell command the agent ran inside the session, every file it read, every tool it called, every API it hit. Captured live, retained centrally, queryable by team, by repo, by model. "What did the agent install yesterday" becomes a query, instead of an archaeology dig with a flashlight and a junior SRE.

The full dialogue, archived

Prompts, model responses, tool calls — the entire transcript. Somewhere stable. Reviewable by security, sampleable by engineering leadership, exportable into the same retention archive you already feed for email and chat. The session is now a record, not a memory of a thing somebody once typed.

Notice what isn't on that list. Token counts are not on the list. They are on the invoice. The point of the list is to put the invoice next to the thing it bought, so a finance team can do the division, an engineering team can stop arguing from vibes, and everyone can spend the meeting on the actual question.

How Bromure does the plumbing.

The agentic coding feature in Bromure was, in fairness, originally built for the security half of this conversation. Each coding agent runs inside a disposable Linux VM on your Mac. The VM has access only to the project folders you mounted; it has no SSH keys, no AWS credentials, no GitHub token sitting on disk. A credential broker on the host swaps stub tokens for real ones, but only at the wire, only for whitelisted endpoints. That is the story we told already, in which a poisoned npm package tries to walk off with your secrets and walks into a wall instead.

The observability story is the same plumbing, used for a different reason. The hypervisor sits between the agent and everything it touches. The agent doesn't open a file; the VM does. The agent doesn't run a shell command; the VM does. The agent doesn't make an API call; the proxy on the host does. Every one of those operations is — has to be, because the security half of the job demands it — a named event, with a timestamp, a session ID, an engineer identity, and a payload. Bromure already records all of this locally. It is called the Session Tracer and it ships with Agentic Coding today.

The piece that closes the loop for CTOs (and CFOs, and CISOs, and the procurement person who would like to know what they procured) is the cloud side. When the Bromure client on a developer's Mac is enrolled into your organization, those local traces stop being a local debugging aid and start being a structured record streamed to your Bromure enterprise server. Per engineer. Per session. Per project. Per model. Filterable, exportable, retainable. The product copy on the agentic coding page calls this "AI usage monitoring" and currently labels it coming soon, which, for the avoidance of doubt, this post is partly the reason for.

DEVELOPER MACS — one VM per agent sessionalice@laptopVM: alice-checkout-refactor agent: claude-code repo: monorepo/checkouttools observed: bash, edit, read, writefiles touched (so far): +8 ~14 -3 trace streaming…bob@laptopVM: bob-flaky-test-hunt agent: cursor repo: monorepo/apitools observed: bash, run_tests, editfiles touched (so far): +0 ~2 -0 (4h running) high spend / low diffcarol@laptopVM: carol-docs-pass agent: codex repo: monorepo/docstools observed: read, writefiles touched: +0 ~31 -0 tight spend / tight diffHYPERVISOR + CREDENTIAL PROXY — every file, every command, every API call is a named eventBROMURE ENTERPRISE SERVER — structured per-session recordsession_idengineerrepomodelfiles Δcmdsspends_8a31… alicecheckoutclaude25312$ 47.20s_4f02… bobapicursor21,184$1,907.55s_b71d… caroldocscodex3138$ 12.04s_3c8e… alicecheckoutclaude1194$ 18.90SELECT engineer, repo, SUM(spend), SUM(files_changed)FROM sessions WHERE week = '2026-W19' GROUP BY engineer, repoexportable as CSV · JSON · CSV-for-finance · prompts-for-securityCTO / FINANCE / CISO$/PR shipped $63.40 medianspend per repo checkout $14.2k api $32.8koutliers bob/api: high $ low Δthe question now has an answer
The same picture once you've put a hypervisor in the middle of it. Each agent session runs inside a per-developer VM. The hypervisor and the credential proxy already see every shell command, every file, every outbound request — they have to, in order to do the security half of their job — and the enterprise server is where those events land in a table you can finally divide the invoice by. Claude Code is still Claude Code, Cursor is still Cursor, Codex is still Codex; the agents are unchanged. The middle box, the one that used to be fog, is now a database.

The reason the events are reliable is not a sidecar inside the agent, and it is not a wrapper around the model API, both of which the agent could in principle work around if it felt like it. The events are reliable because the agent is running inside a VM whose hypervisor has to know about every command and every file regardless of whether anyone is asking it to. The observability is, in a sense, free. You already paid for it. The bill said "isolation." The thing you got is also a log. And so you get the same record whether the engineer used Claude Code or Cursor in CLI mode or Codex or Aider or some internal agent we have never heard of — because the unit of recording is "thing the VM did," not "thing the vendor decided to expose this quarter."

What the conversation a quarter from now sounds like, instead.

The Uber number is a useful one to think with, because nothing about it is a failure. 84% adoption. Code shipped. A CTO who actually rolled up his sleeves and used the thing, which, by the way, more CTOs should do, even at $1,200 a demo. What broke was the conversation a quarter later. "Should we double down? Pull back? Tier our seats? Push people to cheaper models for some classes of work?" None of those decisions are makeable from an aggregate token bill. They are all perfectly makeable from a per-engineer, per-session, per-repo table. The decisions did not get harder. The table got removed.

A handful of questions that become answerable, more or less in the order that we hear them from people who are trying to defend a line item:

  • Which engineers are getting outsize value, and what are they doing differently? Look at the engineers whose spend-to-files- changed ratio is in the good corner of the distribution. Read a few of their session traces. Some of what they are doing well is style, some is technique, some is just which model they reach for. None of it is visible without the record. ("How does Alice ship so much code for so little money" is the kind of question that produces a useful internal lunch-and-learn, but only if Alice's sessions are not a closed terminal somewhere.)
  • Is the spend concentrated in one team, one repo, one kind of work? If 60% of your AI bill is coming from a single legacy service and the agent is mostly thrashing on flaky tests, that is a finding. It is also not a finding about AI.
  • Which projects are productive with which model? "Claude is better than Cursor" is a tweet. "Claude shipped 3x more file changes per dollar than Cursor on our front-end repo last month, and the reverse on our Go service" is a procurement conversation.
  • What did the agent install, and where? This is the security team's question, and it's the same query against the same table. Every npm install, every pip install, every apt-get the agent ran, per session, per repo, filterable by package name. The day a poisoned package shows up on a registry — which, gestures vaguely at most weeks of the calendar — the question "did any of our agents touch this" is a WHERE clause instead of a fire drill.

The thing worth being careful about: none of these questions are answerable from inside the model vendor's dashboard either, and they cannot be. The vendor sees tokens. The vendor sees, sometimes, prompts and completions for the duration of a request. The vendor does not see your repo, your file mutations, your shell commands, or which engineer in your org typed which thing. The vendor cannot. The vendor is on the wrong side of the wire. The visibility layer has to live on your side — on the developer's machine, in the VM, in your enterprise server — because that is where the work is actually happening. Token meters are a postcard from the field.

Disclaimers, because of course.

A few honest ones, because the worst version of a post like this is the one that promises an end-of-history finance report and then ships a dashboard.

Bromure's record tells you what the agent did. It does not tell you whether what the agent did was good. A session that wrote 40 files and shipped a PR can still have shipped 40 bad files. The record makes them easier to find, easier to talk about, and easier to revert. It does not, by itself, review them. Diff review is still on you, and on the human in the loop, and on the very tired senior engineer at 5pm.

Bromure's record covers what the agent does inside the VM. It does not cover what the engineer does in their head before they open the agent. The five-minute conversation an engineer has with Claude in their IDE before the actual session starts is not on this tape. That is a real gap. It is not one we pretend to fill, because we cannot fill it without a much weirder product.

Bromure's record is on your side of the wire, which is the entire point — but it also means your side of the wire is where the retention policy, the access controls, and the data-handling contracts have to live. Prompts include code. Code includes secrets your developers shouldn't have pasted in but absolutely did. The record is as sensitive as the work, and you should treat it that way. The storage is yours; the responsibility is yours. ("There is a name for this in finance, and it is 'shrinkage,'" except now the shrinkage is a SQL table and you can audit it.)

And finally, this is one piece of a bigger story. The other piece is the one we already covered — that running coding agents inside a VM with a credential broker is how you stop the next poisoned npm package from walking off with your SSH keys. The security story and the observability story are the same plumbing. They just file expense reports on different days.

Pay the invoice. But pay it knowing what it bought.

The Uber line — I'm back to the drawing board because the budget I thought I would need is blown away already — is going to be a lot of CTOs' line this year. It was always going to be. Per-token pricing on tools whose appetite is bounded only by how ambitious the model feels at 2pm is going to produce that sentence at every company that adopts it without instrumenting it. The agents are not the problem. The plumbing is.

Bromure Agentic Coding, reluctantly, is what the plumbing looks like when it does its job. Each agent runs in its own VM, each session is a structured record, each record streams to a place your finance team and your CISO can both query, and the agents you already love stay exactly as they were. You will still pay the invoice. You will just, finally, know what it bought — which, the last time anyone checked, was the bare minimum a credit card company asks of its customers.