Return on Tokens (ROT)
Markie Wagner Co-Writes
Welcome to the 192 newly Not Boring people who have joined us since Monday! Join 269,285 smart, curious folks by subscribing here:
Hi friends 👋,
Happy Wednesday and welcome back!
A couple of months ago, my friends Adam and Ben at Genius Ventures asked if they could introduce me to one of their favorite founders, Markie Wagner.
The Markie Wagner? The Choose Good Quests Markie Wagner? The drop an all-timer then go quiet for years, cooking up something spoken of in hushed tones Markie Wagner? The grew up inside of a computer and dreamed as a young girl in Southern California, seriously, of making computers do the work that humans shouldn’t have to Markie Wagner?
Of course I wanted to meet Markie Wagner.
So we met a month ago at Soho Diner and I ordered a milkshake and she asked them to cut up a bowl of fruit. She asked for my lore, which was boring, and I asked for hers, which she weaved non-stop for the next hour, landing so naturally on why she’s building what she’s building that it seemed almost pre-destined.
She also told me, before everyone else came to the same conclusion, that tokenmaxxing was bullshit, because behind closed doors, the Fortune 500 CEOs she works with were all saying some version of “We committed to all this token spend and I have no idea what we’re getting out of it.”
She was right, I think she’s going to be right again, she’s backed by Founders Fund, Kleiner Perkins, Genius Ventures, and OpenAI to go prove it, and now she’s explaining her logic publicly in her first written piece since Good Quests.
So this is where we are heading, according to Markie Wagner.
Let’s get to it.
Today’s Not Boring is brought to you by… Deel
New to global hiring? Start here.
Hiring internationally is complex. Learn what an Employer of Record is and how startups use EORs to hire global talent compliantly.
Return on Tokens (ROT)
Co-Written with Markie Wagner
The promise of AI is that it will turn businesses into software so that they can evolve over millions of tiny iterations. Beautiful, ideal, complex things can only emerge as the result of tremendous trial and error over time. You cannot build perfection, only discover it.
Capitalism is organizational evolution. Millions of businesses compete in the marketplace with offerings that they think customers will want. Some thrive and grow. Others die. Each company evolves, too. People come and go. An experiment becomes a process, a process becomes a web of tacit knowledge. Products are introduced, and products are retired.
This constant evolution is why we enjoy the standard of living we enjoy today, and why ours will look primitive to future generations. Accelerating it is my Good Quest, because if every business can evolve to its ideal form, it will create trillions of dollars of value and unblock all of the other Good Quests.
I dropped out of the research world because it felt like the wrong hill to climb, and I went out into America to just do work, so that I could figure out how to make computers do the work, so that humans could direct the computers in evolving the work.
What I suspected before and learned in my travels is that the way that the market has implemented AI thus far is the wrong way. It’s not endgame. It is too wasteful, too forgetful, and too imprecise. I’ve been in the fucking Sahara Desert out here fighting demons to learn this wisdom.
Tokenmaxxing Clearly Isn’t It
Tokenmaxxing - literally maximizing the amount of tokens you or your organization spends, tracked in leaderboards and rewarded with trinkets - was a mass delusion, something like a commercial form of AI psychosis.
Tokenmaxxing was a lab-grown supermeme that worked better than the labs could have hoped.
Picture this. Anthropic and OpenAI release a product, Agents, in the form of Claude Code/Cowork and Codex, respectively, that are basically lab employees working inside of customers’ companies and are given company credit cards with no spending limit (tokenmaxxing suggests the more they spend the better they’re doing) to spend on behalf of their real employer, the lab. Anthropic ships a bunch of Agents into, say, KPMG, which commits to a certain spend in exchange for discounts (token commits), KPMG’s employees are encouraged to use Agents to do everything they can possibly think of (lots of dashboards), and then these Agents, which again you can think of as digital Anthropic employees with no-limit KPMG credit cards that they can use to spend on Anthropic, run up token bills to their heart’s content. Employees who direct their Agents to use the most tokens are recognized as AI Innovators.
Certainly, some people recognized that it was a delusion. They would ask questions like, “But are the Agents doing anything useful? Aren’t they just building dashboards? Please can someone show me something useful they’ve built with an Agent?” but those sane few were met with the killer retort: “Skill Issue.”
Some people, they were told, were building immensely valuable things with Agents, the same way that some people had a super hot girlfriend at summer camp but you’ve never met her. If you couldn’t figure out how to do the same, well, welcome to the Permanent Underclass.
Everyone fell for it, for a while. The market incentivized companies to spend tokens, so boards incentivized leaders to spend tokens, so leaders incentivized managers to spend tokens, so managers incentivized employees to spend tokens. Nobody had an incentive to say that the tokens aren’t doing useful stuff.
I talk to these people all the time and every company has some version of the same conversation. Someone who’s running the AI team goes, “We’ve made a ton of progress this quarter. We spent $50 million on tokens.” And everyone nods and claps. “Usage is up. We’ve built 3,000 Agents. We shipped 10 million lines of code.” And you’re like… what? And then I’d ask, “Hey did you measure accuracy for the fraud Agent?” And they’d go, “Yeah… it’s about 50%.” People are at 99%!
But the models improve and everyone’s tokenmaxxing and you don’t want to be the luddite, so you keep lazily throwing Agents at everything and hoping they learn.
All of this happened, by the way, right as the labs switched from subscription-based to consumption-based revenue models, so companies had no time to prepare.
It is no wonder token usage, and therefore lab revenue, went parabolic.
It took Uber, that last era’s poster child of VC-subsidized demand, to break the spell. Its CTO said that the company had burned through its 2026 Claude Code token budget by April. In May, its COO said that the company was having a harder time justifying its AI spend, because the link between AI consumption and shipped features “is not there yet.”
What followed was like that scene in Mean Girls where Tina Fey asks the students to raise their hands if they feel personally victimized by Regina George, and one hand goes up, then all of the rest of the hands go up.
There was the consultant saying his client accidentally burned half a billion dollars on Claude Code. Amazon shut down its AI leaderboard. Legora CTO Jacob Lauritzen told Harry Stebbings that token leaderboards “lead to tokenmaxxing, which is people just burn tokens just to look good. That’s a really stupid way to do anything.” Ramp’s Veeral Patel called it the Token Casino: “useful software wrapped in mechanics that make spend feel like progress. It starts with the oldest trick in the book: abstract the money.” Palantir CEO Alex Karp told the TBPN boys that tokenmaxxing is like “a porn addiction.”
Even Sam Altman, a prominent token vendor himself, admitted on CNBC that “You hear companies saying, ‘I am spending a ton of money on AI, and I know some great stuff is happening, but I know there’s a ton of waste, and you know, when… how long do I have to wait for it to really show up in revenue, and how long do I have to wait to really get the costs under control?’” It had become, he admitted, a “huge issue.”
The issue is the companies have focused on maximizing tokens, assuming that tokens = value.
Every cycle has its dumb metric. In the mid-nineteenth century, the market wanted miles of railroad track as a proxy for future monopoly and the benefits thereof, and so railroads raced to lay miles, often along the same routes as competitors. At the turn of the 21st century, the market wanted eyeballs, and so dot coms attracted eyeballs and served them up on a platter. In the 2010s, the market wanted top-line gross revenue, and so companies like WeWork delivered top line gross revenue.
This cycle has tokenmaxxing.
Which is not to say that tokens can’t be valuable. Cornelius Vanderbilt’s New York Central ended up becoming very valuable, as did the Pennsylvania Railroad. Google and Facebook have converted eyeballs to cashflow better than anyone has ever converted anything to cashflow. Uber ended up turning top line growth into market dominance and turning that into $10 billion in 2025 free cash flow.
The question is always: can the thing generate returns?
For tokens, the question is: what is your Return on Tokens (ROT)?
Return on Tokens
When you invest in a new machine, you expect it to generate a return. When you hire an employee, you expect them to generate a return. Business is the process of making investments big and small and expecting them to create more value than they cost.
Tokens need to be held to the same standard.
Return on Tokens = (Value of Output - Cost of Tokens) / Cost of Tokens x 100
There are two ways, then, to increase your ROT. You can create more valuable things with them, or you can spend less on them. Ideally, you spend less to create more value.
The first thing that companies are focused on, because it is easier to measure than output value, is spending less.
Now that the spell has been broken, cooler heads are proudly discussing “routing” as a means to lowering the cost. Use Anthropic and OpenAI’s best models for the really big brain stuff, but do most of the work with cheap Chinese open source models. Coinbase CEO Brian Armstrong’s recent tweet is a good example of this logic:
You can see this in the OpenRouter AI Model Rankings. The move to Chinese models actually showed up in lockstep with consumption-based pricing, although this is a self-selecting group of users that were already thinking about routing tasks to the right models. The rest are scrambling to do the same now.
It’s a good start, but Agents spending tokens, American or Chinese, to figure everything out from scratch is not endgame, either.
Because you know what’s cheaper than Chinese models? Code.
Code, good old fashioned deterministic code, is not only cheap, it is a better fit for most economically valuable work. We have learned this lesson.
In the past, companies hired humans to do all manner of repetitive tasks. Before “computers” were digital, they were humans.

About half a century ago, we began the process of taking repetitive tasks that humans did, like calculating the trajectory of a missile or the profits of a business, and handing them over to software. Code ran more reliably than even the most reliable human computer. It made no mistakes. It answered instantly. Enter the same numbers and same formulae in the same cells in an Excel spreadsheet anywhere in the world, at any time, and it spit out the same number.
Then we got Agents, and we forgot the lesson. We decided that we needed to throw these pseudo-humans at everything, because everyone else was. Agents are great at some things, but they’re not the right shape for a lot of others.
It’s no wonder companies aren’t getting a positive ROT on their tokens. All the dashboards have been dashboarded, and now they’re sending Agents to do software’s job.
There is an argument to be made that a lot of companies aren’t getting a positive Return on Tokens because they don’t know how to use them yet, or because they haven’t re-architected their companies to be AI-native yet. This is one of the reasons, perhaps, that both Anthropic and OpenAI have launched consulting subsidiaries to help companies better deploy tokens. And there are certainly examples of startups built during the AI era that seem to be using tokens to great effect, which shows up in their own supersonic revenue growth. If the Old Economy can’t generate a ROT, well, this is creative destruction baby.
And while it’s certainly true that not all companies are deploying AI equally well, we believe that there are more fundamental structural reasons for the negative ROT: Agents are the wrong architecture for most work.
Agents Aren’t It (For Most Work) Either
There are three structural reasons for Agents’ negative ROT:
The Agentic architecture can’t do long-running work at the nines of quality that real economic work requires.
Agents improvise. They’re spawned fresh onto repetitive tasks like every day is their first day on the job, which hurts consistent accuracy. For new features, prototypes, or dashboards, 80% accuracy is fine. For the real repetitive work on which the economy runs, like fraud detection or underwriting decisions, 80% accuracy is 0% usable.
Engineers don’t know what to build because they don’t do the work.
Most of the process-driven work we’re describing exists as a combination of written rules, which Agents can ingest, and then like 3,000 tacit rules and sub-rules that live in people’s heads, in offices around the country, far away from the engineers’ San Francisco desks. AI can only evolve what it can touch, which is why it’s been great at coding but has largely failed to do useful things in the enterprise.
The original sin is that there are no goals.
If people have no goals then the Agent has no goals, and then the thing achieves no end. Without a goal to hill-climb against, code (whether written by humans or generated by Agents) decays into slop in the limit because there’s no purifying force to evaluate what’s good and bad.
One of the beautiful things about Agents, from a laziness perspective, particularly when you are being encouraged to spend a lot of tokens, is that you can just set them loose without knowing exactly what it is that you’re solving for. They can go spin on a vague instruction for a while, bring something back that’s decent but not perfect, and then go out and spin some more.
This process drives more token spend without delivering any value, which is a fast track to negative ROT.
People are searching for new things for Agents to do assuming that AI will do for everything else what it’s done for code. But it doesn’t have to.
There is a surprising amount of work that is best done with plain old code. The challenge has been that, until recently, there were not enough engineers to turn everything every business does into code, and then update it as things changed. There are now. AI makes writing code trivial, so if we can get the knowledge out of people’s heads, we can turn businesses into code.
AI is a Compiler, Not a Runtime
Basically, software works in two steps, thinking and doing.
First, thinking: you take the goals and requirements for what a piece of software should do and compile it into code that a computer can run.
Then, doing (and doing and doing and doing): every time it needs to do the thing, the code runs cheaply and predictably (or deterministically).
While computer science has a precise definition of a compiler, you can also think of a software company or a software engineer as a compiler. They take the goals and requirements and turn it into code. Then customers buy the code they built, and run it over and over again. This is the beauty of the zero marginal cost software business, and it’s why companies can sell software that took millions of dollars to develop for $20 per month and still generate mouthwatering margins.
The way that most people think about (and use) Agents today is that they replace both the software company and the software. That is the wrong way to think about it.
Agents should replace the software company; they should take goals and requirements in English (or whatever language), and turn it into code that runs over and over and over, deterministically.
Thinking is expensive but happens rarely. Doing is cheap and happens forever.
Agents should do the thinking, code should do the doing.
For most economic work, you want to use humans to figure out the rules, use AI to turn the rules into code, and then run that code forever at near-zero token cost, only bringing the AI back in when the rules change.
Why would you use a prompt to add two numbers? Just write a line of Python, dog.
The current Thinking-Doing Ratio (TDR) in AI implementations is roughly 1000:1, which is not surprising. San Francisco is a Thinking town. Anthropic’s hats say, simply, “thinking.”

Silicon Valley built AI assuming work is mostly thinking, but work is mostly doing.
Chat is the rare exception where you genuinely don’t know what comes next. So maybe customer support chat continues to churn through thinking tokens (although even customer support Agents kick complex problems to humans). Almost nothing else in a business looks like constant improvisation.
So we use Agents for Doing, but Agents are the BlackBerry of doing. They are not where most work will get done inside of companies in five years’ time. It will get done in the deterministic code that they write.
The Agents’ role is in compiling into code, not into running and doing work day to day. Which means that it’s more like CapEx than OpEx. Everything you think is gonna be AI running is just going to be code running.
Everyone thinks the thing that is going to change in the world is that AI is going to become a person, but the real change is that a business is going to become a piece of software.
That’s the world we’re building at Poetic.
Turning Businesses Into Software That Evolves
We are building the antidote to tokenmaxxing: software that tokenminns itself.
Poetic is a new class of software: adaptive like AI, reliable like code.
We use AI as the compiler. We learn everything that a business does by taking in all of the processes that are written down, then going on-site in Nebraska or Providence or wherever the work is done, sitting on people’s shoulders, and asking “What did you just do?” “Why did you do that?” hundreds of times to learn the thousands of hidden tacit rules on which every company runs. Then we turn it into code.
The code is the runtime. When the world stays the same, this code runs the exact same steps every time. When the world changes, it learns, regenerates, tests itself against the objective, and then runs the new code until the world changes again.
The result is 100x less token usage and nines of accuracy on complex tasks. Put differently, each token you spend does 100x more, and it does it right.
The value of the output increases, because Poetic does something that your business actually needs to do, over and over. And the cost of tokens is lower, because Poetic only uses tokens when the world changes. Combined, Poetic generates a clear, measurable Return on Tokens.
We are doing it today, for companies like AIG, SoFi, and Chime. AIG CEO Peter Zaffino said that Poetic has already “achieved 99%+ quality outcomes on multi-hour processes - delivering real enterprise value.”
These companies’ leaders believe what we believe: that every business will have to be re-founded as a software business. The story of the next decade is the beginning of those new businesses. Some will be truly new, built from scratch. Others will be businesses that have existed for hundreds of years, brave enough to reinvent themselves.
Everyone talks about the fact that it took reconfiguring factories around electricity to benefit from electricity, and follows that with the AI equivalent of “so the new businesses that are built to throw a ton of electricity at the problem will win.” What you really need to do is refactor the businesses into code.
Doing that takes a ground game, going deep into the guts of companies, wherever they are, to understand how they work and migrate their logic into programs. We need people to get out there into Minnesota to be like, what the hell do you guys do all day?
Most of our team are engineers, a lot of them ex-Palantir, who spend weeks at a time on-site with customers, learning from them, getting into the nittiest of gritty details.
The term gets a bad rap, but relative to engineers who spend all day at a desk prompting Claude, they are the most Social Engineers. Engineers who understand people, business, and AI will rule the world. If that sounds like you, come join us at Poetic.
It’s hard work, but the biggest mistake of the AI era so far has been believing that anything worth doing could be easy.
This is worth doing. It’s what I’ve wanted to do for as long as I can remember, because all of the institutions that run our world, every business, every government, does so much stuff that doesn’t make sense, operates way more slowly than it should, and gums up the works.
Our goal is to discover the perfect process for every business - the plan, the set of steps that is ideal for achieving your goals. This will evolve as the world evolves.
The role of AI is not “Agentic,” improvising as it goes. It is an evolutionary force: changing, testing, evaluating what plan is most successful and sticking to it until a better one emerges.
In the end, this is what a business is. A piece of living, evolving software reconfiguring, testing, evaluating itself, hurtling towards ideal form at the fastest clip imaginable. Humans exist to define what good looks like, not how to get there. Shaping behavior, directing behavior.
Through running in production, the system gains a record of what happened. What happened, step by step, for every dispute, underwriting case, insurance claim. Thus, every process change becomes testable. The answer to “what if” is known after minutes of backtesting. Run both scenarios in shadow, compare outcomes, decide which is better.
When impact is entirely known, there is little risk - you know exactly what would have happened. Change simply becomes a choice. Then the choice becomes: which outcome is ideal? The process lead, the person responsible for making sure the process achieves the goals above it, simply makes choices.
To make a change, you have to know what the impact of the change is. It’s easy to generate code, hard to know what happens if you run it. After months of running, you’ll be able to ask questions like “What if we approved every dispute under $25?” and know in minutes.
The hill-climb towards the most beautiful process will then begin. Experts experiment, asking what-if questions. Now that humans are no longer bottlenecks, they can begin searching.
This is endgame.
Every business is not just a piece of software; it’s a piece of software constantly editing, testing, evaluating changes. Evolving at the highest frame-rate possible, climbing towards its most correct form. All energy is spent evolving, figuring out the ideal form of the rules.
We don’t use tokens to run the business. We use tokens to turn the business into code and evolve it. We tokenminn to ROTmaxx.
Over billions of years, we have evolved from ocean slime, through trial and error, into fish, lizards, voles, monkeys, and humans.
We don’t want to have to wait billions of years for businesses to evolve into their diverse and ideal forms, and Agents won’t build them. You cannot build the butterfly.
Beautiful, ideal, complex things can only emerge through evolution. I want to speed it up and see how far we can go.
Thanks to Markie for dropping her knowledge and to Adam and Ben for introducing us!
That’s all for today! We’ll be back in your inbox on Friday with a Weekly Dose.
Thanks for reading,
Packy







Thank you for writing this! It was an enjoyable read, and adds some nuance to the token spend / AI's ROT discussion. I'm working on a similar post too, but looking to calculate what it'd cost a business to use LLMs (albeit with some stylised math).
This was a joy to read, and the compiler-versus-runtime distinction is the cleanest cut through the token debate anyone has made. Thank you both for writing it.
One piece of history makes the case even stronger. We tried turning businesses into rules once before. The expert systems of the 1980s were exactly this dream, capture the tacit knowledge in people's heads and turn it into something a machine runs deterministically. They died of two diseases. Extraction was brutally expensive, teams of knowledge engineers interviewing experts for months. And the rules rotted, because when the world changed, updating them was manual and slow. The field even coined a name for the first problem, the knowledge acquisition bottleneck. What's described here is that old dream with both fatal bugs fixed. AI collapses the cost of extraction, and the regenerate-and-retest loop cures the rot. Poetic reads less like a new idea and more like a forty-year-old correct idea whose missing piece finally arrived, which is usually what the best companies turn out to be.
There's also a strange macro consequence hiding in the thesis. If code does the doing and tokens only burn when the world changes, token demand stops tracking how much work the economy contains and starts tracking how fast the world changes. The labs end up long volatility. Quiet years starve them, chaotic ones feed them. That's a very different business from the one the current buildout is priced for.