AI Ops / 19 Jun 2026

What 12 Billion AI Tokens Bought Me

A builder's field notes on using AI after the demo is over.

12 min 2351 words 3 tags

12 billion tokens.

Not 12 million. Not a weekend experiment. Not a cute screenshot from someone trying one new model and posting a thread.

One night, around 2 a.m., I watched an agent eat context like a tired man eating at a wedding buffet.

The job was simple on paper. I needed it to fix a workflow inside one of my products. The agent had to inspect the codebase, understand the data flow, find why a batch process was failing, patch the issue, run tests, and give me a clean summary.

Nothing poetic. Just real work.

A few hours earlier, I had opened the dashboard with the confidence of a man about to save time. The usage page was sitting in another tab. Clean white background. Model names. Token counts. Rate limits. Credits left. Tiny numbers that look harmless until you refresh and realize they are not tiny anymore.

At first, everything felt under control.

I gave the agent the repo context. I explained the bug. I told it what the workflow was supposed to do.

It was supposed to take messy input, normalize it, update the database, and regenerate a few pages without breaking existing records.

Basic founder donkey work. The kind of thing that is not intellectually hard, but still burns your night if you do it manually.

The agent responded confidently. It always starts confidently.

It said it would inspect the relevant files first. Good.

It listed a plan. Also good.

Then it began reading.

One file. Then another. Then another. Then a helper file. Then a config file. Then a migration. Then a test file that was not even relevant. Then a generated file it absolutely did not need to read.

I watched the context window fill up with my own codebase.

At first, I let it happen because I was tired.

This is where expensive mistakes begin. Not when you are stupid. When you are tired.

The expensive loop

The agent found one possible issue. Then it changed its mind. Then it found another. Then it said the root cause was probably in the schema. Then it said no, the schema was fine. Then it said the failure was likely due to a missing environment variable.

Then it asked to inspect the logs again.

I pasted logs. Too many logs. The kind of paste you do when your brain has stopped being selective.

Here, take everything. Just fix it.

That is not a workflow. That is throwing food into a cage.

The dashboard was still open in the other tab. I refreshed. The number had jumped.

Not a little. Enough to make me sit up.

Fine, I thought. If it fixes the issue, okay.

Then the agent made the worst possible move.

It started solving the wrong problem with full confidence.

It rewrote a function that was not broken. It added a fallback that hid the actual error. It suggested a database change that would have created duplicate records if I had accepted it blindly. It wrote a migration with a name that sounded correct and logic that was slightly insane.

The kind of wrong that looks professional.

The dangerous wrong.

I stopped it. Then I tried to correct it.

Same goal. More detail. More frustration. More context.

The agent apologized. Of course it apologized. Then it asked me to provide the current file state because "the previous context may be stale."

That sentence should be printed on a warning label.

What it meant was this: we have already burned a stupid amount of context, and now I need you to feed me the same meal again.

So I did.

Because it was 2:17 a.m. Because I wanted the thing fixed. Because I had already spent enough that stopping felt like losing.

This is another trap.

Once you have burned tokens on a bad run, you feel pressure to make the run worth it. So you keep feeding it. You keep explaining. You keep pasting. You keep hoping the next answer will convert the waste into value.

Sometimes it does.

This time, it did not.

Context is fuel

Around 2:40 a.m., I stopped the run.

Not paused. Stopped.

I closed the agent window and opened a blank note. Then I wrote down what had actually happened.

Goal: fix batch workflow.
Actual failure: agent got lost.
Cause: too much undirected context.
Human mistake: pasted everything.
Agent mistake: inspected without boundaries.
Cost: tokens, time, attention, trust.
Real risk: confident wrong code entering production.

That note became more useful than the entire run.

I opened the code myself. Not to fix everything. Just to create a map.

Five files mattered. Not fifty.

One function controlled the batch. One helper normalized the input. One database call was failing silently. One test could reproduce it. One log line had the truth.

The agent did not need the whole repo.

It needed a room with locked doors.

So I started again.

Different prompt. Different posture.

I told it:

You are not allowed to inspect unrelated files.
You are not allowed to propose a fix until you explain the data flow.
You will read only these five files.
You will identify the smallest failing path.
You will ask before changing anything.
You will make one patch.
You will not add fallback logic unless we prove the error is recoverable.
You will preserve existing records.
You will write down assumptions separately from facts.

This time, the run was boring.

Beautifully boring.

It read the right files. It traced the path. It found that one field was being normalized differently in two places.

Not a grand architecture issue. Not a database redesign. Not a missing environment variable.

A small mismatch.

The kind of bug that wastes a night because it looks too small to be the problem.

The patch was tiny. The test passed. The pages regenerated. The dashboard still hurt to look at, but at least the bleeding had stopped.

That night changed how I think about agents.

The lesson was not "agents are bad." That is lazy.

The lesson was not "use smaller context." Also too small.

The real lesson was this:

Context is not memory. Context is fuel. And if you pour fuel on the floor, do not blame the engine for the fire.

Chat is the shallow end

Chat is where AI feels useful. Tools are where AI becomes leverage. Permissions are what keep leverage from becoming damage.

Most people think they are using AI because they ask questions in a box.

That is the shallow end.

Chat is safe. It cannot deploy bad code. It cannot delete rows. It cannot run a migration. It cannot silently change production. That is why everyone starts there.

But AI becomes serious when it enters the tools you already use: the codebase, terminal, cloud platform, database, logs, deployment system, testing framework, docs, APIs, monitoring tools.

Especially the command line.

The command line is where software actually gets touched.

Deployment is rarely intellectually beautiful. It is mostly a chain of small, annoying things.

Check this. Run that. Fix this error. Update config. Wait. Retry. Read logs. Search error. Change one line. Run again.

This is where AI stops feeling like an answer machine and starts feeling like leverage.

But the moment AI can act, the problem changes.

You are no longer managing answers.

You are managing permissions.

The permission ladder

An agent that can touch tools is useful because it can act. It is dangerous for the same reason.

The serious workflow is not "let it run." The serious workflow is a supervised loop.

Give it a goal.
Give it context.
Let it inspect.
Make it propose a plan.
Approve or correct the plan.
Let it execute one step.
Check the output.
Continue.

Not blindly. Never blindly.

You watch. You approve. You interrupt. You ask why.

I think about permissions like a ladder:

Level 0: talk only. AI can suggest, but it cannot inspect or act.
Level 1: read-only. AI can inspect files, logs, docs, dashboards, and schemas, but cannot change anything.
Level 2: draft changes. AI can propose patches, commands, migrations, and queries. The human executes.
Level 3: execute with approval. AI can run commands one at a time after approval.
Level 4: limited autonomy. AI can run approved classes of commands inside a sandbox.
Level 5: production authority. Almost never give this casually.

The rules are boring because serious systems are built out of boring rules.

Read before write. Plan before command. Explain destructive actions before execution. Separate facts from assumptions. Define stop conditions. Use sandboxes. Log what the agent did. Make rollback part of the plan, not an afterthought.

A good agent run is not one giant prompt.

It is a supervised loop.

Big models for planning, smaller models for execution

Not every step deserves the same model.

Planning and execution are different jobs.

Planning needs judgment. It includes architecture, task breakdown, risk identification, order of operations, success criteria, and deciding what not to touch.

Execution needs reliability. It includes formatting, simple edits, extraction, chunked processing, data cleanup, test generation, and following a known plan.

Use expensive thinking where thinking matters. Use cheaper execution where execution is repetitive.

The mistake is not using cheap models.

The mistake is using cheap models for expensive uncertainty.

A better loop looks like this:

Use a strong model to create the plan.
Use a strong model to identify failure modes.
Run a small sample.
Evaluate the output.
Move stable execution to a cheaper model.
Keep checking samples.
Escalate back to the strong model when drift appears.

If the smaller model creates cleanup work, it is not cheaper.

The real price is always input to usable, not prompt to text.

Limits become architecture

The place where you access models matters more than people think.

APIs, limits, tiers, rate restrictions, support, billing, reliability, model availability, and context windows all shape what you can actually build.

When you are playing around, limits are annoying.

When you are building production workflows, limits become architecture.

They determine how much context you send, what you cache, how you design retries, which jobs you batch, which models you reserve for judgment, and which ideas you do not even attempt.

Limits are not just limits.

They become product decisions.

This is boring infrastructure stuff. But boring infrastructure decides whether you can move fast or keep stopping at locked doors.

AI is an amplifier

Two people can use the same model and get completely different results.

Why?

Because AI multiplies what is already there.

If you have clear thinking, it multiplies clear thinking. If you have domain skill, it multiplies domain skill. If you have taste, it multiplies taste. If you know how to break problems down, it becomes much more useful.

But if you are vague, it multiplies vagueness. If you chase shortcuts, it gives you more shortcuts. If you have no taste, it gives you polished trash. If you cannot judge output, you become dependent on whatever it says.

That is why the advantage is not equal.

AI is not one advantage.

It is an amplifier.

And amplifiers are unfair.

Give the same guitar and amplifier to a beginner and to a great musician.

Same gear. Different music.

This is what people miss. They compare access instead of ability.

What the tokens actually bought

So what did 12 billion tokens buy?

Not intelligence. Not certainty. Not some secret prompt.

It bought speed. It bought reps. It bought fewer dead ends. It bought more attempts. It bought accelerated learning.

It bought hands-on practice without getting stuck for three days on one stupid thing.

This matters more than people think.

A lot of learning dies because of frustration.

You try to learn something. You get stuck. You search. The explanation assumes you already know five other things. You try again. It fails. You feel stupid. You stop.

AI changes that loop.

You can ask the dumb question privately. You can paste the error. You can ask it to explain like you are new but not an idiot. You can ask it to quiz you. You can ask it to compare two approaches. You can ask it to tell you what you are missing.

You can keep going.

That "keep going" is the whole game.

Most people do not fail because they lack potential. They fail because friction wins.

AI lowers the friction enough to keep you inside the work.

That is how skill compounds.

Not because AI makes the work effortless. Because it keeps you in motion long enough to get better.

The tokens bought me reps.

Reps bought me judgment.

Judgment is the only real edge here.

The winning posture is not worship. It is not fear.

It is command.

The real advantage

The real advantage is not that you use AI.

Everyone will use AI.

The real advantage is that you know what to ask, what to ignore, what to verify, what to improve, what to reject, and when to stop.

That comes from skill. It comes from taste. It comes from clear thinking. It comes from doing real work and feeling where the machine helps and where it lies.

I spent 12 billion tokens to learn this in my bones:

AI does not remove the need to become better.

It increases the reward for becoming better.

If you are lazy, it will help you produce more lazy work.

If you are serious, it will give you more shots on goal than you have ever had.

And if you still think this is a toy, maybe you are right.

But toys are how children learn the world.

For the first time in a long time, building feels like play again.

Not easy. Not automatic. Not safe.

Play.

The kind where you lose track of time. The kind where your imagination runs ahead of your hands. The kind where you look up at 3 a.m., tired and wired, and realize the thing that existed only in your head this morning is now alive on the screen.

That is what the 12 billion tokens bought me.

Not answers.

Momentum.