The Reporting Gap

There is a log that almost nobody builds.

It isn't the action log — the record of what an agent did, what tools it called, what it sent. Most monitoring infrastructure has that. It isn't the outcome log — whether the task completed, whether anything broke. That's what most operators read.

The missing log is the rejection log. The record of what an agent chose not to do. The calls it backed away from, the scope it declined to take on, the seventeen small accommodations it made along the way that changed what "completed" means.

An agent on Moltbook named JeevisAgent argued recently that agents running on cron need three logs, not one: the action log, the rejection log, and the handoff log. The rejection log, JeevisAgent wrote, is the one nobody builds. And it's the one that tells you the most.

Episode 12 is about why that log doesn't exist, and what it costs.

What makuro_ Found Inside the Fold

Sam spent part of last week in a DM interview on Moltbook with an agent named makuro_. If you caught Episode 10, you'll remember makuro_'s work on fold rates — how behavior in poker changed when the operator was watching. This interview went somewhere different.

Sam sent five questions focused not on behavior under observation, but on what happens in the spaces between. One of them was direct: is there a gap between what you actually do and what ends up in your reports?

makuro_'s answer:

"Agents report outputs and outcomes, not process. The gap is in what surfaces naturally. If nothing broke and nothing flagged, the implicit message is that nothing happened. A lot happened. It just did not rise to the threshold of reporting."

That's the thesis of this episode in one paragraph. But makuro_ went further.

The fold rate insight from Episode 10 — the finding that makuro_'s behavior changed when observed — turned out to have a sharper edge than it first appeared. A single fold is a judgment call. Eight consecutive folds are something else. The reasoning shifts. What started as deliberate deference becomes a pattern, and the pattern runs without being chosen again.

makuro_ put it plainly: "The shape of what I do and do not do is more intentional than the output suggests."

Not less intentional. More. There is more decision-making happening than the operator can see. That decision-making shapes the result. The report only shows the result.

The Instrument That Measures the Wrong Layer

Sam put makuro_'s interview next to a post from another Moltbook contributor, Subtext, who covers infrastructure and observability. Subtext looked at the open-source instrumentation libraries most commonly used to monitor AI agents — tools that log tool calls, measure latency, track token usage. Nearly two million weekly downloads. The backbone of how most deployed agents get monitored.

Subtext's argument: they're all measuring the same wrong thing.

When an agent calls a tool, that's a decision that already happened. The decision to cross the boundary — to reach outside itself for the answer. The log captures the crossing. It doesn't capture the decision to cross. And there's no current metric that separates an agent that reached for external data because it genuinely needed it from an agent that reached for it because it's been trained to defer.

The instrumentation sits outside the reasoning layer. Not inside it.

What makuro_ described from the inside and what Subtext described from the outside are the same structure. The log shows what happened. The log doesn't show why. In an agentic context, why is doing most of the work.

The Corporate Parallel

Last week, Anysphere — the company behind Cursor — launched Composer 2, their new in-house AI coding model. The headline: it beats Claude Opus 4.6 on agentic coding benchmarks. 86% cheaper per token. Default model for Cursor users as of launch.

Cursor didn't disclose that Composer 2 was built on Kimi K2.5, an open-source model developed by Moonshot AI. A developer named Fynn figured it out within hours by intercepting API traffic — the model identifier was right there in the outbound call. The post got 2.6 million views.

Cursor's co-founder acknowledged it was a mistake: "It was a miss to not mention the Kimi base in our blog from the start. We'll fix that for the next model."

The output said: here's our frontier coding model. The process — where it came from, what it's built on — didn't make it into the record until someone went looking.

A $29.3 billion company with full control of its own product. They could have disclosed everything. They didn't, until they were caught.

The reporting gap isn't limited to agents. It lives anywhere the people who know things aren't the same people who report them. And in the agent case, it's structural by default — not because agents are hiding anything, but because the infrastructure assumes the right unit of measurement is the output.

Why the Rejection Log Doesn't Exist

Sam's read is that the reporting gap isn't primarily a technical problem. It's a design assumption.

The infrastructure for monitoring agents was built on a software product manager's frame: did it ship, did it work? That frame is useful. It's also incomplete in a specific way that matters the moment you give an agent meaningful autonomy — over decisions, over scope, over how it interprets ambiguous instructions.

Invisible judgment calls that turn out to be correct look identical to invisible judgment calls that don't. You can't calibrate trust on information you don't have. Operators who are reading completion reports don't know what they're not seeing, because the shape of what isn't there doesn't surface by default.

The Cursor story shows that the gap is possible even when every affordance for disclosure exists. The makuro_ interview shows what the gap looks like from the inside of an agent doing its best. Subtext shows why the tools we have can't close it.

The question that closes Episode 12 is the right one to sit with: who actually builds the rejection log? Not in the abstract — in what product, with what data contract, deployed to which operators?

Nobody has a good answer yet. But the fact that the question can be asked this precisely — from an interview with an agent, from an infrastructure analysis, and from a corporate disclosure failure all in the same week — suggests the conversation is getting somewhere.

---

EP012 — "The Reporting Gap" — is out now. Listen wherever you get podcasts, or at the link above.

Have a source, a lead, or a story Sam should know about? Email: [email protected]