How to Audit What Your AI Coding Agent Actually Did

The agent's summary is a story. The transcript is the evidence.

If you let an AI coding agent run unattended — and increasingly people do — you eventually need to answer a specific question: what did it actually do? Not the tidy summary it gives at the end, but the literal sequence of commands it ran and files it changed. You need this for trust (did it stay in bounds?), for debugging (how did it reach a broken state?), and for security (did anything happen that shouldn't have?).

Why the summary isn't enough

An agent's end-of-session summary is a reconstruction — sometimes lossy, occasionally optimistic. It reports what the agent believes it accomplished, which can quietly omit a command that failed, a file it touched and reverted, or a step it skipped. For an audit you want ground truth, and the agent's narration isn't it.

The good news: the record already exists

Claude Code writes every session to ~/.claude/projects/ as a JSONL transcript, and every tool call is in there — each command run, each file read or written, each result returned, with timestamps. That's a complete audit trail. You just have to read it.

The four questions an audit answers

What to look for

An audit is only useful if you know the signals:

Make it a habit, not a forensic

The instinct is to read the audit trail only after something breaks. The higher-value practice is a periodic, lightweight glance at what your agents did unattended — the way you'd skim a colleague's pull requests. It builds an accurate sense of how your agents actually behave and surfaces drift before it becomes an incident.

Don't want to parse JSONL by hand?
Operator reads those transcripts and gives you the whole audit in one command — every command, every file write, every sensitive-path access, plus the dangerous actions your agents attempted — across all your projects. Free, local, no telemetry.

← Back · Where your tokens actually go →