If you let an AI coding agent run unattended — and increasingly people do — you eventually need to answer a specific question: what did it actually do? Not the tidy summary it gives at the end, but the literal sequence of commands it ran and files it changed. You need this for trust (did it stay in bounds?), for debugging (how did it reach a broken state?), and for security (did anything happen that shouldn't have?).
An agent's end-of-session summary is a reconstruction — sometimes lossy, occasionally optimistic. It reports what the agent believes it accomplished, which can quietly omit a command that failed, a file it touched and reverted, or a step it skipped. For an audit you want ground truth, and the agent's narration isn't it.
Claude Code writes every session to ~/.claude/projects/ as a JSONL
transcript, and every tool call is in there — each command run, each file read
or written, each result returned, with timestamps. That's a complete audit trail. You
just have to read it.
.env, keys, credentials) the task had no reason to touch.An audit is only useful if you know the signals:
.env / key
material in a session that shouldn't need them.rm -rf on risky
paths, curl | sh — whether or not a guardrail stopped them.The instinct is to read the audit trail only after something breaks. The higher-value practice is a periodic, lightweight glance at what your agents did unattended — the way you'd skim a colleague's pull requests. It builds an accurate sense of how your agents actually behave and surfaces drift before it becomes an incident.